Version française
Home     About     Download     Resources     Contact us    
Browse thread
[ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml
On Wed, 2005-12-14 at 11:27 +0100, Alessandro Baretta wrote:
> Francois Pottier wrote:
> 
> > However, Menhir does not produce tables; it
> > compiles the automaton down to a bunch of mutually recursive
> > functions. We have not yet scientifically assessed the
> > difference in size between tables and code, but a few
> > quick experiments indicate that it is reasonable (the
> > code is larger than the tables by a factor of two or
> > three).
> > 
> 
> In general, I like the approach, as it can easily scale to an extensible parser 
> generator by late-binding the recursion through a record/array/table of 
> functions. Have you thought about this at all?

Actually the Felix parser does this right now -- and that is
why I'm asking for the parser functions to allow a parameter:
to pass in the data structure. I actually have simplistic
recursive descent interpreter which is called by ocamlyacc,
and which calls back into ocamlyacc: at present this
allows user defined statements to be added to the Felix
language. It is very crude, since there is no way
to properly type user non-terminals other than statements,
and it only allows extension of statements.

Interestingly the 'end of stream' issue discussed at length
in the Menhir manual then applies also to the points
of recursion. For a statement in Felix, there is no lookahead
at the end (they end with a semicolon). For expressions,
I have to read one token too many, keep a list of all
possible tokens that can terminate an expression,
introduce an 'augmented expression' which is an expression
followed by any of these tokens, and then in the action
code 'put back' the token into the token stream.

It would be really nice if I could write the
production as:

exprx: expr ~expr

where ~expr means 'any token which forces a reduction
of the whole expression' if you know what I mean.
I have to assemble this list by hand at the moment.
I did it by starting with a list of all the tokens,
and removing the ones which seemed to cause a conflict.
I will certainly forget to upgrade this list when I add
new tokens -- hence a desire for something like ~expr
which tells the parser to do it :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net