Version française
Home     About     Download     Resources     Contact us    
Browse thread
yacc question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] yacc question
On Sun, 2005-12-04 at 09:25 -0600, Brian Hurt wrote:
> 
> On Mon, 5 Dec 2005, skaller wrote:
> 
> > I have the 'usual' kind of parser for expressions, with two
> > nonterminals 'expr' and 'atom', the latter including ( expr )
> > and INTEGER of course.
> >
> > When I have input like
> >
> > 1;
> > 1 + 2 ;
> > (1 + 2) ;
> >
> > none of the case parse as expressions, the first and
> > last do parse as atoms (leaving the ; in the buffer).
> >
> > What I want is to parse the longest possible match.
> > The only way I can think of doing this is:
> >
> > top_expr:
> >  | expr token_not_allowed_in_expressions
> >
> > and then 'put back' the trailing token into the buffer.
> > Is there another way?
> >
> 
> The standard way to implement this is:
> 
> statement:
>  	expression SEMICOLON
> 
> expression:
>  	| mul_expression
>  	| expression PLUS mul_expression
>  	| expression MINUS mul_expression
> 
> mul_expression:
>  	atom_expression
>  	| mul_expression TIMES atom_expression
>  	| mul_expression DIVIDE atom_expression
>  	| mul_expression MODULO atom_expression
> 
> atom_expression:
>  	ATOM
>  	| OPEN_PAREN expression CLOSE_PAREN
> 
> Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed 
> as 3 + (4 * 5).  Also notice that the semicolon is what terminates us- we 
> keep collecting up an expression until we see a semicolon- only then do we 
> complete the statement.

Expressions cannot be statements, but I have instead:

statement:
	NAME LEFT_ARROW expression SEMICOLON

for example, so basically your example holds with a modification.

The problem is I want to parse a *user defined statement*.
This will be done by a preprocessor definition, something like:

#keyword within
#statement select expr within statement ;

This makes 'select' and 'within' keywords, when ocamlyacc
sees the user defined statement keyword 'select', it looks
up a table and finds the grammar production. I then parse
by recursive descent -- backtracking is cool in an FPL,
my tokens are a list, so I can backtrack easily.

When interpreting the grammar, if I see the token 'expr'
then I parse an expression using the ocamlyacc entry point.
Similarly for statements -- in particular, since the 
user defined statements can be 'added' into the statement
grammar, recursion in the 'within' clause will also include
the new statements (the recursion is covariant?)

There is no problem with statements .. but for expressions,
I cannot just terminate on SEMICOLON -- I have to terminate
on many symbols, which can't be part of an expression --
and leave them in the token buffer too. In the example
grammar, 'within' terminates the expression.

The problem of course is that Ocamlyacc doesn't know about
the new grammar .. but I'm still using it to parse expressions.

Mixing LALR1 and an RD parser seems messy .. but I see
no other way to obtain high performance for 'the usual case'
whilst still allowing extensibility -- Ocaml doesn't come
with an LL parser generator.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net