Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Yet another yacc question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: David Allsopp <dra-news@m...>
Subject: Yet another yacc question
Suppose I have the context-free grammar

P -> A | A B | A B C | A C

In ocamlyacc, I code this up as:

%token A B C
%type <unit> parse
%start parse

  A     {()}
| A B   {()}
| A B C {()}
| A C   {()}

And set-up a lexer (called lexer) so that the characters 'A', 'B' and 'C'
produce the tokens A, B and C. Then I write the following function:

let f s = parse lexer (Lexing.from_string s)

And use it a few times...

f "ABCZ"   ...   gives ()
f "ACZ"    ...   gives ()
f "AA"     ...   raises Parsing.Parse_error

The third case fails because "AA" is not in the grammar. However, the first
two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
isn't even a token!). They work because ocamlyacc doesn't need look-ahead
after the "C" in each case to determine that it can reduce to the entry
non-terminal and so return (). In the third case, look-ahead is required -
it looks ahead, sees an A and so fails.

I would quite like the third to match as well and ignore the second A
(ignore and leave on the buffer ready for a future parse... so "peek-ahead"
rather than "look-ahead", I guess). I think I'm probably right in assuming
that ocamlyacc can't do this. I'm not willing to alter my parser to return a
list of tokens which as far as I can see is the only way to make ocamlyacc
do this correctly - i.e.

  token parse {$1::$2}
| EOF {[]}

  /as for parse in the previous grammar/ 

(Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
parse batches of SQL statements so have no obvious terminating token for a
clause - the parser needs to do a longest possible match ignoring anything
else following that would appear to be a syntax error)

So my question: can menhir, dypgen or any of the other parser generators out
there do this - i.e. return one () on the first call and then another () on
the second with the string "AA"? It would finally be a reason for abandoning
ocamlyacc :o)

Thanks! (in hope that I haven't missed something blindingly obvious...)