[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | David Allsopp <dra-news@m...> |
| Subject: | Yet another yacc question |
Suppose I have the context-free grammar
P -> A | A B | A B C | A C
In ocamlyacc, I code this up as:
%token A B C
%type <unit> parse
%start parse
%%
parse:
A {()}
| A B {()}
| A B C {()}
| A C {()}
;
And set-up a lexer (called lexer) so that the characters 'A', 'B' and 'C'
produce the tokens A, B and C. Then I write the following function:
let f s = parse lexer (Lexing.from_string s)
And use it a few times...
f "ABCZ" ... gives ()
f "ACZ" ... gives ()
f "AA" ... raises Parsing.Parse_error
The third case fails because "AA" is not in the grammar. However, the first
two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
isn't even a token!). They work because ocamlyacc doesn't need look-ahead
after the "C" in each case to determine that it can reduce to the entry
non-terminal and so return (). In the third case, look-ahead is required -
it looks ahead, sees an A and so fails.
I would quite like the third to match as well and ignore the second A
(ignore and leave on the buffer ready for a future parse... so "peek-ahead"
rather than "look-ahead", I guess). I think I'm probably right in assuming
that ocamlyacc can't do this. I'm not willing to alter my parser to return a
list of tokens which as far as I can see is the only way to make ocamlyacc
do this correctly - i.e.
parse:
token parse {$1::$2}
| EOF {[]}
;
token:
/as for parse in the previous grammar/
(Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
parse batches of SQL statements so have no obvious terminating token for a
clause - the parser needs to do a longest possible match ignoring anything
else following that would appear to be a syntax error)
So my question: can menhir, dypgen or any of the other parser generators out
there do this - i.e. return one () on the first call and then another () on
the second with the string "AA"? It would finally be a reason for abandoning
ocamlyacc :o)
Thanks! (in hope that I haven't missed something blindingly obvious...)
David