English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Yet another yacc question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-05-24 (13:24)
From: Francois Pottier <Francois.Pottier@i...>
Subject: Re: [Caml-list] Yet another yacc question


On Wed, May 23, 2007 at 04:09:58PM +0100, David Allsopp wrote:

> The third case fails because "AA" is not in the grammar. However, the first
> two work even though "ABCZ" and "ACZ" are also not in the grammar (and Z
> isn't even a token!). They work because ocamlyacc doesn't need look-ahead
> after the "C" in each case to determine that it can reduce to the entry
> non-terminal and so return (). In the third case, look-ahead is required -
> it looks ahead, sees an A and so fails.

Your analysis is correct.

> I would quite like the third to match as well and ignore the second A
> (ignore and leave on the buffer ready for a future parse... so "peek-ahead"
> rather than "look-ahead", I guess).

It's hard to ignore the second A once one has requested it... The lexer
interface does not have a "peek" or "put-back" operation, and LR parsers
aren't designed to support these operations anyway.

> So my question: can menhir, dypgen or any of the other parser generators out
> there do this - i.e. return one () on the first call and then another () on
> the second with the string "AA"? It would finally be a reason for abandoning
> ocamlyacc :o)

In theory, menhir adopts the same semantics as ocamlyacc, but it will attempt
to help you by providing more warnings -- here, it will complain that your
grammar has an end-of-stream conflict.

> (Incidentally, lest anyone have it confirmed that I'm mad, I'm trying to
> parse batches of SQL statements so have no obvious terminating token for a
> clause - the parser needs to do a longest possible match ignoring anything
> else following that would appear to be a syntax error)

If you want to parse sequences of statements without clear separators, it
seems to me that the best approach would be to invoke the parser just once for
an entire sequence, instead of trying to invoke it once per statement and have
it leave the token stream in a consistent state. This approach would require
an inversion of control (the parser would drive the rest of your code, instead
of the code invoking the parser). If you can accept that, you should be fine.

Hope this helps,

François Pottier