Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-04-28 (10:32)
From: skaller <skaller@u...>
Subject: menhir
Just a note I just built the Felix parser with Menhir.
First, it detected some duplicate definitions Ocamlyacc didn't! Good!

Second, I got a "rather a lot" of states have end-of-stream conflicts.
What's that about?

Third the generated ml file was 4.5 Meg. 
Ocamlopt on amd64 hung for so long I almost posted a bug report
for Ocamlopt, but finally it finished. This was a 95% CPU, 25% memory
job, so no paging. I'd guess it took 100x times longer than
compilation of the ocamlyacc file (which is just a bunch of numbers :)
I didn't measure it .. no biggie for me now I know, but my
box is a LOT faster than some of the boxes my product gets built on.

After that, Felix built ok, and the parser worked for
'pure' code. However it failed when Felix preprocessor
syntax extensions were used (which is 90% of all programs).

Now, most of the system data transport for this is properly
built so it can't cause any problems. The one thing which 
is hacked is the pushback detection.

Basically: when Ocamlyacc reduces a production, it sometimes
ends on the last token, and sometimes it overshoots by 1.

My grammar uses a system like:

  expr expr_terminator { $1,$2 }

  | statement_aster statements_terminator { $1, $2 }

This is saying: a 'special expression' exprx is an expression
PLUS one of the tokens which will solidly terminate an 
expression AND NOT ITSELF BE OVERSHOT by the parser.

In other words when exprx is parsed the reduction must
leave the next token unread.

The semantics used are: when the exprx is processed the
action arranges to push the terminator token back
into the token stream.

Perhaps because Menhir is LR(1) not LALR(1), this technique
is failing. Or Menhir may simply be looking ahead further
than required in the token stream.

Whichever way, I am depending on this particular implementation
detail of Ocamlyacc, and Mehir is using a different implementation.

Any suggestions how to 'fix' this?

John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: