Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] ocamllex/yacc and camlp4
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-06-17 (09:36)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] ocamllex/yacc and camlp4
On Thu, 2004-06-17 at 08:48, Pierre Weis wrote:

>  1) ocamllex and ocamlyacc implementation technologies are damned fast
> and it is difficult to compete with them using streams.

They're not so fast when your problem exceeds the constraints
which determine what they're good at. All my lexers generate
an in memory token list for this reason.

> Last but not least, the actual ocamllex/ocamlyacc implementations work
> pretty well, so that there is no clear necessity to rewrite them.

I think there is: they haven't worked so well for any of the
parsers I've had to write -- not even the Felix parser which
is specifically designed to be unambiguious LALR(1) 
and Ocamlyaccable.

The parser can't take a state argument, it can't accept
a token type, the generated interface can't be added to
by the client which is necessary when you need to define
a function callable by the lexer and parser which depends on the
type of a token, you can't use a meta-grammar notation with
the obvious interpretation (a* makes a list, a? an option).

Considerable effort is required to decouple the
faulty interface which makes the parser depend on
a lexbuf.

LALR(1) is very hard to work with, and often the easiest
workaround is by doing some lookahead in the tokeniser:
the coupling of the parser and lexer make this difficult.
Felix lexer/parser needs about 8 files: more than any
other part of the compiler.

Also Ocamllex is only an 8 bit lexer which isn't
that useful these days where XML/Web stuff demands UTF-8
encoded Unicode.

> In conclusion: pure Camlp4 implementation of ocamllex/ocamlyacc is
> still an interesting and challenging progamming task for the next few
> years, if you (or someone else) had the will and time to provide two
> ``great camlp4 examples'' to the rest of us...
> Happy hacking :)

Ulex already integrates lexing and provides UTF-8,
Camomile already provides 32 bit lexers. Code exists.
The problem here isn't hacking the code, but getting
INRIA to agree to sit down and work with the community
on designing an interface specification for a facility
good enough to put directly in the standard distro.
Once that were agreed I'm quite sure the non-INRIA
community would rapidly provide an implementation.

As an added incentive: an integrated lexer automatically
provides a superior alternative to Str and PCRE, and if
you do a parser as well, an even better system: quite a
lot of 'regexp' stuff should actually be done by a real

John Skaller,
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: