Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Context Free Grammars?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-08-12 (19:25)
From: Paul Snively <psnively@m...>
Subject: Re: [Caml-list] Context Free Grammars?
Hash: SHA1

Hi David!

On Aug 12, 2004, at 11:14 AM, David McClain wrote:

> I am not a YACC expert (no kidding!?) but I always had the notion that  
> nonterminal reduction trees would somehow guide the reduce decisions.  
> Now, however, it appears that YACC is simply looking blindly for  
> syntactic token chains and reducing on them with absolutely no notion  
> of who might be interested in the result. Kind of takes the wind out  
> of my ancient understandings of YACC.
Quite right. YACC, as you no doubt know, is LALR(1), which expands to  
"Look Ahead Left-Right 1, which is a fancy way of saying basically two  

1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.

More in a moment.

> [So I guess I was writing my grammars as though there were some kind  
> of recursive descent engine in control. Not apparently so...]
Right! And hand-written recursive-descent parsers are almost always  

1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.

If you're accustomed to writing LL(1) grammars, you'll need to factor  
right recursion to left recursion in order to go LALR(1). But you're  
still left with that pesky (1), which really seems to be what your post  
is about. So forget about LALR vs. LL and let's look at some solutions  
to the (1) problem.

O'Caml as a suite already has some quite nice parsing and lexing tools,  
and third parties have made others. Let me first recommend that you  
have a look at the Camlp4 tutorial at  
<>. Camlp4 is O'Caml's  
"preprocessor," but unfortunately that term has become one of rather  
ill repute thanks to the C/C++ communities. Camlp4 is much closer in  
expressive power and safety to Scheme's hygenic macros or to a good  
LL-based parser-generator system, as the tutorial makes abundantly  

However, as nice as Camlp4 undeniably is, it still leaves you with your  
(1) problem barring lexer hacks as described at  
<>: "The input 'A  
C' raises Stream.Error. There is no simple solution to this problem,  
but there is a solution (not very clean, actually): create a entry from  
a parser (it is possible via the function Grammar.Entry.of_parser).  
This parser must scan the stream using Stream.npeek until the number of  
necessary tokens allows to make the difference; return unit or raise  
Stream.Failure according to the cases." This is the traditional "smart  
lexer" approach to handling ambiguous (i.e. real-world) grammars in  
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.

A lot of work has been done on relaxing the (1) restriction while still  
allowing efficient parsing. I won't go into the history and  
alternatives, but rather will just refer to  
<>. Elkhound is a GLR  
parser generator that will generate either C++ or O'Caml parsers. Being  
GLR, it deals nicely with ambiguities, and will even let you defer  
implementing resolution of them until after you've prototyped your  
entire grammar if you wish. There's a tutorial at  
tutorial.html> that may be helpful, although it's in C++ terms rather  
than O'Caml. Simply putting:

option lang_OCaml;

at the beginning of your grammar will fix that. :-)

> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
> +1.520.390.7738 (USA)
> -------------------
> To unsubscribe, mail Archives:  
> Bug reports: FAQ:  
> Beginner's list:
I hope this is helpful,
Paul Snively

Version: GnuPG v1.2.4 (Darwin)


To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: