Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Context Free Grammars?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-08-13 (04:45)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] Context Free Grammars?
On Fri, 2004-08-13 at 03:18, David McClain wrote:

> Anyone have any hints about syntax transformations so that CFG's can 
> really be used here? 

Yeah, I've been through this pain and still bump into it
a lot.

There are two tricks. The first is to change your thinking
to *bottom up*. The grammar names 'syntactic fragments'
of increasing size and complexity as it sees them,
rather than going searching for some wholistic shape.
So name your fragments uniquely; think in terms:

"the terms I'm parsing have intrinsic (synthetic) structure" 
that is passed upwards, rather than "I'm looking down
the tree for something I expect, if I don't find it
I will try something else"

The second trick is: make your grammar coarse grained
and far too general. Don't use the LALR(1) parser to
enforce constraints. Do that in the { executable code }
part or in post processing. Type checking is the obvious
example of that.

In the Felix grammar (which is LALR(1) and entirely
unambiguous), I use exactly the same coarse syntax for
executable expressions and for type annotations. 

In both cases I allow x + y * z (yup, Felix has anonymous
sum types). So when I parse

	(x + y * z) : ( t + u * v)
	// executable  : type annotation

I use (the moral equivalent of):

	expr COLON expr {
		let x = $1 in
		let t = expr_as_type $3 in
		`Coercion (x,y)

Since not all executable expressions are type expressions,
I trap that in the function 'expr_as_type' and throw
an exception -- which produces a vastly superior error message
to 'Syntax Error' that is the best the parser can produce

Finally: if you are parsing a *nasty* language, such as Python,
that isn't even remotely LALR(1), you can still use a LALR(1)
grammar with some care are trickery, to do a lot of the work.
To parse Python, I wrote  multi-stage 'token filter' to preprocess
the token stream, generating 'INDENT' and 'UNDENT' tokens,
for example (Python uses indentation to specify block structure).
Another nastiness in Python is (a,) for a unary tuple: that trailing
comma is allowed in pairs too: (a,b,) and it really screws up
LALR(1) parsing.

It took around 10 separate passes to generate a list of tokens
that I could more easily parse with Ocamlyacc.

John Skaller,
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: