Re: GenLex stream parsers too eager?

From: Xavier Leroy (Xavier.Leroy@inria.fr)
Date: Mon Feb 01 1999 - 14:13:28 MET


Date: Mon, 1 Feb 1999 14:13:28 +0100
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: David McClain <dmcclain@azstarnet.com>, Liste CAML <caml-list@inria.fr>
Subject: Re: GenLex stream parsers too eager?
In-Reply-To: <000c01be4952$09752290$210148bf@dylan>; from David McClain on Tue, Jan 26, 1999 at 10:33:46AM -0700

> It appears that the Genlex derived parsers always eagerly tokenize
> negaitve integer and float constants. This causes incorrect behavior
> in closely spaced code (no-spaces):
>
> a-2*c --> parses as "a", "-2" ,"*", "c" instead of "a","-","2","*","c"
>

Right. This is a classic compiler problem: one can either tokenize
negative integer literals in the lexer (-?[0-9]+), which causes the
weird behavior above for expressions without spaces, or have the lexer
tokenize only positive integer literals ([0-9]+) and add a special
case in the parser to recognize "-" followed by an integer literal.

Genlex is very simple-minded and follows the former approach.
The Caml compilers follow the latter.

(The latter approach has its own problems. For instance, in Caml,
it parses "f -1" as "f minus 1", not as "f applied to the integer -1",
like many users expect.)

> Any suggestions? (Perhaps I should be using OCAMLLEX and OCAMLYACC instead?)

You'll have to write your own lexer, indeed. You can either use ocamllex
to generate it, or start with the source code of the Genlex module
and customize it to your needs.

Best regards,

- Xavier Leroy



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:19 MET