Re: GenLex stream parsers too eager?

Date: Mon Feb 01 1999 - 14:13:28 MET

Subject: Re: GenLex stream parsers too eager?
> It appears that the Genlex derived parsers always eagerly tokenize
> negaitve integer and float constants. This causes incorrect behavior
> in closely spaced code (no-spaces):
> a-2*c --> parses as "a", "-2" ,"*", "c" instead of "a","-","2","*","c"

Right. This is a classic compiler problem: one can either tokenize
negative integer literals in the lexer (-?[0-9]+), which causes the
weird behavior above for expressions without spaces, or have the lexer
tokenize only positive integer literals ([0-9]+) and add a special
case in the parser to recognize "-" followed by an integer literal.

Genlex is very simple-minded and follows the former approach.
The Caml compilers follow the latter.

(The latter approach has its own problems. For instance, in Caml,
it parses "f -1" as "f minus 1", not as "f applied to the integer -1",
like many users expect.)

> Any suggestions? (Perhaps I should be using OCAMLLEX and OCAMLYACC instead?)

You'll have to write your own lexer, indeed. You can either use ocamllex
to generate it, or start with the source code of the Genlex module
and customize it to your needs.

Best regards,

- Xavier Leroy

