Re: lexer function .

Xavier Leroy (Xavier.Leroy@inria.fr)
Mon, 4 Nov 1996 17:45:56 +0100 (MET)

From: Xavier Leroy <Xavier.Leroy@inria.fr>
Message-Id: <199611041645.RAA05308@pauillac.inria.fr>
Subject: Re: lexer function .
In-Reply-To: <199610311025.LAA08481@aenegada.inria.fr> from Olivier Pons at "Oct 31, 96 11:25:48 am"
To: opons@aenegada.inria.fr (Olivier Pons)
Date: Mon, 4 Nov 1996 17:45:56 +0100 (MET)

> I have a little problem with the lexer function generated by make_lexer
> in the example below:
> let lexer = make_lexer ["("; ")"; "/\\"; "\//"; "~"; "->"];;
> let a = (lexer(stream_of_string"a/\~a"));; (* there is no blank*)
> a : token stream = <abstr>
> #let b = (lexer(stream_of_string"a/\ ~a"));;(* there is a blank*)
> b : token stream = <abstr>
> I don't understand why in the first case the keywords /\\ and ~ are
> not recognized ?

Lexers generated by "genlex" first split the input stream into
literals, identifiers and "special characters", then match the
identifiers and special characters against the keyword table.
In the first case of your example, the stream is split as
a /\~ a
and since /\~ is not a keyword, it's returned as an identifier.
In the second case, the stream is split as
a /\ ~ a
and /\ is recognized as a keyword.

This behavior is consistent with that of the Caml lexer itself:

#1+-2;;
Toplevel input:
>1+-2;;
> ^^
The value identifier +- is unbound.
#1 + -2;;
- : int = -1

So, use blanks liberally.

- Xavier Leroy