Re: another question on lexer function

Pierre Weis (Pierre.Weis@inria.fr)
Fri, 8 Nov 1996 14:21:55 +0100 (MET)

From: Pierre Weis <Pierre.Weis@inria.fr>
Message-Id: <199611081321.OAA28852@pauillac.inria.fr>
Subject: Re: another question on lexer function
In-Reply-To: <199611071007.LAA25922@aenegada.inria.fr> from Olivier Pons at "Nov 7, 96 11:07:04 am"
To: opons@aenegada.inria.fr (Olivier Pons)
Date: Fri, 8 Nov 1996 14:21:55 +0100 (MET)

> is it forbidden to use the ";;" token in a make_lexer function ?

You cannot use an arbitrary sequence of characters as a declared
keyword: the sequences you write must be recognized as an identifier
by the ``next_token'' function inside make_lexer, (or be a single
non-alphanumeric character). The next_token function recognizes two
kinds of identifiers: regular ones (roughly speaking sequences of
alphanumeric characters), and special ones or symbols (sequences of
non alphanumeric characters, such as ++). Once an identifier is
recognized, it is compared with the list of declared keywords: if
found in the list, a Kwd token is emitted, otherwise an Ident token is
returned.

Since the sequence ;; is recognized as two single characters by
next_token (same treatment as for parens, brackets or commas). But
the rule for single non-alphanumeric characters is that they must have
been declared as keywords, or it is a lexical error. Since ; has not
been declared as a keyword of the lexer, an error occurs.

Now, if you want to deal with a token ``;;'', you may:
1) Declare ";" as a keyword, then interpret to successive `;' tokens as a
";;" in your grammar rule.
2) Adapt the genlex module to your specific needs. I strongly
recommend this solution, in particular if you want to understand
stream parsing, or if you will use the lexer in an intensive way.

Best regards,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis