Version française
Home     About     Download     Resources     Contact us    
Browse thread
Re: Genlex
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Pierre Weis <Pierre.Weis@i...>
Subject: Re: Genlex
> Hi,
> I was writing a simple Scheme parser in Ocaml and I set up a lexer using
> the Genlex library. It seems that the lexer doesn't support keywords
> with the # character in them, making it hard to recognize #t and #f
> (true and false in Scheme).
[...]
> --Joerg
> -----------------------------------------------------
> Hans-Joerg Tiede
[...]
> www:    http://www.iwu.edu/~htiede
> -----------------------------------------------------

Right. This is because the # is a starter for ``special idents'' made
of symbols only  (here symbols == non alphanumeric chars). You must
change the rule for ident2 to add the possibility to have
alpha-numeric chars after a non-alphanumeric char. For instance:

and ident2 = parser
  | [< '  '!'|'%'|'&'|'$'|'#'|'+'|'-'|'/'|':'|'<'|'='|'>'|'?'|'@'|'\\'|
              '~'|'^'|'|'|'*' as c; s >] ->
      store c; ident2 s
  | [< '  'A'..'Z'|'a'..'z'|'\192'..'\255'|'0'..'9'|'_'|'\'' as c; s>] ->
      store c; ident2 s
  | [< >] ->
      Some(ident_or_keyword(get_string()))

However, to built a lexical analyzer for Scheme, you should rewrite a
lot of the Genlex module, since the tokens recognized by genlex are
far too similar to those of Caml (or Pascal or C or Java) to
accomodate Scheme symbols (for instance int->real is naturally
considered as 3 tokens by Genlex, when it is a regular ident name in
Scheme).

Alternatively, you can consider using Ocamllex to write a conventional
lexer.

Hope this helps,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/