English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
features of PCRE-OCaml
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2000-12-08 (09:23)
From: Alain Frisch <frisch@c...>
Subject: Re: features of PCRE-OCaml
On Fri, 8 Dec 2000, John Max Skaller wrote:

> 	[Ocaml lex cannot support large enough tables for matching
> ISO-10646 identifiers, when encoded using UTF-8. This is a real pain,
> since all my languages specify UTF-8 encoded ISO-10646: I have to 
> cheat, and assume 'almost everything' is a suitable character to
> put in an identifier, and then check it afterwards. This makes it
> hard to use use special symbols as tokens. I'm not sure why
> this is, but I guess it doesn't eliminate duplicate columns?]

Have a look at wlex:

<< This package consists of a lexer generator and the associated runtime
system. The new lexing model adds a "classification" layer between the
lexbuf and the lexer itself. This layer classifies characters from the
lexbuf into a few number of classes, on which the regexps in the lexer
specification are built. 

 This reduces the number of states and transitions in the automaton,
especially when working with large encodings such as UTF-8 (the primary
motivation for wlex).  >>

The development release of pxp may use wlex (same lexer for different
encodings: UTF-8, Latin-1).

wlex is distributed as a patch to ocamllex.

  Alain Frisch