Browse thread
features of PCRE-OCaml
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2000-12-08 (09:23) |
From: | Alain Frisch <frisch@c...> |
Subject: | Re: features of PCRE-OCaml |
On Fri, 8 Dec 2000, John Max Skaller wrote: > [Ocaml lex cannot support large enough tables for matching > ISO-10646 identifiers, when encoded using UTF-8. This is a real pain, > since all my languages specify UTF-8 encoded ISO-10646: I have to > cheat, and assume 'almost everything' is a suitable character to > put in an identifier, and then check it afterwards. This makes it > hard to use use special symbols as tokens. I'm not sure why > this is, but I guess it doesn't eliminate duplicate columns?] Have a look at wlex: http://www.eleves.ens.fr:8080/home/frisch/soft http://www.eleves.ens.fr:8080/home/frisch/info/wlex-20001006.tar.gz << This package consists of a lexer generator and the associated runtime system. The new lexing model adds a "classification" layer between the lexbuf and the lexer itself. This layer classifies characters from the lexbuf into a few number of classes, on which the regexps in the lexer specification are built. This reduces the number of states and transitions in the automaton, especially when working with large encodings such as UTF-8 (the primary motivation for wlex). >> The development release of pxp may use wlex (same lexer for different encodings: UTF-8, Latin-1). wlex is distributed as a patch to ocamllex. -- Alain Frisch