OCamllex 1.06 patch - adds let bound regexps

From: Christian Lindig (lindig@ips.cs.tu-bs.de)
Date: Thu Dec 04 1997 - 16:27:54 MET

Date: Thu, 4 Dec 1997 16:27:54 +0100 (MET)
Message-Id: <199712041527.QAA09344@infbsst3.ips.cs.tu-bs.de>
From: Christian Lindig <lindig@ips.cs.tu-bs.de>
To: caml-list@inria.fr
Subject: OCamllex 1.06 patch - adds let bound regexps

As an enthusiastic user of OCaml I would like to support the OCaml
project with a small contribution: an ocamllex patch that adds the
ability to bind frequently used regular expressions to names. This is
the feature I missed most in comparison to other lex implementations.

You can find the patch against OCaml 1.06 at our ftp server:


It includes examples and updated documentation. The README which
explains the patch in more detail can be found there, too, and at the
end of this mail.

Best regards,


 Christian Lindig lindig@ips.cs.tu-bs.de
 TU Braunschweig fon +49 531 391 7465
 Institut fuer Programmiersprachen fax +49 531 391 8140
 D-38106 Braunschweig http://www.cs.tu-bs.de

Ocamllex from OCaml 1.06 does not allow to bind frequently used
regular expressions to names. This feature is provided by many other
lex implementations and permits to write easier to maintain lex
specification. This text describes a patch which adds a "let" construct
to ocamllex which just adds this feature.

After the header and before the entry points of an ocamllex
specification "let" can be used to bind regular expressions to names:

          (* header *)
        let whitespace = [' ' '\t']
        let digit = ['0'-'9']
        let digits = ['0'-'9']+
        let lowercase = ['a'-'z']
        let uppercase = ['A'-'Z']
        let ident = (uppercase|lowercase) (uppercase|lowercase|digit)*
          rule token = parse
          whitespace { token lexbuf } (* skip blanks *)
        | ['\n' ] { EOL }
        | digits { INT(int_of_string(Lexing.lexeme lexbuf)) }
        | '+' { PLUS }
        | '-' { MINUS }
        | '*' { TIMES }
        | '/' { DIV }
        | '(' { LPAREN }
        | ')' { RPAREN }
        | eof { raise Eof }

Let bound symbols can be used to define other symbols and inside
rules. The symbols used to bind regular expressions must be distinct
from any ocamllex keyword like rule or eof und symbols must be defined
before they can be used. Because let bound names are optional all old lex
specifications that do not use let-bindings are accepted by the
new ocamllex. The new feature is thus fully backward compatible.

The documentation of ocamllex and ocamlyac has been updated to reflect
the new feature. However, because the original documentation was not
available as source only an ascii text 'lex.doc' is provided. To find
the differences to the original documentation 'lex.doc.orig' is also

As a larger example the lex specification of ocamllex has been edited
to use the new feature. It can be used to boostrap the new ocamllex
and is provided as 'example.mll'

The implementation of the let construct is straight forward: a
preprocessing step builds a symbol table and replaces all symbols with
the regular expressions they denote. The then symbol free regular
expressions are passed to the existing machine. A better
implementation what have used the bound names to avoid constructing
certain automatons multiple times as it is done now.

Christian Lindig
Institut f\"ur Programmiersprachen
und Informationssysteme
Abteilung Softwaretechnologie
TU Braunschweig
D-38106 Braunschweig


This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:13 MET