hacks using camlp4

From: David Monniaux (David.Monniaux@ens-lyon.fr)
Date: Wed Nov 26 1997 - 19:00:58 MET

Date: Wed, 26 Nov 1997 19:00:58 +0100 (MET)
From: David Monniaux <David.Monniaux@ens-lyon.fr>
To: Caml-list <caml-list@inria.fr>
Subject: hacks using camlp4

[ J'ai voulu mettre en place des expressions regulieres avec une syntaxe
plus agreable et bien typees, en utilisant camlp4. J'ai quelques
resultats mais aussi quelques problemes... ]


I lately checked the camlp4 preprocessor. I think this tool may have lots
of useful applications, since it allows especially custom syntaxes for the
input of certain kind of objects, in a more programmer-friendly fashion
than just inputting raw data structures into the source code.

This also lessens the pressure on putting library-dependent features into
the language itself, as was done for the lists, strings and more
stringently with the format strings for the Printf.*printf functions.

I thus tried to give a type-safe interface to regular expressions. My
small hack (available on request) adds a syntax to do matching on a
regular expression and return components in a friendly and type-safe way,
and also a replace function.


let x = "meuh je (suis) un [beau] chaton {suis}" =~ item
  => (String.uppercase)
  /{item: ['(' '['] ['a'-'z']+ [']' ')']}/
in Printf.printf "%s\n" x;;

will uppercase the substrings enclosed in () or [].

To give Perl-like efficiency, I precompile regexps (let-bindings for
precompiled regexps are prepended to the output); that allows efficient
matching even in the middle of a loop. However, this precludes using some
variable parts in the regexps (I for now only allow constant regexps); the
ideal solution would be to compile the regexps at a step in the program
where all the involved values can be computed.

This alas involves some semantic analysis.

I therefore have three problems:

1. Precompiled expressions.
   More generally, what would be needed would be some construction to
   evaluate an expression as soon as it is possible.

2. It would be nice if regexp precompilation could be done at compile or
   preprocessing time (I was thinking of marshalling the precompiled
   regexp, but I fear some C-library private data structure inside the
   regexp type).

3. The stupid emacs regexp syntax (my code is for now not really
   working if you put ^ or ] in ranges).

The third point is only a matter of library, but the others deserve some
thinking, I should say...

This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:13 MET