Re: hacks using camlp4

From: Michel Mauny (
Date: Sun Nov 30 1997 - 10:59:11 MET

Date: Sun, 30 Nov 1997 10:59:11 +0100
From: Michel Mauny <>
To: David Monniaux <>
Subject: Re: hacks using camlp4
In-Reply-To: <>; from Daniel de Rauglaudre on Thu, Nov 27, 1997 at 08:45:55PM +0100

David Monniaux wrote:

> > I lately checked the camlp4 preprocessor. I think this tool may have lots
> > of useful applications, since it allows especially custom syntaxes for the
> > input of certain kind of objects, in a more programmer-friendly fashion
> > than just inputting raw data structures into the source code.

Yes, indeed. One of Camlp4 applications we had in mind are to provide
dedicated syntactic support for the use of some libraries. We are (at
least, I am) eager to see non trivial examples, and regular
expressions could be one of these.

> > I therefore have three problems:
> >
> > 1. Precompiled expressions.
> > More generally, what would be needed would be some construction to
> > evaluate an expression as soon as it is possible.
> >
> > 2. It would be nice if regexp precompilation could be done at compile or
> > preprocessing time (I was thinking of marshalling the precompiled
> > regexp, but I fear some C-library private data structure inside the
> > regexp type).

Daniel de Rauglaudre answered:

> A solution is partial evaluation.

Another one (suggested by Xavier Leroy) is changing the regexp
compiler into a memo-function. Encapsulate the compiler together with
a private memory (assoc list, hash table) that can be used to return
immediately the result of already (that is, previously) compiled
regexprs. (Maybe that one could be directly provided by the Str
library as a `memo_regexp' function?)

More generally, Camlp4 is only a front-end to the OCaml compiler, not
part of it. It can therefore only perform source-level
transformations. If it was possible to express regexp compilation as a
source-to-source transformation, then a general partial evaluation
would definitely solve your problem #2. Unfortunately, the regexp
library is a black box, and can only be used as such, as far as I
understand. If the regexp compiler had some known properties (such as
compositionality), then something could also be done. Unfortunately,
this isn't the case either.

Coming to your point #1, where the question is pretty general, this
can be dealt with partial evaluation (which is a pretty general answer
to pretty general questions :-), as suggested by Daniel. At a compiler
level, classical compile-time optimizations (such as extraction of
loop invariants) or less classical ones (how to obtain full laziness
in lazy functional languages by extracting maximal subexpressions)
seem to address that question. Unfortunately, they are less effective
in languages such as Caml (because of side-effects) than in pure FL,
if semantic preservation is an issue (and it is, generally).

As a conclusion, for your application, Daniel's solution could be used
to globalize non-variable regexprs, and a `memo' interface to the
regexp compiler could do the rest.


Michel Mauny
INRIA  --  BP 105 --  F-78153 Le Chesnay Cedex
Tel.: +33 1 39 63 57 96 Fax: +33 1 39 63 56 84

This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:13 MET