Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
programmer-friendly regexp package
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: David Monniaux <David.Monniaux@e...>
Subject: programmer-friendly regexp package
Heya all camlers,

I have an alpha version of a CamlP4 preprocessing package enabling the use
of a straightforward, easy syntax for regexps:

(* Defines and compiles the regexp; substrings to be extracted are
   labelled "foo" and "bar" *)
let regexp = let k = "zzz" in (* notice the support for embedded
                                   variables and expressions *)
  RE { "X" (foo: (['0'-'9']+))? bar:(['a'-'z']+) "Y" k };;

(* Now the actual pattern matching *)
REmatch "+++X45aaabYzzz" with
  regexp as contents ~bar ~foo ->
    Printf.printf "str=%s bar=%s foo=%s\n" contents bar
      (match foo with Some x -> x | None -> "<NONE>")
| _ -> Printf.printf "Not found!\n";;

It uses the "Str" library, but it may come to support PCRE as well.

It requires OCaml and CamlP4 3.00 and makes use of the labels (as a side
effect, when using the traditional mode, the user must supply the various
labels in the "as" clause in ASCII ordering).

No documentation is available yet. The syntax for the regexps is supposed
to be compatible with ocamllex. There are currently pitfalls in the
handling of character sets.

I specifically request comments on how to make this hack better, esp. with
syntactic choices. I plan to support the predefinition of regexp parts; it
is not yet decided whether the nesting will be preprocessing-time or
execution-time, or both.


David Monniaux  
Laboratoire d'informatique de l'École Normale Supérieure,
Paris, France