From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199907281902.VAA20083@miss.wu-wien.ac.at>
Subject: Announcement: PCRE-library for OCaml
To: caml-list@inria.fr (OCAML)
Date: Wed, 28 Jul 1999 21:02:12 +0100 (MET DST)
Hello all,
I have just released an interface library to the PCRE (Perl-Compatibility
regular expressions) library for OCaml. The original PCRE-library is
written in C.
Here some highlights of this distribution:
* The PCRE-library by Philip Hazel has been under development for
about two years now and is fairly advanced and stable. It implements
just about all of the convenient functionality of regular expressions
as you can find them in PERL.
* In contrast to PERL, the library creates DFAs (deterministic finite
automata) instead of NFAs (nondeterministic finite automata). DFAs
generally allow much faster pattern matching, because they need
not do backtracking on most patterns. Especially patterns with many
alternations can see a great speedup.
* It is reentrant - and thus thread safe. This is not the case with
the "Str"-module of OCaml, which builds on the GNU "regex"-library.
Using reentrant libraries also means more convenience for the
programmer. He does not have to reason about states in which the
library might be in.
* The higher level functions like substitution and splitting, they are
all implemented in OCaml, are much faster than the ones of the
"Str"-module. In fact, when compiled to native code, they even seem
to be significantly faster than those of PERL (PERL is written in C)
(possibly around 20-30% in average).
Testing performance is a tricky business, but in some quick examples
the speedup over PERL was somewhere between 20 and 100% for pattern
substitution (including the speedup due to faster pattern matching).
* You can also rely on the data returned being unique. In other terms:
if the result of a function is a string, you can safely use
destructive updates on it without having to fear side effects.
* The library also implements all of the functionality you can find
in the Str-module. This should make it easier to switch. The
compatibility functions have just one "incompatibility": they do
not crash with a stack overflow if you try to use patterns that can
match the null-string (eg. " *" - may match everywhere).
For downloading and more details look at:
http://miss.wu-wien.ac.at/~mottl/ocaml_sources/intro.html
Best regards,
Markus Mottl
-- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl
This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:24 MET