Announcement: PCRE-library for OCaml

From: Markus Mottl (mottl@miss.wu-wien.ac.at)
Date: Wed Jul 28 1999 - 22:02:12 MET DST


From: Markus Mottl <mottl@miss.wu-wien.ac.at>
Message-Id: <199907281902.VAA20083@miss.wu-wien.ac.at>
Subject: Announcement: PCRE-library for OCaml
To: caml-list@inria.fr (OCAML)
Date: Wed, 28 Jul 1999 21:02:12 +0100 (MET DST)

Hello all,

I have just released an interface library to the PCRE (Perl-Compatibility
regular expressions) library for OCaml. The original PCRE-library is
written in C.

Here some highlights of this distribution:

  * The PCRE-library by Philip Hazel has been under development for
    about two years now and is fairly advanced and stable. It implements
    just about all of the convenient functionality of regular expressions
    as you can find them in PERL.

  * In contrast to PERL, the library creates DFAs (deterministic finite
    automata) instead of NFAs (nondeterministic finite automata). DFAs
    generally allow much faster pattern matching, because they need
    not do backtracking on most patterns. Especially patterns with many
    alternations can see a great speedup.

  * It is reentrant - and thus thread safe. This is not the case with
    the "Str"-module of OCaml, which builds on the GNU "regex"-library.
    Using reentrant libraries also means more convenience for the
    programmer. He does not have to reason about states in which the
    library might be in.

  * The higher level functions like substitution and splitting, they are
    all implemented in OCaml, are much faster than the ones of the
    "Str"-module. In fact, when compiled to native code, they even seem
    to be significantly faster than those of PERL (PERL is written in C)
    (possibly around 20-30% in average).

    Testing performance is a tricky business, but in some quick examples
    the speedup over PERL was somewhere between 20 and 100% for pattern
    substitution (including the speedup due to faster pattern matching).

  * You can also rely on the data returned being unique. In other terms:
    if the result of a function is a string, you can safely use
    destructive updates on it without having to fear side effects.

  * The library also implements all of the functionality you can find
    in the Str-module. This should make it easier to switch. The
    compatibility functions have just one "incompatibility": they do
    not crash with a stack overflow if you try to use patterns that can
    match the null-string (eg. " *" - may match everywhere).

For downloading and more details look at:

  http://miss.wu-wien.ac.at/~mottl/ocaml_sources/intro.html

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl



This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:24 MET