Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] [ann] Regexp library supporting binding for * and +'s
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2004-09-20 (06:54)
From: Yutaka OIWA <oiwa@y...>
Subject: Re: [Caml-list] [ann] Regexp library supporting binding for * and +'s
>> On 20 Sep 2004 10:38:33 +1000, skaller <> said:

skaller> On Mon, 2004-09-20 at 06:41, Yutaka OIWA wrote:
>> I plan to construct a neat syntax sugar over this library 
>> and build a next-generation version of Regexp/OCaml library.
>> Any comments are welcome.

skaller> Can you explain why/how Pcre is being used?

The reason is simply current implemenentation convenience.
It is stable, has enough features (e.g. unlimited number of captures,
non-capturing groups, much of helper functions and runtime features,
and is well-performing. My intension is not to implement automata engine
by myself, at least in near future.

However, as you can see in README in Regexp-OCaml (main version), my
future plan includes supporting backends other than PCRE/OCaml.
Having its own regexp parser and limiting regexp syntax to strict
regular language are the provision for possible future.
At the time of OCaml 3.07 released, I really considered to support
the standard Str module, but unfortunately current Str lacks some of
the features required by current Regexp/OCaml implementation.
Anyway, backend is backend. And also, frontend is frontend. Period.
It can be highly independent once it designed so, and my interests
are mainly in the frontend part. I highly appreciete supports from 
people working on the backend part.

Multilingualization is one in current high-priority to-do list.
At least one of the users requested me to support EUC-JP patterns,
and you might be the second person :-)
I am considering how to support M17N feature: it may depends to
underlying backends (e.g. Camomile?), or it may be supported solely in the
frontend layer, by encoding multibyte handling into regexps.
This trick is used in the Japanese port of Perl interpreter on MS-DOS,
and (at least) one of Japanese handling module for Perl5.
# As you can imagine, just using M17N feature of underlying library is
# not sufficient: internal regexp parser must also modified to accept
# multibyte-encoded regular expression. This is one of the reason that 
# curent Regexp/OCaml does not support UTF8 option of PCRE/OCaml.

For supporting list-binding of Kleene-stars, I am very interested in
richer backends which supports such features.  Alain Frisch's recent 
posting has interested me.  There is also a talk with related title in
ICFP04, although I had not yet read the paper.
However, I feel at the same time that backend is not a current show-stopper:
it is truly better to have such backends, but it can be emulated without that,
As I had shown in the combinators.  I can wait for a while for
theretical/practical progresses. Current problem is mainly the frontend:
there are many language-design problems once we introduce nested bindings.
I already had a discussion with some people in ICFP04, and I hope more.

Yutaka Oiwa              Yonezawa Lab., Dept. of Computer Science,
      Graduate School of Information Sci. & Tech., Univ. of Tokyo.
                    <>, <>
PGP fingerprint = C9 8D 5C B8 86 ED D8 07  EA 59 34 D8 F4 65 53 61

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: