Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] ANNOUNCE: mod_caml 1.0.6 - includes security patch
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Yutaka OIWA <oiwa@y...>
Subject: Re: [Caml-list] ANNOUNCE: mod_caml 1.0.6 - includes security patch
Hello.

>> On Fri, 16 Jan 2004 09:34:54 +0000, Richard Jones <rich@annexia.org> said:

Richard> Being able to write:

Richard> var ~ /ab+/

Richard> and similar certainly makes string handling and simple parsing a lot
Richard> easier.

>> On Fri, 16 Jan 2004 13:05:15 -0600 (CST), Brian Hurt <bhurt@spnz.org> said:

Brian> What I'd like to see is to be able to pattern match on regexs, like:

Brian> match str with
Brian> 	| /ab+/ -> ...
Brian> 	| /foo(bar)*/ -> ...

Brian> etc.

My camlp4-macro named Regexp/OCaml may solve most of the requests:
try it from http://www.yl.is.s.u-tokyo.ac.jp/~oiwa/caml/ .

Using Regexp/OCaml, you can write the code like

    Regexp.match str with
      "^(\d+)-(\d+)$" as f : int, t : int ->
        for i = f to t do
          printf "%d\n" i
        done
    | "^(\d+)$" as s : int ->
        printf "%d\n" s

to perform branch based on multiple regular patterns and to extract
matched substrings automatically (bound to f, t, s respectively, after
converted to int type by using int_of_string).  See 
http://www.yl.is.s.u-tokyo.ac.jp/~oiwa/pub/caml/regexp-pp-0.9.3/README.match-regexp
for further details.


Brian> The compiler could then combine all the matchings into a single DFA, 
Brian> improving performance over code like:

Brian> if (regex_match str "ab+") then
Brian>     ...
Brian> else if (regex_match str "foo(bar)*") then
Brian>     ...
Brian> else 
Brian>     ...

The code generated by current Regexp/OCaml is something similar to the
above, (however, pattern compilations are performed only once per
execution per each pattern.) but if the backend regexp engine
(currently Regexp/OCaml uses PCRE/OCaml) supports optimization for
multiple regular expression matching, Regexp/OCaml can easily
utilize it.  Analysis for patterns may be performed at compilation
(camlp4-translation) phase, if required.

Brian> The regex matching would also let the compiler know if there were possible 
Brian> unmatched strings (these would should up as transitions to the error state 
Brian> in the DFA).

This feature is not currently implemented in Regexp/OCaml, but
as the macro package owns self-implemented parser for regular
patterns, it is possible to implement if I have enough time to do.
(And it is included in my personal to-do list for Regexp/OCaml.)

-- 
Yutaka Oiwa              Yonezawa Lab., Dept. of Computer Science,
      Graduate School of Information Sci. & Tech., Univ. of Tokyo.
      <oiwa@yl.is.s.u-tokyo.ac.jp>, <yutaka@oiwa.shibuya.tokyo.jp>
PGP fingerprint = C9 8D 5C B8 86 ED D8 07  EA 59 34 D8 F4 65 53 61

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners