Browse thread
[Caml-list] Regular expression library: a poll on features
-
Xavier Leroy
- Pixel
-
Xavier Leroy
- Matt Armstrong
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Matt Armstrong <matt@l...> |
| Subject: | [Caml-list] Re: Regular expression library: a poll on features |
Xavier Leroy <xavier.leroy@inria.fr> writes: [...] > Feature 2: partial string matching as per Str.string_partial_match, i.e. > the ability to recognize that a string is a prefix of a string that > match a regexp. > > has already used 0 > could use in some cases 6 > no use 8 [...] > Feature 2 is unusual and I haven't heard from anyone that uses it > :-) I got two replies suggesting one plausible scenario where > partial matching could come handy: find delimiters in a piece of > text that is being read block by block. However, I'm not sure > Str.string_partial_match is adequate here, it looks like a "search > forward for a partial match" operation is needed, which Str doesn't > provide... This is how a MIME message parser I wrote worked (written in a scripting language that made byte-by-byte string comparisons more costly than regexps). The parser read in the message chunk by chunk. I had a list of regexps representing the current set of MIME boundaries, and I was interested if the last N bytes of the current chunk ended with a (possibly partial) match of each regexp. If there was a match and it wasn't complete, you have to deal with a MIME boundary that might cross a chunk boundary. > It was also suggested to me that the effect of partial matching > against a regexp R can be achieved by exact matching against a > regexp R' derived from R. This is true for "textbook regexps", > e.g. if R is "ab*c", then R' would be > "epsilon|a(epsilon|b*(epsilon|c))", but doesn't work for more > complex regexps languages, especially if back-references are > supported. (Consider R = "(a+)\1".) And in the MIME parser, this is what I did -- since the regexps were simple. In Ocaml, I'm not sure I would use regexps for this at all since (I assume) comparing strings "by hand" would be fast. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners