Browse thread
Thread safe Str
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2005-01-11 (17:55) |
From: | skaller <skaller@u...> |
Subject: | Re: [Caml-list] Thread safe Str |
On Tue, 2005-01-11 at 23:08, Gerd Stolpmann wrote: > Capturing just for the purpose of string extraction is not problematic. Unfortunately you're wrong. I had in fact hoped this was not the case, but having read a couple of paper on it, I now know that, unfortunately, they are. (See below). Alain Frisch is one of the leaders in this area, perhaps he can explain it better. You can get Cardelli and Frisch paper here: http://felix.sf.net/papers/greedy.pdf another by Ville Laurikari here: http://felix.sf.net/papers/spire2000-tnfa.ps and his Master thesis on the topic here: http://felix.sf.net/papers/regex-submatch.ps [and if you can figure out how to actually build a tagged DFA after reading any of that please let me know .. ] Frisch's algorithm used in CDuce works with a forward pass that ignores captures, but records the automaton states, and then a backwards pass the extracts the information (CDuce actually builds trees). Unfortunately this has a problem: it requires a bidirectional iterator, whereas a DFA only requires an input iterator. (NFA's require forward iterators) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net