Version française
Home     About     Download     Resources     Contact us    
Browse thread
Thread safe Str
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: skaller <skaller@u...>
Subject: Re: [Caml-list] Thread safe Str
On Tue, 2005-01-11 at 23:08, Gerd Stolpmann wrote:


> Capturing just for the purpose of string extraction is not problematic.

Unfortunately you're wrong. I had in fact hoped this was not
the case, but having read a couple of paper on it,
I now know that, unfortunately, they are. (See below).

Alain Frisch is one of the leaders in this area, perhaps
he can explain it better. You can get Cardelli and Frisch
paper here:

http://felix.sf.net/papers/greedy.pdf

another by Ville Laurikari here:

http://felix.sf.net/papers/spire2000-tnfa.ps

and his Master thesis on the topic here:

http://felix.sf.net/papers/regex-submatch.ps

[and if you can figure out how to actually build
a tagged DFA after reading any of that please let
me know .. ]

Frisch's algorithm used in CDuce works with a forward
pass that ignores captures, but records the automaton states,
and then a backwards pass the extracts the information
(CDuce actually builds trees).

Unfortunately this has a problem: it requires a
bidirectional iterator, whereas a DFA only requires
an input iterator. (NFA's require forward iterators)


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net