Version française
Home     About     Download     Resources     Contact us    
Browse thread
Str.string_match incorrect
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Gerd Stolpmann <info@g...>
Subject: Re: [Caml-list] Str.string_match incorrect
On Mit, 2004-12-22 at 09:00, William Lovas wrote:
> On Tue, Dec 21, 2004 at 11:44:55PM -0800, Evan Martin wrote:
> > This is consistent with the docs, which say:
> >   [string_match r s start] tests whether the characters in s starting at
> >   position start match the regular expression r.
> > and in general with how regular expression systems work.  string_match
> > corresponds to running your automaton directly and seeing whether you
> > end up in an accept state, while string_partial_match effectively adds
> > an extra ".*" to the beginning.
> > (It's more debatable whether this makes sense.)
> 
> I concur with your assessment, but i think you're characterization of the
> semantics of string_partial_match is inaccurate:
> 
>     # Str.string_match (Str.regexp "ab") "a" 0;;
>     - : bool = false
>     # Str.string_partial_match (Str.regexp "ab") "a" 0;;
>     - : bool = true
>     # Str.string_match (Str.regexp ".*ab") "a" 0;;
>     - : bool = false
> 
> ... unless of course, i've misunderstood you.  I don't think there's a
> simple transformation on regular expressions that allow you to emulate
> string_partial_match's behavior using only string_match.  (Does anyone
> care to prove me wrong? :)

The difference between string_match and string_partial_match is quite
simple:

- string_match: whether a prefix of the string matches the regexp

- string_partial_match: whether the string is a prefix of another string
  matching the regexp

On the level of the automaton, string_match is true when the automaton
reaches an accept state whereas string_partial_match is true when the
automaton is in a valid state at the end of the string and there is
still a path to an accept state.

Of course, it is easy to emulate string_partial_match when you already
have the automaton - just make all states accepting that have a path to
an accept state. But there is not a comparably simple rule for the
regular expression. You can do an emulating transformation by generating
the automaton, adding the additional accept states, and transforming the
automaton back to the regexp. There is an algorithm for this, but the
length of the regexp explodes (exponentially when I remember correctly).

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
------------------------------------------------------------