Version française
Home     About     Download     Resources     Contact us    
Browse thread
substring match like "strstr"
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Julian Assange <proff@i...>
Subject: Re: substring match like "strstr"
Gerd Stolpmann <gerd@gerd-stolpmann.de> writes:

> >The results (execution time in seconds) were as follows.
> >
> >  strstr   55.74
> >  regexp  154.37
> >  OCamlA  302.57
> >  OCamlB  129.23
> 
> I did not expect that OCamlB performs so well; so I suggested OcamlA and
> regexp (which are both fast for the bytecode interpreter, too). I think it
> depends also on the problem size (especially on the length of the substring).
> 
> Gerd

If you are really calling strstr 400,000,000 times, I strongly suggest
the use of Boyer-Moore. Is your alphabet amino-acids or base-pairs? If
the latter, I have developed parallel hashing version of Boyer-More
(lets call it Assange-Boyer-More :), which beats the pants off
Knuth-Moris-Pratt. 500 long substrings searches are only about 50%
slower than 1, although for this application, if you have the memory,
you might also want to look at suffix trees or suffix arrays.

--
 Julian Assange        |If you want to build a ship, don't drum up people
                       |together to collect wood or assign them tasks
 proff@iq.org          |and work, but rather teach them to long for the endless
 proff@gnu.ai.mit.edu  |immensity of the sea. -- Antoine de Saint Exupery