Version française
Home     About     Download     Resources     Contact us    
Browse thread
substring match like "strstr"
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: eijiro_sumii@a...
Subject: Re: substring match like "strstr"
> You can write it yourself:
...
> If speed is important, I recommend 
> 
> let find_substring s1 s2 =
>   Str.search_forward (Str.regexp (Str.quote s2)) s1 0
> ;;

OK, I benchmarked the following 4 implementations for the purpose of
comparison.

  strstr
    stub to call "strstr" in libc

  regexp
    combination of "Str.regexp_string" and "Str.search_forward"

  OCamlA
    the simple implementation in OCaml given in the previous message,
    a little tuned for more efficiency

  OCamlB
    another straightforward implementation in OCaml, attached at the
    end of this message

The results (execution time in seconds) were as follows.

  strstr   55.74
  regexp  154.37
  OCamlA  302.57
  OCamlB  129.23

The application is a kind of library for data mining of gene
sequences.  It calls the substring matching function more than
400,000,000 times.  The machine is a Sun UltraEnterprise 450 with 4
UltraSPARC II (480 MHz) processors and 2GB main memory, but the
program itself doesn't make any use of multiple processors or much
space.

Although I haven't checked the reasons for these results in detail,
this might hopefully be of some information for other people, too.

And probably I should also consider implementing more sophisticated,
efficient algorithm in OCaml...

// Eijiro Sumii (http://www.yl.is.s.u-tokyo.ac.jp/~sumii/)
// 
// Ph.D. Student at Department of IS, University of Tokyo
// Visiting Scholar at Department of CIS, University of Pennsylvania

----------------------------------------------------------------------
let strstr pat str =
  let patlen = String.length pat in
  let strlen = String.length str in
  let rec is_prefix pos spos epos =
    if pos < 0 then true else
    if String.unsafe_get pat pos <>
      String.unsafe_get str epos then false else
    is_prefix (pos - 1) spos (epos - 1) in
  let rec search spos =
    let epos = spos + patlen - 1 in
    if epos >= strlen then raise Not_found else
    if is_prefix (patlen - 1) spos epos then spos else
    search (spos + 1) in
  search 0
----------------------------------------------------------------------