[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | eijiro_sumii@a... |
| Subject: | Re: substring match like "strstr" |
> You can write it yourself:
...
> If speed is important, I recommend
>
> let find_substring s1 s2 =
> Str.search_forward (Str.regexp (Str.quote s2)) s1 0
> ;;
OK, I benchmarked the following 4 implementations for the purpose of
comparison.
strstr
stub to call "strstr" in libc
regexp
combination of "Str.regexp_string" and "Str.search_forward"
OCamlA
the simple implementation in OCaml given in the previous message,
a little tuned for more efficiency
OCamlB
another straightforward implementation in OCaml, attached at the
end of this message
The results (execution time in seconds) were as follows.
strstr 55.74
regexp 154.37
OCamlA 302.57
OCamlB 129.23
The application is a kind of library for data mining of gene
sequences. It calls the substring matching function more than
400,000,000 times. The machine is a Sun UltraEnterprise 450 with 4
UltraSPARC II (480 MHz) processors and 2GB main memory, but the
program itself doesn't make any use of multiple processors or much
space.
Although I haven't checked the reasons for these results in detail,
this might hopefully be of some information for other people, too.
And probably I should also consider implementing more sophisticated,
efficient algorithm in OCaml...
// Eijiro Sumii (http://www.yl.is.s.u-tokyo.ac.jp/~sumii/)
//
// Ph.D. Student at Department of IS, University of Tokyo
// Visiting Scholar at Department of CIS, University of Pennsylvania
----------------------------------------------------------------------
let strstr pat str =
let patlen = String.length pat in
let strlen = String.length str in
let rec is_prefix pos spos epos =
if pos < 0 then true else
if String.unsafe_get pat pos <>
String.unsafe_get str epos then false else
is_prefix (pos - 1) spos (epos - 1) in
let rec search spos =
let epos = spos + patlen - 1 in
if epos >= strlen then raise Not_found else
if is_prefix (patlen - 1) spos epos then spos else
search (spos + 1) in
search 0
----------------------------------------------------------------------