Version française
Home     About     Download     Resources     Contact us    
Browse thread
substring match like "strstr"
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Chris Hecker <checker@d...>
Subject: Re: substring match like "strstr"

>  let i_limit = strlen - patlen in

Ah, duh.  Good idea.

Here are some new timings, including the _imp2 (I fixed a minor fencepost bug in yours in an admittedly cheesy way; the code's below):

Under Win2k cmd.exe:
strstr total = 0.610000 (5,0,19,13)
strstr_imp total = 0.593000 (5,0,19,13)
strstr_imp2 total = 0.563000 (5,0,19,13)
strstr_c total = 0.469000 (5,0,19,13)

Under Win98 command.com:
strstr total = 1.460000 (5,0,19,13)
strstr_imp total = 1.020000 (5,0,19,13)
strstr_imp2 total = 0.955000 (5,0,19,13)
strstr_c total = 0.800000 (5,0,19,13)

Under Win98 emacs shell-command:
strstr total = 1.125000 (5,0,19,13)
strstr_imp total = 1.025000 (5,0,19,13)
strstr_imp2 total = 0.960000 (5,0,19,13)
strstr_c total = 0.800000 (5,0,19,13)

I doubled the number of iterations to try to figure out the emacs/command.com difference, but to no avail.

However, then I decided to check to see if this was actually the strstr in the library, so instead of calling that c code, I called string.h's strstr.  I discovered that there's actually an intel-asm version of strstr that's compiled into the crt:

strstr total = 1.130000 (5,0,19,13)
strstr_imp total = 1.020000 (5,0,19,13)
strstr_imp2 total = 0.960000 (5,0,19,13)
strstr_c total = 0.365000 (5,0,19,13)

Ouch!  We're back to 3x slower.  But hey, no one said ocaml could compete with asm.  :)

Anyway, as it stands, the ocaml code is 17% slower than the C code at relatively comparable code-simplicity.  I'd say the ocaml code is a bit harder to understand and wordier, but that's mostly because ocaml doesn't have modifyable pointers, and the C code just walks the pointers through the array.  Also, the C has an early-out return that ocaml can't do (without an exception), and that makes things a little simpler as well.

Of course, I'm not saying you can extrapolate anything about real programs from this simple example, but it's interesting to me nonetheless.

Chris

Here's the updated strstr_imp2.  If you really wanted to try to max out the ocaml, we could read the pattern into an int and try to match it a word at a time, etc.

(* "optimized" imperative strstr *)
let strstr_imp2 pat str = 
  let strlen = String.length str in
  let patlen = String.length pat and
      i = ref 0 and
      f = ref strlen in
  let limit = strlen - patlen + 1 in
  while !i < limit do
    let pati = ref 0 and
        j = ref !i in
    while !pati < patlen && 
      String.unsafe_get str !j == String.unsafe_get pat !pati do
      pati := succ !pati;
      j := succ !j;
    done;
    if !pati = patlen then
      begin f := !i; i := strlen end
    else
      i := succ !i
  done;
  !f