English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
substring match like "strstr"
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2000-12-12 (09:32)
From: Chris Hecker <checker@d...>
Subject: Re: substring match like "strstr"

>  let i_limit = strlen - patlen in

Ah, duh.  Good idea.

Here are some new timings, including the _imp2 (I fixed a minor fencepost bug in yours in an admittedly cheesy way; the code's below):

Under Win2k cmd.exe:
strstr total = 0.610000 (5,0,19,13)
strstr_imp total = 0.593000 (5,0,19,13)
strstr_imp2 total = 0.563000 (5,0,19,13)
strstr_c total = 0.469000 (5,0,19,13)

Under Win98 command.com:
strstr total = 1.460000 (5,0,19,13)
strstr_imp total = 1.020000 (5,0,19,13)
strstr_imp2 total = 0.955000 (5,0,19,13)
strstr_c total = 0.800000 (5,0,19,13)

Under Win98 emacs shell-command:
strstr total = 1.125000 (5,0,19,13)
strstr_imp total = 1.025000 (5,0,19,13)
strstr_imp2 total = 0.960000 (5,0,19,13)
strstr_c total = 0.800000 (5,0,19,13)

I doubled the number of iterations to try to figure out the emacs/command.com difference, but to no avail.

However, then I decided to check to see if this was actually the strstr in the library, so instead of calling that c code, I called string.h's strstr.  I discovered that there's actually an intel-asm version of strstr that's compiled into the crt:

strstr total = 1.130000 (5,0,19,13)
strstr_imp total = 1.020000 (5,0,19,13)
strstr_imp2 total = 0.960000 (5,0,19,13)
strstr_c total = 0.365000 (5,0,19,13)

Ouch!  We're back to 3x slower.  But hey, no one said ocaml could compete with asm.  :)

Anyway, as it stands, the ocaml code is 17% slower than the C code at relatively comparable code-simplicity.  I'd say the ocaml code is a bit harder to understand and wordier, but that's mostly because ocaml doesn't have modifyable pointers, and the C code just walks the pointers through the array.  Also, the C has an early-out return that ocaml can't do (without an exception), and that makes things a little simpler as well.

Of course, I'm not saying you can extrapolate anything about real programs from this simple example, but it's interesting to me nonetheless.


Here's the updated strstr_imp2.  If you really wanted to try to max out the ocaml, we could read the pattern into an int and try to match it a word at a time, etc.

(* "optimized" imperative strstr *)
let strstr_imp2 pat str = 
  let strlen = String.length str in
  let patlen = String.length pat and
      i = ref 0 and
      f = ref strlen in
  let limit = strlen - patlen + 1 in
  while !i < limit do
    let pati = ref 0 and
        j = ref !i in
    while !pati < patlen && 
      String.unsafe_get str !j == String.unsafe_get pat !pati do
      pati := succ !pati;
      j := succ !j;
    if !pati = patlen then
      begin f := !i; i := strlen end
      i := succ !i