Version française
Home     About     Download     Resources     Contact us    
Browse thread
mboxlib reloaded ;-)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Bruno De Fraine <Bruno.De.Fraine@v...>
Subject: ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)]
Hello,

On 28 Apr 2007, at 01:12, Oliver Bandel wrote:
> So, I then checked my mboxlib and saw that it is quite slow,
> compared to what I expected ( expect! I did not tried it
> on my development machine because I have nomutt installed there)
> and even if native-code smuch faster, it's nevertheless slow...
> ...so I thought I have to redesign my scanner-stage.
> (I use Str-module and ocamnllex mixed together; maybe
>  using a plain selfwritten  OCaml-scanner might be better here).

I don't know if Oliver ever got to the bottom of this speed problem,  
but, I also noticed ocamllex can be quite slow for simple scanning.  
For example, I used this ocamllex source:

{ }
rule translate = parse
| "current_directory" { print_endline (Sys.getcwd ()); translate  
lexbuf }
| _ { translate lexbuf }
| eof { () }
{
     for i = 1 to (Array.length Sys.argv - 1); do
         translate (Lexing.from_channel (open_in Sys.argv.(i)))
     done ;;
}

And compared it against this version using the Str module:

let re = Str.regexp_string "current_directory" ;;
for i = 1 to (Array.length Sys.argv - 1); do
     let ch = open_in Sys.argv.(i) in
     try
         while true; do
             let line = input_line ch in
             try
                 let _ = Str.search_forward re line 0 in
                 print_endline (Sys.getcwd ())
             with Not_found -> ()
         done
     with End_of_file -> close_in ch
done ;;

Neither version does anything useful, except print the current  
directory when it encounters the string "current_directory". I tested  
this on a 57M text file (that has only a few "current_directory"  
occurrences). The ocamllex-version takes about 3.5s, while the Str- 
version takes only 0.35s. What causes this difference? Perhaps there  
is a high overhead in calling the translate function for every input  
character in such big input files, but I don't know how this can be  
avoided.

Thanks,
Bruno