English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
mboxlib reloaded ;-)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2007-09-24 (18:22)
From: Bruno De Fraine <Bruno.De.Fraine@v...>
Subject: ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)]

On 28 Apr 2007, at 01:12, Oliver Bandel wrote:
> So, I then checked my mboxlib and saw that it is quite slow,
> compared to what I expected ( expect! I did not tried it
> on my development machine because I have nomutt installed there)
> and even if native-code smuch faster, it's nevertheless slow...
> ...so I thought I have to redesign my scanner-stage.
> (I use Str-module and ocamnllex mixed together; maybe
>  using a plain selfwritten  OCaml-scanner might be better here).

I don't know if Oliver ever got to the bottom of this speed problem,  
but, I also noticed ocamllex can be quite slow for simple scanning.  
For example, I used this ocamllex source:

{ }
rule translate = parse
| "current_directory" { print_endline (Sys.getcwd ()); translate  
lexbuf }
| _ { translate lexbuf }
| eof { () }
     for i = 1 to (Array.length Sys.argv - 1); do
         translate (Lexing.from_channel (open_in Sys.argv.(i)))
     done ;;

And compared it against this version using the Str module:

let re = Str.regexp_string "current_directory" ;;
for i = 1 to (Array.length Sys.argv - 1); do
     let ch = open_in Sys.argv.(i) in
         while true; do
             let line = input_line ch in
                 let _ = Str.search_forward re line 0 in
                 print_endline (Sys.getcwd ())
             with Not_found -> ()
     with End_of_file -> close_in ch
done ;;

Neither version does anything useful, except print the current  
directory when it encounters the string "current_directory". I tested  
this on a 57M text file (that has only a few "current_directory"  
occurrences). The ocamllex-version takes about 3.5s, while the Str- 
version takes only 0.35s. What causes this difference? Perhaps there  
is a high overhead in calling the translate function for every input  
character in such big input files, but I don't know how this can be