Browse thread
mboxlib reloaded ;-)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2007-09-24 (18:22) |
From: | Bruno De Fraine <Bruno.De.Fraine@v...> |
Subject: | ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)] |
Hello, On 28 Apr 2007, at 01:12, Oliver Bandel wrote: > So, I then checked my mboxlib and saw that it is quite slow, > compared to what I expected ( expect! I did not tried it > on my development machine because I have nomutt installed there) > and even if native-code smuch faster, it's nevertheless slow... > ...so I thought I have to redesign my scanner-stage. > (I use Str-module and ocamnllex mixed together; maybe > using a plain selfwritten OCaml-scanner might be better here). I don't know if Oliver ever got to the bottom of this speed problem, but, I also noticed ocamllex can be quite slow for simple scanning. For example, I used this ocamllex source: { } rule translate = parse | "current_directory" { print_endline (Sys.getcwd ()); translate lexbuf } | _ { translate lexbuf } | eof { () } { for i = 1 to (Array.length Sys.argv - 1); do translate (Lexing.from_channel (open_in Sys.argv.(i))) done ;; } And compared it against this version using the Str module: let re = Str.regexp_string "current_directory" ;; for i = 1 to (Array.length Sys.argv - 1); do let ch = open_in Sys.argv.(i) in try while true; do let line = input_line ch in try let _ = Str.search_forward re line 0 in print_endline (Sys.getcwd ()) with Not_found -> () done with End_of_file -> close_in ch done ;; Neither version does anything useful, except print the current directory when it encounters the string "current_directory". I tested this on a 57M text file (that has only a few "current_directory" occurrences). The ocamllex-version takes about 3.5s, while the Str- version takes only 0.35s. What causes this difference? Perhaps there is a high overhead in calling the translate function for every input character in such big input files, but I don't know how this can be avoided. Thanks, Bruno