Browse thread
mboxlib reloaded ;-)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Bruno De Fraine <Bruno.De.Fraine@v...> |
| Subject: | ocamllex speed [was Re: [Caml-list] mboxlib reloaded ;-)] |
Hello,
On 28 Apr 2007, at 01:12, Oliver Bandel wrote:
> So, I then checked my mboxlib and saw that it is quite slow,
> compared to what I expected ( expect! I did not tried it
> on my development machine because I have nomutt installed there)
> and even if native-code smuch faster, it's nevertheless slow...
> ...so I thought I have to redesign my scanner-stage.
> (I use Str-module and ocamnllex mixed together; maybe
> using a plain selfwritten OCaml-scanner might be better here).
I don't know if Oliver ever got to the bottom of this speed problem,
but, I also noticed ocamllex can be quite slow for simple scanning.
For example, I used this ocamllex source:
{ }
rule translate = parse
| "current_directory" { print_endline (Sys.getcwd ()); translate
lexbuf }
| _ { translate lexbuf }
| eof { () }
{
for i = 1 to (Array.length Sys.argv - 1); do
translate (Lexing.from_channel (open_in Sys.argv.(i)))
done ;;
}
And compared it against this version using the Str module:
let re = Str.regexp_string "current_directory" ;;
for i = 1 to (Array.length Sys.argv - 1); do
let ch = open_in Sys.argv.(i) in
try
while true; do
let line = input_line ch in
try
let _ = Str.search_forward re line 0 in
print_endline (Sys.getcwd ())
with Not_found -> ()
done
with End_of_file -> close_in ch
done ;;
Neither version does anything useful, except print the current
directory when it encounters the string "current_directory". I tested
this on a 57M text file (that has only a few "current_directory"
occurrences). The ocamllex-version takes about 3.5s, while the Str-
version takes only 0.35s. What causes this difference? Perhaps there
is a high overhead in calling the translate function for every input
character in such big input files, but I don't know how this can be
avoided.
Thanks,
Bruno