Browse thread
zcat vs CamlZip
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2006-08-29 (18:40) |
From: | Sam Steingold <sds@p...> |
Subject: | zcat vs CamlZip |
I read through a huge *.gz file. I have two versions of the code: 1. use Unix.open_process_in "zcat foo.gz". 2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09. it turns out that the zcat version is 3(!) times as fast as the gzip.mli one: Run time: 189.435840 sec Self: 189.435840 sec sys: 183.447465 sec user: 5.988375 sec Children: 0.000000 sec sys: 0.000000 sec user: 0.000000 sec GC: minor: 169778 major: 478 compactions: 3 Allocated: 5510457762.0 words Wall clock: 206 sec (00:03:26) vs Run time: 58.471655 sec Self: 54.855429 sec sys: 48.527033 sec user: 6.328396 sec Children: 3.616226 sec sys: 3.168198 sec user: 0.448028 sec GC: minor: 43174 major: 229 compactions: 5 Allocated: 1401290543.0 words Wall clock: 78 sec (00:01:18) since gzip.mli lacks input_line function, I had to roll my own: let buf = Buffer.create 1024 let gz_input_line gz_in char_counter line_counter = Buffer.clear buf; let finish () = incr line_counter; Buffer.contents buf in let rec loop () = let ch = Gzip.input_char gz_in in char_counter := Int64.succ !char_counter; if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop (); ) in try loop () with End_of_file -> if Buffer.length buf = 0 then raise End_of_file else finish () is there something wrong with my gz_input_line? is this a know performance issue with the CamlZip library? thanks. Sam.