Browse thread
zcat vs CamlZip
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2006-08-30 (06:03) |
From: | Jeff Henrikson <jehenrik@y...> |
Subject: | Re: [Caml-list] zcat vs CamlZip |
I was planning on using the library "ocaml gz" in my application, which is a binding to zlib. I haven't done any detailed benchmarking, but I presume its speed is comparable to gzip/gunzip since they just call out to zlib. http://ocamlplot.sourceforge.net/ Jeff Henrikson Sam Steingold wrote: > I read through a huge *.gz file. > I have two versions of the code: > > 1. use Unix.open_process_in "zcat foo.gz". > > 2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09. > > it turns out that the zcat version is 3(!) times as fast as the > gzip.mli one: > > Run time: 189.435840 sec > Self: 189.435840 sec > sys: 183.447465 sec > user: 5.988375 sec > Children: 0.000000 sec > sys: 0.000000 sec > user: 0.000000 sec > GC: minor: 169778 > major: 478 > compactions: 3 > Allocated: 5510457762.0 words > Wall clock: 206 sec (00:03:26) > > vs > > Run time: 58.471655 sec > Self: 54.855429 sec > sys: 48.527033 sec > user: 6.328396 sec > Children: 3.616226 sec > sys: 3.168198 sec > user: 0.448028 sec > GC: minor: 43174 > major: 229 > compactions: 5 > Allocated: 1401290543.0 words > Wall clock: 78 sec (00:01:18) > > since gzip.mli lacks input_line function, I had to roll my own: > > let buf = Buffer.create 1024 > let gz_input_line gz_in char_counter line_counter = > Buffer.clear buf; > let finish () = incr line_counter; Buffer.contents buf in > let rec loop () = > let ch = Gzip.input_char gz_in in > char_counter := Int64.succ !char_counter; > if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop > (); ) in > try loop () > with End_of_file -> > if Buffer.length buf = 0 then raise End_of_file else finish () > > is there something wrong with my gz_input_line? > is this a know performance issue with the CamlZip library? > > thanks. > Sam. > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs