Version française
Home     About     Download     Resources     Contact us    
Browse thread
zcat vs CamlZip
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Jeff Henrikson <jehenrik@y...>
Subject: Re: [Caml-list] zcat vs CamlZip
I was planning on using the library "ocaml gz" in my application, which 
is a binding to zlib.  I haven't done any detailed benchmarking, but I 
presume its speed is comparable to gzip/gunzip since they just call out 
to zlib.

http://ocamlplot.sourceforge.net/


Jeff Henrikson



Sam Steingold wrote:

> I read through a huge *.gz file.
> I have two versions of the code:
>
> 1. use Unix.open_process_in "zcat foo.gz".
>
> 2. use gzip.mli (1.2 2002/02/18) as comes with godi 3.09.
>
> it turns out that the zcat version is 3(!) times as fast as the 
> gzip.mli one:
>
> Run time: 189.435840 sec
> Self:     189.435840 sec
>      sys: 183.447465 sec
>     user: 5.988375 sec
> Children: 0.000000 sec
>      sys: 0.000000 sec
>     user: 0.000000 sec
> GC:     minor: 169778
>         major: 478
>   compactions: 3
> Allocated:  5510457762.0 words
> Wall clock:  206 sec (00:03:26)
>
> vs
>
> Run time: 58.471655 sec
> Self:     54.855429 sec
>      sys: 48.527033 sec
>     user: 6.328396 sec
> Children: 3.616226 sec
>      sys: 3.168198 sec
>     user: 0.448028 sec
> GC:     minor: 43174
>         major: 229
>   compactions: 5
> Allocated:  1401290543.0 words
> Wall clock:  78 sec (00:01:18)
>
> since gzip.mli lacks input_line function, I had to roll my own:
>
> let buf = Buffer.create 1024
> let gz_input_line gz_in char_counter line_counter =
>   Buffer.clear buf;
>   let finish () = incr line_counter; Buffer.contents buf in
>   let rec loop () =
>     let ch = Gzip.input_char gz_in in
>     char_counter := Int64.succ !char_counter;
>     if ch = '\n' then finish () else ( Buffer.add_char buf ch; loop 
> (); ) in
>   try loop ()
>   with End_of_file ->
>     if Buffer.length buf = 0 then raise End_of_file else finish ()
>
> is there something wrong with my gz_input_line?
> is this a know performance issue with the CamlZip library?
>
> thanks.
> Sam.
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs