Version française
Home     About     Download     Resources     Contact us    
Browse thread
zcat vs CamlZip
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Sam Steingold <sds@p...>
Subject: Re: zcat vs CamlZip
Bardur Arantsson wrote:
> Sam Steingold wrote:
>> I read through a huge *.gz file.
>> I have two versions of the code:
> [--snip--]
>>
>> let buf = Buffer.create 1024
>> let gz_input_line gz_in char_counter line_counter =
>>   Buffer.clear buf;
>>   let finish () = incr line_counter; Buffer.contents buf in
>>   let rec loop () =
>>     let ch = Gzip.input_char gz_in in
> 
> This is your most likely culprit. Any kind of "do this for every 
> character" is usually insanely expensive when you can do it in bulk.
> (This is especially true when needing to do system calls, or if the 
> called function cannot be inlined.)
> 

yes, I thought about it, but I assumed that the ocaml gzip module 
inlines  Gzip.input_char (obviously the gzip module needs an internal 
cache so Gzip.input_char does not _always_ translate to a system call, 
most of the time it just pops a char from the internal buffer).
at any rate, do you really expect that using Gzip.input and then 
searching the result for a newline, slicing and dicing to get the 
individual input lines, &c &c would be faster?

Sam.