Browse thread
zcat vs CamlZip
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2006-08-29 (19:15) |
From: | Sam Steingold <sds@p...> |
Subject: | Re: zcat vs CamlZip |
Bardur Arantsson wrote: > Sam Steingold wrote: >> I read through a huge *.gz file. >> I have two versions of the code: > [--snip--] >> >> let buf = Buffer.create 1024 >> let gz_input_line gz_in char_counter line_counter = >> Buffer.clear buf; >> let finish () = incr line_counter; Buffer.contents buf in >> let rec loop () = >> let ch = Gzip.input_char gz_in in > > This is your most likely culprit. Any kind of "do this for every > character" is usually insanely expensive when you can do it in bulk. > (This is especially true when needing to do system calls, or if the > called function cannot be inlined.) > yes, I thought about it, but I assumed that the ocaml gzip module inlines Gzip.input_char (obviously the gzip module needs an internal cache so Gzip.input_char does not _always_ translate to a system call, most of the time it just pops a char from the internal buffer). at any rate, do you really expect that using Gzip.input and then searching the result for a newline, slicing and dicing to get the individual input lines, &c &c would be faster? Sam.