Version française
Home     About     Download     Resources     Contact us    
Browse thread
integration of compression with channels
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Eric Cooper <ecc@c...>
Subject: integration of compression with channels
I was interested to see Zack's work on integrating gzip and bzip2 with
I/O channels:
    http://upsilon.cc/~zack/blog/posts/2008/11/ocaml_batteries_gzip/

I initially tried something like this in the approx proxy server, but
found out the hard way that it was difficult to deal with corrupt .gz
files.  You might only discover the corruption after reading garbage
for a while, and an exception at that point would be unexpected.

Eventually I switched to spawning a "gunzip" process to a temporary
file, and then reading that.  In addition to detecting corruption
early, it was also significantly faster than CamlZip.

I suppose one could argue that you can get an I/O error even from
reading an uncompressed file (bad disk block, or whatever), and that
a robust program should be equally prepared to deal with that.
But I think there's a real difference in practice.

The integrated approach is definitely more elegant, and perhaps the
performance will be competitive someday.  So I'd be interested
if anyone has a better way of handling potentially corrupt files.

-- 
Eric Cooper             e c c @ c m u . e d u