Version française
Home     About     Download     Resources     Contact us    
Browse thread
integration of compression with channels
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Stefano Zacchiroli <zack@u...>
Subject: Re: [Caml-list] integration of compression with channels
On Tue, Nov 04, 2008 at 07:50:20AM -0500, Eric Cooper wrote:
> I initially tried something like this in the approx proxy server, but
> found out the hard way that it was difficult to deal with corrupt .gz
> files.  You might only discover the corruption after reading garbage
> for a while, and an exception at that point would be unexpected.

I think you are trying to fight with an intrinsic underlying problem.

Let's take the extreme end of integrity checks: checksum on the whole
file. To be able to check that you need to see all the file in
advance, compute its checksum, and compare with the expected checksum.
On the other hand, abstractions like channels are precisely meant to
read files in a streaming fashion, rather than all together.

Bottom-line: there is a trade-off among "streamability" and integrity
checks, it is up to you to choose where to put yourself in the
trade-off.

Actually, often it is not even up to you, but rather up to the file
format you are reading.  I don't know the gory details of the GZip
format, but Camlzip does some sanity checks on GZip headers, spotting
*some* of the possible header corruptions. It might be that you hit
some corruption cases not implemented by Camlzip, in that case the
proper solution is to add those checks to Camlzip.  On the other hand,
if you want to spot in advance corruptions which occur later on in the
compressed file (and I don't know if GZip supports that or not) you
have no choice beside buffering.

Cheers.

-- 
Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è sempre /oo\ All one has to do is hit the right
uno zaino        -- A.Bergonzoni \__/ keys at the right time -- J.S.Bach