[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2008-11-04 (13:35) |
From: | Stefano Zacchiroli <zack@u...> |
Subject: | Re: [Caml-list] integration of compression with channels |
On Tue, Nov 04, 2008 at 07:50:20AM -0500, Eric Cooper wrote: > I initially tried something like this in the approx proxy server, but > found out the hard way that it was difficult to deal with corrupt .gz > files. You might only discover the corruption after reading garbage > for a while, and an exception at that point would be unexpected. I think you are trying to fight with an intrinsic underlying problem. Let's take the extreme end of integrity checks: checksum on the whole file. To be able to check that you need to see all the file in advance, compute its checksum, and compare with the expected checksum. On the other hand, abstractions like channels are precisely meant to read files in a streaming fashion, rather than all together. Bottom-line: there is a trade-off among "streamability" and integrity checks, it is up to you to choose where to put yourself in the trade-off. Actually, often it is not even up to you, but rather up to the file format you are reading. I don't know the gory details of the GZip format, but Camlzip does some sanity checks on GZip headers, spotting *some* of the possible header corruptions. It might be that you hit some corruption cases not implemented by Camlzip, in that case the proper solution is to add those checks to Camlzip. On the other hand, if you want to spot in advance corruptions which occur later on in the compressed file (and I don't know if GZip supports that or not) you have no choice beside buffering. Cheers. -- Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7 zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/ Dietro un grande uomo c'è sempre /oo\ All one has to do is hit the right uno zaino -- A.Bergonzoni \__/ keys at the right time -- J.S.Bach