Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum mismatch for ocaml-3.12-refman.html.tar.gz #5381

Closed
vicuna opened this issue Oct 23, 2011 · 10 comments
Closed

Checksum mismatch for ocaml-3.12-refman.html.tar.gz #5381

vicuna opened this issue Oct 23, 2011 · 10 comments

Comments

@vicuna
Copy link

vicuna commented Oct 23, 2011

Original bug ID: 5381
Reporter: gerd
Assigned to: @damiendoligez
Status: closed (set by @xavierleroy on 2015-12-11T18:19:28Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 3.12.1
Category: documentation
Related to: #6022
Monitored by: @protz @ygrek

Bug description

Obviously, this file was modified in the download directory, because the checksum changed recently.

I do not know the reason, and can only speculate: Maybe the site was hacked, and an intruder replaced the file. Maybe there was a patch release under the same file name.

Additional information

If the change is because of a patch release, please don't do this. All distributors run into problems when a file is updated under the same name, because we cannot distinguish between intentional and non-intentional changes, and our distribution mechanisms do not handle this type of change well. Either the file is rejected because an intrusion is suspected, or the update is just ignored and not seen. Also, the replication mechanisms are sometimes confused. I'm reporting as GODI maintainer, but I know this problem exists basically everywhere in the one or other form.

Workaround: add a patch version number to the file.

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: @xavierleroy

I can confirm that, on our side, the file and its MD5 as written in MD5SUM agree, and moreover neither the file nor MD5SUM were changed after the release (Aug 29).

If I download ocaml-3.12-refman.html.tar.gz using wget, I obtain the right file (length 526923) with the right MD5. If I download using "save to file" in Mozilla or Chrome, I get a different file of length 521527 and "tar tzf" in Ubuntu complains about that file, but "tar tzf" in MacOSX is happy with it. Safari, on the other hand, gives me the right file but strips the .gz from its name.

This is all very confusing. Maybe the Web server at caml.inria.fr is wrongly configured or interacts bizarrely with clients that advertise more capabilities than wget. Suggestions or analyses from Web experts would be welcome.

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: @protz

The server is sending an invalid file (it's not even possible to extract it) when talking to Firefox. Investigating.

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: @protz

Firefox gets:

HTTP/1.1 200 OK
Date: Tue, 20 Dec 2011 08:57:28 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Fri, 29 Jul 2011 13:33:47 GMT
Etag: "176028-80a4b-4a93554bee4c0"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-gzip

Wget gets:

HTTP/1.1 200 OK
Date: Tue, 20 Dec 2011 08:58:41 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Fri, 29 Jul 2011 13:33:47 GMT
ETag: "176028-80a4b-4a93554bee4c0"
Accept-Ranges: bytes
Content-Length: 526923
Vary: Accept-Encoding
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: application/x-gzip

What happens is Firefox, seeing gzip-compressed data, decides to de-compress it on the fly, because it's a standard practice to send responses over an HTTP channel compressed in gzip format. However, it still saves the file with the .tar.gz extension. Moreover, it looks like the data is improperly decompressed by Firefox.

I'm puzzled as to it's a bug in Firefox or not. I think the server is sending borderlines HTTP headers.

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: @ygrek

When the client indicates Accept-Encoding: gzip the server compresses the tar.gz with gzip once more and answers with Content-Encoding: gzip. Probably the browsers are not very comfortable with this? Opera saved double-compressed file too, curl behaves as expected (with and without --compress). AFAIK it is a standard practice to disable compression for already compressed files..

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: gerd

I meant something else. Apparently there are two versions of ocaml-3.12-refman.html.tar.gz: One from July 16, 2010, and one from July 29, 2011. The old one has been archived by the GODI system: http://ocaml-programming.de/godi-backup/ocaml-3.12-refman.html.tar.gz.

What probably happened is that the manual was updated when 3.12.1 came out. If I diff the files I get a number of differences in the wording. I do not criticize that this has been done (of course not), but rather that the same file name has been chosen. This is intransparent to the users (it is the first time this happened for a minor Ocaml update, so far I remember), and the packaging systems go crazy when there are two versions of the same file (and apparently, humans also do, see the previous notes). I'd not be surprised if other distributors also ran into the same problem, or did not even recognize that there was an update, and ship 3.12.1 with the old manual.

Regarding the "save the gz file" issue: Historically, this never worked well because of a bug introduced in early Netscape versions (file.gz was saved as file w/ compression), and so servers had to work around this, and so newer browsers just duplicate the bug that is now considered "standard". The lesson: Never download a .gz file with a browser. It activates "workarounds" that are just wrong in 50% of the cases. Use wget or curl instead.

@vicuna
Copy link
Author

vicuna commented Dec 20, 2011

Comment author: @protz

Xavier, or anyone who can modify the server configuration: from https://bugs.launchpad.net/ubuntu/+source/apache/+bug/220171, it looks like

SetEnvIfNoCase REQUEST_URI .(?:gif|jp?g|png|zip|tar.gz|t?gz|bz2)$ no-gzip dont-vary

in the Apache's .htaccess or httpd.conf should fix the download problem. http://httpd.apache.org/docs/2.0/mod/mod_deflate.html seems to be pretty good too. I'll let others argue about the rewording issue.

@vicuna
Copy link
Author

vicuna commented Jan 11, 2012

Comment author: @protz

I think this is also responsible for http://caml.inria.fr/~xleroy/ and others not being displayable (this website uses an unsupported form of compression).

@vicuna
Copy link
Author

vicuna commented Jan 17, 2012

Comment author: @lefessan

I tested on my own Apache server, that exhibits the same problem, and it is clear that the problem appears when a ".html." substring is found in the name of the archive.
Renaming "ocaml-3.12-refman.html.tar.gz" to "ocaml-3.12-refman-html.tar.gz" solved the problem. Unfortunately, I have no direct access to caml.inria.fr

@vicuna
Copy link
Author

vicuna commented Jan 19, 2012

Comment author: @damiendoligez

OK, I'm renaming the file to ocaml-3.12.1-refman-html.tar.gz and from now on I will refrain from changing the file without changing the version number.

What should I do with the current file? Delete it? Restore the previous version? Leave it as it is?

Also, should I also rename ocaml-3.12-refman.html.zip, or is it treated correctly by the servers and browsers?

@vicuna
Copy link
Author

vicuna commented Jan 20, 2012

Comment author: @ygrek

I believe this is a webserver misconfiguration/bug - probably http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=565626 has something to do with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants