New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binaries are not reproducible (temporary filenames are in the object files) #7037
Comments
Comment author: @hannesm The direct link to the diff between two compilations is: https://reproducible.debian.net/dbd/unstable/amd64/ocaml_4.02.3-5.debbindiff.html and text version https://reproducible.debian.net/dbdtxt/unstable/amd64/ocaml_4.02.3-5.debbindiff.txt |
Comment author: @gasche Thanks for starting this conversation. I think this issue is I had a quick look at the Debian patchset. (I think some of them 0010-Enable-ocamldoc-to-build-reproducible-manpages.patch: This 0010-Add-a-.file-directive-to-generated-.s-files.patch/: This The second patch looks like it is the cause of the diff you
It is not clear to me which way would be best. Stéphane Glondu |
Comment author: @hannesm The 0010-Add-a-.file-directive-to-generated-.s-files.patch inserts the .file directive using Location.input_name, which is /tmp/ocamlppXXXX -- without .file the linker adds the filename of the object file. I like the proposal to retain the original filename in the .file directive. This is then, as far as I understand, also useful for debugging (where a /tmp/ocamlppXXXX does not help, neither does any static string). |
Comment author: @gasche I rebased Stéphane's patch 0010-Add-a-.file-directive-to-generated-.s-files.patch on top of the current trunk, and uploaded the rebased patch here as add-a-file-directive-to-generated-s-files-rebased.patch in the "Attached Files" section above. |
Comment author: @gasche I am not sure what is the exact command used by the "reproducible Debian" folks to show the diff they have. Just doing "readelf -all ocamlopt.opt" does not suffice to see /tmp/* files in the output, so I use the following command find . -type f | xargs readelf -all 2>/dev/null | grep FILE | grep tmp |
Comment author: @gasche I just uploaded a second patch ( PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch ) that fixes the /tmp/ocamlpp appearing in the I am unable to observe any other non-deterministic name appearing in this output, so it might be the case that this part of the build process is now deterministic. I rather suspect, however, that I don't look hard enough. (During testing I have seen "camlstartup"-prefixed random temporary files in the ELF output, but cleaning my development repository removed them and they haven't re-appeared.) Do you (hannes) know whether it would be possible to re-run the Debian testing machinery to see whether these patches indeed solve the problem? If so, I can propose these patches (and the ocamldoc-related one) for inclusion. |
Comment author: lamby I had a quick look at this last night, and alas the patch does not apply cleanly to the version in sid so I was unable to test it right then.. However, it seems like some of the issues the patch is intending to fix are already fixed elsewhere.. or I am just conflating them. Basically, I'm raising it now in case we can avoid (more?) duplicated work. The issues I am think of are: Debian#795784: ocamlopt startup file randomness ends up in final executable - https://bugs.debian.org/795784 .. and this patch links to #795784 and #796336, both already closed. |
Comment author: @gasche lamby: how can I get "the version in sid" to provide patches on top of? If there is a reasonably easy way for me to test for reproducibility directly, that could also let us iterate faster. Thanks for the pointers to the additional Debian bugtracker issues. They contain commands to reproduce the issues, which is very helpful -- to test whether the issue is only solved in Debian-patched ocaml, or also in upstream ocaml. I think a good and reasonable goal would be to aim for a reproducible upstream (although I don't know what maintainers will think of changes to support reproducibility convention that are Debian-specific currently, so there is no guarantee that everything will be upstreamed). |
Comment author: @glondu For the record, I was planning to submit reproducibility patches upstream eventually. For that, I was (first) waiting the transition to 4.02.3 to finish (it finished last week, on November 1st). During the transition, we noticed that one of the patches was bogus (see: https://anonscm.debian.org/cgit/pkg-ocaml-maint/packages/ocaml.git/commit/?id=0f93e9ee91dfa37f2e6209c306fe8f2dbc46e540 ). gasche: your "fix" was already tried (cf. commit above), but it breaks "ocamldoc -pp". I was rather thinking about adding a new reference in Location, say Location.original_file_name, that would contain the original file name, and using it for .file instead of Location.input_name. I didn't do it during the transition because it would break the ABI of Location. |
Comment author: @gasche Very good, I'll wait for the patches you want to propose. I am surprised that "ocamldoc -pp" is broken, because I checked that every call to Pparse in ocamldoc is dominated by an assignment to Location.input_name. Would you be willing to add a regression test for "ocamldoc -pp" in testsuite/tests/tools-ocamldoc, to be sure this feature is properly exercised by our test suite? Before implementing this one-line patch, I did a more complicated patch that propagated the "proper" input name through Pparse (instead of using a global mutable variable) -- but then I noticed that the one-line patch seemed to suffice. I'm uploading it (Pparse.file-correctly-set-input_name-to-the-before-p.patch) in case it can help. |
Comment author: @xavierleroy Rather than try to figure out a stable name for the input file, why not just emit
at the beginning of every assembly file? If we're compiling in -g mode, precise .file declarations will be generated later in the .s file, anyway. Also, quoting the Info manual for GNU as,
STRING is the new file name. In general, the filename is recognized Should we really rely on this statement not to "go away in the future"? |
Comment author: @glondu Using `.file ""' would be fine I guess. It is what gcc does when compiling standard input. It would solve many (if not all) reproducibility issues related to temporary file names. |
Comment author: @gasche Just one point of release-management information: there will be a feature freeze somewhere between mid- and end-December, in preparation for the next (4.03) release in first quarter 2016. If we want to patch the compiler to get reproducible builds (or at least to improve reproducibility), we need a concrete proposal before then. glondu, would you be able to provide a patch series in that time frame? Of course there is no particular hurry. We are trying to move to a shorter release cycle (6 months ?), so hopefully 4.04 will not be long to come. |
Comment author: @gasche I looked at the details of the issue Stéphane Glondu reported with my first one-line patch. I think the related ocamldoc-issue can be fixed and I uploaded an equally simple patch A bit of context first. This patch had been written independently by Stéphane and included in Debian as-is for a while, until the Debian bug report I analyzed the "ocamldoc -pp" in more details, and I think I understand where the issue comes from. ocamldoc re-opens source files to get comments, and at this point it needs to open the post-processing source, not the pre-processing source (whose path we want in !Location.input_file). Unfortunately some parts of odoc_analyse.ml use !Location.input_file to get the path of the post-processing source (which is currently correct, but a mistake in my opinion), while they have access to another variable that always denote the post-processing path (as returned by Pparse.preprocess). Fixing these two lines of code to use [input_file] instead of [!Location.input_name] lets us change !Location.input_name to point to the original source path without breaking "ocamldoc -pp". I tested this new patch against mlpost (the Debian package whose documentation was broken by the first patch); I could check that "make doc" breaks with the first patch, but that this new patch makes it work properly again. I inspected all uses of Location.input_name in ocamldoc, and I believe that there are no further issues in ocamldoc. |
Comment author: @xavierleroy Fixed in commit [trunk eef84c4] by emitting a '.file ""' directive at the beginning of every generated asm file. |
Original bug ID: 7037
Reporter: @hannesm
Status: closed (set by @xavierleroy on 2017-02-16T14:18:10Z)
Resolution: fixed
Priority: high
Severity: minor
Version: 4.02.3
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Tags: junior_job
Monitored by: @glondu @jmeber @yallop
Bug description
When compiling OCaml, temporary files are generated, and their filenames end up in the resulting object file. There's no need for that.
The filenames should not end up in the binary at all (since these files are removed anyways).
Additional information
The reproducible builds project provides a web interface to look into differences: https://reproducible.debian.net/rb-pkg/unstable/amd64/ocaml.html
Additionally, they have an issue about OCaml https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=794583
The concrete diff can be seen at https://reproducible.debian.net/rb-pkg/unstable/amd64/ocaml.html
Debian uses several patches already https://sources.debian.net/src/ocaml/4.02.3-5/debian/patches/
File attachments
The text was updated successfully, but these errors were encountered: