|Anonymous | Login | Signup for a new account||2016-09-27 09:05 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0007037||OCaml||OCaml backend (code generation)||public||2015-11-06 17:44||2015-12-13 18:41|
|Target Version||4.03.0+dev / +beta1||Fixed in Version||4.03.0+dev / +beta1|
|Summary||0007037: Binaries are not reproducible (temporary filenames are in the object files)|
|Description||When compiling OCaml, temporary files are generated, and their filenames end up in the resulting object file. There's no need for that.|
The filenames should not end up in the binary at all (since these files are removed anyways).
|Additional Information||The reproducible builds project provides a web interface to look into differences: https://reproducible.debian.net/rb-pkg/unstable/amd64/ocaml.html [^]|
Additionally, they have an issue about OCaml https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=794583 [^]
The concrete diff can be seen at https://reproducible.debian.net/rb-pkg/unstable/amd64/ocaml.html [^]
Debian uses several patches already https://sources.debian.net/src/ocaml/4.02.3-5/debian/patches/ [^]
|Attached Files|| add-a-file-directive-to-generated-s-files-rebased.patch [^] (3,792 bytes) 2015-11-07 14:25 [Show Content]
PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch [^] (1,195 bytes) 2015-11-07 15:08 [Show Content]
Pparse.file-correctly-set-input_name-to-the-before-p.patch [^] (4,709 bytes) 2015-11-09 10:44 [Show Content]
PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file-bis.patch [^] (2,949 bytes) 2015-12-06 17:13 [Show Content]
edited on: 2015-11-06 18:20
The direct link to the diff between two compilations is: https://reproducible.debian.net/dbd/unstable/amd64/ocaml_4.02.3-5.debbindiff.html [^] and text version https://reproducible.debian.net/dbdtxt/unstable/amd64/ocaml_4.02.3-5.debbindiff.txt [^]
Thanks for starting this conversation. I think this issue is
important and have been thinking of starting looking at it for
OCaml. The "reproducible Debian" folks are doing good work.
I had a quick look at the Debian patchset. (I think some of them
could be considered for inclusion again, in particular
Debian patches look related to reproducible builds:
change looks like a good idea in the context of reproducible
build. It implements a Debian-specific
convention (letting SOURCE_DATE_EPOCH override current time for
timestamps used for information on the build environment), but
I would be in favor of integrating it.
change adds a "file" directive to produced assembly outputs. It
replaces one form of non-reproducibility (random filenames
generated at assembly time) by another (random filenames
generated at preprocessing time) that affect less OCaml source
files (only the preprocessed ones).
The second patch looks like it is the cause of the diff you
observe -- rather than upstream code. I suppose that this could
be fixed by either:
- use a deterministic open_temp_file version for -pp
processing (this is what is suggested by the Debian
bug report), or
- threading the source name of the *original* file (not the
preprocessing file) to the assembly backend
It is not clear to me which way would be best. Stéphane Glondu
seems (from the dicussion in the bug report) to have thought
about this issue in more details, maybe he would be interested in
participating in the disucssion again. I'll send him an email.
The 0010-Add-a-.file-directive-to-generated-.s-files.patch inserts the .file directive using Location.input_name, which is /tmp/ocamlppXXXX -- without .file the linker adds the filename of the object file.
I like the proposal to retain the original filename in the .file directive. This is then, as far as I understand, also useful for debugging (where a /tmp/ocamlppXXXX does not help, neither does any static string).
I rebased Stéphane's patch
on top of the current trunk, and uploaded the rebased patch here as
in the "Attached Files" section above.
I am not sure what is the exact command used by the "reproducible Debian" folks to show the diff they have. Just doing "readelf -all ocamlopt.opt" does not suffice to see /tmp/* files in the output, so I use the following command
find . -type f | xargs readelf -all 2>/dev/null | grep FILE | grep tmp
edited on: 2015-11-07 15:12
I just uploaded a second patch ( PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch ) that fixes the /tmp/ocamlpp appearing in the `readelf` output. This patch applies cleanly over both trunk and 4.02.3.
I am unable to observe any other non-deterministic name appearing in this output, so it might be the case that this part of the build process is now deterministic. I rather suspect, however, that I don't look hard enough.
(During testing I have seen "camlstartup"-prefixed random temporary files in the ELF output, but cleaning my development repository removed them and they haven't re-appeared.)
Do you (hannes) know whether it would be possible to re-run the Debian testing machinery to see whether these patches indeed solve the problem? If so, I can propose these patches (and the ocamldoc-related one) for inclusion.
I had a quick look at this last night, and alas the patch does not apply cleanly to the version in sid so I was unable to test it right then..
However, it seems like some of the issues the patch is intending to fix are already fixed elsewhere.. or I am just conflating them. Basically, I'm raising it now in case we can avoid (more?) duplicated work.
The issues I am think of are:
Debian#795784: ocamlopt startup file randomness ends up in final executable - https://bugs.debian.org/795784 [^]
Debian#796336: ocamlopt -pack produces non-reproducible output - https://bugs.debian.org/796336 [^]
Debian#794586: ocaml: Please make ocamldoc produce reproducible manpages - https://bugs.debian.org/794586 [^]
.. and this patch links to #795784 and #796336, both already closed.
lamby: how can I get "the version in sid" to provide patches on top of? If there is a reasonably easy way for me to test for reproducibility directly, that could also let us iterate faster.
Thanks for the pointers to the additional Debian bugtracker issues. They contain commands to reproduce the issues, which is very helpful -- to test whether the issue is only solved in Debian-patched ocaml, or also in upstream ocaml. I think a good and reasonable goal would be to aim for a reproducible upstream (although I don't know what maintainers will think of changes to support reproducibility convention that are Debian-specific currently, so there is no guarantee that everything will be upstreamed).
For the record, I was planning to submit reproducibility patches upstream eventually. For that, I was (first) waiting the transition to 4.02.3 to finish (it finished last week, on November 1st). During the transition, we noticed that one of the patches was bogus (see: https://anonscm.debian.org/cgit/pkg-ocaml-maint/packages/ocaml.git/commit/?id=0f93e9ee91dfa37f2e6209c306fe8f2dbc46e540 [^] ).
gasche: your "fix" was already tried (cf. commit above), but it breaks "ocamldoc -pp". I was rather thinking about adding a new reference in Location, say Location.original_file_name, that would contain the original file name, and using it for .file instead of Location.input_name. I didn't do it during the transition because it would break the ABI of Location.
edited on: 2015-11-09 10:45
Very good, I'll wait for the patches you want to propose.
I am surprised that "ocamldoc -pp" is broken, because I checked that every call to Pparse in ocamldoc is dominated by an assignment to Location.input_name. Would you be willing to add a regression test for "ocamldoc -pp" in testsuite/tests/tools-ocamldoc, to be sure this feature is properly exercised by our test suite?
Before implementing this one-line patch, I did a more complicated patch that propagated the "proper" input name through Pparse (instead of using a global mutable variable) -- but then I noticed that the one-line patch seemed to suffice. I'm uploading it (Pparse.file-correctly-set-input_name-to-the-before-p.patch) in case it can help.
@glondu thanks! I was not aware you're actively working on this (and hope I didn't step on your toe by opening this issue here).
@gasche thanks for preparing the patches!
Rather than try to figure out a stable name for the input file, why not just emit
at the beginning of every assembly file? If we're compiling in -g mode, precise .file declarations will be generated later in the .s file, anyway.
Also, quoting the Info manual for GNU as,
STRING is the new file name. In general, the filename is recognized
whether or not it is surrounded by quotes `"'; but if you wish to
specify an empty file name, you must give the quotes-`""'. This
statement may go away in future: it is only recognized to be compatible
with old `as' programs.
Should we really rely on this statement not to "go away in the future"?
|Using `.file ""' would be fine I guess. It is what gcc does when compiling standard input. It would solve many (if not all) reproducibility issues related to temporary file names.|
Just one point of release-management information: there will be a feature freeze somewhere between mid- and end-December, in preparation for the next (4.03) release in first quarter 2016. If we want to patch the compiler to get reproducible builds (or at least to improve reproducibility), we need a concrete proposal before then.
glondu, would you be able to provide a patch series in that time frame? Of course there is no particular hurry. We are trying to move to a shorter release cycle (6 months ?), so hopefully 4.04 will not be long to come.
I submitted the ocamldoc patch for inclusion in
edited on: 2015-12-06 17:25
I looked at the details of the issue Stéphane Glondu reported with my first one-line patch. I think the related ocamldoc-issue can be fixed and I uploaded an equally simple patch
that I think could be included upstream.
A bit of context first. This patch had been written independently by Stéphane and included in Debian as-is for a while, until the Debian bug report
let him know that it breaks some software using "ocamldoc -pp" -- mlpost in this case.
I analyzed the "ocamldoc -pp" in more details, and I think I understand where the issue comes from. ocamldoc re-opens source files to get comments, and at this point it needs to open the post-processing source, not the pre-processing source (whose path we want in !Location.input_file). Unfortunately some parts of odoc_analyse.ml use !Location.input_file to get the path of the post-processing source (which is currently correct, but a mistake in my opinion), while they have access to another variable that always denote the post-processing path (as returned by Pparse.preprocess). Fixing these two lines of code to use [input_file] instead of [!Location.input_name] lets us change !Location.input_name to point to the original source path without breaking "ocamldoc -pp".
I tested this new patch against mlpost (the Debian package whose documentation was broken by the first patch); I could check that "make doc" breaks with the first patch, but that this new patch makes it work properly again. I inspected all uses of Location.input_name in ocamldoc, and I believe that there are no further issues in ocamldoc.
I just merged the Debian patch on ocamldoc timestamps (support for SOURCE_DATE_EPOCH) in trunk:
Fixed in commit [trunk eef84c4] by emitting a '.file ""' directive at the beginning of every generated asm file.
|2015-11-06 17:44||hannes||New Issue|
|2015-11-06 17:53||hannes||Tag Attached: junior_job|
|2015-11-06 18:04||hannes||Note Added: 0014648|
|2015-11-06 18:20||hannes||Note Edited: 0014648||View Revisions|
|2015-11-06 18:50||gasche||Note Added: 0014649|
|2015-11-06 19:21||hannes||Note Added: 0014650|
|2015-11-07 14:25||gasche||File Added: add-a-file-directive-to-generated-s-files-rebased.patch|
|2015-11-07 14:26||gasche||Note Added: 0014651|
|2015-11-07 14:27||gasche||Note Added: 0014652|
|2015-11-07 15:08||gasche||File Added: PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch|
|2015-11-07 15:11||gasche||Note Added: 0014653|
|2015-11-07 15:12||gasche||Note Edited: 0014653||View Revisions|
|2015-11-07 15:48||gasche||Status||new => acknowledged|
|2015-11-08 16:09||lamby||Note Added: 0014654|
|2015-11-08 16:45||gasche||Note Added: 0014655|
|2015-11-09 10:23||glondu||Note Added: 0014656|
|2015-11-09 10:44||gasche||Note Added: 0014657|
|2015-11-09 10:44||gasche||File Added: Pparse.file-correctly-set-input_name-to-the-before-p.patch|
|2015-11-09 10:45||gasche||Note Edited: 0014657||View Revisions|
|2015-11-09 11:10||hannes||Note Added: 0014658|
|2015-11-15 18:27||xleroy||Note Added: 0014690|
|2015-11-17 10:35||glondu||Note Added: 0014707|
|2015-11-20 19:04||gasche||Note Added: 0014761|
|2015-11-29 23:42||gasche||Note Added: 0014893|
|2015-12-06 17:13||gasche||File Added: PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file-bis.patch|
|2015-12-06 17:24||gasche||Note Added: 0015059|
|2015-12-06 17:25||gasche||Note Edited: 0015059||View Revisions|
|2015-12-12 11:33||gasche||Note Added: 0015144|
|2015-12-13 18:41||xleroy||Note Added: 0015150|
|2015-12-13 18:41||xleroy||Status||acknowledged => resolved|
|2015-12-13 18:41||xleroy||Resolution||open => fixed|
|2015-12-13 18:41||xleroy||Fixed in Version||=> 4.03.0+dev / +beta1|
|2015-12-13 18:41||xleroy||Target Version||=> 4.03.0+dev / +beta1|
|Copyright © 2000 - 2011 MantisBT Group|