Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007037OCamlOCaml backend (code generation)public2015-11-06 17:442015-12-13 18:41
Assigned To 
PlatformOSOS Version
Product Version4.02.3 
Target Version4.03.0+dev / +beta1Fixed in Version4.03.0+dev / +beta1 
Summary0007037: Binaries are not reproducible (temporary filenames are in the object files)
DescriptionWhen compiling OCaml, temporary files are generated, and their filenames end up in the resulting object file. There's no need for that.

The filenames should not end up in the binary at all (since these files are removed anyways).
Additional InformationThe reproducible builds project provides a web interface to look into differences: [^]

Additionally, they have an issue about OCaml [^]

The concrete diff can be seen at [^]

Debian uses several patches already [^]
Attached Filespatch file icon add-a-file-directive-to-generated-s-files-rebased.patch [^] (3,792 bytes) 2015-11-07 14:25 [Show Content]
patch file icon PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch [^] (1,195 bytes) 2015-11-07 15:08 [Show Content]
patch file icon Pparse.file-correctly-set-input_name-to-the-before-p.patch [^] (4,709 bytes) 2015-11-09 10:44 [Show Content]
patch file icon PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file-bis.patch [^] (2,949 bytes) 2015-12-06 17:13 [Show Content]

- Relationships

-  Notes
hannes (reporter)
2015-11-06 18:04
edited on: 2015-11-06 18:20

The direct link to the diff between two compilations is: [^] and text version [^]

gasche (developer)
2015-11-06 18:50

Thanks for starting this conversation. I think this issue is
important and have been thinking of starting looking at it for
OCaml. The "reproducible Debian" folks are doing good work.

I had a quick look at the Debian patchset. (I think some of them
could be considered for inclusion again, in particular
0001-Pass-no-relax-to-ld-on-alpha.patch and
0008-Native-backtraces-don-t-work-on-powerpc-and-sparc.patch). Two
Debian patches look related to reproducible builds:

0010-Enable-ocamldoc-to-build-reproducible-manpages.patch: This
  change looks like a good idea in the context of reproducible
  build. It implements a Debian-specific
  convention (letting SOURCE_DATE_EPOCH override current time for
  timestamps used for information on the build environment), but
  I would be in favor of integrating it.

0010-Add-a-.file-directive-to-generated-.s-files.patch/: This
  change adds a "file" directive to produced assembly outputs. It
  replaces one form of non-reproducibility (random filenames
  generated at assembly time) by another (random filenames
  generated at preprocessing time) that affect less OCaml source
  files (only the preprocessed ones).

The second patch looks like it is the cause of the diff you
observe -- rather than upstream code. I suppose that this could
be fixed by either:

- use a deterministic open_temp_file version for -pp
  processing (this is what is suggested by the Debian
  bug report), or

- threading the source name of the *original* file (not the
  preprocessing file) to the assembly backend

It is not clear to me which way would be best. Stéphane Glondu
seems (from the dicussion in the bug report) to have thought
about this issue in more details, maybe he would be interested in
participating in the disucssion again. I'll send him an email.
hannes (reporter)
2015-11-06 19:21

The 0010-Add-a-.file-directive-to-generated-.s-files.patch inserts the .file directive using Location.input_name, which is /tmp/ocamlppXXXX -- without .file the linker adds the filename of the object file.

I like the proposal to retain the original filename in the .file directive. This is then, as far as I understand, also useful for debugging (where a /tmp/ocamlppXXXX does not help, neither does any static string).
gasche (developer)
2015-11-07 14:26

I rebased Stéphane's patch


on top of the current trunk, and uploaded the rebased patch here as


in the "Attached Files" section above.
gasche (developer)
2015-11-07 14:27

I am not sure what is the exact command used by the "reproducible Debian" folks to show the diff they have. Just doing "readelf -all ocamlopt.opt" does not suffice to see /tmp/* files in the output, so I use the following command

  find . -type f | xargs readelf -all 2>/dev/null | grep FILE | grep tmp
gasche (developer)
2015-11-07 15:11
edited on: 2015-11-07 15:12

I just uploaded a second patch ( PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch ) that fixes the /tmp/ocamlpp appearing in the `readelf` output. This patch applies cleanly over both trunk and 4.02.3.

I am unable to observe any other non-deterministic name appearing in this output, so it might be the case that this part of the build process is now deterministic. I rather suspect, however, that I don't look hard enough.

(During testing I have seen "camlstartup"-prefixed random temporary files in the ELF output, but cleaning my development repository removed them and they haven't re-appeared.)

Do you (hannes) know whether it would be possible to re-run the Debian testing machinery to see whether these patches indeed solve the problem? If so, I can propose these patches (and the ocamldoc-related one) for inclusion.

lamby (reporter)
2015-11-08 16:09

I had a quick look at this last night, and alas the patch does not apply cleanly to the version in sid so I was unable to test it right then..

However, it seems like some of the issues the patch is intending to fix are already fixed elsewhere.. or I am just conflating them. Basically, I'm raising it now in case we can avoid (more?) duplicated work.

The issues I am think of are:

Debian#795784: ocamlopt startup file randomness ends up in final executable - [^]
Debian#796336: ocamlopt -pack produces non-reproducible output - [^]
Debian#794586: ocaml: Please make ocamldoc produce reproducible manpages - [^]

.. and this patch links to #795784 and #796336, both already closed.
gasche (developer)
2015-11-08 16:45

lamby: how can I get "the version in sid" to provide patches on top of? If there is a reasonably easy way for me to test for reproducibility directly, that could also let us iterate faster.

Thanks for the pointers to the additional Debian bugtracker issues. They contain commands to reproduce the issues, which is very helpful -- to test whether the issue is only solved in Debian-patched ocaml, or also in upstream ocaml. I think a good and reasonable goal would be to aim for a reproducible upstream (although I don't know what maintainers will think of changes to support reproducibility convention that are Debian-specific currently, so there is no guarantee that everything will be upstreamed).
glondu (reporter)
2015-11-09 10:23

For the record, I was planning to submit reproducibility patches upstream eventually. For that, I was (first) waiting the transition to 4.02.3 to finish (it finished last week, on November 1st). During the transition, we noticed that one of the patches was bogus (see: [^] ).

gasche: your "fix" was already tried (cf. commit above), but it breaks "ocamldoc -pp". I was rather thinking about adding a new reference in Location, say Location.original_file_name, that would contain the original file name, and using it for .file instead of Location.input_name. I didn't do it during the transition because it would break the ABI of Location.
gasche (developer)
2015-11-09 10:44
edited on: 2015-11-09 10:45

Very good, I'll wait for the patches you want to propose.

I am surprised that "ocamldoc -pp" is broken, because I checked that every call to Pparse in ocamldoc is dominated by an assignment to Location.input_name. Would you be willing to add a regression test for "ocamldoc -pp" in testsuite/tests/tools-ocamldoc, to be sure this feature is properly exercised by our test suite?

Before implementing this one-line patch, I did a more complicated patch that propagated the "proper" input name through Pparse (instead of using a global mutable variable) -- but then I noticed that the one-line patch seemed to suffice. I'm uploading it (Pparse.file-correctly-set-input_name-to-the-before-p.patch) in case it can help.

hannes (reporter)
2015-11-09 11:10

@glondu thanks! I was not aware you're actively working on this (and hope I didn't step on your toe by opening this issue here).

@gasche thanks for preparing the patches!
xleroy (administrator)
2015-11-15 18:27

Rather than try to figure out a stable name for the input file, why not just emit

    .file ""

at the beginning of every assembly file? If we're compiling in -g mode, precise .file declarations will be generated later in the .s file, anyway.

Also, quoting the Info manual for GNU as,

     .file STRING

   STRING is the new file name. In general, the filename is recognized
whether or not it is surrounded by quotes `"'; but if you wish to
specify an empty file name, you must give the quotes-`""'. This
statement may go away in future: it is only recognized to be compatible
with old `as' programs.

Should we really rely on this statement not to "go away in the future"?
glondu (reporter)
2015-11-17 10:35

Using `.file ""' would be fine I guess. It is what gcc does when compiling standard input. It would solve many (if not all) reproducibility issues related to temporary file names.
gasche (developer)
2015-11-20 19:04

Just one point of release-management information: there will be a feature freeze somewhere between mid- and end-December, in preparation for the next (4.03) release in first quarter 2016. If we want to patch the compiler to get reproducible builds (or at least to improve reproducibility), we need a concrete proposal before then.

glondu, would you be able to provide a patch series in that time frame? Of course there is no particular hurry. We are trying to move to a shorter release cycle (6 months ?), so hopefully 4.04 will not be long to come.
gasche (developer)
2015-11-29 23:42

I submitted the ocamldoc patch for inclusion in [^]
gasche (developer)
2015-12-06 17:24
edited on: 2015-12-06 17:25

I looked at the details of the issue Stéphane Glondu reported with my first one-line patch. I think the related ocamldoc-issue can be fixed and I uploaded an equally simple patch
that I think could be included upstream.

A bit of context first. This patch had been written independently by Stéphane and included in Debian as-is for a while, until the Debian bug report [^]
let him know that it breaks some software using "ocamldoc -pp" -- mlpost in this case.

I analyzed the "ocamldoc -pp" in more details, and I think I understand where the issue comes from. ocamldoc re-opens source files to get comments, and at this point it needs to open the post-processing source, not the pre-processing source (whose path we want in !Location.input_file). Unfortunately some parts of use !Location.input_file to get the path of the post-processing source (which is currently correct, but a mistake in my opinion), while they have access to another variable that always denote the post-processing path (as returned by Pparse.preprocess). Fixing these two lines of code to use [input_file] instead of [!Location.input_name] lets us change !Location.input_name to point to the original source path without breaking "ocamldoc -pp".

I tested this new patch against mlpost (the Debian package whose documentation was broken by the first patch); I could check that "make doc" breaks with the first patch, but that this new patch makes it work properly again. I inspected all uses of Location.input_name in ocamldoc, and I believe that there are no further issues in ocamldoc.

gasche (developer)
2015-12-12 11:33

I just merged the Debian patch on ocamldoc timestamps (support for SOURCE_DATE_EPOCH) in trunk: [^]
xleroy (administrator)
2015-12-13 18:41

Fixed in commit [trunk eef84c4] by emitting a '.file ""' directive at the beginning of every generated asm file.

- Issue History
Date Modified Username Field Change
2015-11-06 17:44 hannes New Issue
2015-11-06 17:53 hannes Tag Attached: junior_job
2015-11-06 18:04 hannes Note Added: 0014648
2015-11-06 18:20 hannes Note Edited: 0014648 View Revisions
2015-11-06 18:50 gasche Note Added: 0014649
2015-11-06 19:21 hannes Note Added: 0014650
2015-11-07 14:25 gasche File Added: add-a-file-directive-to-generated-s-files-rebased.patch
2015-11-07 14:26 gasche Note Added: 0014651
2015-11-07 14:27 gasche Note Added: 0014652
2015-11-07 15:08 gasche File Added: PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file.patch
2015-11-07 15:11 gasche Note Added: 0014653
2015-11-07 15:12 gasche Note Edited: 0014653 View Revisions
2015-11-07 15:48 gasche Status new => acknowledged
2015-11-08 16:09 lamby Note Added: 0014654
2015-11-08 16:45 gasche Note Added: 0014655
2015-11-09 10:23 glondu Note Added: 0014656
2015-11-09 10:44 gasche Note Added: 0014657
2015-11-09 10:44 gasche File Added: Pparse.file-correctly-set-input_name-to-the-before-p.patch
2015-11-09 10:45 gasche Note Edited: 0014657 View Revisions
2015-11-09 11:10 hannes Note Added: 0014658
2015-11-15 18:27 xleroy Note Added: 0014690
2015-11-17 10:35 glondu Note Added: 0014707
2015-11-20 19:04 gasche Note Added: 0014761
2015-11-29 23:42 gasche Note Added: 0014893
2015-12-06 17:13 gasche File Added: PR7037-fix-erroneous-Location.input_name-setting-in-Pparse-file-bis.patch
2015-12-06 17:24 gasche Note Added: 0015059
2015-12-06 17:25 gasche Note Edited: 0015059 View Revisions
2015-12-12 11:33 gasche Note Added: 0015144
2015-12-13 18:41 xleroy Note Added: 0015150
2015-12-13 18:41 xleroy Status acknowledged => resolved
2015-12-13 18:41 xleroy Resolution open => fixed
2015-12-13 18:41 xleroy Fixed in Version => 4.03.0+dev / +beta1
2015-12-13 18:41 xleroy Target Version => 4.03.0+dev / +beta1

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker