ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j #4991

vicuna · 2010-03-04T18:36:08Z

Original bug ID: 4991
Reporter: letouzey
Status: acknowledged (set by @xavierleroy on 2010-04-18T12:31:37Z)
Resolution: open
Priority: normal
Severity: feature
Version: 3.11.2
Category: compiler driver
Related to: #5000 #5282 #7472
Monitored by: mfp mehdi @glondu bobot "Julien Signoles" @dra27 @dbuenzli @yakobowski

Bug description

When a .ml file has no corresponding .mli file, ocamlc and ocamlopt
create both the same .cmi file. During a parallel build that mixes
the byte and opt phases, this could lead to failure: imagine for instance
a process trying to read the .cmi (made earlier by ocamlc) at the precise
moment ocamlopt is recreating the .cmi, resulting in a "corrupted .cmi" error.
We have from time to time this kind of issue when building Coq
via make -j (unlimited number of process) on a quad-core.

Btw, a strace on ocamlc or ocamlopt shows that the .cmi is
first opened with O_WRONLY|O_CREAT|O_TRUNC, then a second time
with RDONLY. Is it done for some kind of verification ?

Would it be possible to modify ocaml{c,opt} in order to make them check
for an existing .cmi even if no .mli exists, and modify this .cmi only
if it differs from what the compiler intend to write ?

If this isn't easily doable, an option preventing the creation of
the .cmi would be great. In fact, there already exists a
Clflags.dont_write_files option, which is used precisely at the right
spot of Typemod. Could this flag be toggle someday by the user ?

For the moment, I'm circumventing this issue in our makefile by making sure
that ocamlc comes first and then doing something like
ocamlopt -c -o foo.tmp.cmx foo.ml &&
rm foo.tmp.cmi && mv foo.tmp.o foo.o && mv foo.tmp.cmx foo.cmx
It works, but this isn't exactly pretty.

Thanks in advance,

Pierre Letouzey

vicuna · 2010-03-04T19:20:44Z

Comment author: @alainfrisch

OMake's approach is to force ocamlopt to use the .cmi file produced by ocamlc. This is done by creatin a dependency on this .cmi file (when then .mli file does not exist) for the ocamlopt rule, and to use "-intf-suffix .cmi" to let ocamlopt believe that the interface does exist.

That said, we also observe random and rare cases of "corrupted compiled interface" during parallel builds under Windows. Maybe there is something wrong with the approach above, or it's just Windows that decides to do random errors.

vicuna · 2010-03-04T19:39:04Z

Comment author: letouzey

:-)

Thanks Alain, I actually though of this -intf-suffix, but I was so
sure this kind of trick would simply not work that I didn't even take
the time to try. Guess what, I'll go and test that now.

And indeed, I've also made .cmx depend on .cmi...

Concerning the windows build, I played recently with mingw32 cross-compilation
(more precisely, the debianisation of the work of R. Jones for fedora). And
It's so cool to be able to build .exe (even coqide) for a decent dev platform.

Best,
Pierre

vicuna · 2010-03-08T10:25:56Z

Comment author: letouzey

Ok, after testing Alain's tip I can confirm that "-intf-suffix .cmi" is
exactly what I was looking for. Thanks. I think this bug report can be
closed. My only remaining suggestion would be to document more properly
this trick: as I told in my first comment above, after reading man pages for
ocaml{c,opt}, I was (wrongly) conviced that files indicated via -intf-suffix
would have to be source interface file, while they actually are only
checked for existence and never looked into.

Best regards,
Pierre

vicuna · 2010-03-25T10:25:14Z

Comment author: @damiendoligez

The way I see it, the best solution is to have a .mli for each .ml file. (For rather
obscure reasons, it also makes the GC very slightly more efficient.)

The next best thing would be to make the compiler output the compiled interface
to a temporary file, then move it atomically to the .cmi (although I'm not sure
if that's possible under Windows).

vicuna · 2010-04-06T21:14:53Z

Comment author: @alainfrisch

Damien: what's wrong with the -intf-suffix solution?

vicuna · 2010-04-18T12:31:37Z

Comment author: @xavierleroy

Re: atomic file rename: it's unclear this can be done under Windows. The rename() function from the C standard library, interfaced in Caml as Sys.rename, fails if the destination file exists. The MoveFileEx Win32 system call can be cajoled into replacing an existing file, but it is not specified that this happens atomically. (Google "MoveFileEx atomic".)

Re: opening the .cmi twice: the second open-for-reading is done to compute the MD5 digest of the data just written. OCaml's I/O channels don't support opening for reading and writing at the same time.

Re: the -intf-suffix trick: I'm happy that it provides a workaround. Still, I have the feeling that something is slightly wrong here, so I'll leave this PR as a feature wish.

vicuna · 2010-04-26T11:21:36Z

Comment author: @damiendoligez

I have no reason to believe there is currently something wrong with the -intf-suffix trick, but it still feels like a hack.

vicuna · 2014-04-10T16:49:21Z

Comment author: @gasche

I'm hitting a new instance of this problem that I thought I would document here for posterity. The race condition happens not between ocamlc and ocamlopt, but two instances of ocamlopt invoking -pack to produce the same packed module, which does not have a corresponding .mli.

The problem is the following: if the two runs try to produce the same .cmi file at precisely the same time, then one of them may make a mistake when computing the control sum (CRC) of the .cmi (which happens in between the beginning of writing the .cmi and before finishing the write), with some parts written or truncated by the other process taken into account. Both runs will then include the CRC they computed in the final .cmo, so it is likely that an incorrect CRC gets written in the .cmo file -- and that the .cmi produced is incorrect.

It seems to me that, in that case, the workaround presented are not effective:

I don't think we can expect every user of -pack to write a .mli file for the pack, especially when the individual packed modules already have a .mli
Passing -intf-suffix doesn't work (not quite sure why, I tried and it fails with a Syntax Error on the signature)

Reproductibility information: The race occurs about one times over three (so it's actually quite common) when running "make -j7" from the ocamlbuild subdirectory of the compiler (trunk@14532), after I first added a couple missing dependencies in the .depend that also broke compilation (ocamlbuild{_plugin,_unix_plugin,light} should depend on ocamlbuild_pack.cmi).

vicuna · 2014-04-11T18:43:12Z

Comment author: @gasche

I fixed this instance of the problem by adding more dependencies (from the pack .cmo to the pack .cmi). I still think that, long-term, it would be nice to change the way we produce side-files (.cmi and .cmx) to a more atomic replace operation or, even better, replace_if_changed (we can look at the checksum of the existing file).

vicuna · 2014-04-17T07:01:20Z

Comment author: bobot

Re: atomic file rename: why the atomicity guaranty of the rename is needed? It seems that the worse that can happen is to lose the compiled file .cm* during a power outage, but it is just a compiled file and discrepancy between .cm* will be detected.

The second problem with non-atomicity is that the destination file can be missing during a period of time. But here the acceptable use case is when the destination file is the same than the temporary file .cmi. So we can test that the checksum is identical and just "touch" the destination file.

In the other case, when the destination file is different, nobody should be opening the .cm* since they are not yet updated, it is an error of the build system.

vicuna · 2014-04-17T07:51:19Z

Comment author: @gasche

Sorry for being unclear: a plain (not particularily atomic) "rename" would already be much more atomic than the way cmi files are currently generated, which is as follow:

write the signature in the target file
read the target file to compute the checksum of the signature
write the checksum in the target file

Races between two processes writing partial information to the file happens quite often in this setting. A rename from an assumed-private temporary would not allow one compilation process to checksum a file that's been partially written by another, as it is currently possible.

vicuna · 2014-04-17T09:41:21Z

Comment author: bobot

So even if, as Xavier remarked, "it's unclear [atomicity] can be done under Windows.", patches are welcomed for using an intermediary file during generation of files?

Or patch that computes the checksum directly on the in memory data?

vicuna · 2014-04-17T09:54:04Z

Comment author: @gasche

I would be interested in such patches, in any case. But be careful before investing your time: any downside of a new approach (eg. increased compilation time, or too-invasive change¹) would be a sufficient argument to maintain the statu quo, that works for most users (with high probability).

¹: eg. you do not want to re-implement part of the marshalling logic to compute checksum, as this would probably be quite fragile. Marshalling to a string instead of directly to a file would be a possible approach, but then we'd have to make sure that memory consumption isn't a problem (how large can .cmi files be? I heard that core.cmi is quite large; would we apply the same strategy on .cmo?).

I think that write-in-temp than move is probably the easiest way forward in this case, but I'll let you decide if you're interested in working on this.

vicuna · 2014-04-26T21:42:43Z

Comment author: bobot

The write-in-temp inside the compiler is a nice solution, but there is some choice to make. For example, when the cmi is identical should we still "touch" it?

Another possibility can be to let the build system do the choice of the temporary file and do the test. The build system just need to be able to set the output filename for the cmi file using a command line option of ocamlc or ocamlopt, e.g. --outfile-cmi. This interface is simple and lets a lot of freedom to the user. One drawback is that Makefile/build tool must be modified for solving the problem of this bugreport.

With a more general point of view, the possibility to set the filename of all the output file of the compiler (.cmt, .annot, ...) is a very nice feature to have.

vicuna · 2014-06-30T07:30:39Z

Comment author: bobot

In fact -o does already the job, since every produced file filename as the same prefix than it (only the extension change)

vicuna · 2014-06-30T07:59:55Z

Comment author: bobot

#78 will try to work on this possible solution:

instead of creating foo.cm* create them in a new temporary directory $rnd.tmp
test if it is different than the foo.cm* previously or concurrently produced
replace it if it is the case
remove the temporary directory

vicuna · 2017-08-29T15:17:41Z

Comment author: @xavierleroy

Work in progress at #1307

nojb · 2019-03-15T14:56:13Z

PR merged, so closing this.

vicuna added the compiler-driver label Mar 14, 2019

This was referenced Mar 14, 2019

ocamldep misses dependencies #5000

Closed

.cmo files are generated even for native compilation #5282

Closed

Please replace cmi files atomically when writing new versions #7472

Closed

nojb closed this as completed Mar 15, 2019

vicuna added the feature-wish label Mar 20, 2019

silene mentioned this issue Aug 26, 2019

Parallel build errors coq-community/bignums#20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j #4991

ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j #4991

vicuna commented Mar 4, 2010

vicuna commented Mar 4, 2010

vicuna commented Mar 4, 2010

vicuna commented Mar 8, 2010

vicuna commented Mar 25, 2010

vicuna commented Apr 6, 2010

vicuna commented Apr 18, 2010

vicuna commented Apr 26, 2010

vicuna commented Apr 10, 2014

vicuna commented Apr 11, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 26, 2014

vicuna commented Jun 30, 2014

vicuna commented Jun 30, 2014

vicuna commented Aug 29, 2017

nojb commented Mar 15, 2019

ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j #4991

ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j #4991

Comments

vicuna commented Mar 4, 2010

Bug description

vicuna commented Mar 4, 2010

vicuna commented Mar 4, 2010

vicuna commented Mar 8, 2010

vicuna commented Mar 25, 2010

vicuna commented Apr 6, 2010

vicuna commented Apr 18, 2010

vicuna commented Apr 26, 2010

vicuna commented Apr 10, 2014

vicuna commented Apr 11, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 17, 2014

vicuna commented Apr 26, 2014

vicuna commented Jun 30, 2014

vicuna commented Jun 30, 2014

vicuna commented Aug 29, 2017

nojb commented Mar 15, 2019