Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007472OCamlcompiler driverpublic2017-01-30 22:492017-11-12 17:17
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
PlatformOSOS Version
Product Version4.04.0 
Target VersionFixed in Version4.06.0 
Summary0007472: Please replace cmi files atomically when writing new versions
DescriptionI've recently found a problem in the omake build rules, and while it is possible to tackle it there I think it is better to do so in the compiler. In short, when both bytecode and native code build paths are enabled, it is generally possible that both of these paths create the cmi file for a module at different points in time, and that the build in the other path sees for a short moment a destroyed cmi file (because it is being overwritten with identical contents). When there is an mli file, it is easy to write the build rules so that the problem is avoided (only either the bytecode or native code path compile the mli file). However, when there is no mli file, it is not that easy.

Apparently, omake used to contain a special ruleset to tackle that case, and the bytecode and native code build is then done together, i.e. ocamlc and ocamlopt are called in the same build step, just to avoid that one of the build paths sees a corrupt cmi file. I recently discovered the problem again for the macro that creates packs (see [^]). The solution works, but there is a price to pay. When the user only requests, say the native code version of an executable, and packs or units are involved that do not have an mli file, some parts of the bytecode build are also being conducted.

There is an easy solution of the problem in the compiler: when writing a cmi file, do not directly overwrite the cmi file, but first write to a second file, and finally rename that file so it has the cmi suffix. This way there is no point in time where a truncated cmi file is visible to other processes of the build. This way build tools need not to take care of the problem.
Steps To ReproduceAssume there is but no x.mli. We do both a bytecode and a native code build. Let's say the bytecode build was a little bit quicker so far, and the build runs

ocamlc -c

The bytecode build then continues, assuming that x.cmi is present. Somewhat later the native code build is also at this step, and runs

ocamlopt -c

when the file x.cmi is written, and at the same time the bytecode compiler accesses x.cmi, there is the danger that the bytecode compiler sees a truncated x.cmi file.

Note that this is more likely to occur for big cmi files. I guess it's because of this I recently stubled upon the problem when creating packs.
Additional InformationThe workaround for the problem in omake consists of these lines:

    section rule
        if ...
        elseif $(BYTE_ENABLED)
            %.cmx %.cmi %$(EXT_OBJ) %.cmo: :scanner:
                $(OCamlC) -c $<
                $(OCamlOpt) -c $<

This build rule is only in effect when there (a) both bytecode and native code are enabled, and (b) there is no mli file. As you see both compilers are called, even if this is not needed.

In a recent PR I am now introducing something similar for creating packs: [^]
TagsNo tags attached.
Attached Files

- Relationships
related to 0004991acknowledged ocaml{c,opt} may truncate and recreate a .cmi, leading to (rare) failures of make -j 

-  Notes
frisch (developer)
2017-01-31 14:20

I like the proposal.

What about .cmt files? In addition to the risk of concurrent writes (which will become more serious when tools that process .cmt files -- such as documentation generators -- start to be more widely used), there is the problem that .cmt files generated by ocamlc and ocamlopt for the same input file will be different since they keep a copy of the compiler's command-line.
dra (developer)
2017-01-31 15:54
edited on: 2017-01-31 15:54

I too like this proposal. My past solution with make either drops parallelisation or has to use ugly locking for the no .mli case.

Unrelated to this PR, but possibly related to Alain's comment, I wonder given safe-string, flambda, various available runtimes and eventually proper cross-compiling if it's time to review why we have a separate ocamlc and ocamlopt (rather than one driver with output options)? Much of this disappears if it were possible to output bytecode and native code in a single driver invocation.

gerd (reporter)
2017-01-31 16:25

dra: not sure what the benefit is (besides maybe cleaning up the code base). If we generate both bytecode and native code in one step, this is very much like always calling ocamlc and ocamlopt together. Ok, you can save time by avoiding doing the typing twice, but typically this isn't the time-consuming part of compilation.

Just for the record, there is still another radical solution: simply do not use the same cmi files in both bytecode and native builds. Separate the toolchains completely. I want to mention this because I think the bytecode compiler will become unimportant at some point (in particular when we have a native toplevel), at least for the majority of the users. At the same time ocamlopt advances into the field of whole-project optimizations, and unifying cmi and cmx could become interesting. In this world, the overhead of having parallel builds could become a burden anyway.
dim (developer)
2017-01-31 16:35

Note that at Jane Street we use a different trick. We choose ocamlc to build the .cmi and we pass [-intf .ml] to ocamlopt. ocamlopt then thinks that the mli exists and will read the .cmi rather than re-create it.

It'd still be nice to improve the situation in general, but that's a good workaround for existing versions of OCaml.
xleroy (administrator)
2017-01-31 16:47

See 0004991 for a somewhat more general discussion.

The main issue here is Windows support: it is not clear there exists a Win32 "rename file" function that guarantees atomicity.
dra (developer)
2017-01-31 17:23
edited on: 2017-01-31 17:24

TxF would provide this guarantee on Windows (Vista+ only) but it's partially deprecated. Given how easy these conflicts are to generate with build systems currently, it would not be too hard to see if ReplaceFile provides sufficient guarantees. Its API doesn't mention atomicity, though the TxF migration pages recommend using it as an alternative for this exact situation. For our purposes wouldn't this be OK - the problem is a truncated file existing, rather than no file existing? I think it may also be that the scenarios under which this call can fail would already be a problem for the build system (out of disk space, permissions)?

gerd (reporter)
2017-01-31 17:31

If there is no atomic rename on Windows, another option would be to lock the file while it is being written. This also needs some support from the reader, though.
xleroy (administrator)
2017-01-31 19:35

All right, ReplaceFile might be atomic enough... I welcome a patch to byterun/sys.c and byterun/win32.c and byterun/include/misc.h that reimplements Sys.rename in terms of ReplaceFile under Win32. The idea being to have Sys.rename behave as much as possible the same under POSIX and under Win32.
gerd (reporter)
2017-02-01 10:53

In omake we are tracking this now in [^]
xleroy (administrator)
2017-08-29 17:16
edited on: 2017-08-29 17:17

Work in progress at [^] and [^]

aha (reporter)
2017-11-11 14:22

The current solution (from the Github Pull Requests linked above) doesn't work very well on Windows.
I can observe it from time to time that, when two or more parallel processes call Sys.rename with the same target, one call will fail with a "Permission denied" error.
The most common case are now parallel `ocamlc -bin-annot -c` and `ocamlopt -bin-annot -c` calls, that race to create foo.cmt ( [^] via [^] ). It leads to misleading error messages like:

'File "", line 1:
Error: I/O error: Permission denied'

So the original hope remains unfulfilled. It must still be solved by build tools.
xleroy (administrator)
2017-11-12 17:16

I'm sorry to hear about those difficulties under Windows, but I think we have done everything we could on the OCaml side. If Windows is unable to rename a file atomically, that's a Windows problem.
xleroy (administrator)
2017-11-12 17:17

The GPRs were merged in OCaml 4.06 so I'm marking this MPR as resolved.

- Issue History
Date Modified Username Field Change
2017-01-30 22:49 gerd New Issue
2017-01-31 14:20 frisch Note Added: 0017209
2017-01-31 15:54 dra Note Added: 0017210
2017-01-31 15:54 dra Note Edited: 0017210 View Revisions
2017-01-31 16:25 gerd Note Added: 0017211
2017-01-31 16:35 dim Note Added: 0017212
2017-01-31 16:45 xleroy Relationship added related to 0004991
2017-01-31 16:47 xleroy Note Added: 0017213
2017-01-31 16:47 xleroy Status new => acknowledged
2017-01-31 17:23 dra Note Added: 0017214
2017-01-31 17:24 dra Note Edited: 0017214 View Revisions
2017-01-31 17:31 gerd Note Added: 0017215
2017-01-31 19:35 xleroy Note Added: 0017219
2017-02-01 10:53 gerd Note Added: 0017228
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-02-27 15:42 doligez Category -OCaml general => compiler driver
2017-02-27 15:42 doligez Severity feature => minor
2017-03-07 14:08 shinwell Severity minor => feature
2017-08-29 17:16 xleroy Note Added: 0018201
2017-08-29 17:17 xleroy Note Edited: 0018201 View Revisions
2017-11-11 14:22 aha Note Added: 0018645
2017-11-12 17:16 xleroy Note Added: 0018646
2017-11-12 17:17 xleroy Note Added: 0018647
2017-11-12 17:17 xleroy Status acknowledged => resolved
2017-11-12 17:17 xleroy Resolution open => fixed
2017-11-12 17:17 xleroy Fixed in Version => 4.06.0

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker