Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006329OCamlruntime system and C interfacepublic2014-02-18 10:392017-09-20 16:31
Reportergmelquiond 
Assigned Toshinwell 
PrioritylowSeveritymajorReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version
Product Version4.01.0 
Target Version4.06.0+devFixed in Version 
Summary0006329: File "camlinternalMod.ml", line 63, characters 6-12: Assertion failed.
DescriptionThere is an assertion failed in the OCaml runtime when dynamically loading a recursive module. Unfortunately, the bug is highly sensitive to conditions: changing the size of the code segment, changing the size of the data segment, changing the size of the read-only data segment, changing the number of relocations, etc, all cause the bug to vanish. As a consequence, we were unable to produce a small testcase, sorry. This bug report is mostly for the record. That said, it is perfectly reproducible and we were able to trigger it on three systems with OCaml 4.01.0 and Coq 8.4pl3.
Steps To Reproducegit clone git://scm.gforge.inria.fr/why3/why3.git [^]
cd why3
git checkout a8c13660788c726a134fab9109e384592881e8d7
autoconf
./configure
cat > src/util/config.ml << EOF
let version = "1234567812345678123456781234567812345678123456781234567812345678123456781234567"
let builddate = ""
let libdir = ""
let datadir = ""
let localdir = Some ""
let compile_time_support = [ "", ""; "", "" ]
EOF
make src/coq-tactic/.why3-vo-opt
Tagsrecmod
Attached Files

- Relationships

-  Notes
(0010963)
frisch (developer)
2014-02-20 11:36

Could it be the case that two units of the same name where dynlinked in the same process? Do you use -pack?
(0010964)
frisch (developer)
2014-02-20 11:59

$ make src/coq-tactic/.why3-vo-opt
...
...
Ocamldep src/util/extmap.ml
Ocamldep src/util/strings.ml
Ocamldep src/util/lists.ml
Ocamldep src/util/opt.ml
Ocamldep src/util/util.ml
Ocamldep src/util/config.ml
make: *** No rule to make target `src/coq-tactic/.why3-vo-opt'. Stop.


Do we need to install anything else in addition to OCaml itself?
(0010965)
gmelquiond (reporter)
2014-02-20 12:20

I don't think there are two units with the same name. (Any way to make sure?) We are using -pack, so that all our modules are submodules of Why3.

Also, you need Coq, since the dynamically-loaded module is actually a Coq plugin and it depends on various Coq modules. Due to the high sensitivity on the size of the segments and the number of relocations, I was unfortunately not able to remove Coq from the equation.
(0011590)
shinwell (developer)
2014-05-30 13:02

I tried to reproduce this, building a clean 4.01 OCaml installation, and then compiling coq-8.4pl3 from the distribution tarball. With both of the installation directories on my PATH, the configure script for why3 said:

                 Summary
-----------------------------------------
Verbose make : no
OCaml compiler : yes
    Version : 4.01.1+dev2-2013-12-18+CLOSED
    Library path : /home/mark/dev/pr/6329/ocaml/lib/ocaml
    Native compilation : yes
    Profiling : no
IDE : no (lablgtk2 not found)
Bench tool : no (sqlite3 not found)
Documentation : no (rubber not found)
Coq support : yes
    Version : 8.4pl3
    Library path : /home/mark/dev/pr/6329/coq-install/lib/coq
    "why3" tactic : no (camlp5o not found)
    Realization support : yes
        FP arithmetic : no (Flocq >= 2.2 not found)
PVS support : no (pvs not found)
Isabelle support : no (isabelle not found)
Frama-C support : no
Hypothesis selection : no (ocamlgraph not found)
Installable : yes
    Binary path : ${exec_prefix}/bin
    Lib path : ${exec_prefix}/lib/why3
    Data path : ${prefix}/share/why3
    Ocaml Library : /home/mark/dev/pr/6329/ocaml/lib/ocaml/why3
    Relocatable : no

However, I still get the same error "make: *** No rule to make target `src/coq-tactic/.why3-vo-opt'. Stop.". I think it might be because some dependency of lib/coq-tactic/why3tac.cmxs can't or won't build.

Could you please adjust the repro instructions, and then we can try again? Thanks.
(0011591)
shinwell (developer)
2014-05-30 13:17

Ah, the error was caused by missing camlp5...
(0011592)
shinwell (developer)
2014-05-30 13:55
edited on: 2014-05-30 13:58

OK, so I managed to get that make command to complete successfully.

Maybe you could prepare a tarball that includes all of the sources and objects from a tree where this bug has been observed.

Also, it would be useful if you could get this to fail having altered your OCaml source tree such that instead of just bombing out with an assertion failure, it prints the values of:
- n
- Obj.tag n
- Obj.size n
- Array.length comps

(0013426)
gmelquiond (reporter)
2015-03-10 18:15

This bug started popping up again, this time on our regression server. So I took the chance to create a minimal environment for reliably reproducing it. Since the chroot is 1GB-large, I will not post its tarball here. ("Minimal" meant the smallest Debian system able to compile the Why3 plugin nowadays, so it is quite large.)

Due to the high sensitivity to the compilation conditions (see original report: even adding a string causes the bug to vanish), I have not tried modifying the standard library of Ocaml to add some debug printfs. However, I am willing to keep the chroot around and try any non-intrusive method you can think of. In particular, if you can think of a way of recovering n, Obj.tag n, etc from e.g. gdb, please do to tell.
(0013862)
shinwell (developer)
2015-05-07 14:57

Can you please put the tarball somewhere on the web such that I can download it, and try to reproduce this problem?
(0014685)
xleroy (administrator)
2015-11-15 17:40

Any news about this mysterious failure?
(0015343)
doligez (administrator)
2016-02-11 14:18

I'm in contact with gmelquiond who will give me the repro file.
(0015674)
doligez (administrator)
2016-04-04 17:16

I got the repro file (364 MB).
(0016819)
shinwell (developer)
2016-12-08 09:24

@doligez Are you able to reproduce this bug using the repro file (which only you have)?
(0017281)
xleroy (administrator)
2017-02-16 11:18

So, is this issue reproducible or not?
(0017392)
doligez (administrator)
2017-02-21 17:19

I'm sorry I dropped the ball on this one. I still have the repro file and I'll try it shortly.
(0017605)
shinwell (developer)
2017-03-08 18:35

I have successfully reproduced this, and will investigate.
(0017609)
shinwell (developer)
2017-03-09 11:59

This problem arises because an empty module is passed to [CamlinternalMod.update_mod] as the [n] argument. The empty module lives in the data section of a shared library being loaded via natdynlink. It appears to be defined within another module called Why3.Mlw_ty. By chance, it is the last thing in the "data_begin .. data_end" range for Why3.Mlw_ty. On my system the address of camlWhy3__Mlw_ty__data_end is 0x7fffecda6000; this is also the address of the empty module, and the preceding word is indeed zero (= header for zero-sized array).

When CamlinternalMod.update_mod is in the "Module" case it examines the empty module's tag, using [caml_obj_tag], to ensure it is indeed zero. This should work, because the area should be registered in the page table as static data. Unfortunately it is not; the function returns 1001 meaning "out of heap", and the assertion fails.

This stems from fragility in the page table infrastructure. What should happen is that the entire data_begin .. data_end range for each of the modules in the .cmxs should be registered as static data. Unfortunately the calculation for the above end address goes like this:

(gdb) p/x (0x7fffecda6000 - 1) & (0xffffffffffffffff << 12)
$4 = 0x7fffecda5000

which means that various things, including the aforementioned empty module, won't get recognised as being in static data.

To make this robust I think you'd have to do two things:
1. Ensure data_begin .. data_end sections are a multiple of Page_size in size;
2. Ensure that shared libraries get linked such that every data_begin therein lands at an address which is divisible by the page size.

I'm not even sure 2 is possible in general. I would like to propose a perhaps more satisfactory solution which is that I stop procrastinating about https://github.com/ocaml/ocaml/pull/203 [^] with a view to completely removing the page table for 4.06 and imposing the no-naked-pointers conditions by default. We can try to write some kind of helper tool for migration based on 4.05 (maybe a different OPAM switch with more checks). The forthcoming multicore work also has no page table, so this would be consistent with that.

We could change camlinternalMod.ml in 4.05 so that it uses a different function for testing the tag, or we could be brave and temporarily disable the assertion.
(0017610)
shinwell (developer)
2017-03-09 12:10

Thinking about this a bit more, maybe it isn't as bad as all that (although I still favour the proposed solution above). The situation probably only arises because there's an empty block immediately before "_data_end". So adding a padding word at that point would likely fix the problem, because the subsequent page would get registered in the table.

I suppose in terms of the more general issue: we can assume that shared libraries won't be loaded at an address that isn't zero modulo the OS page size, as memory protection needs to be applied (and that presumably works in page units). So we might overshoot when registering one data_begin .. data_end pair in two ways: firstly, into another such pair---which is fine---or secondly, off the end of the data section of the shared library itself. However we shouldn't go further than the next OS page boundary if that condition holds. So maybe it's ok, just.
(0018099)
frisch (developer)
2017-07-19 11:43

Mark: do you think you can propose a fix in time for 4.06's freeze date?
(0018295)
xleroy (administrator)
2017-09-20 16:11

An inglorious quick fix would be to remove the assertion "Obj.tag n = 0" in the "Module" case of "update_mod".

In the longer term @shinwell is right that an extra world of padding at the end of the data segment can make sense.
(0018296)
shinwell (developer)
2017-09-20 16:31

Sorry I completely missed @frisch's ping on this one. I will try to do something for 4.06.

- Issue History
Date Modified Username Field Change
2014-02-18 10:39 gmelquiond New Issue
2014-02-19 20:46 doligez Tag Attached: recmod
2014-02-20 11:36 frisch Note Added: 0010963
2014-02-20 11:59 frisch Note Added: 0010964
2014-02-20 12:20 gmelquiond Note Added: 0010965
2014-05-30 13:02 shinwell Note Added: 0011590
2014-05-30 13:17 shinwell Note Added: 0011591
2014-05-30 13:55 shinwell Note Added: 0011592
2014-05-30 13:55 shinwell Assigned To => shinwell
2014-05-30 13:55 shinwell Status new => feedback
2014-05-30 13:58 shinwell Note Edited: 0011592 View Revisions
2014-07-08 11:30 doligez Target Version => 4.03.0+dev / +beta1
2015-03-10 18:15 gmelquiond Note Added: 0013426
2015-03-10 18:15 gmelquiond Status feedback => assigned
2015-05-07 14:57 shinwell Note Added: 0013862
2015-05-07 14:57 shinwell Status assigned => feedback
2015-11-15 17:40 xleroy Note Added: 0014685
2016-02-11 14:18 doligez Note Added: 0015343
2016-04-04 17:16 doligez Note Added: 0015674
2016-04-14 16:31 doligez Target Version 4.03.0+dev / +beta1 => 4.03.1+dev
2016-12-08 09:24 shinwell Note Added: 0016819
2016-12-08 09:46 shinwell Status feedback => assigned
2017-02-16 11:18 xleroy Note Added: 0017281
2017-02-16 11:18 xleroy Target Version 4.03.1+dev => 4.05.0 +dev/beta1/beta2/beta3/rc1
2017-02-21 17:19 doligez Note Added: 0017392
2017-02-23 16:43 doligez Category OCaml runtime system => runtime system
2017-03-03 17:45 doligez Category runtime system => runtime system and C interface
2017-03-08 18:35 shinwell Note Added: 0017605
2017-03-09 11:59 shinwell Note Added: 0017609
2017-03-09 12:10 shinwell Note Added: 0017610
2017-06-09 12:16 doligez Target Version 4.05.0 +dev/beta1/beta2/beta3/rc1 => 4.06.0+dev
2017-07-19 11:43 frisch Note Added: 0018099
2017-09-20 16:11 xleroy Note Added: 0018295
2017-09-20 16:31 shinwell Note Added: 0018296


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker