New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File "camlinternalMod.ml", line 63, characters 6-12: Assertion failed. #6329
Comments
Comment author: @alainfrisch Could it be the case that two units of the same name where dynlinked in the same process? Do you use -pack? |
Comment author: @alainfrisch $ make src/coq-tactic/.why3-vo-opt Do we need to install anything else in addition to OCaml itself? |
Comment author: gmelquiond I don't think there are two units with the same name. (Any way to make sure?) We are using -pack, so that all our modules are submodules of Why3. Also, you need Coq, since the dynamically-loaded module is actually a Coq plugin and it depends on various Coq modules. Due to the high sensitivity on the size of the segments and the number of relocations, I was unfortunately not able to remove Coq from the equation. |
Comment author: @mshinwell I tried to reproduce this, building a clean 4.01 OCaml installation, and then compiling coq-8.4pl3 from the distribution tarball. With both of the installation directories on my PATH, the configure script for why3 said:
Verbose make : no However, I still get the same error "make: *** No rule to make target `src/coq-tactic/.why3-vo-opt'. Stop.". I think it might be because some dependency of lib/coq-tactic/why3tac.cmxs can't or won't build. Could you please adjust the repro instructions, and then we can try again? Thanks. |
Comment author: @mshinwell Ah, the error was caused by missing camlp5... |
Comment author: @mshinwell OK, so I managed to get that make command to complete successfully. Maybe you could prepare a tarball that includes all of the sources and objects from a tree where this bug has been observed. Also, it would be useful if you could get this to fail having altered your OCaml source tree such that instead of just bombing out with an assertion failure, it prints the values of:
|
Comment author: gmelquiond This bug started popping up again, this time on our regression server. So I took the chance to create a minimal environment for reliably reproducing it. Since the chroot is 1GB-large, I will not post its tarball here. ("Minimal" meant the smallest Debian system able to compile the Why3 plugin nowadays, so it is quite large.) Due to the high sensitivity to the compilation conditions (see original report: even adding a string causes the bug to vanish), I have not tried modifying the standard library of Ocaml to add some debug printfs. However, I am willing to keep the chroot around and try any non-intrusive method you can think of. In particular, if you can think of a way of recovering n, Obj.tag n, etc from e.g. gdb, please do to tell. |
Comment author: @mshinwell Can you please put the tarball somewhere on the web such that I can download it, and try to reproduce this problem? |
Comment author: @xavierleroy Any news about this mysterious failure? |
Comment author: @damiendoligez I'm in contact with gmelquiond who will give me the repro file. |
Comment author: @damiendoligez I got the repro file (364 MB). |
Comment author: @mshinwell @doligez Are you able to reproduce this bug using the repro file (which only you have)? |
Comment author: @xavierleroy So, is this issue reproducible or not? |
Comment author: @damiendoligez I'm sorry I dropped the ball on this one. I still have the repro file and I'll try it shortly. |
Comment author: @mshinwell I have successfully reproduced this, and will investigate. |
Comment author: @mshinwell This problem arises because an empty module is passed to [CamlinternalMod.update_mod] as the [n] argument. The empty module lives in the data section of a shared library being loaded via natdynlink. It appears to be defined within another module called Why3.Mlw_ty. By chance, it is the last thing in the "data_begin .. data_end" range for Why3.Mlw_ty. On my system the address of camlWhy3__Mlw_ty__data_end is 0x7fffecda6000; this is also the address of the empty module, and the preceding word is indeed zero (= header for zero-sized array). When CamlinternalMod.update_mod is in the "Module" case it examines the empty module's tag, using [caml_obj_tag], to ensure it is indeed zero. This should work, because the area should be registered in the page table as static data. Unfortunately it is not; the function returns 1001 meaning "out of heap", and the assertion fails. This stems from fragility in the page table infrastructure. What should happen is that the entire data_begin .. data_end range for each of the modules in the .cmxs should be registered as static data. Unfortunately the calculation for the above end address goes like this: (gdb) p/x (0x7fffecda6000 - 1) & (0xffffffffffffffff << 12) which means that various things, including the aforementioned empty module, won't get recognised as being in static data. To make this robust I think you'd have to do two things:
I'm not even sure 2 is possible in general. I would like to propose a perhaps more satisfactory solution which is that I stop procrastinating about #203 with a view to completely removing the page table for 4.06 and imposing the no-naked-pointers conditions by default. We can try to write some kind of helper tool for migration based on 4.05 (maybe a different OPAM switch with more checks). The forthcoming multicore work also has no page table, so this would be consistent with that. We could change camlinternalMod.ml in 4.05 so that it uses a different function for testing the tag, or we could be brave and temporarily disable the assertion. |
Comment author: @mshinwell Thinking about this a bit more, maybe it isn't as bad as all that (although I still favour the proposed solution above). The situation probably only arises because there's an empty block immediately before "_data_end". So adding a padding word at that point would likely fix the problem, because the subsequent page would get registered in the table. I suppose in terms of the more general issue: we can assume that shared libraries won't be loaded at an address that isn't zero modulo the OS page size, as memory protection needs to be applied (and that presumably works in page units). So we might overshoot when registering one data_begin .. data_end pair in two ways: firstly, into another such pair---which is fine---or secondly, off the end of the data section of the shared library itself. However we shouldn't go further than the next OS page boundary if that condition holds. So maybe it's ok, just. |
Comment author: @alainfrisch Mark: do you think you can propose a fix in time for 4.06's freeze date? |
Comment author: @xavierleroy An inglorious quick fix would be to remove the assertion "Obj.tag n = 0" in the "Module" case of "update_mod". In the longer term @shinwell is right that an extra world of padding at the end of the data segment can make sense. |
Comment author: @mshinwell Sorry I completely missed @Frisch's ping on this one. I will try to do something for 4.06. |
Comment author: @gasche Ping again :-) |
Comment author: @mshinwell I will try to look at this tomorrow (Friday) |
Comment author: @mshinwell Superceded by #1437 |
Original bug ID: 6329
Reporter: gmelquiond
Assigned to: @mshinwell
Status: resolved (set by @mshinwell on 2017-10-20T09:00:38Z)
Resolution: duplicate
Priority: low
Severity: major
Version: 4.01.0
Target version: 4.06.0 +dev/beta1/beta2/rc1
Category: runtime system and C interface
Tags: recmod
Bug description
There is an assertion failed in the OCaml runtime when dynamically loading a recursive module. Unfortunately, the bug is highly sensitive to conditions: changing the size of the code segment, changing the size of the data segment, changing the size of the read-only data segment, changing the number of relocations, etc, all cause the bug to vanish. As a consequence, we were unable to produce a small testcase, sorry. This bug report is mostly for the record. That said, it is perfectly reproducible and we were able to trigger it on three systems with OCaml 4.01.0 and Coq 8.4pl3.
Steps to reproduce
git clone git://scm.gforge.inria.fr/why3/why3.git
cd why3
git checkout a8c13660788c726a134fab9109e384592881e8d7
autoconf
./configure
cat > src/util/config.ml << EOF
let version = "1234567812345678123456781234567812345678123456781234567812345678123456781234567"
let builddate = ""
let libdir = ""
let datadir = ""
let localdir = Some ""
let compile_time_support = [ "", ""; "", "" ]
EOF
make src/coq-tactic/.why3-vo-opt
The text was updated successfully, but these errors were encountered: