New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Z.of_int as %identity in Zarith (was: zarith sample code segfaults) #6501
Comments
Comment author: @diml When z.cmx is present functions from the Z module might be inlined. I diffed the two clambda and there are actually very few differences. Only [Z.zero] and [Z.of_int] are inlined. |
Comment author: @yakobowski The bug disappears (and the code becomes faster :-)) when the following definition is used for memoize let memoize f = However, the real problem probably lies in the loop
If total is instead initialized to Z.of_int 1, and the code returns 'Z.sub !total (Z.of_int 1)' instead, the segfault disappears. It is likely that 'total' changes from OCaml integers to Zarith internal representation at some point, which confuses either the code generator or the GC. EDIT: printing the linear code with initial value 0 and 1 shows that very little changes between the two versions. Either the problem is in the GC, or this is the wrong track. |
Comment author: Antoine Mine Thanks for the bug analysis. Indeed, Z.of_int (and co.) seems to be the culprit. It is defined as %identity, which was probably a bad idea. Redefining it as a C external "ml_z_of_int" solves the issue. I think I will go with this solution and patch Zarith. Nevertheless, I wonder if there is some relationship with Lazy.t as: 1) bug.ml is the first program I've seen that mixes Lazy and Zarith, 2) I've never had problems with Zarith for a long long time, 3) both do colorful magic mixing values and blocks. A wild guess is that OCaml performs type-based optimizations (or, more precise, using the abstraction "is a scalar", "is a block", "is polymorphic") and uses the knowledge that a mutable object's initial value is the result of calling %identity on an integer. However, the type of Z.of_int is "int -> t" where t is abstract, which may (should?) prevent type propagation. |
Comment author: @yakobowski My remark about replacing 'ref Z.zero' by 'ref (Z.of_int 1)' may actually be a red herring. When doing so, the arguments to some GC functions (prior to the crash) do change in a significant way. It is theoretically possible that the switch from OCaml short integers to Zarith long integers occur earlier because of this. This would change the allocation profile of the code. |
Comment author: @diml The typer doesn't really cares about the fact that "Z.of_int" is declared as an external. This is used by later phases in the compiler, so I don't think that's the problem. The assembly code has very few differences between the inlined and non-inlined versions anyway. Lazy shouldn't be a problem here. I tried replacing the lazy by a ref and the bug is still there. I played a bit with the configuration in caml_z.c, setting Z_PERFORM_CHECK to 1 produces a different error: ml_z_check: wrong custom block for arg2 at ml_z_mul:1339 also reducing the size of the minor heap make the bug happen much faster, for instance with OCAMLRUNPARAM=s=256. The bug always happen some time after the first minor collection, so something might get corrupted during the promotion. |
Comment author: @diml I spoke a bit too fasst. Looking at -dlive total is indeed considered as always an integer and so ignored by the gc. ocamlobjinfo shows that [Z.zero] is an integer constant and not a pointer constant which is probably the reason. |
Comment author: @xavierleroy I confirm that the problem is a missing GC root. A bignum gets allocated, then freed by a minor collection, then used again as argument to Z.mul. The typing issue mentioned by "dim" is very likely the culprit. This could be a misuse of "%identity" as mentioned by Antoine, or something else. I'll investigate more. |
Comment author: @xavierleroy Indeed, the problem lies with "%identity" when it is used to inject integers into a type that can also contain heap pointers. A similar issue can be observed with Obj.magic, which is also "%identity": let f n = let _ = f 10000 This segfaults shortly after the first GC, because r is not a root, Why is it not a root? Because it's a local mutable variable (after let-ref optimization) that is initialized with an integer (a C-- Cconst_int, not a C-- Cconst_pointer). So, it has machine type "int" and not "address", and is excluded from the roots given to the GC. The safe, short-term fix is to stop implementing "Z.of_int" as "%identity" and use a C function instead, as Antoine suggested. Additionally, Zarith could define type t' = Zero | One | Other of Obj.t let zero : t = Obj.magic Zero so that zero, one would be constants of machine type address. But it doesn't work for minus_one nor for of_int in general. In the longer term, there might be better solutions with some changes to the ocamlopt compiler. One would be a primitive like "%identity" that forces its result to have machine type "address". Unlike "%identity", it would have to be kept by the front-end all the way to C--. Another solution would be to treat mutable let-bound variables differently during C-- to Mach compilation, so that their machine type is inferred not just from their initial value but from all their assignments as well. |
Comment author: @xavierleroy I reclassify this PR as a feature wish so that it doesn't block the release of OCaml 4.02. |
Comment author: Antoine Mine Thank you very much for your help and the clear explanation. I've patched Zarith to use a C external instead of %identity for of_int. It works well. |
Comment author: @xavierleroy Tentative fix in SVN trunk, commit r15078. The machine type of a C-- mutable variable is now the lub of the types of its initial value and all the values assigned to it, with type Int being a subtype of Addr. In the problematic examples from this PR, it causes the mutable variables to have type Addr and be treated as GC roots (correctly). |
Original bug ID: 6501
Reporter: Antoine Mine
Status: closed (set by @xavierleroy on 2016-12-07T10:34:27Z)
Resolution: fixed
Priority: normal
Severity: feature
Platform: x86_64
OS: Linux
Version: 4.02.0+beta1 / +rc1
Target version: 4.02.1+dev
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Monitored by: avaron yminsky @yakobowski
Bug description
This is an upstream from Zarith's bug #1409
https://forge.ocamlcore.org/tracker/index.php?func=detail&aid=1409&group_id=243&atid=1095
The client code bug.ml (in Zarith's bug report) uses the Z module compiled by Zarith into zarith.cmxa. When compiling bug.ml with ocamlopt and the z.cmx file from the compilation of Zarith is present, the code segfaults in the GC:
#0 0x000000000044fc28 in invert_pointer_at ()
#1 0x000000000044fcb3 in invert_root ()
#2 0x0000000000441ef0 in caml_do_roots ()
#3 0x000000000044fd63 in do_compaction ()
#4 0x00000000004501e8 in caml_compact_heap ()
#5 0x000000000045056d in caml_compact_heap_maybe ()
#6 0x0000000000443507 in caml_major_collection_slice ()
#7 0x0000000000443c71 in caml_minor_collection ()
#8 0x00000000004429f5 in caml_garbage_collection ()
#9 0x000000000045179c in caml_call_gc ()
#10 0x0000000000432147 in camlHashtbl__mem_1128 () at hashtbl.ml:181
#11 0x000000000041054d in camlBug__fun_1093 ()
#12 0x000000000041068b in camlBug__count_1066 ()
#13 0x000000000041057f in camlBug__fun_1093 ()
#14 0x000000000041057f in camlBug__fun_1093 ()
#15 0x00000000004105c6 in camlBug__count_1057 ()
#16 0x000000000041057f in camlBug__fun_1093 ()
#17 0x0000000000410ae5 in camlBug__entry ()
#18 0x000000000040e689 in caml_program ()
#19 0x000000000045197e in caml_start_program ()
#20 0x0000000000000000 in ?? ()
Well, it might be a problem in Zarith, which does magic mixing of OCaml, C and asm.
But now for the weird part. If the z.cmx file is not present when bug.ml is compiled, the program does not segfault. This is always reproducible on my machine. I've checked that the code generated by ocamlopt in both cases is actually quite different.
Steps to reproduce
Download zarith and unpack it
https://forge.ocamlcore.org/frs/download.php/1199/zarith-1.2.1.tgz
./configure and make it
download bug.ml and copy it in zarith source
https://forge.ocamlcore.org/tracker/download.php/243/1095/1409/261/bug.ml
check that z.cmx is in the working directory as well as bug.ml
compile with "ocamlopt zarith.cmxa bug.ml -cclib -L."
./a.out segfaults
remove z.cmx
recompile with "ocamlopt zarith.cmxa bug.ml -cclib -L."
./a.out now works
The text was updated successfully, but these errors were encountered: