Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006239OCamlOCaml generalpublic2013-11-16 21:112013-11-28 15:30
Reporteryminsky 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version4.01.1+dev 
Summary0006239: make world.opt seems to crash on tip of trunk on up-to-date OS X
DescriptionI don't know if others can reproduce this, but on my mac, trunk segfaults when you try to build world.opt. Here's the github id of the version I tried.

df7e6c1ded6334937227fa1046102a79bc0f6a18

Here's the error message I got:

../boot/ocamlrun ../ocamlopt -nostdlib -I ../stdlib -I ../utils -I ../parsing -I ../typing -I ../bytecomp -I ../asmcomp -I ../driver -I ../toplevel -o read_cmt.opt ../utils/misc.cmx ../utils/warnings.cmx ../utils/tbl.cmx ../utils/consistbl.cmx ../utils/config.cmx ../utils/clflags.cmx ../parsing/location.cmx ../parsing/longident.cmx ../parsing/lexer.cmx ../parsing/pprintast.cmx ../parsing/ast_helper.cmx ../parsing/ast_mapper.cmx ../typing/ident.cmx ../typing/path.cmx ../typing/types.cmx ../typing/typedtree.cmx ../typing/btype.cmx ../typing/subst.cmx ../typing/predef.cmx ../typing/datarepr.cmx ../typing/cmi_format.cmx ../typing/env.cmx ../typing/ctype.cmx ../typing/oprint.cmx ../typing/primitive.cmx ../typing/printtyp.cmx ../typing/mtype.cmx ../typing/envaux.cmx ../typing/typedtreeMap.cmx ../typing/typedtreeIter.cmx ../typing/cmt_format.cmx ../typing/stypes.cmx untypeast.cmx tast_iter.cmx cmt2annot.cmx read_cmt.cmx
cd ocamldoc && /Applications/Xcode.app/Contents/Developer/usr/bin/make opt.opt
/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt -nostdlib -I ../stdlib -pp ./remove_DEBUG -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
/bin/sh: line 1: 73228 Segmentation fault: 11 ${CAMLOPT_BIN} -nostdlib -I ../stdlib -pp './remove_DEBUG' -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
make[3]: *** [odoc_config.cmx] Error 139
make[2]: *** [ocamldoc.opt] Error 2
make[1]: *** [opt.opt] Error 2
make: *** [world.opt] Error 2
Steps To ReproduceI've attached the log of the build, as well as some stack-traces from re-running the failing command using lldb
Additional Informationocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt -nostdlib -I ../stdlib -pp ./remove_DEBUG -I ../parsing -I ../utils -I ../typing -I ../driver -I ../bytecomp -I ../tools -I ../toplevel/ -I ../stdlib -I ../otherlibs/str -I ../otherlibs/dynlink -I ../otherlibs/unix -I ../otherlibs/num -I ../otherlibs/graph -warn-error A -c odoc_config.ml
Current executable set to '/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt' (x86_64).
(lldb) run
Process 73259 launched: '/Users/yminsky/Documents/code/ocaml-trunk/ocamlopt.opt' (x86_64)
Process 73259 stopped
* thread #1: tid = 0x4ade4, 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50
libsystem_malloc.dylib`large_malloc + 50:
-> 0x7fff89c30d49: movaps %xmm0, -64(%rbp)
   0x7fff89c30d4d: cmoveq %r13, %r14
   0x7fff89c30d51: shlq %cl, %r14
   0x7fff89c30d54: cmpq $134217727, %r14
(lldb) bt
* thread #1: tid = 0x4ade4, 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50
    frame #1: 0x00007fff89c363b6 libsystem_malloc.dylib`szone_malloc_should_clear + 287
    frame #2: 0x00007fff89c3887c libsystem_malloc.dylib`malloc_zone_malloc + 71
    frame 0000003: 0x00007fff89c39290 libsystem_malloc.dylib`malloc + 42
    frame 0000004: 0x00000001001c7f6e ocamlopt.opt`caml_stat_alloc + 14
    frame 0000005: 0x00000001001c3f99 ocamlopt.opt`caml_init_frame_descriptors + 185
    frame 0000006: 0x00000001001d9e03 ocamlopt.opt`caml_next_frame_descriptor + 35
    frame 0000007: 0x00000001001d9efd ocamlopt.opt`caml_stash_backtrace + 93
    frame 0000008: 0x00000001001da95e ocamlopt.opt`caml_raise_exn + 54
    frame 0000009: 0x000000010019d251 ocamlopt.opt`.L200 + 13
(lldb) frame select 4
frame 0000004: 0x00000001001c7f6e ocamlopt.opt`caml_stat_alloc + 14
ocamlopt.opt`caml_stat_alloc + 14:
-> 0x1001c7f6e: testq %rax, %rax
   0x1001c7f71: je 0x1001c7f7a ; caml_stat_alloc + 26
   0x1001c7f73: addq $8, %rsp
   0x1001c7f77: popq %rbx
(lldb) register read
General Purpose Registers:
       rbx = 0x0000000000080000
       rbp = 0x00007fff5fbff5a8
       rsp = 0x00007fff5fbff598
       r12 = 0x00007fff5fbff568
       r13 = 0x00000001001da921 ocamlopt.opt`caml_start_program + 165
       r14 = 0x0000000000010000
       r15 = 0x00000001010009a0
       rip = 0x00000001001c7f6e ocamlopt.opt`caml_stat_alloc + 14
13 registers were unavailable.
TagsNo tags attached.
Attached Files? file icon typescript [^] (7,597 bytes) 2013-11-16 21:11 [Show Content]
diff file icon patch-r14316.diff [^] (1,034 bytes) 2013-11-27 18:11 [Show Content]

- Relationships

-  Notes
(0010639)
frisch (developer)
2013-11-18 10:15

Can you check that the version your tried is after revision 14294 from the upstream SVN? (i.e. check that asmrun/fail.c includes "callback.h")
(0010643)
shinwell (developer)
2013-11-18 17:25

Also, which version of Mac OS X is this?
(0010647)
johnwhitington (reporter)
2013-11-18 17:44

SVN 14302 builds fine on OS X 10.9 with the latest XCode.
(0010651)
yminsky (reporter)
2013-11-19 12:45

I'm building with the latest xcode on 10.9. And the same box can build older versions, e.g., I built 4.00.1 on the same box after the build of trunk failed.

I'm not sure what extra debug info would be helpful for tracking this down. It's clearly not an issue with all os x builds.
(0010652)
yminsky (reporter)
2013-11-19 12:49

And I can confirm for Alain that it was exactly 14294 that I built.
(0010654)
shinwell (developer)
2013-11-19 13:54

I'm going to look at this on yminsky's machine.
(0010657)
frisch (developer)
2013-11-20 10:32

This should be fixed by commit 14307.
(0010658)
yminsky (reporter)
2013-11-21 05:06

Trying the latest version (14307), I new get it to fail in a different place:

ocaml-trunk $ lldb -- /Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt -nostdlib -I ../../stdlib -c -w +33..39 -warn-error A -g -nolabels unix.mli
Current executable set to '/Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt' (x86_64).
(lldb) run
Process 22317 launched: '/Users/yminsky/Documents/code/ocaml-trunk/ocamlc.opt' (x86_64)
Process 22317 stopped
* thread #1: tid = 0xd10c9, 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50
libsystem_malloc.dylib`large_malloc + 50:
-> 0x7fff89c30d49: movaps %xmm0, -64(%rbp)
   0x7fff89c30d4d: cmoveq %r13, %r14
   0x7fff89c30d51: shlq %cl, %r14
   0x7fff89c30d54: cmpq $134217727, %r14
(lldb) bt
* thread #1: tid = 0xd10c9, 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00007fff89c30d49 libsystem_malloc.dylib`large_malloc + 50
    frame #1: 0x00007fff89c363b6 libsystem_malloc.dylib`szone_malloc_should_clear + 287
    frame #2: 0x00007fff89c3887c libsystem_malloc.dylib`malloc_zone_malloc + 71
    frame 0000003: 0x00007fff89c39290 libsystem_malloc.dylib`malloc + 42
    frame 0000004: 0x000000010018c6ce ocamlc.opt`caml_stat_alloc + 14
    frame 0000005: 0x00000001001886f9 ocamlc.opt`caml_init_frame_descriptors + 185
    frame 0000006: 0x000000010019e563 ocamlc.opt`caml_next_frame_descriptor + 35
    frame 0000007: 0x000000010019e65d ocamlc.opt`caml_stash_backtrace + 93
    frame 0000008: 0x000000010019f1f6 ocamlc.opt`caml_raise_exn + 54
    frame 0000009: 0x0000000100161f41 ocamlc.opt`.L200 + 13
(0010659)
shinwell (developer)
2013-11-21 10:32

I'm trying to reproduce this now...
(0010660)
shinwell (developer)
2013-11-21 12:14

This is a horrid one. I couldn't reproduce it but then realized what's wrong: it's faulting because %rbp isn't 16-byte aligned on that 128-bit move in [large_malloc].

So it looks like this is very similar to mantis 5700. C functions have to be entered with %rsp mod 16 = 8. I have to go now, and I haven't yet identified exactly where this rule is being broken, but it should be enough for you (Alain!) to go on. My suspicion is that the assembly code of [caml_raise_exn] (and perhaps [caml_reraise_exn] in some cases) is being called with the wrong stack alignment.
(0010667)
shinwell (developer)
2013-11-27 12:21

I haven't managed to reproduce this yet. If anyone can reproduce it, please let me know. I expect to be able to get access to yminsky's machine in a couple of weeks.
(0010668)
avsm (reporter)
2013-11-27 14:34

I've successfully built trunk (r14390, remove camlp4) on OS X 10.9 and passed all tests with this gcc:

$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix
flick:testsuite avsm$ clang -v
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.0.0
Thread model: posix

I've also tried a build with various Malloc options enabled to see if that'll make a difference, which it hasn't. Yaron, how much memory do you have in your laptop (mine is 8GB, so I should be in high memory too).

$ env MallocScribble=1 MallocPreScribble=1 MallocGuardEdges=1 make world.opt

Not sure what else to try to reproduce this one.
(0010669)
doligez (administrator)
2013-11-27 17:01

I can't reproduce this problem on any of my Macs.
I've even tried to activate the -with-frame-pointers option to configure (had to patch configure to allow me), but that didn't crash.

I'm on r14310.
(0010670)
doligez (administrator)
2013-11-27 17:28

XL found the bug.
To reproduce, launch "OCAMLRUNPARAM=b make world.opt".

Xavier will explain the bug and post a patch soon.
(0010671)
xleroy (administrator)
2013-11-27 18:20

Consider:

let f x = raise x

Compile this with ocamlopt -g, and you'll see that the stack (initially = 8 mod 16) is not realigned to 0 mod 16 before calling caml_raise_exn.

Why? because ocamlopt treats this function as a leaf function (!Proc.contains_calls = false) which does not need allocation of a proper stack frame.

The criteria for a leaf function are pretty strict: it should
- not contain any call to a Caml function (except tail calls)
- not contain any call to a C function
- not allocate (because this can call the GC)
- not spill any temporaries to the stack.

Can you spot the missing case? Yes, there is one: if the function contains a "raise" and is compiled with -g, a call to a C function (caml_stash_backtrace) can occur, so it must not be a tail function.

This issue has been with us for a long time, but I believe it shows up only now because of Alain's recent optimization of constant exceptions. Before, raising such an exception would always allocate, causing the enclosing function to lose its leaf status. Now, we have more cases of useful functions that raise exceptions but don't allocate.

The fix is pretty simple: set Proc.contains_calls to true if the function contains a "raise" (not of the "notrace" kind) and is compiled with -g.

This fix is committed on SVN trunk, r14136, and a patch is attached.

Please let us know if this fixes the crash; then, I'll port it to the 4.01 branch.
(0010672)
avsm (reporter)
2013-11-27 18:48

Confirmed the crash and the fix on OSX 10.9 and 4.02.0dev+trunk.
(0010673)
johnwhitington (reporter)
2013-11-27 19:54

Patch as applied in 14316 fixes the crash here.
(0010674)
doligez (administrator)
2013-11-28 14:57

Confirmed the fix on OSX 10.7 with Xcode 4.6.3.
(0010675)
xleroy (administrator)
2013-11-28 15:30

Thanks for the confirmations. Fix also applied to 4.01 bugfix branch, r14320. Marking this PR as resolved.

- Issue History
Date Modified Username Field Change
2013-11-16 21:11 yminsky New Issue
2013-11-16 21:11 yminsky File Added: typescript
2013-11-18 10:15 frisch Note Added: 0010639
2013-11-18 17:25 shinwell Note Added: 0010643
2013-11-18 17:44 johnwhitington Note Added: 0010647
2013-11-19 12:45 yminsky Note Added: 0010651
2013-11-19 12:49 yminsky Note Added: 0010652
2013-11-19 13:54 shinwell Note Added: 0010654
2013-11-19 13:54 shinwell Assigned To => shinwell
2013-11-19 13:54 shinwell Status new => assigned
2013-11-20 10:32 frisch Note Added: 0010657
2013-11-21 05:06 yminsky Note Added: 0010658
2013-11-21 10:32 shinwell Note Added: 0010659
2013-11-21 12:14 shinwell Note Added: 0010660
2013-11-21 12:15 shinwell Assigned To shinwell => frisch
2013-11-25 10:24 shinwell Assigned To frisch => shinwell
2013-11-27 12:21 shinwell Note Added: 0010667
2013-11-27 14:34 avsm Note Added: 0010668
2013-11-27 17:01 doligez Note Added: 0010669
2013-11-27 17:28 doligez Note Added: 0010670
2013-11-27 18:11 xleroy File Added: patch-r14316.diff
2013-11-27 18:20 xleroy Note Added: 0010671
2013-11-27 18:20 xleroy Status assigned => feedback
2013-11-27 18:48 avsm Note Added: 0010672
2013-11-27 19:54 johnwhitington Note Added: 0010673
2013-11-28 14:57 doligez Note Added: 0010674
2013-11-28 15:30 xleroy Note Added: 0010675
2013-11-28 15:30 xleroy Assigned To shinwell =>
2013-11-28 15:30 xleroy Status feedback => resolved
2013-11-28 15:30 xleroy Resolution open => fixed
2013-11-28 15:30 xleroy Fixed in Version => 4.01.1+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker