Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006484OCamlOCaml backend (code generation)public2014-07-13 22:002014-08-11 09:15
ReporterRichard Jones 
Assigned Toshinwell 
PrioritynormalSeveritycrashReproducibilityrandom
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.02.0+beta1 / +rc1 
Target Version4.02.0+devFixed in Version4.02.0+dev 
Summary0006484: ocamlopt.opt on 32 bit arm segfaults compiling ounit 2.0.0
Descriptionocamlopt.opt (armv7hl) segfaults when compiling one file
from ounit 2.0.0.

I am using ocaml 4.02 (commit 8c1e5cdf915), on 32 bit ARM.

When building ounit, the following command segfaults:

+ ocamlfind ocamlopt -c -g -I src -I src -package threads -package unix -thread -I src -o src/oUnitConf.cmx src/oUnitConf.ml
File "src/oUnitConf.ml", line 77, characters 16-35:
Warning 3: deprecated feature: String.set
ocamlopt.opt got signal and exited
Command exited with code 2.

I captured a core dump and the stack trace is:

Core was generated by `ocamlopt.opt -c -g -I src -I src -I src -o src/oUnitConf.cmx -thread src/oUnitC'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0014c208 in caml_oldify_mopup () at minor_gc.c:206
206 oldify_todo_list = Field (new_v, 1); /* Remove from list. */
Missing separate debuginfos, use: debuginfo-install glibc-2.19.90-26.fc21.armv7hl libgcc-4.9.0-14.fc21.armv7hl
(gdb) bt
#0 0x0014c208 in caml_oldify_mopup () at minor_gc.c:206
#1 0x0014c308 in caml_empty_minor_heap () at minor_gc.c:237
#2 0x0014c410 in caml_minor_collection () at minor_gc.c:276
0000003 0x0014b2b0 in caml_garbage_collection () at signals_asm.c:70
0000004 0x0015be1c in caml_call_gc ()
0000005 0x00032fac in camlSelectgen__fun_2006 ()
0000006 0x00032fac in camlSelectgen__fun_2006 ()
0000007 0x00032fac in camlSelectgen__fun_2006 ()
0000008 0x00032fac in camlSelectgen__fun_2006 ()
0000009 0x00032fac in camlSelectgen__fun_2006 ()
0000010 0x00032fac in camlSelectgen__fun_2006 ()
0000011 0x00032fac in camlSelectgen__fun_2006 ()
0000012 0x00032fac in camlSelectgen__fun_2006 ()
[same stack frame repeated forever]

(gdb) list
201
202 while (oldify_todo_list != 0){
203 v = oldify_todo_list; /* Get the head. */
204 Assert (Hd_val (v) == 0); /* It must be forwarded. */
205 new_v = Field (v, 0); /* Follow forward pointer. */
206 oldify_todo_list = Field (new_v, 1); /* Remove from list. */
207
208 f = Field (new_v, 0);
209 if (Is_block (f) && Is_young (f)){
210 caml_oldify_one (f, &Field (new_v, 0));
(gdb) print oldify_todo_list
$1 = -1228923798
(gdb) print ((value*)oldify_todo_list)[-1]
$5 = 0
(gdb) print ((value*)oldify_todo_list)[0]
$2 = 25600
(gdb) print ((value*)oldify_todo_list)[1]
$3 = 1640235008
(gdb) print ((value*)oldify_todo_list)[2]
$4 = 243318
(gdb) print ((value*)25600)[0]
Cannot access memory at address 0x6400

Note that the crash is not 100% reliable by any means. On my
local hardware it happens, but rarely. On the Fedora builders
it happens quite often but not always.
Steps To ReproduceBuild & install ocaml 4.02 from git (https://github.com/ocaml/ocaml [^])

Download ounit 2.0.0 from http://ounit.forge.ocamlcore.org/ [^]

zcat ounit-2.0.0.tar.gz | tar xf -
./configure
make all

You may need to repeat `make clean ; make all'
several times since the crash isn't 100% reliable.
Additional InformationWorks fine on x86.
TagsNo tags attached.
Attached Files

- Relationships
related to 0006486resolvedshinwell ocamlopt.opt on aarch64 runs out of memory compiling camlp4 

-  Notes
(0011805)
Richard Jones (reporter)
2014-07-13 23:08

camlp4 also fails to build on arm 32 bit in the same way.

/usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/ocamlbuild/ocamlbuildlib.cmxa myocamlbuild_config.ml myocamlbuild.ml /usr/lib/ocaml/ocamlbuild/ocamlbuild.cmx -o myocamlbuild
+ /usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/ocamlbuild/ocamlbuildlib.cmxa myocamlbuild_config.ml myocamlbuild.ml /usr/lib/ocaml/ocamlbuild/ocamlbuild.cmx -o myocamlbuild
Command got signal -10.
make: *** [byte] Error 11

Stack trace is the same as above:

Core was generated by `/usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/oca'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 caml_oldify_mopup () at minor_gc.c:217
217 Field (new_v, i) = f;
Missing separate debuginfos, use: debuginfo-install glibc-2.19.90-26.fc21.armv7hl libgcc-4.9.0-14.fc21.armv7hl
(gdb) bt
#0 caml_oldify_mopup () at minor_gc.c:217
#1 0x0014c308 in caml_empty_minor_heap () at minor_gc.c:237
#2 0x0014c410 in caml_minor_collection () at minor_gc.c:276
0000003 0x0014b2b0 in caml_garbage_collection () at signals_asm.c:70
0000004 0x0015be1c in caml_call_gc ()
0000005 0x00032ff8 in camlSelectgen__fun_2001 ()
0000006 0x00032ff8 in camlSelectgen__fun_2001 ()
0000007 0x00032ff8 in camlSelectgen__fun_2001 ()
[etc]
(0011807)
Richard Jones (reporter)
2014-07-14 15:13

The script I am now using to bisect this problem is below.

$HOME/d/ocaml contains ocaml sources (from git, branch 4.02)

$HOME/d/camlp4 contains camlp4 sources (from git)

-------------------
#!/bin/bash

sources=$HOME/d/ocaml

cd $HOME/d/camlp4
rm *.cmx
exec \
$sources/ocamlopt.opt -I $sources/stdlib -I $sources/otherlibs/unix \
    -I $sources/ocamlbuild \
    unix.cmxa ocamlbuildlib.cmxa \
    myocamlbuild_config.ml myocamlbuild.ml \
    ocamlbuild.cmx \
    -o /tmp/myocamlbuild
(0011809)
shinwell (developer)
2014-07-14 15:49
edited on: 2014-07-14 15:53

Does a stack overflow cause a segfault on that architecture?

Maybe try rerunning with an increased "ulimit -s".
If you disassemble the [caml_oldify_mopup] function, is there a spill to the stack at the point it fails?

(Although, from the values you've printed, it does look like the heap may be corrupted.)

(0011810)
Richard Jones (reporter)
2014-07-14 16:15

Yup, I've tried increasing the stack limits, and it makes no difference.

Dump of assembler code for function caml_oldify_mopup:
   0x0014c1e4 <+0>: push {r3, r4, r5, r6, r7, r8, r9, lr}
   0x0014c1e8 <+4>: movw r8, #17536 ; 0x4480
   0x0014c1ec <+8>: movt r8, 0000043 ; 0x2b
   0x0014c1f0 <+12>: mov r9, r8
   0x0014c1f4 <+16>: ldr r6, [r8, 0000008]
   0x0014c1f8 <+20>: cmp r6, #0
   0x0014c1fc <+24>: beq 0x14c29c <caml_oldify_mopup+184>
   0x0014c200 <+28>: ldr r7, [r6]
   0x0014c204 <+32>: add r4, r7, 0000004
   0x0014c208 <+36>: ldm r7, {r0, r3}
   0x0014c20c <+40>: tst r0, #1
   0x0014c210 <+44>: str r3, [r9, 0000008]
   0x0014c214 <+48>: bne 0x14c238 <caml_oldify_mopup+84>
   0x0014c218 <+52>: ldr r3, [r9]
   0x0014c21c <+56>: cmp r0, r3
   0x0014c220 <+60>: bcs 0x14c238 <caml_oldify_mopup+84>
   0x0014c224 <+64>: ldr r3, [r9, 0000004]
   0x0014c228 <+68>: cmp r0, r3
   0x0014c22c <+72>: bls 0x14c238 <caml_oldify_mopup+84>
   0x0014c230 <+76>: mov r1, r7
   0x0014c234 <+80>: bl 0x14c004 <caml_oldify_one>
   0x0014c238 <+84>: ldr r3, [r7, #-4]
   0x0014c23c <+88>: sub r7, r7, 0000004
   0x0014c240 <+92>: cmp r3, 0002048 ; 0x800
   0x0014c244 <+96>: bcc 0x14c1f4 <caml_oldify_mopup+16>
   0x0014c248 <+100>: mov r5, #1
   0x0014c24c <+104>: b 0x14c288 <caml_oldify_mopup+164>
   0x0014c250 <+108>: ldr r2, [r8]
   0x0014c254 <+112>: cmp r3, r2
   0x0014c258 <+116>: bcs 0x14c294 <caml_oldify_mopup+176>
   0x0014c25c <+120>: ldr r2, [r9, 0000004]
   0x0014c260 <+124>: mov r0, r3
   0x0014c264 <+128>: mov r1, r4
   0x0014c268 <+132>: cmp r3, r2
   0x0014c26c <+136>: bls 0x14c294 <caml_oldify_mopup+176>
   0x0014c270 <+140>: bl 0x14c004 <caml_oldify_one>
   0x0014c274 <+144>: ldr r3, [r7]
   0x0014c278 <+148>: add r5, r5, #1
   0x0014c27c <+152>: add r4, r4, 0000004
   0x0014c280 <+156>: cmp r5, r3, lsr 0000010
   0x0014c284 <+160>: bcs 0x14c1f4 <caml_oldify_mopup+16>
   0x0014c288 <+164>: ldr r3, [r6, 0000004]!
   0x0014c28c <+168>: tst r3, #1
   0x0014c290 <+172>: beq 0x14c250 <caml_oldify_mopup+108>
=> 0x0014c294 <+176>: str r3, [r4]
   0x0014c298 <+180>: b 0x14c274 <caml_oldify_mopup+144>
   0x0014c29c <+184>: pop {r3, r4, r5, r6, r7, r8, r9, pc}
End of assembler dump.

(gdb) info registers
r0 0xf8f2f14f 4176671055
r1 0xb547cbb0 3041381296
r2 0xb6ddd000 3067990016
r3 0xc9ac0000 3383492608
r4 0xcc04 52228
r5 0x1 1
r6 0xb6cf6386 3067044742
r7 0xcbfc 52220
r8 0x2b4480 2835584
r9 0x2b4480 2835584
r10 0xb6cdcff4 3066941428
r11 0x2b448c 2835596
r12 0xb4295 737941
sp 0xbeb8bb18 0xbeb8bb18
lr 0x14cb2c 1362732
pc 0x14c294 0x14c294 <caml_oldify_mopup+176>
cpsr 0x20080010 537395216
(0011812)
shinwell (developer)
2014-07-14 17:15

OK. Can you build the compiler with the debug runtime enabled, and see if we hit any of the assertions?
(0011813)
Richard Jones (reporter)
2014-07-14 18:24

We don't hit any assertions. It simply crashes in the same place.

How do you enable debugging? In Fedora we carry a custom patch so we can supply our own $CFLAGS:
http://pkgs.fedoraproject.org/cgit/ocaml.git/tree/0005-configure-Allow-user-defined-C-compiler-flags.patch [^]
Is there an official way to do this?
(0011814)
Richard Jones (reporter)
2014-07-14 19:50

Just to be clear about the previous comment:

(1) I enabled debug runtime and there are no assertions.

(2) The question about enabling debugging applies to the other bug I'm looking at (0006486).
(0011818)
shinwell (developer)
2014-07-15 10:42

I'm not sure what you mean by "enabling debugging". Could you clarify?

I think it's worth trying the same approach using a watchpoint as I've just written in mantis 6486.
(0011819)
shinwell (developer)
2014-07-15 10:43
edited on: 2014-07-15 10:43

(Oh, and I'm now suspecting that the repeated stack frames in Selectgen might be due to a different bug, as noted on 6486. Although doesn't aarch64 use frame pointers for unwinding? I don't remember offhand.)

(0011825)
Richard Jones (reporter)
2014-07-15 15:32

I am able to get through a compile of camlp4 by disabling the new CSE optimization added in 4.02. Didn't test yet whether this also fixes the oUnit build.

The patch I'm using is this one:

http://caml.inria.fr/mantis/view.php?id=6486#c11823 [^]
(0011826)
Richard Jones (reporter)
2014-07-15 16:48

I finally got through a git bisect. Unfortunately because I had to skip a lot of ARM breakage it's not very conclusive, but here it is:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
0cba565437e617cc5826cad64f4a1212e00fc1ae
9639370d40a4dd5d880b4aba4dd570c8c8b7b343
2633ff77ced16e223678b46d07067e541b233687
452390e0eadaafe92ff9d2c9d008035dfdb878f9
979fe8b8adb7a8d8c824e277b9cb7ca1c1cc9a77
9c1d005ebb21b9eff2804ac4d80450251ffe6b5a
29b34438e08e26ae8f8623eb32bb524386f0532f
95d98cd9782c0577b0c7290f6535b29e7bd4cd41
558f40e3446854913d5ce011441c4b10da03f27e ## NB: CSE
We cannot bisect more!
(0011884)
xleroy (administrator)
2014-07-18 16:14

Problem in CSE diagnosed and tentative fix pushed to 4.02 branch (commit 15012) and trunk (15013). With the fix, ocamlopt.opt reliably compiles Camlp4 on ARMv7-hardfloat.

- Issue History
Date Modified Username Field Change
2014-07-13 22:00 Richard Jones New Issue
2014-07-13 23:08 Richard Jones Note Added: 0011805
2014-07-14 15:13 Richard Jones Note Added: 0011807
2014-07-14 15:49 shinwell Note Added: 0011809
2014-07-14 15:53 shinwell Note Edited: 0011809 View Revisions
2014-07-14 16:15 Richard Jones Note Added: 0011810
2014-07-14 17:15 shinwell Note Added: 0011812
2014-07-14 17:15 shinwell Status new => acknowledged
2014-07-14 18:24 Richard Jones Note Added: 0011813
2014-07-14 19:50 Richard Jones Note Added: 0011814
2014-07-15 10:42 shinwell Note Added: 0011818
2014-07-15 10:43 shinwell Note Added: 0011819
2014-07-15 10:43 shinwell Note Edited: 0011819 View Revisions
2014-07-15 10:44 shinwell Assigned To => shinwell
2014-07-15 10:44 shinwell Status acknowledged => assigned
2014-07-15 15:32 Richard Jones Note Added: 0011825
2014-07-15 16:48 Richard Jones Note Added: 0011826
2014-07-16 16:45 doligez Target Version => 4.02.0+dev
2014-07-18 16:04 xleroy Relationship added related to 0006486
2014-07-18 16:14 xleroy Note Added: 0011884
2014-07-18 16:14 xleroy Resolution open => fixed
2014-07-18 16:14 xleroy Fixed in Version => 4.02.0+dev
2014-07-18 16:14 xleroy Status assigned => resolved


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker