Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocamlopt.opt on 32 bit arm segfaults compiling ounit 2.0.0 #6484

Closed
vicuna opened this issue Jul 13, 2014 · 12 comments
Closed

ocamlopt.opt on 32 bit arm segfaults compiling ounit 2.0.0 #6484

vicuna opened this issue Jul 13, 2014 · 12 comments
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Jul 13, 2014

Original bug ID: 6484
Reporter: Richard Jones
Assigned to: @mshinwell
Status: resolved (set by @xavierleroy on 2014-07-18T14:14:18Z)
Resolution: fixed
Priority: normal
Severity: crash
Version: 4.02.0+beta1 / +rc1
Target version: 4.02.0+dev
Fixed in version: 4.02.0+dev
Category: back end (clambda to assembly)
Related to: #6486 #7307
Monitored by: meurer @avsm

Bug description

ocamlopt.opt (armv7hl) segfaults when compiling one file
from ounit 2.0.0.

I am using ocaml 4.02 (commit 8c1e5cd), on 32 bit ARM.

When building ounit, the following command segfaults:

  • ocamlfind ocamlopt -c -g -I src -I src -package threads -package unix -thread -I src -o src/oUnitConf.cmx src/oUnitConf.ml
    File "src/oUnitConf.ml", line 77, characters 16-35:
    Warning 3: deprecated feature: String.set
    ocamlopt.opt got signal and exited
    Command exited with code 2.

I captured a core dump and the stack trace is:

Core was generated by `ocamlopt.opt -c -g -I src -I src -I src -o src/oUnitConf.cmx -thread src/oUnitC'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0014c208 in caml_oldify_mopup () at minor_gc.c:206
206 oldify_todo_list = Field (new_v, 1); /* Remove from list. */
Missing separate debuginfos, use: debuginfo-install glibc-2.19.90-26.fc21.armv7hl libgcc-4.9.0-14.fc21.armv7hl
(gdb) bt
#0 0x0014c208 in caml_oldify_mopup () at minor_gc.c:206
#1 0x0014c308 in caml_empty_minor_heap () at minor_gc.c:237
#2 0x0014c410 in caml_minor_collection () at minor_gc.c:276
#3 0x0014b2b0 in caml_garbage_collection () at signals_asm.c:70
#4 0x0015be1c in caml_call_gc ()
#5 0x00032fac in camlSelectgen__fun_2006 ()
#6 0x00032fac in camlSelectgen__fun_2006 ()
#7 0x00032fac in camlSelectgen__fun_2006 ()
#8 0x00032fac in camlSelectgen__fun_2006 ()
#9 0x00032fac in camlSelectgen__fun_2006 ()
#10 0x00032fac in camlSelectgen__fun_2006 ()
#11 0x00032fac in camlSelectgen__fun_2006 ()
#12 0x00032fac in camlSelectgen__fun_2006 ()
[same stack frame repeated forever]

(gdb) list
201
202 while (oldify_todo_list != 0){
203 v = oldify_todo_list; /* Get the head. /
204 Assert (Hd_val (v) == 0); /
It must be forwarded. /
205 new_v = Field (v, 0); /
Follow forward pointer. /
206 oldify_todo_list = Field (new_v, 1); /
Remove from list. /
207
208 f = Field (new_v, 0);
209 if (Is_block (f) && Is_young (f)){
210 caml_oldify_one (f, &Field (new_v, 0));
(gdb) print oldify_todo_list
$1 = -1228923798
(gdb) print ((value
)oldify_todo_list)[-1]
$5 = 0
(gdb) print ((value*)oldify_todo_list)[0]
$2 = 25600
(gdb) print ((value*)oldify_todo_list)[1]
$3 = 1640235008
(gdb) print ((value*)oldify_todo_list)[2]
$4 = 243318
(gdb) print ((value*)25600)[0]
Cannot access memory at address 0x6400

Note that the crash is not 100% reliable by any means. On my
local hardware it happens, but rarely. On the Fedora builders
it happens quite often but not always.

Steps to reproduce

Build & install ocaml 4.02 from git (https://github.com/ocaml/ocaml)

Download ounit 2.0.0 from http://ounit.forge.ocamlcore.org/

zcat ounit-2.0.0.tar.gz | tar xf -
./configure
make all

You may need to repeat `make clean ; make all'
several times since the crash isn't 100% reliable.

Additional information

Works fine on x86.

@vicuna
Copy link
Author

vicuna commented Jul 13, 2014

Comment author: Richard Jones

camlp4 also fails to build on arm 32 bit in the same way.

/usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/ocamlbuild/ocamlbuildlib.cmxa myocamlbuild_config.ml myocamlbuild.ml /usr/lib/ocaml/ocamlbuild/ocamlbuild.cmx -o myocamlbuild

  • /usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/ocamlbuild/ocamlbuildlib.cmxa myocamlbuild_config.ml myocamlbuild.ml /usr/lib/ocaml/ocamlbuild/ocamlbuild.cmx -o myocamlbuild
    Command got signal -10.
    make: *** [byte] Error 11

Stack trace is the same as above:

Core was generated by `/usr/bin/ocamlopt.opt unix.cmxa -I /usr/lib/ocaml/ocamlbuild /usr/lib/ocaml/oca'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 caml_oldify_mopup () at minor_gc.c:217
217 Field (new_v, i) = f;
Missing separate debuginfos, use: debuginfo-install glibc-2.19.90-26.fc21.armv7hl libgcc-4.9.0-14.fc21.armv7hl
(gdb) bt
#0 caml_oldify_mopup () at minor_gc.c:217
#1 0x0014c308 in caml_empty_minor_heap () at minor_gc.c:237
#2 0x0014c410 in caml_minor_collection () at minor_gc.c:276
#3 0x0014b2b0 in caml_garbage_collection () at signals_asm.c:70
#4 0x0015be1c in caml_call_gc ()
#5 0x00032ff8 in camlSelectgen__fun_2001 ()
#6 0x00032ff8 in camlSelectgen__fun_2001 ()
#7 0x00032ff8 in camlSelectgen__fun_2001 ()
[etc]

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: Richard Jones

The script I am now using to bisect this problem is below.

$HOME/d/ocaml contains ocaml sources (from git, branch 4.02)

$HOME/d/camlp4 contains camlp4 sources (from git)


#!/bin/bash

sources=$HOME/d/ocaml

cd $HOME/d/camlp4
rm *.cmx
exec
$sources/ocamlopt.opt -I $sources/stdlib -I $sources/otherlibs/unix
-I $sources/ocamlbuild
unix.cmxa ocamlbuildlib.cmxa
myocamlbuild_config.ml myocamlbuild.ml
ocamlbuild.cmx
-o /tmp/myocamlbuild

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: @mshinwell

Does a stack overflow cause a segfault on that architecture?

Maybe try rerunning with an increased "ulimit -s".
If you disassemble the [caml_oldify_mopup] function, is there a spill to the stack at the point it fails?

(Although, from the values you've printed, it does look like the heap may be corrupted.)

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: Richard Jones

Yup, I've tried increasing the stack limits, and it makes no difference.

Dump of assembler code for function caml_oldify_mopup:
0x0014c1e4 <+0>: push {r3, r4, r5, r6, r7, r8, r9, lr}
0x0014c1e8 <+4>: movw r8, #17536 ; 0x4480
0x0014c1ec <+8>: movt r8, #43 ; 0x2b
0x0014c1f0 <+12>: mov r9, r8
0x0014c1f4 <+16>: ldr r6, [r8, #8]
0x0014c1f8 <+20>: cmp r6, #0
0x0014c1fc <+24>: beq 0x14c29c <caml_oldify_mopup+184>
0x0014c200 <+28>: ldr r7, [r6]
0x0014c204 <+32>: add r4, r7, #4
0x0014c208 <+36>: ldm r7, {r0, r3}
0x0014c20c <+40>: tst r0, #1
0x0014c210 <+44>: str r3, [r9, #8]
0x0014c214 <+48>: bne 0x14c238 <caml_oldify_mopup+84>
0x0014c218 <+52>: ldr r3, [r9]
0x0014c21c <+56>: cmp r0, r3
0x0014c220 <+60>: bcs 0x14c238 <caml_oldify_mopup+84>
0x0014c224 <+64>: ldr r3, [r9, #4]
0x0014c228 <+68>: cmp r0, r3
0x0014c22c <+72>: bls 0x14c238 <caml_oldify_mopup+84>
0x0014c230 <+76>: mov r1, r7
0x0014c234 <+80>: bl 0x14c004 <caml_oldify_one>
0x0014c238 <+84>: ldr r3, [r7, #-4]
0x0014c23c <+88>: sub r7, r7, #4
0x0014c240 <+92>: cmp r3, #2048 ; 0x800
0x0014c244 <+96>: bcc 0x14c1f4 <caml_oldify_mopup+16>
0x0014c248 <+100>: mov r5, #1
0x0014c24c <+104>: b 0x14c288 <caml_oldify_mopup+164>
0x0014c250 <+108>: ldr r2, [r8]
0x0014c254 <+112>: cmp r3, r2
0x0014c258 <+116>: bcs 0x14c294 <caml_oldify_mopup+176>
0x0014c25c <+120>: ldr r2, [r9, #4]
0x0014c260 <+124>: mov r0, r3
0x0014c264 <+128>: mov r1, r4
0x0014c268 <+132>: cmp r3, r2
0x0014c26c <+136>: bls 0x14c294 <caml_oldify_mopup+176>
0x0014c270 <+140>: bl 0x14c004 <caml_oldify_one>
0x0014c274 <+144>: ldr r3, [r7]
0x0014c278 <+148>: add r5, r5, #1
0x0014c27c <+152>: add r4, r4, #4
0x0014c280 <+156>: cmp r5, r3, lsr #10
0x0014c284 <+160>: bcs 0x14c1f4 <caml_oldify_mopup+16>
0x0014c288 <+164>: ldr r3, [r6, #4]!
0x0014c28c <+168>: tst r3, #1
0x0014c290 <+172>: beq 0x14c250 <caml_oldify_mopup+108>
=> 0x0014c294 <+176>: str r3, [r4]
0x0014c298 <+180>: b 0x14c274 <caml_oldify_mopup+144>
0x0014c29c <+184>: pop {r3, r4, r5, r6, r7, r8, r9, pc}
End of assembler dump.

(gdb) info registers
r0 0xf8f2f14f 4176671055
r1 0xb547cbb0 3041381296
r2 0xb6ddd000 3067990016
r3 0xc9ac0000 3383492608
r4 0xcc04 52228
r5 0x1 1
r6 0xb6cf6386 3067044742
r7 0xcbfc 52220
r8 0x2b4480 2835584
r9 0x2b4480 2835584
r10 0xb6cdcff4 3066941428
r11 0x2b448c 2835596
r12 0xb4295 737941
sp 0xbeb8bb18 0xbeb8bb18
lr 0x14cb2c 1362732
pc 0x14c294 0x14c294 <caml_oldify_mopup+176>
cpsr 0x20080010 537395216

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: @mshinwell

OK. Can you build the compiler with the debug runtime enabled, and see if we hit any of the assertions?

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: Richard Jones

We don't hit any assertions. It simply crashes in the same place.

How do you enable debugging? In Fedora we carry a custom patch so we can supply our own $CFLAGS:
http://pkgs.fedoraproject.org/cgit/ocaml.git/tree/0005-configure-Allow-user-defined-C-compiler-flags.patch
Is there an official way to do this?

@vicuna
Copy link
Author

vicuna commented Jul 14, 2014

Comment author: Richard Jones

Just to be clear about the previous comment:

(1) I enabled debug runtime and there are no assertions.

(2) The question about enabling debugging applies to the other bug I'm looking at (#6486).

@vicuna
Copy link
Author

vicuna commented Jul 15, 2014

Comment author: @mshinwell

I'm not sure what you mean by "enabling debugging". Could you clarify?

I think it's worth trying the same approach using a watchpoint as I've just written in mantis 6486.

@vicuna
Copy link
Author

vicuna commented Jul 15, 2014

Comment author: @mshinwell

(Oh, and I'm now suspecting that the repeated stack frames in Selectgen might be due to a different bug, as noted on 6486. Although doesn't aarch64 use frame pointers for unwinding? I don't remember offhand.)

@vicuna
Copy link
Author

vicuna commented Jul 15, 2014

Comment author: Richard Jones

I am able to get through a compile of camlp4 by disabling the new CSE optimization added in 4.02. Didn't test yet whether this also fixes the oUnit build.

The patch I'm using is this one:

#6486#c11823

@vicuna
Copy link
Author

vicuna commented Jul 15, 2014

Comment author: Richard Jones

I finally got through a git bisect. Unfortunately because I had to skip a lot of ARM breakage it's not very conclusive, but here it is:

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
0cba565
9639370
2633ff7
452390e
979fe8b
9c1d005
29b3443
95d98cd
558f40e ## NB: CSE
We cannot bisect more!

@vicuna
Copy link
Author

vicuna commented Jul 18, 2014

Comment author: @xavierleroy

Problem in CSE diagnosed and tentative fix pushed to 4.02 branch (commit 15012) and trunk (15013). With the fix, ocamlopt.opt reliably compiles Camlp4 on ARMv7-hardfloat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants