Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007405OCamlOCaml backend (code generation)public2016-11-09 15:412016-11-12 01:01
ReporterRichard Jones 
Assigned To 
PrioritynormalSeveritycrashReproducibilityalways
StatusresolvedResolutionfixed 
Platforms390xOSFedoraOS Version26
Product Version4.04.0 
Target VersionFixed in Version4.05.0+dev 
Summary0007405: Illegal instruction (core dumped) when running coqtop on s390x
Descriptionbin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R theories Coq
Makefile.build:590: recipe for target 'theories/Init/Prelude.vo' failed
make[1]: *** [theories/Init/Prelude.vo] Segmentation fault (core dumped)

I'm afraid I don't have very much detail, but I am able to get
a core file.

Here is the stack trace according to gdb:

Core was generated by `bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R t'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000020080995e22 in ?? ()
[Current thread is 1 (Thread 0x2003db32870 (LWP 4231))]
(gdb) bt
#0 0x0000020080995e22 in ?? ()
#1 0x000002005b1ccfdc in camlG_extraction__entry ()
    at plugins/extraction/g_extraction.ml4:22
#2 0x000003ffd81f91c0 in ?? ()
PC not saved
(gdb) frame 1
#1 0x000002005b1ccfdc in camlG_extraction__entry ()
    at plugins/extraction/g_extraction.ml4:22
22 ARGUMENT EXTEND mlname

The disassembly is below. Current program counter is marked with "=>"

Dump of assembler code for function camlG_extraction__entry:
   0x000002005b1ccf30 <+0>: lay %r15,-40(%r15)
   0x000002005b1ccf36 <+6>: stg %r14,32(%r15)
   0x000002005b1ccf3c <+12>: lgrl %r3,0x2005b214fa0
   0x000002005b1ccf42 <+18>: lgrl %r4,0x2005b2159a8
   0x000002005b1ccf48 <+24>: stg %r3,0(%r4)
   0x000002005b1ccf4e <+30>: brasl %r14,0x2005b1ccf7a <camlG_extraction__entry+74>
   0x000002005b1ccf54 <+36>: lgrl %r9,0x2005b2152d0
   0x000002005b1ccf5a <+42>: lg %r12,16(%r9)
   0x000002005b1ccf60 <+48>: cgr %r2,%r12
   0x000002005b1ccf64 <+52>: jgne 0x2005b1ccf74 <camlG_extraction__entry+68>
   0x000002005b1ccf6a <+58>: lghi %r6,1
   0x000002005b1ccf6e <+62>: jg 0x2005b1cd00a <camlG_extraction__entry+218>
   0x000002005b1ccf74 <+68>: brasl %r14,0x20080995e22
   0x000002005b1ccf7a <+74>: lay %r15,-16(%r15)
   0x000002005b1ccf80 <+80>: stg %r14,0(%r15)
   0x000002005b1ccf86 <+86>: stg %r13,8(%r15)
   0x000002005b1ccf8c <+92>: lgr %r13,%r15
   0x000002005b1ccf90 <+96>: lgrl %r5,0x2005b215140
   0x000002005b1ccf96 <+102>: lg %r2,32(%r5)
   0x000002005b1ccf9c <+108>: brasl %r14,0x2005b182198 <camlGenarg__default_empty_value_2063@plt>
   0x000002005b1ccfa2 <+114>: cgfi %r2,1
   0x000002005b1ccfa8 <+120>: jge 0x2005b1ccfba <camlG_extraction__entry+138>
   0x000002005b1ccfae <+126>: lg %r8,0(%r2)
   0x000002005b1ccfb4 <+132>: jg 0x2005b1ccfdc <camlG_extraction__entry+172>
   0x000002005b1ccfba <+138>: lgrl %r9,0x2005b215458
   0x000002005b1ccfc0 <+144>: lghi %r12,0
   0x000002005b1ccfc4 <+148>: sty %r12,0(%r9)
   0x000002005b1ccfca <+154>: lgrl %r2,0x2005b2152d0
   0x000002005b1ccfd0 <+160>: lg %r2,16(%r2)
   0x000002005b1ccfd6 <+166>: brasl %r14,0x20080995e22
=> 0x000002005b1ccfdc <+172>: lay %r11,-16(%r11)
   0x000002005b1ccfe2 <+178>: clgr %r11,%r10
   0x000002005b1ccfe6 <+182>: jgl 0x2005b1d09a2 <camlG_extraction__entry+14962>
Steps To ReproduceCompile coq from source on s390x with OCaml 4.04.0
Additional InformationFedora has some patches on top of OCaml 4.04.0, but not many and
I don't believe any of them are relevant to this bug. However you
can find the patches here in case:

https://git.fedorahosted.org/cgit/fedora-ocaml.git/log/?h=fedora-26-4.04.0 [^]
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0016527)
Richard Jones (reporter)
2016-11-09 15:50

Apparently brasl %r14,0x20080995e22 is a branch to a subroutine. However gdb thinks that is an unmapped address:

(gdb) disassemble 0x20080995e22
No function contains specified address.
(gdb) disassemble *0x20080995e22
Cannot access memory at address 0x20080995e22

That explains why it crashes, although not why it's jumping to a random place.
(0016528)
Richard Jones (reporter)
2016-11-09 16:11

The function it is trying to call is probably caml_raise_exn, given other code in the main coqtop program like this:

    80283d2c: c0 e5 00 38 90 7b brasl %r14,80995e22 <caml_raise_exn>

g_extraction.ml4 seems like it is part of a native dynlink library (confirmed by examining the strace output, showing that it is loading theories/extraction_plugin.cmxs). My suspicion would be that relocation of symbols in that library is going wrong somehow.
(0016529)
Richard Jones (reporter)
2016-11-09 16:11

open("/home/rjones/d/fedora/coq/master/coq-8.5pl3/plugins/extraction/extraction_plugin.cmxs", O_RDONLY|O_CLOEXEC) = 7
read(7, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\10$\330"..., 832) = 832
fstat(7, {st_mode=S_IFREG|0755, st_size=1443584, ...}) = 0
mmap(NULL, 1238464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x3ff6ca80000
mmap(0x3ff6cb8f000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x10e000) = 0x3ff6cb8f000
close(7) = 0
mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_WRITE) = 0
mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_EXEC) = 0
mprotect(0x3ff6cb8f000, 4096, PROT_READ) = 0
brk(NULL) = 0x8d7ac000
brk(0x8d894000) = 0x8d894000
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x3ff80995e22} ---
+++ killed by SIGILL (core dumped) +++
(0016530)
Richard Jones (reporter)
2016-11-09 16:35

A simple reproducer is:

ocamlfind opt -g -package dynlink -linkpkg main.ml -o test
ocamlfind opt -g -shared -o lib.cmxs lib.ml
./test

--------------- main.c ----
let () =
  Dynlink.init ();
  (try Dynlink.loadfile "lib.cmxs"
   with
   | Dynlink.Error err ->
       print_endline ("dynlink error: " ^ Dynlink.error_message err)
   | exn ->
       failwith ("unknown error loading plugin: " ^ Printexc.to_string exn)
  )

--------------- lib.c ----
let () =
  print_endline "loading the plugin ...";
  raise Not_found


Note that it's debugging (-g) which causes the problem. If that is
disabled then the test case works fine for me.
(0016531)
Richard Jones (reporter)
2016-11-09 17:01

This patch seems to fix it:

diff --git a/asmcomp/s390x/emit.mlp b/asmcomp/s390x/emit.mlp
index 5d233a3..a099bdb 100644
--- a/asmcomp/s390x/emit.mlp
+++ b/asmcomp/s390x/emit.mlp
@@ -611,7 +611,7 @@ let emit_instr i =
     | Lraise k ->
         begin match k with
         | Cmm.Raise_withtrace ->
- ` brasl %r14, {emit_symbol "caml_raise_exn"}\n`;
+ ` brasl %r14, {emit_symbol "caml_raise_exn"}@PLT\n`;
           let lbl = record_frame Reg.Set.empty true i.dbg in
           `{emit_label lbl}:\n`
         | Cmm.Raise_notrace ->
(0016532)
Richard Jones (reporter)
2016-11-09 17:45
edited on: 2016-11-09 17:46

Applied to Fedora, and this fixes the Coq build.

https://git.fedorahosted.org/cgit/fedora-ocaml.git/commit/?h=fedora-26-4.04.0&id=e732c39340e86939530a087744caa8d8f1247878 [^]

(0016533)
gasche (developer)
2016-11-09 18:14
edited on: 2016-11-09 18:14

Very nice! Would you be willing to submit a PR on github? If not I can take care of it -- reusing your commit.

(0016534)
Richard Jones (reporter)
2016-11-09 18:32

Please submit one, I find the github workflow very tedious.
(0016535)
gasche (developer)
2016-11-09 23:53

Done, thanks: https://github.com/ocaml/ocaml/pull/903 [^]

(Note: there are command-line tools to submit github pull-requests, see https://github.com/github/hub [^] )
(0016537)
gasche (developer)
2016-11-10 14:42

Xavier Leroy suggested a change to the patch, which is to use `emit_call "caml_raise_exn";` instead. I updated my upstream patch proposal, the new version is available at:

  https://patch-diff.githubusercontent.com/raw/ocaml/ocaml/pull/903.patch [^]
(0016544)
xleroy (administrator)
2016-11-11 19:47

Seems to be fixed on trunk, see GPR 903. Should it be backported to 4.04.1?

Tentatively marking this MPR resolved. Yell if you disagree.
(0016545)
Richard Jones (reporter)
2016-11-11 19:51

I do believe this should be backported to the 4.04 branch, for the benefit of Debian and other distros.
(0016548)
gasche (developer)
2016-11-11 19:55

Fine with me (also un-invasive for other arches), I'll backport to 4.04.
(0016557)
gasche (developer)
2016-11-12 01:01

4.04 commit: 7020a0b682524b4a63bdeb395db12b72389d01da

  https://github.com/ocaml/ocaml/commit/7020a0b682524b4a63bdeb395db12b72389d01da [^]

- Issue History
Date Modified Username Field Change
2016-11-09 15:41 Richard Jones New Issue
2016-11-09 15:50 Richard Jones Note Added: 0016527
2016-11-09 16:11 Richard Jones Note Added: 0016528
2016-11-09 16:11 Richard Jones Note Added: 0016529
2016-11-09 16:35 Richard Jones Note Added: 0016530
2016-11-09 17:01 Richard Jones Note Added: 0016531
2016-11-09 17:45 Richard Jones Note Added: 0016532
2016-11-09 17:46 Richard Jones Note Edited: 0016532 View Revisions
2016-11-09 18:14 gasche Note Added: 0016533
2016-11-09 18:14 gasche Note Edited: 0016533 View Revisions
2016-11-09 18:32 Richard Jones Note Added: 0016534
2016-11-09 23:45 gasche Status new => confirmed
2016-11-09 23:53 gasche Note Added: 0016535
2016-11-10 14:42 gasche Note Added: 0016537
2016-11-11 19:47 xleroy Note Added: 0016544
2016-11-11 19:47 xleroy Status confirmed => resolved
2016-11-11 19:47 xleroy Resolution open => fixed
2016-11-11 19:47 xleroy Fixed in Version => 4.05.0+dev
2016-11-11 19:51 Richard Jones Note Added: 0016545
2016-11-11 19:55 gasche Note Added: 0016548
2016-11-12 01:01 gasche Note Added: 0016557


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker