New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arm64 crash when compiling coq #6507
Comments
Comment author: Richard Jones I was able to reproduce this with OCaml 4.02 branch from today (git commit 10e4575). |
Comment author: @mshinwell Richard, in which function is the return address you quote above? (0x20002978d0) (Looking at your backtrace, I'm going to guess that it isn't in your executable -- is it in libc?) |
Comment author: Richard Jones Different file in the Coq source, same failure: $ gdb --args ./bin/coqtop.opt -boot -compile plugins/ring/LegacyArithRing Program received signal SIGSEGV, Segmentation fault. Looking at /proc/PID/maps of the process, I can see that the pointer 7fb7770000-7fb7d76000 rw-p 00000000 00:00 0 |
Comment author: Richard Jones (gdb) hexdump (0x7fb7b74000-0x100) 0x200 |
Comment author: @mshinwell Richard, could you please apply this patch to your 4.02 trunk compiler, and rebuild? Then please paste the lines starting "roots:" that arise from executing coqtop.opt on one of the two offending input files. Index: asmrun/roots.c--- asmrun/roots.c (revision 15041)
@@ -218,6 +222,7 @@
@@ -226,6 +231,7 @@
|
Comment author: @mshinwell Sorry, and also: having got those lines of output, please decode the "retaddr" addresses. A quick way to decode such an address should be: gdb /my/executable -ex "inf sym 0x12345678" -ex "quit" |
Comment author: @mshinwell Fixed in trunk and 4.02. [caml_bottom_of_stack] was 16 bytes too low only when the GC was entered via the assembly [caml_alloc] veneers. This occurred as a result of the stack pointer being recorded after the two spills in those veneers. The upshot was that the return address of the previous frame was being read from the wrong stack slot. Ugh. (Richard Jones kindly compiled Coq on a real aarch64 machine, and ran the tests, with an earlier version of this patch that I am confident is semantically identical to the one committed.) |
Comment author: @mshinwell Oh, and, I wonder if we can optimize those caml_alloc sequences by not saving x29 prior to Lcaml_call_gc. We can think about that later, after 4.02. |
Original bug ID: 6507
Reporter: Richard Jones
Assigned to: @mshinwell
Status: closed (set by @xavierleroy on 2016-12-07T10:34:26Z)
Resolution: fixed
Priority: normal
Severity: crash
Platform: arm64
Version: 4.02.0+beta1 / +rc1
Target version: 4.02.0+dev
Fixed in version: 4.02.0+dev
Category: back end (clambda to assembly)
Monitored by: @gasche @mshinwell
Bug description
Another one, I'm afraid ...
$ gdb --args bin/coqtop.opt -boot -compile theories/FSets/FSetList
GNU gdb (GDB) Fedora 7.7.90.20140711-11.fc21
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/coqtop.opt...done.
(gdb) run
Starting program: /home/rjones/d/fedora/coq/master/coq-8.4pl4/bin/coqtop.opt -boot -compile theories/FSets/FSetList
Program received signal SIGSEGV, Segmentation fault.
caml_oldify_local_roots () at roots.c:190
190 if (d->retaddr == retaddr) break;
(gdb) bt
#0 caml_oldify_local_roots () at roots.c:190
#1 0x00000000009610dc in caml_empty_minor_heap () at minor_gc.c:233
#2 0x0000000000961214 in caml_minor_collection () at minor_gc.c:276
#3 0x000000000095f0b0 in caml_garbage_collection () at signals_asm.c:70
#4 0x0000000000971f38 in caml_call_gc ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) print d
$1 = (frame_descr *) 0x0
(gdb) print/x retaddr
$3 = 0x20002978d0
(gdb) print caml_frame_descriptors
$4 = (frame_descr **) 0xe0a370
(gdb) print h
$5 = 77594
Note this is not a general failure in coqtop.opt. The same program
runs probably hundreds of times during the build without failing, and
if I use `make -k' then it keeps running.
It only fails -- 100% reproducibly -- on this one input file.
Any suggestions how to debug this further?
Steps to reproduce
Compile coq on arm64.
The text was updated successfully, but these errors were encountered: