|Anonymous | Login | Signup for a new account||2016-08-29 05:43 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005976||OCaml||OCaml runtime system||public||2013-04-05 16:05||2015-12-11 19:19|
|Target Version||4.01.0+dev||Fixed in Version||4.01.0+dev|
|Summary||0005976: Illegal recovery of 2 stack overflows with ocamlopt in Mac OS|
|Description||The following code produces the output|
Illegal instruction: 4
(only while compile in native)
|Steps To Reproduce||(* compile the following code with ocamlopt *)|
let rec f () = f () ; f ()
let rec loop i =
if i <= 0 then print_string "OK\n" else
with Stack_overflow -> loop (pred i)
let () = loop 2 (* works for 1 *)
|Tags||No tags attached.|
|This bug is cumbersome in Coq, because whenever a computation raises a Stack_overflow, the user cannot do anything but restart coqtop to recover properly the next Stack_overflow failure.|
|It is also known that stack flow recovery does not work well under Windows. What about a mode where the runtime would stop cleanly, with a proper error message, upon stack overflow, instead of trying to recover from it?|
The same Caml code works fine under Linux x86-64, so there's something specific to MacOS X to be investigated.
@frisch: stack overflow as clean fatal error wouldn't help with the Coq use case mentioned by ppedrot. Also, even printing an error message can be challenging when your program is really out of stack space. But I welcome sample implementations, esp. for Windows.
Further investigations: I tried to reproduce the problem in pure C code, using setjmp/longjmp to simulate exceptions, and the problem does not show up. Looking further into the implementation of longjmp() on MacOS X, it appears that it goes to great lengths to call the undocumented "sigreturn" syscall when exiting from a signal handler. I have the impression that this is especially important when the signal was taken on an alternate stack.
My theory at this point is as follows: the OCaml runtime exits the handler for the stack overflow signal by raising an OCaml exception. This cuts the stack just fine, but does not call "sigreturn". As a consequence, the alternate stack for this handler may not be reset properly, and taking a second stack overflow signal on this alternate stack causes the kernel to abort the program.
This needs to be confirmed further, knowing that gdb under MacOS X is unable to step through a SIGSEGV signal handler...
A possible workaround would be to simulate the raising of the Stack_overflow exception from within the signal handler, by tweaking the saved registers from the ucontext, then returning "normally". This would be a major hack and I'm unsure it can be done in time for release 4.01.
Tentative fix in trunk, commits r13759 and r13760. The fix is to return normally from segv_handler, after changing the PC in the signal context to point to caml_stack_overflow in amd64.S, which actually raises the exception. Whether to use this trick is governed by RETURN_AFTER_STACK_OVERFLOW defined or not in asmrun/signals_osdep.h. For the time being, it is defined only for amd64/macosx.
Note: stack backtraces on Stack_overflow exceptions were not reliably recorded by the old implementation, to begin with, but this alternate implementation makes it fundamentally impossible to record them, as we don't have the stack space required to do so. This could be an additional reason to stick to the old implementation on all platforms where it works.
|2013-04-05 16:05||pboutill||New Issue|
|2013-04-05 17:01||ppedrot||Note Added: 0009039|
|2013-06-06 23:10||frisch||Note Added: 0009426|
|2013-06-06 23:10||frisch||Target Version||=> 4.01.0+dev|
|2013-06-07 13:47||xleroy||Note Added: 0009428|
|2013-06-07 13:47||xleroy||Status||new => acknowledged|
|2013-06-08 19:05||xleroy||Note Added: 0009441|
|2013-06-09 11:00||xleroy||Note Added: 0009442|
|2013-06-09 11:00||xleroy||Assigned To||=> xleroy|
|2013-06-09 11:00||xleroy||Status||acknowledged => resolved|
|2013-06-09 11:00||xleroy||Resolution||open => fixed|
|2013-06-09 11:00||xleroy||Fixed in Version||=> 4.01.0+dev|
|2015-12-11 19:19||xleroy||Status||resolved => closed|
|Copyright © 2000 - 2011 MantisBT Group|