Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005976OCamlOCaml runtime systempublic2013-04-05 16:052013-06-09 11:00
Reporterpboutill 
Assigned Toxleroy 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
Platformx86_64OSMacOSOS Version10.5-10.8
Product Version4.00.1 
Target Version4.01.0+devFixed in Version4.01.0+dev 
Summary0005976: Illegal recovery of 2 stack overflows with ocamlopt in Mac OS
DescriptionThe following code produces the output

Illegal instruction: 4

(only while compile in native)
Steps To Reproduce(* compile the following code with ocamlopt *)
let rec f () = f () ; f ()

let rec loop i =
  if i <= 0 then print_string "OK\n" else
    try
      f ()
    with Stack_overflow -> loop (pred i)

let () = loop 2 (* works for 1 *)
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0009039)
ppedrot (reporter)
2013-04-05 17:01

This bug is cumbersome in Coq, because whenever a computation raises a Stack_overflow, the user cannot do anything but restart coqtop to recover properly the next Stack_overflow failure.
(0009426)
frisch (developer)
2013-06-06 23:10

It is also known that stack flow recovery does not work well under Windows. What about a mode where the runtime would stop cleanly, with a proper error message, upon stack overflow, instead of trying to recover from it?
(0009428)
xleroy (administrator)
2013-06-07 13:47

The same Caml code works fine under Linux x86-64, so there's something specific to MacOS X to be investigated.

@frisch: stack overflow as clean fatal error wouldn't help with the Coq use case mentioned by ppedrot. Also, even printing an error message can be challenging when your program is really out of stack space. But I welcome sample implementations, esp. for Windows.
(0009441)
xleroy (administrator)
2013-06-08 19:05

Further investigations: I tried to reproduce the problem in pure C code, using setjmp/longjmp to simulate exceptions, and the problem does not show up. Looking further into the implementation of longjmp() on MacOS X, it appears that it goes to great lengths to call the undocumented "sigreturn" syscall when exiting from a signal handler. I have the impression that this is especially important when the signal was taken on an alternate stack.

My theory at this point is as follows: the OCaml runtime exits the handler for the stack overflow signal by raising an OCaml exception. This cuts the stack just fine, but does not call "sigreturn". As a consequence, the alternate stack for this handler may not be reset properly, and taking a second stack overflow signal on this alternate stack causes the kernel to abort the program.

This needs to be confirmed further, knowing that gdb under MacOS X is unable to step through a SIGSEGV signal handler...

A possible workaround would be to simulate the raising of the Stack_overflow exception from within the signal handler, by tweaking the saved registers from the ucontext, then returning "normally". This would be a major hack and I'm unsure it can be done in time for release 4.01.
(0009442)
xleroy (administrator)
2013-06-09 11:00

Tentative fix in trunk, commits r13759 and r13760. The fix is to return normally from segv_handler, after changing the PC in the signal context to point to caml_stack_overflow in amd64.S, which actually raises the exception. Whether to use this trick is governed by RETURN_AFTER_STACK_OVERFLOW defined or not in asmrun/signals_osdep.h. For the time being, it is defined only for amd64/macosx.

Note: stack backtraces on Stack_overflow exceptions were not reliably recorded by the old implementation, to begin with, but this alternate implementation makes it fundamentally impossible to record them, as we don't have the stack space required to do so. This could be an additional reason to stick to the old implementation on all platforms where it works.

- Issue History
Date Modified Username Field Change
2013-04-05 16:05 pboutill New Issue
2013-04-05 17:01 ppedrot Note Added: 0009039
2013-06-06 23:10 frisch Note Added: 0009426
2013-06-06 23:10 frisch Target Version => 4.01.0+dev
2013-06-07 13:47 xleroy Note Added: 0009428
2013-06-07 13:47 xleroy Status new => acknowledged
2013-06-08 19:05 xleroy Note Added: 0009441
2013-06-09 11:00 xleroy Note Added: 0009442
2013-06-09 11:00 xleroy Assigned To => xleroy
2013-06-09 11:00 xleroy Status acknowledged => resolved
2013-06-09 11:00 xleroy Resolution open => fixed
2013-06-09 11:00 xleroy Fixed in Version => 4.01.0+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker