New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocamlopt generated code: segmentation fault instead of stack overflow #5064
Comments
Comment author: @edwintorok I have measured the performance and code size impact of implementing my suggestion #2. There is no measurable performance difference (all differences are within the standard deviation, and are <1%)
Code size for ocamlopt.opt: I think this stack check could be eliminated when -unsafe is specified, and then you'd get back the original code size. Measurements done on Debian unstable, x86_64, Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz, 4GB RAM. |
Comment author: @edwintorok Attached the patch I used to test performance/codesize. |
Comment author: @damiendoligez Using CPS is a bad idea here, you don't want to bog down the machine by filling the whole memory just because of one buggy program. Stacks are limited for a good reason. |
Comment author: @ygrek Here is another example (from caml-list, "Segmentation faults in unexpected places" by Peter Frey at Tue, 9 Aug 2011 18:16:00 -0400) : $ cat a.ml let stol len = (* create some bogus int list of length len *) let () = $ uname -rms Note the random crashing. In this case offending C function is caml_call_gc and descendants : Core was generated by `./a 261900'. |
Comment author: gerd This seems to be difficult to solve. I think a strategy could be based on two ideas:
|
Comment author: @xavierleroy I implemented a 90% complete fix (commits r12159 and r12160) whereas the stack "touch", checking that 4K words of stack are available and raising Stack_overflow otherwise, is performed in the asm glue between ocamlopt-generated code and the OCaml runtime system, namely in glue functions caml_c_call and caml_call_gc. A good thing about this solution is that correct stack backtraces are generated, since caml_bottom_of_stack and caml_last_return_address are properly set up before the stack touch. Another good thing is that no extra code is generated by ocamlopt. A weakness, compared with the original proposal, is that no stack touch is performed (yet) before calls to "noalloc" C functions, which do not go through caml_c_call. I need to run more experiments to assess the overhead of this solution, esp. when calling small functions from the math library. The fix is currently implemented for x86 (32 and 64 bits) under Linux, BSD and MacOS X, as well as PowerPC under MacOS X. Testing on production code is most welcome. The 4K words should be enough to run the GC and most of the OCaml runtime functions, with the notable exception of hashing, marshaling and unmarshaling, which are recursive. An iterative reimplementation of these functions can be considered but is nontrivial work. |
Original bug ID: 5064
Reporter: @edwintorok
Status: closed (set by @xavierleroy on 2013-08-31T10:46:33Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.11.2
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Duplicate of: #4297
Related to: #5485
Monitored by: mfp mehdi @edwintorok @ygrek @glondu "Julien Signoles" @mjambon gerd
Bug description
This is related to #4297 , however I didn't find a way to reopen that bug. Please consider fixing this, see a possible solution in the 'Additional Information' section of this bug.
I encountered a segmentation fault when running menhir (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=583300), and I reduced the problem down to the attached testcase.
Problem is that there is not enough stack space available to invoke the C function caml_equal, and ocaml's sigsegv handler only raises Stack_overflow for OCaml code. This is generally a good idea, since the C code could itself be responsible for the overflow, and there is no way to know that raising an OCaml exception (instead of segfaulting) won't result in inconsistent state for the C part.
However the equality comparison is part of the OCaml language (and any native C code invoked by standard libraries would be too), and I think it is important to have a robust protection against stack overflows. Functional programs are recursive by their nature, and getting a segfault due to a simple runaway recursion is not acceptable.
One might need to run some cleanup code in all cases (like releasing a file lock), having the Stack_overflow exception ensures that, a segfault doesn't.
Additional information
Here are some suggestions:
check that there is always enough stack space left before invoking a C function. "enough" is tricky to define here, but I think we could have something minimal, like 8k or 16k ensured at all times.
implement Continuation Passing Style transform. that way the OCaml code could run in constant system stack space, and the C part would have plenty of stack left always. Sure the continuation will eventually grow too big and allocation will fail. Recovering from a failed allocation is safer than recovering from a segfault
Implementing 1. is very simple, and doesn't require much additional code:
Just add code like this before every call to non-OCaml code:
subq $8192, %rsp
movq $0, 0(%rsp)
addq $8192, %rsp
Then if you don't have at least 8k left on the stack, you'll get a SIGSEGV at the 'movq', which will be considered as still part of the OCaml code, and you get a nice Stack_Overflow instead of a segfault.
You could emit this code only when '-unsafe' is not specified. If -unsafe is specified ocamlopt could generate code just like now.
I tested this modification, and it works!
After modification:
$ as -o po/y.o y.s
$ gcc -o a.out -L/usr/lib/ocaml po/camlstartupaa0114.o po/std_exit.o po/y.o /usr/lib/ocaml/stdlib.a /usr/lib/ocaml/libasmrun.a -lm -ldl
Fatal error: exception Stack_overflow
Before the modification to y.s I got the segfault. (I captured the .o files by using a bash script that I passed as -cc to ocamlopt).
File attachments
The text was updated successfully, but these errors were encountered: