|Anonymous | Login | Signup for a new account||2013-05-23 17:58 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0004746||OCaml||OCaml general||public||2009-03-16 10:32||2012-01-03 05:40|
|Target Version||Fixed in Version||3.12.1+dev|
|Summary||0004746: incorrect calculation for stack overflow detection|
|Description||The calculation for detecting upon SIGSEGV whether the faulting address is likely to lie in the stack fails to work correctly on x86-64 machines with address space randomization enabled (as appears to be the default with at least some recent Redhat kernels). The upshot is that stack overflow ends up producing segfaults rather than Caml exceptions.|
With this randomization enabled and the ulimit -s value set to X, then the stack is not allocated as maximum X in size but rather X + EXEC_STACK_BIAS, in order to give space for the randomization. The bias constant defaults to 2Mb in the kernel I have seen, and the effect can easily be seen by growing the stack to the ulimit and looking at the memory map with pmap. The stack limit check in the kernel (mm/mmap.c) looks like this:
static int over_stack_limit(unsigned long sz)
if (sz < EXEC_STACK_BIAS)
return (sz - EXEC_STACK_BIAS) >
The upshot is that even with the ulimit at X, a stack-eating program won't fault at approximately (top of stack - X) but rather at (top of stack - X - EXEC_STACK_BIAS) or thereabouts. (The EXEC_STACK_BIAS offset on %rsp is extremely obscure until you realize what is happening!) This means that the check in the Caml default segfault handler doesn't catch it.
I haven't yet thought much about a solution to this problem but I'll add to this issue if I think of one.
|Tags||No tags attached.|
|Attached Files||stack_overflow_detection_around_rsp.diff [^] (3,003 bytes) 2012-01-03 05:16 [Show Content]|
I've never been a big fan of address stack randomization :-) I don't know what would be a good way to handle this issue, but if you can think of one, please let us know.
|Two years later, do we have new data about this issue? My understanding of Mark's analysis is that Linux kernels with stack address randomization report the wrong stack size when we ask getrlimit(RLIMIT_STACK), namely the stack size without the EXEC_STACK_BIAS slack, while the kernel's stack overflow handler uses another stack size, which includes EXEC_STACK_BIAS. Is is still the case today? Is my analysis wrong? Because if that's the root of the problem, it's a kernel issue and I don't see anything we could do about it in OCaml.|
You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.
Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them. AFAIK the only full proof way to recover from stack overflows is to detect them before they happen. This is a pretty intrusive change but could be used as a step towards segmented stacks a la go (http://golang.org/doc/go_faq.html#goroutines [^]).
All right, so I see no one discussed the issue with kernel developers.
> You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.
That would be worth a try, even though I'm not sure how much slack around rsp should be tolerated. Feel free to experiment and let me know of the results.
> Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them.
No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code.
> AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.
That still needs to know exactly where the bottom of the system stack is, which is our current problem.
As a quick fix, under Linux only, I added 2Mb to the stack space reported by getrlimit(). It works again (tested on AMD64), but we'll have to find something better at some point. Done in 3.12 branch (commit 11897) and in SVN trunk (commit 11898).
>> No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code
True although PC is not always available. What I should have said is that people might wrongly expect exceptions to always be raised on stack overflows and this isn't technically possible. I guess I am mostly trying to slowly sell the idea of segmented stacks (which require an accurate preemptive detection of overflow).
>> AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.
> That still needs to know exactly where the bottom of the system stack is, which is our current problem.
You could be conservative. This does seem like a very involved solution and would also make function calls more expensive at runtime.
|2009-03-16 10:32||shinwell||New Issue|
|2009-03-28 16:34||xleroy||Note Added: 0004869|
|2009-03-28 16:34||xleroy||Assigned To||=> xleroy|
|2009-03-28 16:34||xleroy||Status||new => feedback|
|2011-12-18 10:35||xleroy||Note Added: 0006371|
|2011-12-19 06:59||till||Note Added: 0006381|
|2011-12-19 12:01||xleroy||Note Added: 0006384|
|2011-12-20 15:41||xleroy||Note Added: 0006419|
|2011-12-20 15:41||xleroy||Status||feedback => resolved|
|2011-12-20 15:41||xleroy||Resolution||open => fixed|
|2011-12-20 15:47||xleroy||Fixed in Version||=> 3.12.1+dev|
|2012-01-03 05:16||till||File Added: stack_overflow_detection_around_rsp.diff|
|2012-01-03 05:40||till||Note Added: 0006579|
|Copyright © 2000 - 2011 MantisBT Group|