incorrect calculation for stack overflow detection #4746

vicuna · 2009-03-16T09:32:05Z

Original bug ID: 4746
Reporter: @mshinwell
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2013-08-31T10:44:06Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.11.0
Fixed in version: 3.12.1+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: rdouglass @ygrek yminsky @oandrieu pzimmer @mmottl

Bug description

The calculation for detecting upon SIGSEGV whether the faulting address is likely to lie in the stack fails to work correctly on x86-64 machines with address space randomization enabled (as appears to be the default with at least some recent Redhat kernels). The upshot is that stack overflow ends up producing segfaults rather than Caml exceptions.

With this randomization enabled and the ulimit -s value set to X, then the stack is not allocated as maximum X in size but rather X + EXEC_STACK_BIAS, in order to give space for the randomization. The bias constant defaults to 2Mb in the kernel I have seen, and the effect can easily be seen by growing the stack to the ulimit and looking at the memory map with pmap. The stack limit check in the kernel (mm/mmap.c) looks like this:

static int over_stack_limit(unsigned long sz)
{
if (sz < EXEC_STACK_BIAS)
return 0;
return (sz - EXEC_STACK_BIAS) >
current->signal->rlim[RLIMIT_STACK].rlim_cur;
}

The upshot is that even with the ulimit at X, a stack-eating program won't fault at approximately (top of stack - X) but rather at (top of stack - X - EXEC_STACK_BIAS) or thereabouts. (The EXEC_STACK_BIAS offset on %rsp is extremely obscure until you realize what is happening!) This means that the check in the Caml default segfault handler doesn't catch it.

I haven't yet thought much about a solution to this problem but I'll add to this issue if I think of one.

File attachments

stack_overflow_detection_around_rsp.diff

vicuna · 2009-03-28T15:34:03Z

Comment author: @xavierleroy

I've never been a big fan of address stack randomization :-) I don't know what would be a good way to handle this issue, but if you can think of one, please let us know.

vicuna · 2011-12-18T09:35:01Z

Comment author: @xavierleroy

Two years later, do we have new data about this issue? My understanding of Mark's analysis is that Linux kernels with stack address randomization report the wrong stack size when we ask getrlimit(RLIMIT_STACK), namely the stack size without the EXEC_STACK_BIAS slack, while the kernel's stack overflow handler uses another stack size, which includes EXEC_STACK_BIAS. Is is still the case today? Is my analysis wrong? Because if that's the root of the problem, it's a kernel issue and I don't see anything we could do about it in OCaml.

vicuna · 2011-12-19T05:59:26Z

Comment author: till

You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.

Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them. AFAIK the only full proof way to recover from stack overflows is to detect them before they happen. This is a pretty intrusive change but could be used as a step towards segmented stacks a la go (http://golang.org/doc/go_faq.html#goroutines).

Till

vicuna · 2011-12-19T11:01:08Z

Comment author: @xavierleroy

All right, so I see no one discussed the issue with kernel developers.

You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.

That would be worth a try, even though I'm not sure how much slack around rsp should be tolerated. Feel free to experiment and let me know of the results.

Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them.

No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code.

AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.

That still needs to know exactly where the bottom of the system stack is, which is our current problem.

vicuna · 2011-12-20T14:41:52Z

Comment author: @xavierleroy

As a quick fix, under Linux only, I added 2Mb to the stack space reported by getrlimit(). It works again (tested on AMD64), but we'll have to find something better at some point. Done in 3.12 branch (commit 11897) and in SVN trunk (commit 11898).

vicuna · 2012-01-03T04:40:11Z

Comment author: till

No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code

True although PC is not always available. What I should have said is that people might wrongly expect exceptions to always be raised on stack overflows and this isn't technically possible. I guess I am mostly trying to slowly sell the idea of segmented stacks (which require an accurate preemptive detection of overflow).

AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.

That still needs to know exactly where the bottom of the system stack is, which is our current problem.

You could be conservative. This does seem like a very involved solution and would also make function calls more expensive at runtime.

vicuna closed this as completed Aug 31, 2013

vicuna assigned xavierleroy Mar 14, 2019

vicuna added the bug label Mar 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect calculation for stack overflow detection #4746

incorrect calculation for stack overflow detection #4746

vicuna commented Mar 16, 2009

vicuna commented Mar 28, 2009

vicuna commented Dec 18, 2011

vicuna commented Dec 19, 2011

vicuna commented Dec 19, 2011

vicuna commented Dec 20, 2011

vicuna commented Jan 3, 2012

incorrect calculation for stack overflow detection #4746

incorrect calculation for stack overflow detection #4746

Comments

vicuna commented Mar 16, 2009

Bug description

File attachments

vicuna commented Mar 28, 2009

vicuna commented Dec 18, 2011

vicuna commented Dec 19, 2011

vicuna commented Dec 19, 2011

vicuna commented Dec 20, 2011

vicuna commented Jan 3, 2012