Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004746OCamlOCaml generalpublic2009-03-16 10:322013-08-31 12:44
Reportershinwell 
Assigned Toxleroy 
PrioritynormalSeverityminorReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version3.11.0 
Target VersionFixed in Version3.12.1+dev 
Summary0004746: incorrect calculation for stack overflow detection
DescriptionThe calculation for detecting upon SIGSEGV whether the faulting address is likely to lie in the stack fails to work correctly on x86-64 machines with address space randomization enabled (as appears to be the default with at least some recent Redhat kernels). The upshot is that stack overflow ends up producing segfaults rather than Caml exceptions.

With this randomization enabled and the ulimit -s value set to X, then the stack is not allocated as maximum X in size but rather X + EXEC_STACK_BIAS, in order to give space for the randomization. The bias constant defaults to 2Mb in the kernel I have seen, and the effect can easily be seen by growing the stack to the ulimit and looking at the memory map with pmap. The stack limit check in the kernel (mm/mmap.c) looks like this:

static int over_stack_limit(unsigned long sz)
{
        if (sz < EXEC_STACK_BIAS)
                return 0;
        return (sz - EXEC_STACK_BIAS) >
                        current->signal->rlim[RLIMIT_STACK].rlim_cur;
}

The upshot is that even with the ulimit at X, a stack-eating program won't fault at approximately (top of stack - X) but rather at (top of stack - X - EXEC_STACK_BIAS) or thereabouts. (The EXEC_STACK_BIAS offset on %rsp is extremely obscure until you realize what is happening!) This means that the check in the Caml default segfault handler doesn't catch it.

I haven't yet thought much about a solution to this problem but I'll add to this issue if I think of one.
TagsNo tags attached.
Attached Filesdiff file icon stack_overflow_detection_around_rsp.diff [^] (3,003 bytes) 2012-01-03 05:16 [Show Content]

- Relationships

-  Notes
(0004869)
xleroy (administrator)
2009-03-28 16:34

I've never been a big fan of address stack randomization :-) I don't know what would be a good way to handle this issue, but if you can think of one, please let us know.
(0006371)
xleroy (administrator)
2011-12-18 10:35

Two years later, do we have new data about this issue? My understanding of Mark's analysis is that Linux kernels with stack address randomization report the wrong stack size when we ask getrlimit(RLIMIT_STACK), namely the stack size without the EXEC_STACK_BIAS slack, while the kernel's stack overflow handler uses another stack size, which includes EXEC_STACK_BIAS. Is is still the case today? Is my analysis wrong? Because if that's the root of the problem, it's a kernel issue and I don't see anything we could do about it in OCaml.
(0006381)
till (reporter)
2011-12-19 06:59

You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.

Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them. AFAIK the only full proof way to recover from stack overflows is to detect them before they happen. This is a pretty intrusive change but could be used as a step towards segmented stacks a la go (http://golang.org/doc/go_faq.html#goroutines [^]).

Till
(0006384)
xleroy (administrator)
2011-12-19 12:01

All right, so I see no one discussed the issue with kernel developers.

> You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested.

That would be worth a try, even though I'm not sure how much slack around rsp should be tolerated. Feel free to experiment and let me know of the results.

> Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them.

No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code.

> AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.

That still needs to know exactly where the bottom of the system stack is, which is our current problem.
(0006419)
xleroy (administrator)
2011-12-20 15:41

As a quick fix, under Linux only, I added 2Mb to the stack space reported by getrlimit(). It works again (tested on AMD64), but we'll have to find something better at some point. Done in 3.12 branch (commit 11897) and in SVN trunk (commit 11898).
(0006579)
till (reporter)
2012-01-03 05:40

>> No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code

True although PC is not always available. What I should have said is that people might wrongly expect exceptions to always be raised on stack overflows and this isn't technically possible. I guess I am mostly trying to slowly sell the idea of segmented stacks (which require an accurate preemptive detection of overflow).

>> AFAIK the only full proof way to recover from stack overflows is to detect them before they happen.
>
> That still needs to know exactly where the bottom of the system stack is, which is our current problem.

You could be conservative. This does seem like a very involved solution and would also make function calls more expensive at runtime.

- Issue History
Date Modified Username Field Change
2009-03-16 10:32 shinwell New Issue
2009-03-28 16:34 xleroy Note Added: 0004869
2009-03-28 16:34 xleroy Assigned To => xleroy
2009-03-28 16:34 xleroy Status new => feedback
2011-12-18 10:35 xleroy Note Added: 0006371
2011-12-19 06:59 till Note Added: 0006381
2011-12-19 12:01 xleroy Note Added: 0006384
2011-12-20 15:41 xleroy Note Added: 0006419
2011-12-20 15:41 xleroy Status feedback => resolved
2011-12-20 15:41 xleroy Resolution open => fixed
2011-12-20 15:47 xleroy Fixed in Version => 3.12.1+dev
2012-01-03 05:16 till File Added: stack_overflow_detection_around_rsp.diff
2012-01-03 05:40 till Note Added: 0006579
2013-08-31 12:44 xleroy Status resolved => closed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker