New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect calculation for stack overflow detection #4746
Comments
Comment author: @xavierleroy I've never been a big fan of address stack randomization :-) I don't know what would be a good way to handle this issue, but if you can think of one, please let us know. |
Comment author: @xavierleroy Two years later, do we have new data about this issue? My understanding of Mark's analysis is that Linux kernels with stack address randomization report the wrong stack size when we ask getrlimit(RLIMIT_STACK), namely the stack size without the EXEC_STACK_BIAS slack, while the kernel's stack overflow handler uses another stack size, which includes EXEC_STACK_BIAS. Is is still the case today? Is my analysis wrong? Because if that's the root of the problem, it's a kernel issue and I don't see anything we could do about it in OCaml. |
Comment author: till You might have better luck comparing the CONTEXT_FAULTING_ADDRESS with the info provided in the sigcontext.rsp instead of trying to figure where the top of the stack is. I'd be happy to try cooking up a patch if you are interested. Note that you will still have issues when the overflow is triggered in other parts of the runtime (e.g. a malloc in the gc). Since those overflow might leave internal data structures incoherent states there's probably no recovering from them. AFAIK the only full proof way to recover from stack overflows is to detect them before they happen. This is a pretty intrusive change but could be used as a step towards segmented stacks a la go (http://golang.org/doc/go_faq.html#goroutines). Till |
Comment author: @xavierleroy All right, so I see no one discussed the issue with kernel developers.
That would be worth a try, even though I'm not sure how much slack around rsp should be tolerated. Feel free to experiment and let me know of the results.
No: segv_handler is careful not to turn the SIGSEGV into an exception unless the PC is inside ocamlopt-generated code.
That still needs to know exactly where the bottom of the system stack is, which is our current problem. |
Comment author: @xavierleroy As a quick fix, under Linux only, I added 2Mb to the stack space reported by getrlimit(). It works again (tested on AMD64), but we'll have to find something better at some point. Done in 3.12 branch (commit 11897) and in SVN trunk (commit 11898). |
Comment author: till
True although PC is not always available. What I should have said is that people might wrongly expect exceptions to always be raised on stack overflows and this isn't technically possible. I guess I am mostly trying to slowly sell the idea of segmented stacks (which require an accurate preemptive detection of overflow).
You could be conservative. This does seem like a very involved solution and would also make function calls more expensive at runtime. |
Original bug ID: 4746
Reporter: @mshinwell
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2013-08-31T10:44:06Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.11.0
Fixed in version: 3.12.1+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: rdouglass @ygrek yminsky @oandrieu pzimmer @mmottl
Bug description
The calculation for detecting upon SIGSEGV whether the faulting address is likely to lie in the stack fails to work correctly on x86-64 machines with address space randomization enabled (as appears to be the default with at least some recent Redhat kernels). The upshot is that stack overflow ends up producing segfaults rather than Caml exceptions.
With this randomization enabled and the ulimit -s value set to X, then the stack is not allocated as maximum X in size but rather X + EXEC_STACK_BIAS, in order to give space for the randomization. The bias constant defaults to 2Mb in the kernel I have seen, and the effect can easily be seen by growing the stack to the ulimit and looking at the memory map with pmap. The stack limit check in the kernel (mm/mmap.c) looks like this:
static int over_stack_limit(unsigned long sz)
{
if (sz < EXEC_STACK_BIAS)
return 0;
return (sz - EXEC_STACK_BIAS) >
current->signal->rlim[RLIMIT_STACK].rlim_cur;
}
The upshot is that even with the ulimit at X, a stack-eating program won't fault at approximately (top of stack - X) but rather at (top of stack - X - EXEC_STACK_BIAS) or thereabouts. (The EXEC_STACK_BIAS offset on %rsp is extremely obscure until you realize what is happening!) This means that the check in the Caml default segfault handler doesn't catch it.
I haven't yet thought much about a solution to this problem but I'll add to this issue if I think of one.
File attachments
The text was updated successfully, but these errors were encountered: