|Anonymous | Login | Signup for a new account||2013-05-24 03:50 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005485||OCaml||OCaml general||public||2012-01-19 22:23||2012-02-17 17:02|
|Target Version||Fixed in Version||3.13.0+dev|
|Summary||0005485: Reducing the risk of segfault due to stack overflow|
|Description||We just had a segfault due a stack overflow. It occurred during the compare_val function while doing a Hashtbl.find_all on an excessively deep bucket. It took me close to two days to identify the source of the problem and to fix it in our unfriendly environment (Hadoop map/reduce). Having a stack trace in this case would have saved about a day of debugging.|
Here is a simple repro case:
----- overflow.ml -----
let main () =
let n = 1_000_000 in
let tbl = Hashtbl.create n in
let k = "a" in
for i = 1 to n do
Hashtbl.add tbl k ()
ignore (Hashtbl.find_all tbl k)
let () =
$ ocamlopt -o overflow -g overflow.ml
$ ulimit -c unlimited
Segmentation fault (core dumped)
Xavier's last comment on a similar bug report is:
"Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation."
(see http://caml.inria.fr/mantis/view.php?id=4843#c5094 [^])
Here gdb tells us that the crash occurs during compare_val() which is
used to compare the query key with the keys in the hash table's bucket:
$ gdb overflow core
GNU gdb (Gentoo 7.3.1 p2) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/martin/tmp/overflow/overflow...(no debugging
[New LWP 7204]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./overflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041affc in compare_val ()
In the absence of a better solution, something helpful would be a compile-time flag that adds a check before each function call. A Stack_overflow exception would be raised N bytes before reaching the stack size limit. It means that any sequence of C function calls would have at least N bytes of stack space to work on. This would make a C function much less likely to trigger a stack overflow.
A compile-time flag causing a reasonable slowdown (< 2x) would greatly facilitate debugging.
|Tags||No tags attached.|
|It might be possible to raise a Stack_overflow in this case (if there is not enough stack space available before calling into a C function), either as suggested in PR#5064, or (even more cheaply) by putting the "stack touch" sequence in caml_c_call and caml_call_gc. Generating a meaningful stack backtrace is much more difficult, though.|
Reasonable (but not 100% perfect) fix implemented in SVN trunk. It does generate stack backtraces! See PR#5064 for a discussion.
|2012-01-19 22:23||Martin Jambon||New Issue|
|2012-02-14 15:48||gasche||Status||new => confirmed|
|2012-02-16 11:16||xleroy||Relationship added||related to 0005064|
|2012-02-16 11:20||xleroy||Note Added: 0006930|
|2012-02-17 17:02||xleroy||Note Added: 0006935|
|2012-02-17 17:02||xleroy||Status||confirmed => resolved|
|2012-02-17 17:02||xleroy||Resolution||open => fixed|
|2012-02-17 17:02||xleroy||Fixed in Version||=> 3.13.0+dev|
|Copyright © 2000 - 2011 MantisBT Group|