You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original bug ID: 5485 Reporter:@mjambon Status: closed (set by @xavierleroy on 2015-12-11T18:19:42Z) Resolution: fixed Priority: normal Severity: feature Platform: Linux/AMD64 OS Version: 2.6.34 Version: 3.12.1 Fixed in version: 3.13.0+dev Category: ~DO NOT USE (was: OCaml general) Related to:#5064 Monitored by:@ygrek
Bug description
We just had a segfault due a stack overflow. It occurred during the compare_val function while doing a Hashtbl.find_all on an excessively deep bucket. It took me close to two days to identify the source of the problem and to fix it in our unfriendly environment (Hadoop map/reduce). Having a stack trace in this case would have saved about a day of debugging.
Here is a simple repro case:
----- overflow.ml -----
let main () =
let n = 1_000_000 in
let tbl = Hashtbl.create n in
let k = "a" in
for i = 1 to n do
Hashtbl.add tbl k ()
done;
print_endline "find_all";
ignore (Hashtbl.find_all tbl k)
"Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation."
Here gdb tells us that the crash occurs during compare_val() which is
used to compare the query key with the keys in the hash table's bucket:
$ gdb overflow core
GNU gdb (Gentoo 7.3.1 p2) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see: http://bugs.gentoo.org/...
Reading symbols from /home/martin/tmp/overflow/overflow...(no debugging
symbols found)...done.
[New LWP 7204]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./overflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041affc in compare_val ()
(gdb)
In the absence of a better solution, something helpful would be a compile-time flag that adds a check before each function call. A Stack_overflow exception would be raised N bytes before reaching the stack size limit. It means that any sequence of C function calls would have at least N bytes of stack space to work on. This would make a C function much less likely to trigger a stack overflow.
A compile-time flag causing a reasonable slowdown (< 2x) would greatly facilitate debugging.
The text was updated successfully, but these errors were encountered:
It might be possible to raise a Stack_overflow in this case (if there is not enough stack space available before calling into a C function), either as suggested in #5064, or (even more cheaply) by putting the "stack touch" sequence in caml_c_call and caml_call_gc. Generating a meaningful stack backtrace is much more difficult, though.
Original bug ID: 5485
Reporter: @mjambon
Status: closed (set by @xavierleroy on 2015-12-11T18:19:42Z)
Resolution: fixed
Priority: normal
Severity: feature
Platform: Linux/AMD64
OS Version: 2.6.34
Version: 3.12.1
Fixed in version: 3.13.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5064
Monitored by: @ygrek
Bug description
We just had a segfault due a stack overflow. It occurred during the compare_val function while doing a Hashtbl.find_all on an excessively deep bucket. It took me close to two days to identify the source of the problem and to fix it in our unfriendly environment (Hadoop map/reduce). Having a stack trace in this case would have saved about a day of debugging.
Here is a simple repro case:
----- overflow.ml -----
let main () =
let n = 1_000_000 in
let tbl = Hashtbl.create n in
let k = "a" in
for i = 1 to n do
Hashtbl.add tbl k ()
done;
print_endline "find_all";
ignore (Hashtbl.find_all tbl k)
let () =
Printexc.record_backtrace true;
main ()
$ ocamlopt -o overflow -g overflow.ml
$ ulimit -c unlimited
$ ./overflow
find_all
Segmentation fault (core dumped)
Xavier's last comment on a similar bug report is:
"Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation."
(see #4843#c5094)
Here gdb tells us that the crash occurs during compare_val() which is
used to compare the query key with the keys in the hash table's bucket:
$ gdb overflow core
GNU gdb (Gentoo 7.3.1 p2) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
http://bugs.gentoo.org/...
Reading symbols from /home/martin/tmp/overflow/overflow...(no debugging
symbols found)...done.
[New LWP 7204]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./overflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041affc in compare_val ()
(gdb)
In the absence of a better solution, something helpful would be a compile-time flag that adds a check before each function call. A Stack_overflow exception would be raised N bytes before reaching the stack size limit. It means that any sequence of C function calls would have at least N bytes of stack space to work on. This would make a C function much less likely to trigger a stack overflow.
A compile-time flag causing a reasonable slowdown (< 2x) would greatly facilitate debugging.
The text was updated successfully, but these errors were encountered: