Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005485OCamlOCaml generalpublic2012-01-19 22:232013-06-14 22:19
ReporterMartin Jambon 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityalways
StatusresolvedResolutionfixed 
PlatformLinux/AMD64OSOS Version2.6.34
Product Version3.12.1 
Target VersionFixed in Version3.13.0+dev 
Summary0005485: Reducing the risk of segfault due to stack overflow
DescriptionWe just had a segfault due a stack overflow. It occurred during the compare_val function while doing a Hashtbl.find_all on an excessively deep bucket. It took me close to two days to identify the source of the problem and to fix it in our unfriendly environment (Hadoop map/reduce). Having a stack trace in this case would have saved about a day of debugging.


Here is a simple repro case:

----- overflow.ml -----
let main () =
  let n = 1_000_000 in
  let tbl = Hashtbl.create n in
  let k = "a" in
  for i = 1 to n do
    Hashtbl.add tbl k ()
  done;
  print_endline "find_all";
  ignore (Hashtbl.find_all tbl k)

let () =
  Printexc.record_backtrace true;
  main ()
------------------------

$ ocamlopt -o overflow -g overflow.ml
$ ulimit -c unlimited
$ ./overflow
find_all
Segmentation fault (core dumped)

Xavier's last comment on a similar bug report is:

"Mark Shinwell's analysis is correct. We can catch SEGV arising from stack overflows in Caml code reasonably well, but we cannot recover from a SEGV arising in the middle of C code. I'm afraid this is a "cannot fix" situation."

(see http://caml.inria.fr/mantis/view.php?id=4843#c5094 [^])

Here gdb tells us that the crash occurs during compare_val() which is
used to compare the query key with the keys in the hash table's bucket:

$ gdb overflow core
GNU gdb (Gentoo 7.3.1 p2) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html> [^]
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>... [^]
Reading symbols from /home/martin/tmp/overflow/overflow...(no debugging
symbols found)...done.
[New LWP 7204]

warning: Can't read pathname for load map: Input/output error.
Core was generated by `./overflow'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041affc in compare_val ()
(gdb)


In the absence of a better solution, something helpful would be a compile-time flag that adds a check before each function call. A Stack_overflow exception would be raised N bytes before reaching the stack size limit. It means that any sequence of C function calls would have at least N bytes of stack space to work on. This would make a C function much less likely to trigger a stack overflow.

A compile-time flag causing a reasonable slowdown (< 2x) would greatly facilitate debugging.
TagsNo tags attached.
Attached Files

- Relationships
related to 0005064closed ocamlopt generated code: segmentation fault instead of stack overflow 

-  Notes
(0006930)
xleroy (administrator)
2012-02-16 11:20

It might be possible to raise a Stack_overflow in this case (if there is not enough stack space available before calling into a C function), either as suggested in PR#5064, or (even more cheaply) by putting the "stack touch" sequence in caml_c_call and caml_call_gc. Generating a meaningful stack backtrace is much more difficult, though.
(0006935)
xleroy (administrator)
2012-02-17 17:02

Reasonable (but not 100% perfect) fix implemented in SVN trunk. It does generate stack backtraces! See PR#5064 for a discussion.

- Issue History
Date Modified Username Field Change
2012-01-19 22:23 Martin Jambon New Issue
2012-02-14 15:48 gasche Status new => confirmed
2012-02-16 11:16 xleroy Relationship added related to 0005064
2012-02-16 11:20 xleroy Note Added: 0006930
2012-02-17 17:02 xleroy Note Added: 0006935
2012-02-17 17:02 xleroy Status confirmed => resolved
2012-02-17 17:02 xleroy Resolution open => fixed
2012-02-17 17:02 xleroy Fixed in Version => 3.13.0+dev


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker