Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007118OCamlback end (clambda to assembly)public2016-01-06 21:482017-08-27 17:48
Reporterbartjacobs 
Assigned Togasche 
PrioritynormalSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.02.1 
Target Version4.03.1+devFixed in Version4.03.0+dev / +beta1 
Summary0007118: Crashes when ocamlopt-generated code calls into Objective-C code that adds an exception handler
DescriptionIt seems that Objective-C code sometimes performs a stack walk as part of its happy-path normal operation; such stack walks seem to sometimes crash on ocamlopt-generated stack frames.

Evidence for this is that my tool, VeriFast, has been crashing recently. See the attached crash report. VeriFast uses GTK through lablgtk for its GUI. On my OS X machine, I can make VeriFast 15.12 crash reliably (see below). Apparently, the sequence of events is as follows:
- VeriFast calls into lablgtk
- lablgtk calls into GTK
- GTK calls into Cocoa (AppKit)
- AppKit calls into the Objective-C runtime to add an exception handler.
- The runtime wants to find the innermost enclosing stack frame that handles Objective-C exceptions. It uses Apple's libunwind implementation to walk the stack to find such a frame.
- libunwind crashes while advancing the stack walking cursor.

I haven't been able to verify yet that the stack frame that libunwind crashes on is an ocamlopt-generated stack frame (I don't have debug symbols for libunwind and interpreting the machine state manually is a bit daunting), but that seems likely, given that ocamlopt-generated code generates unusually-shaped stack frames.

Note: initially, I though the problem manifested only when calling ocamlopt without the -g option, but recently, I'm seeing the crashing even with VeriFast 15.12, which is built with the -g option.

Unfortunately, it does not seem to be the case that libunwind always crashes on ocamlopt-generated stack frames. Apparently, it only does sometimes. For example, if I run VeriFast IDE inside a debugger (lldb), I don't get crashes. (I can only investigate the problem by generating a core dump.)

Assuming that it is indeed the case that libunwind crashes on ocamlopt-generated stack frames, there seem to be a number of possible solutions:
- Modify ocamlopt so that it generates stack frames that libunwind doesn't crash on
- Modify ocamlopt so that on OS X it generates stack walking backstops when calling into external functions.
- Modify libunwind so that it doesn't crash.
- Modify lablgtk so that it generates stack walking backstops.
- Modify GTK so that it generates stack walking backstops when calling into Cocoa.
- Modify the Objective-C runtime so that it doesn't walk the stack when adding an exception handler.
- Modify the Objective-C compiler so that it generates a stack walking backstop when C code calls Objective-C code.
Steps To Reproduce(This works on my machine reliably, but it seems rather brittle so it might not work on your machine.)
1. Download VeriFast 15.12 for OS X from http://www.cs.kuleuven.be/~bartj/ [^]
2. Unzip the tarball.
3. Run 'bin/vfide examples/termination/ackermann.c'
4. Issue the Verify command, either by clicking the Play toolbar button, pressing F5, or choosing the Verify menu item.
VeriFast crashes after verification finishes, when it tries to scroll to the error location.
Additional InformationI have also filed the following bugs:
- lablgtk bug 1518 (https://forge.ocamlcore.org/tracker/index.php?func=detail&aid=1518&group_id=220&atid=1032 [^])
- GTK bug 759278 (https://bugzilla.gnome.org/show_bug.cgi?id=759278 [^])
- Apple bug 24076902 (https://bugreport.apple.com/problem/viewproblem [^])
TagsNo tags attached.
Attached Files? file icon vfide-core_2016-01-05-192723_Bart-Jacobss-MacBook-Pro.crash [^] (49,134 bytes) 2016-01-06 21:48

- Relationships
related to 0007120resolvedgasche Extra .cfi_adjust_cfa_offset directive between ret and .cfi_endproc leads to incorrect unwind behavior on OS X 

-  Notes
(0015236)
gasche (administrator)
2016-01-06 22:03
edited on: 2016-01-07 17:17

One option you could try as a potential workaround/experiment is to use the variant of the OCaml distribution that is configured to add frame pointers (this makes stack easier to walks by some tools). In OPAM, you can use the switches 4.02.1+fp (or 4.02.3+fp) to get an OCaml environment with frame-pointers enabled -- but this is a global change, as it changes the global compilation strategy.

Long-term, if your analysis of the issue is valid, I would guess that your option

> Modify ocamlopt so that on OS X it generates stack walking backstops when calling into external functions.

is the more doable options on the OCaml side. I don't know about "stack walking backstops", but if was a clang attribute to generate such backstops at function entry, it would even be possible to add them in the C binding code directly, for example in the C bindings part of LablGtk. We could even modify only the C functions that we know may delegate to the Objective-C runtime, which would lower massively any performance impact of the change.

(0015237)
yallop (developer)
2016-01-06 22:53

We've seen similar issues with Objective C, libunwind, stack walking and ocamlopt on OS X:

  http://lists.ocaml.org/pipermail/ctypes/2013-December/000015.html [^]

I don't think we ever quite got to the bottom of the problem, but if the problems are related then the information in that thread might be helpful. For example, Daniel Buenzli traced the problem to the addition of -no_compact_unwind to the configure script:

  http://lists.ocaml.org/pipermail/ctypes/2013-December/000038.html [^]

According to my notes libunwind seemed to have trouble parsing the frame description entries (FDEs) generated with -no_compact_unwind.

We had the additional complication of using libffi, so it's possible there was something else going on.
(0015239)
bartjacobs (reporter)
2016-01-07 18:38

Many thanks to gasche and yallop for your help.

I have now familiarized myself a bit with DWARF, used that to read some of the relevant libunwind source code, and used that to make some sense of the core dump of my crashed VeriFast process. I can now confirm that libunwind does indeed crash while trying to step to the caller of an ocamlopt-generated frame. The frame it crashes on is the first ocamlopt-generated frame it encounters. This is a VeriFast frame.

libunwind does find DWARF call frame information for the ocamlopt-generated code, but apparently something goes wrong while interpreting that information, because the CFA computed is outside the stack. (The CFA, Canonical Frame Address, is supposed to be the stack pointer at the call site.) (This seems to confirm yallop's notes, except that I did not yet check that I'm using -no_compact_unwind.)

libunwind then tries to use this CFA to retrieve a register value saved on the stack by dereferencing the address at a given offset from the CFA, whence the EXC_BAD_ACCESS crash.

It just occurred to me that there might be an extremely simple and feasible solution: simply don't emit any unwind information for ocamlopt-generated frames! This will cause libunwind to stop walking the stack. This even reduces the file size! :-)

I will try this once I figure out what options to pass to ocamlopt and/or the linker to achieve this. (Any pointers?) However, this won't work if this information is required for a 'legitimate' purpose. Does the OCaml runtime use this information for its own exception propagation?

In that case, I see another very simple solution: do not emit unwind information for the C modules of lablgtk.

I'll try these approaches now.
(0015240)
bartjacobs (reporter)
2016-01-07 19:05

Update: I decided to first check why earlier versions of VeriFast didn't crash. By debugging VeriFast 15.05 with lldb, I could determine that in that version, libunwind doesn't walk into the VeriFast binary. Specifically, it walks through GTK, but it doesn't walk into the lablgtk C glue function. Presumably, in that build the lablgtk C functions don't have unwinding info. I'll try to figure out why in my current builds they do.
(0015241)
gasche (administrator)
2016-01-07 20:53

Mark Shinwell would be the expert on Dwarf information emitted by the compiler -- current and future state.
(0015242)
bartjacobs (reporter)
2016-01-08 03:01

Update: I seem to have found a fix for my problem. It seems to be sufficient to add -fno-unwind-tables to the C compiler command line when compiling OCaml-to-GTK glue code. After this fix, my previously reliable way to make VeriFast crash no longer works. I confirmed in the debugger that the Cocoa stack walks no longer walk past the lablgtk C glue functions.

I have reported this on the lablgtk bug tracker.

While, for defense-in-depth purposes, it might be good to also take further action on the OCaml side (such as fixing the unwind info or dropping it), for me, no further action seems to be required. Many thanks again to gasche and yallop for your help!
(0015452)
gasche (administrator)
2016-03-08 14:36

This has been fixed by Bart Jacob's PR
  https://github.com/ocaml/ocaml/pull/408 [^]
which has been merged in 4.03:
  https://github.com/ocaml/ocaml/commit/59a4fd6615454362aba2b7c4c5ea788d19edd739 [^]

- Issue History
Date Modified Username Field Change
2016-01-06 21:48 bartjacobs New Issue
2016-01-06 21:48 bartjacobs File Added: vfide-core_2016-01-05-192723_Bart-Jacobss-MacBook-Pro.crash
2016-01-06 22:03 gasche Note Added: 0015236
2016-01-06 22:53 yallop Note Added: 0015237
2016-01-07 17:16 gasche Note Edited: 0015236 View Revisions
2016-01-07 17:16 gasche Note Edited: 0015236 View Revisions
2016-01-07 17:17 gasche Note Edited: 0015236 View Revisions
2016-01-07 18:38 bartjacobs Note Added: 0015239
2016-01-07 19:05 bartjacobs Note Added: 0015240
2016-01-07 20:53 gasche Note Added: 0015241
2016-01-07 20:55 gasche Status new => acknowledged
2016-01-08 03:01 bartjacobs Note Added: 0015242
2016-01-09 14:15 gasche Relationship added related to 0007120
2016-02-03 12:07 doligez Target Version => 4.03.1+dev
2016-03-08 14:36 gasche Note Added: 0015452
2016-03-08 14:36 gasche Status acknowledged => resolved
2016-03-08 14:36 gasche Fixed in Version => 4.03.0+dev / +beta1
2016-03-08 14:36 gasche Resolution open => fixed
2016-03-08 14:36 gasche Assigned To => gasche
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker