Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra .cfi_adjust_cfa_offset directive between ret and .cfi_endproc leads to incorrect unwind behavior on OS X #7120

Closed
vicuna opened this issue Jan 9, 2016 · 9 comments
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Jan 9, 2016

Original bug ID: 7120
Reporter: bartjacobs
Assigned to: @gasche
Status: resolved (set by @gasche on 2016-03-08T13:36:32Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 4.02.3
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Related to: #7118
Monitored by: @gasche @yallop

Bug description

ocamlopt (at least sometimes, perhaps always) generates an extra .cfi_adjust_cfa_offset directive between ret and .cfi_endproc at the end of an assembly routine. (.cfi_xxx directives generate DWARF stack unwinding information. This information enables debuggers such as LLDB to produce a stack backtrace even for code that does not adhere to the classical stack frame layout with base pointers. It is also used by the runtime system of certain programming languages, including C++ and Objective-C, to walk and unwind the stack for exception handling.)

This causes the .cfi_adjust_cfa_offset directives to not be pairwise balanced, i.e. this extra directive does not have a counterpart that cancels it out.

The resulting unwind info causes some programs interpreting this info (at least Apple's libunwind 35.3 on OS X 10.10.1, perhaps other consumers on other systems as well) to take the directive into account for subsequent assembly routines, causing the CFA offsets to accumulate and be incorrect for these subsequent routines.

The consequences are truncated call stacks in LLDB. But more importantly, this can lead to crashes if OCaml code calls into code that performs stack walks to find exception handlers. For example, Cocoa (the OS X GUI library) triggers such stack walks. See also bug 7118.

Steps to reproduce

Unzip the attached archive. Then:
make
lldb ./driver
break set -n camlMylib__bar_1223 # The number is non-deterministic. Check mylib.s for the right value.
run
bt # Observe that the backtrace is not correct: at this point there are really at least 10 stack frames: dylib`start -> main -> caml_main -> caml_start_program -> caml_program -> camlDriver__entry -> caml_apply11 -> camlMylib__foo1 -> caml_apply10 -> camlMylib__bar
target modules show-unwind -n camlMylib__foo1_1199
target modules show-unwind -n camlMylib__foo2_1211
target modules show-unwind -n camlMylib__bar_1223

The unwind info shown for foo1 is correct. Notice, however, that the unwind info shown for foo2 is different from that shown for foo1, even though these two OCaml functions have identical bodies and identical assembly routines. The info for foo2 is incorrect. The extra .cfi_adjust_cfa_offset value specified at the end of the assembly routine for foo1 has been accumulated onto the CFA offsets for foo2.

File attachments

@vicuna
Copy link
Author

vicuna commented Jan 9, 2016

Comment author: bartjacobs

I have a fix for this issue. See btj@4183ce5 .

I now get correct stack traces, except that they still stop at caml_start_program. This is an assembly routine defined in asmrun/amd64.S. I am investigating.

I will try to fix this problem, then add a regression test, and then create a pull request.

@vicuna
Copy link
Author

vicuna commented Jan 9, 2016

Comment author: bartjacobs

I now have clean, full backtraces in LLDB. I submitted a pull request (#408).

Some useful things I learned:

  • You can disassemble object files on OS X using otool -t -v myobj.o
  • You can see unwind info using dwarfdump --eh-frame myobj.o

@vicuna
Copy link
Author

vicuna commented Jan 18, 2016

Comment author: bartjacobs

I have now added an automated regression test to the abovementioned pull request. It runs only on OS X. It uses libunwind to walk the stack. It succeeds if the stack walk reaches main.

It took me some investigating to come up with the regression test. I had trouble getting the stack walk to walk the C functions.

It turned out that there was no unwind information (neither compact nor DWARF) for the C functions in the executable, even though the object files for the C modules did have both DWARF and compact unwind info. So, the linker dropped both.

The problem is that by default, ocamlopt passes the -Wl,-no_compact_unwind flag to clang, which causes clang to pass -no_compact_unwind to ld. That explains why ld did not copy the compact unwind info from the object files into the executable. It also doesn't seem to copy the DWARF info. Presumably, that's because there's a separate check that says: if there's compact unwind info, don't copy the DWARF info. (That makes sense: the whole point of the compact info is to reduce the size of the binary (as well as to speed up run-time lookups).)

It turns out you can force ld to copy the DWARF info using the -keep_dwarf_unwind flag. This solves the problem.

Note: it's important that ocamlopt passes -no_compact_unwind to ld. That's because without that flag, ld (at least on my machine, OS X 10.10.1, ld64-241.9) will infer compact unwind info from the DWARF info. Unfortunately, it seems to do so incorrectly: without the -no_compact_unwind flag, ld generates incorrect compact unwind info for the OCaml functions.

(BTW: The regression test is written in C and uses libunwind explicitly. I also wrote a test case in C++ with a custom main function, which throws a C++ exception through the OCaml code. The test case succeeds if the exception is successfully caught in main. Compiling this test case requires passing -cc clang++ to ocamlopt when building the executable. But when using -cc, ocamlopt does not generate the -Wl,-no_compact_unwind option! So, when using -cc clang++, explicitly pass -cclib -Wl,-no_compact_unwind,-keep_dwarf_unwind and the test case will work!)

@vicuna
Copy link
Author

vicuna commented Jan 18, 2016

Comment author: bartjacobs

Philosophical note: I don't know what the original motivation was for having ocamlopt generate DWARF unwinding info. Presumably, the goal was to facilitate debugging, not to enable run-time stack unwinding.

Of course, the stability and correctness requirements for the debugging scenario are much weaker than for the run-time stack unwinding scenario. It is, therefore, unfortunate that this info is being used inadvertently by the OS X unwinder at run time. However, there seems to be a solution: if the goal really is only to facilitate debugging, then the frame info should be put into the .debug_frame section of the object file, not the .eh_frame section! (Note: I didn't test that this does the right thing.)

Unfortunately, it seems that the LLVM assembler (invoked by passing a .s file to clang) always emits the information specified by the .cfi_xxx directives into the .eh_frame section, never into the .debug_frame section. I would say that's a bug in the LLVM assembler. (The offending code seems to be in class ASMPrinter::needsCFIMoves (http://llvm.org/doxygen/AsmPrinter_8cpp_source.html#l00799).) Participating in the run-time unwinding story should always be an opt-in, not a default.

I have filed an LLVM bug (https://llvm.org/bugs/show_bug.cgi?id=26191).

I have also filed a bug about the LLVM assembler not flagging unbalanced .cfi_adjust_cfa_offset directives as an error (https://llvm.org/bugs/show_bug.cgi?id=26098).

@vicuna
Copy link
Author

vicuna commented Jan 25, 2016

Comment author: bartjacobs

Update on .eh_frame versus .debug_frame: It turns out that there is an assembler directive called '.cfi_sections'. You can get the assembler to emit CFI information into the .debug_frame section of the object file instead of the .eh_frame section by putting '.cfi_sections .debug_frame' at the top of the assembly file. I tested it; it works on OS X with the clang assembler.

However, the OS X linker (ld64) does not seem to copy the .debug_frame section into either the executable or the .dSYM bundle. In fact, the string "debug_frame" does not appear at all in ld64's source code. So this seems to be a dead end.

In any case, even if ld64 supported .debug_frame, it should copy it into the .dSYM bundle that holds the debugging information, not the executable itself. Therefore, if the reason why ocamlopt emits CFI information is to ensure that processes running OCaml code have proper stack traces, then this would be achieved only if the .dSYM bundle for the executable was present. (And actually, I'm not even sure that debuggers or other backtrace producers such as OS X's crash diagnostic report generator would use the .debug_frame section if present. Conversely, I'm not sure libunwind would not use it.) So, if the goal is to support proper stack traces even for "stripped" executables (i.e., ones without a .dSYM), then emitting an .eh_frame section seems to be the only option. (Actually, the other option is to use frame pointers.)

For OCaml users who don't care about native backtraces and who don't want to be exposed to possible remaining or newly introduced CFI bugs, options include passing the -cclib -Wl,-no_keep_dwarf_unwind option to ocamlopt when generating the binary (didn't test it yet), and/or passing the -fno-unwind-tables option to clang when compiling OCaml-to-C glue code modules (tested).

For users who do want native backtraces, it seems advisable to test the CFI information. One way to do so would be to include a call to a perform_stack_walk() function (like the one in the regression test case in my pull request, see above) into each OCaml-to-C glue code function in debug builds. perform_stack_walk() would use libunwind to walk the stack, and fail if the walk did not reach the main function. I might try and contribute something like this to lablgtk.

@vicuna
Copy link
Author

vicuna commented Jan 28, 2016

Comment author: bartjacobs

Actually, the -fno-unwind-tables approach for OCaml-to-C glue code seems to be optimal: it stops libunwind stack walks, but it doesn't prevent proper backtraces from being generated by OS X on a crash, or by lldb, provided the OCaml-to-C glue code has "standard" stack frames (e.g. using a frame pointer). (Didn't test yet.)

Also, I realized that I can figure out the original intention of emitting the .cfi directives by looking into the OCaml repo history. It turns out the CFI information was originally added on Feb 21, 2012 (2eecf2d) in response to #5487 (#5487). The goal was to get decent backtraces in gdb on Linux/x86 and Linux/amd64.

@vicuna
Copy link
Author

vicuna commented Jan 28, 2016

Comment author: bartjacobs

For the record: OCaml's ./configure script has a -no-cfi option.

@vicuna
Copy link
Author

vicuna commented Mar 8, 2016

Comment author: @gasche

This has been fixed by Bart Jacob's PR
#408
which has been merged in 4.03:
59a4fd6

@vicuna vicuna closed this as completed Mar 8, 2016
@vicuna
Copy link
Author

vicuna commented Mar 8, 2016

Comment author: @gasche

There is a problematic bug reported against the Coq project, which regularly crashes CoqIDE on MacOSX, and I think it is the same issue (an OSX trace shows a crash in libunwind after trying to install an exception handler from Obj-C):
https://coq.inria.fr/bugs/show_bug.cgi?id=4548

Thanks a lot for the very detailed information in this bug report, I think it will be very helpful for the Coq team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants