Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCaml + frame-pointer on + clang (and perhaps GCC) #7417

Closed
vicuna opened this issue Nov 18, 2016 · 15 comments
Closed

OCaml + frame-pointer on + clang (and perhaps GCC) #7417

vicuna opened this issue Nov 18, 2016 · 15 comments
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Nov 18, 2016

Original bug ID: 7417
Reporter: cullmann
Assigned to: @gasche
Status: resolved (set by @gasche on 2016-11-27T19:16:36Z)
Resolution: fixed
Priority: normal
Severity: minor
Platform: Linux 64-bit
OS: openSUSE
OS Version: 42.x
Version: 4.04.0
Target version: 4.04.1+dev
Fixed in version: 4.04.1+dev
Category: back end (clambda to assembly)
Tags: patch
Monitored by: @gasche @jmeber

Bug description

We tried to use OCaml together with the $with_frame_pointers option to have usable perf profiling. As we use clang and not gcc, we were so free to activate that config in configure, too:

if test "$with_frame_pointers" = "true"; then
case "$target,$cc" in
x86_64--linux,gcc*|x86_64--linux,clang*)
nativecccompopts="$nativecccompopts -g -fno-omit-frame-pointer"
bytecccompopts="$bytecccompopts -g -fno-omit-frame-pointer"
nativecclinkopts="$nativecclinkopts -g"
echo "#define WITH_FRAME_POINTERS" >> m.h
;;
*) err "Unsupported architecture with frame pointers";;
esac
fi

Unfortunately, that leads to random segfaults e.g. during unit test execution.
(mostly inside gc routines)

As clang normally behaves just like gcc (at least for the C code the ocaml stuff contains), this looks strange. After looking bit at the code to search potential issues that could lead to this, I found in amd64.S that lines:

LBL(caml_allocN):
pushq %rax; CFI_ADJUST(8) /* save desired size /
subq %rax, %r15
CMP_VAR(caml_young_limit, %r15)
jb LBL(103)
addq $8, %rsp; CFI_ADJUST (-8) /
drop desired size /
ret
LBL(103):
CFI_ADJUST(8)
RECORD_STACK_FRAME(8)
#ifdef WITH_FRAME_POINTERS
/
Do we need 16-byte alignment here ? /
ENTER_FUNCTION
#endif
call LBL(caml_call_gc)
#ifdef WITH_FRAME_POINTERS
LEAVE_FUNCTION
#endif
popq %rax; CFI_ADJUST(-8) /
recover desired size */
jmp LBL(caml_allocN)
CFI_ENDPROC

=> I would say you need 16-byte alignment here, given older bugs like

https://caml.inria.fr/mantis/print_bug_page.php?bug_id=6038

=> can it be that this ifdef breaks that? This would mean that with frame pointers is broken for GCC, too, which makes that assumption, but perhaps clang more often triggers that.

Steps to reproduce

See above, patch configure and compile with clang + frame pointers (we use clang 3.9.0)

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: cullmann

The lines quoted above got introduced for

#6031

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: @mshinwell

I think you do need 16-byte alignment here. The problem is probably that Clang has generated some instruction that faults if an operand isn't 16-byte aligned.

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: cullmann

Perhaps other places modified with that #ifdef needs review, too. Given problems occur only for some instruction generations in the C compiler that is a hard to track thingy.

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: @gasche

We tried to use OCaml together with the $with_frame_pointers option to have usable perf profiling.

You don't need frame pointers for usable perf profiling, you can use the DWARF information produced by (ocamlopt -g) with the --call-graph=dwarf option:

https://ocaml.org/learn/tutorials/performance_and_profiling.html#UsingperfonLinux

(Using perf on binaries without -g will give you per-function costs, but not aggregated by call stacks.)

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: cullmann

You don't need frame pointers for usable perf profiling, you can use the DWARF information produced by (ocamlopt -g) with the --call-graph=dwarf option:

We tried that, like for C/C++ the resulting stack frame chains are much more imprecise and more often broken for large things.

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: @mshinwell

You also need a fairly recent version of perf (and kernel maybe?) for DWARF/perf profiling.

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: cullmann

You also need a fairly recent version of perf (and kernel maybe?) for DWARF/perf profiling.
I have both, that doesn't matter.
You just miss more frames as it is much more expensive the use the DWARF variant.
Anyways, I think that is offtopic, the with-frame-pointer option must still work OK and not generate code that breaks with compilers assuming 16 byte alignment (if that is the case here).

@vicuna
Copy link
Author

vicuna commented Nov 18, 2016

Comment author: @mshinwell

I will investigate making a patch for this issue, and see if I can spot any other cases where the alignment is wrong.

@vicuna
Copy link
Author

vicuna commented Nov 20, 2016

Comment author: cullmann

I would propose a patch like:

--- a/src/asmrun/amd64.S
+++ b/src/asmrun/amd64.S
@@ -438,12 +438,15 @@ LBL(103):
CFI_ADJUST(8)
RECORD_STACK_FRAME(8)
#ifdef WITH_FRAME_POINTERS

  •    /* Do we need 16-byte alignment here ? */
    
  •    /* ensure 16 byte alignment by subq + enter using 16-bytes, #7417 */
    
  •    subq    $8, %rsp; CFI_ADJUST (8)
       ENTER_FUNCTION
    

#endif
call LBL(caml_call_gc)
#ifdef WITH_FRAME_POINTERS

  •    /* ensure 16 byte alignment by leave + addq using 16-bytes #7417 */
       LEAVE_FUNCTION
    
  •    addq    $8, %rsp; CFI_ADJUST (-8)
    

#endif
popq %rax; CFI_ADJUST(-8) /* recover desired size */
jmp LBL(caml_allocN)

I inspected the other #ifdef'd locations, they look not wrong to me.

@vicuna
Copy link
Author

vicuna commented Nov 20, 2016

Comment author: cullmann

Created small pull request for that change:

#930

@vicuna
Copy link
Author

vicuna commented Nov 20, 2016

Comment author: cullmann

I can now confirm that all company internal regression tests work with that again (e.g. clang + -with-frame-pointers does again the same as without the option)

@vicuna
Copy link
Author

vicuna commented Nov 27, 2016

Comment author: @xavierleroy

Patch merged on trunk, commit 3ad607f. OK for closing? or should it also go on 4.04 release branch?

@vicuna
Copy link
Author

vicuna commented Nov 27, 2016

Comment author: cullmann

If there is some 4.04.x patch release, I think it would make sense to have it in the branch for that.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2016

Comment author: @gasche

I understand Damien's initial triaging as an indication of support for a 4.0x fix, so I'll also cherry-pick in 4.04. Thanks to Xavier for the triaging and being careful about that.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2016

Comment author: @gasche

Merged in the 4.04 release branch as a10770e. Thanks!

@vicuna vicuna closed this as completed Nov 27, 2016
@vicuna vicuna added this to the 4.04.1 milestone Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants