Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005721OCamlback end (clambda to assembly)public2012-08-06 14:172015-12-11 19:23
Assigned Tolefessan 
PlatformOSOS Version
Product Version4.00.0 
Target VersionFixed in Version4.01.0+dev 
Summary0005721: patch to use frame pointer to profile code with Linux perf tools
DescriptionHere is a patch to allow profiling amd64 OCaml native code with modern Linux performance tools ("perf record/report"), based on kernel support of processor counters.

Since the kernel is unable to use DWARF information stored in the program, stack unwinding is currently only based on C-style frames with frame pointers (%rbp saved on entry, and restored on leave), and this is not going to change in the near future (Linus becomes angry as soon as "dwarf" is pronounced somewhere).

This patch modifies the amd64 code generator to implement C-style frames, with the frame pointer in %rbp. This behavior becomes the default, unless a new option '-fomit-frame-pointer' is provided.

The implementation is based on decreasing by one the number of available integer registers, and moving %rbp two positions later (after %r12 and %13), as the last available integer register, available only if -fomit-frame-pointer is provided. %rbp is also saved on entry, and restored on leave from every function, and replaced by %r13 as an argument register.

On the "standard" compcert short benchmark (rm -f lib/Integers.vo; time make lib/Integers.vo), the performance penalty is less than 5%.
TagsNo tags attached.
Attached Filespatch file icon omit-frame-pointer-4.00.0.patch [^] (19,996 bytes) 2012-08-06 14:17 [Show Content]
? file icon omit-frame-pointer-4.00.0.patch.v2 [^] (23,045 bytes) 2012-08-13 10:09
patch file icon ocaml-4.00.1-with-fp.patch [^] (136,506 bytes) 2012-10-24 17:22 [Show Content]
? file icon [^] (435 bytes) 2012-12-17 23:30 [Show Content]
diff file icon fix-destroyed_at_c_call.diff [^] (724 bytes) 2013-01-14 16:59 [Show Content]

- Relationships
related to 0005723closedmeyer add environment variables to control compiler options 

-  Notes
ygrek (reporter)
2012-08-06 14:57
edited on: 2012-08-06 15:20

> and this is not going to change in the near future

There have been recent attempts at dwarf unwinding kernel-recorded stack in userspace, e.g. [^] (previous attempt at!topic/linux.kernel/MDZE766vq7E [^])

lefessan (developer)
2012-08-06 16:28

Yes, the "dwarf unwind in user space" patch has been around for a few months, but it is only able to unwind the portion of the stack saved by the kernel in the perf event. It is expensive (you need to copy all the data instead of just the return addresses, for every event) and you only extract the top of the stack, which might not be enough in some cases. Anyway, it might take a long time before having this available in a stable linux kernel in a standard distribution.
xleroy (administrator)
2012-08-06 17:57

I'm very much interested in good performance monitoring tools for Linux. Any pointer to some documentation I could read to learn about this new "perf" tool? And why it needs the kernel to walk the call stack?

However: if Linus gets angry when "dwarf" is mentioned, I get angry when "frame pointer" is mentioned. The %ebp/%rbp business in the x86 code generation conventions is a terrible waste, not just of time, but even more of stack space. (It can *double* the stack usage of some recursive functions, for heaven's sake!)

So, there is no way that frame pointers can be maintained by default. I'm open to discussions, however, on whether to maintain it if some options are given (-g ? some new option?).
lefessan (developer)
2012-08-06 19:22

There is a wiki with some information on how to use the tools: [^]

but most of the documentation is in the tools themselves ("perf help report" à la git), or in the code (in the tools/perf/ subdirectory of recent linux kernel sources).

For the space taken by %rbp on the stack, it is true that it can double the frame size, but at the same time, having a program doing a stack overflow because of that looks more like a bug from the program than from the code generator.

Anyway, use of frame pointers could be triggered as in C by a -fno-omit-frame-pointer option instead of being the default, and of course by default with -g (but use of -g should not be mandatory, because it also might disable other optimizations). Maybe having two distinguished modes for profiling, like "-p-hard" and "-p-soft" for hardware and software profiling modes...
lefessan (developer)
2012-08-06 19:26

On a side note: it would be useful to be able to use an environment variable, like OCAML_PROFILING=1, to trigger a mode for generating code suitable for profiling, without modifying the build system files. And another option for debugging code...
frisch (developer)
2012-08-06 22:49

Will it be possible to link modules compile with and without "no-omit-frame-pointer"? (My understanding is that this is purely local to the function and does not change the calling convention, right?)


What about a more general "OCAMLFLAGS"? The compilers would simply parse this variable and synthesize extra arguments at the beginning of the command-line.
(And it would then be nice if all switches came with a negated form.)
lefessan (developer)
2012-08-06 23:27

> Will it be possible to link modules compile with and without "no-omit-frame-pointer"? (My understanding is that this is purely local to the function and does not change the calling convention, right?)

Yes, but since my patch modifies the calling convention, all the files have to be compiled with the patch applied, you can then link together files compiled with and without the option.


Does such an option really make sense ? How should an option in OCAMLFLAGS be interpreted by different tools like ocamlc, ocamlopt, ocamldep, etc. ? It looks better for me to have specific environment variables for specific purposes (ocamldep knows that it can disregard OCAML_PROFILING, but it doesn't know what to do with a '-g' present in OCAMLFLAGS).
frisch (developer)
2012-08-07 17:48

> How should an option in OCAMLFLAGS be interpreted by different tools like ocamlc, ocamlopt, ocamldep, etc. ?

Indeed, we should probably have OCAMLCFLAGS, OCAMLOPTFLAGS, OCAMLDEPFLAGS, etc.

If we create an environment variable per command-line option, we will get a lot of them, and the user needs to know all their names. Off the top of my head, I can imagine situations where it could be useful to support the following options:

lefessan (developer)
2012-08-08 13:07

The discussion is diverging. I created another bug report here of the environment variable feature: [^]
lefessan (developer)
2012-08-13 10:09

I uploaded a new version (some offsets in caml_raise_exn were wrong, so backtraces were not correctly printed)
meurer (developer)
2012-08-13 11:18

I'm currently working on adding iOS support to the ARM backend, and the iOS ABI uses r7 as frame pointer. Right now I'm just using r7 as regular caller-save register, making debugging OCaml code on iPhone/iPad impossible. But if there is some interest to support frame pointers (and debugging/performance monitoring techniques using frame pointers), we may want to add some generic support stuff and add only minimal backend specific glue for amd64/i386 (Linux) and arm (iOS).
Boris Yakobowski (reporter)
2012-09-23 21:16

Profiling programs with perf is far superior to using gprof, both in terms of precision and ease of use. I would be very glad to see this patch integrated in the trunk.
doligez (administrator)
2012-11-15 15:29
edited on: 2012-11-26 15:10

I agree with Xavier that this should not be the compiler's default. Also, I intensely dislike "-fno-omit-frame-pointers" as a command-line switch. Since this is a profiling option, it shoud be "-p<something>".

lefessan (developer)
2012-11-19 11:16

Actually, since the calling conventions are changed by that patch, it should probably not be a compilation option, but a ./configure option, so that you don't mix libraries linked with and without profiling.

I would propose "./configure -asm-keep-frame-pointer". The option would only work under amd64, for now.
dim (developer)
2012-12-17 23:30

I observed a bug with this path. I attached a file which reproduces it. The problem is that the compiler chooses to store the upper-bound of the for-loop into the r10 register but doesn't save it on the stack, and this register maybe be destroyed by functions called inside the loop, in this case by caml_modify (not itself, but by another function it may end up calling: realloc).
dim (developer)
2013-01-14 17:00

Fixed: register numbers were not updated in Proc.destroyed_at_c_call. I attached a patch.
dim (developer)
2013-04-15 16:29

What is the status of this feature? We have been using this patch for development everyday for a few month now at Jane Street and it would be great if it could be integrated upstream. Having it as a configure option is fine.
lefessan (developer)
2013-04-15 22:06

There was an agreement in the core team, that it could be integrated with the switch in the configure to activate it. So, it is just a matter of finding the time to merge it in trunk.
lefessan (developer)
2013-06-11 13:59

Integrated in trunk for version 4.01
nicolas_boulay (reporter)
2013-08-13 11:49

Does you need 2 compiler to have the profiling option and an other one to do binary release ?

- Issue History
Date Modified Username Field Change
2012-08-06 14:17 lefessan New Issue
2012-08-06 14:17 lefessan File Added: omit-frame-pointer-4.00.0.patch
2012-08-06 14:21 lefessan Description Updated View Revisions
2012-08-06 14:57 ygrek Note Added: 0007900
2012-08-06 15:20 ygrek Note Edited: 0007900 View Revisions
2012-08-06 16:28 lefessan Note Added: 0007902
2012-08-06 17:57 xleroy Note Added: 0007906
2012-08-06 17:57 xleroy Status new => feedback
2012-08-06 19:22 lefessan Note Added: 0007919
2012-08-06 19:22 lefessan Status feedback => new
2012-08-06 19:26 lefessan Note Added: 0007920
2012-08-06 22:49 frisch Note Added: 0007922
2012-08-06 23:27 lefessan Note Added: 0007924
2012-08-07 17:48 frisch Note Added: 0007931
2012-08-08 13:03 lefessan Relationship added related to 0005723
2012-08-08 13:07 lefessan Note Added: 0007932
2012-08-13 10:09 lefessan File Added: omit-frame-pointer-4.00.0.patch.v2
2012-08-13 10:09 lefessan Note Added: 0007938
2012-08-13 11:18 meurer Note Added: 0007939
2012-09-23 21:16 Boris Yakobowski Note Added: 0008147
2012-10-24 17:22 lefessan File Added: ocaml-4.00.1-with-fp.patch
2012-11-15 15:29 doligez Note Added: 0008516
2012-11-15 15:29 doligez Status new => feedback
2012-11-19 11:16 lefessan Note Added: 0008533
2012-11-19 11:16 lefessan Status feedback => new
2012-11-19 11:17 lefessan Assigned To => lefessan
2012-11-19 11:17 lefessan Status new => acknowledged
2012-11-26 15:10 doligez Note Edited: 0008516 View Revisions
2012-12-17 23:30 dim Note Added: 0008624
2012-12-17 23:30 dim File Added:
2013-01-14 16:59 dim File Added: fix-destroyed_at_c_call.diff
2013-01-14 17:00 dim Note Added: 0008750
2013-04-15 16:29 dim Note Added: 0009111
2013-04-15 22:06 lefessan Note Added: 0009123
2013-04-15 22:06 lefessan Status acknowledged => confirmed
2013-06-11 13:59 lefessan Note Added: 0009456
2013-06-11 13:59 lefessan Status confirmed => resolved
2013-06-11 13:59 lefessan Fixed in Version => 4.01.0+dev
2013-06-11 13:59 lefessan Resolution open => fixed
2013-08-13 11:49 nicolas_boulay Note Added: 0010169
2015-12-11 19:23 xleroy Status resolved => closed
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker