Date: Thu, 16 Jul 1998 20:42:29 -0400
From: Aleksey Nogin <firstname.lastname@example.org>
Subject: OCAML native code profiler for x86+gcc
I wrote a small patch that makes possible to produce the profiled code
using ocamlopt. Of course even without any patch you can always pass
"-ccopt -p" option to ocamlopt and get some profiling information, but
with "-ccopt -p" you would only get the "flat" profile, while with my
patch you will also get the full call graph, including the information
- how many times each function was called
- which functions were called from some particular function and how
- how much time was spent in each function
- how much time was spent inside each function including the
function itself, its children, grandchildren, grand-grandchildren, etc.;
how this time is distributed among the children functions
- which functions called some particular function
I wrote my patch to mimic the behavior of "gcc -pg" - the patch ocamlopt
works in exactly the same way as the usual ocamlopt unless it is given
the "-p" option. When given the "-p" option, the patched ocamlopt:
- inserts calls to the counter function, mcount, at the beginning of
- passes the "-pg" function to gcc
- links the code against libasmprun library, which is the profiled
version of the usual libasmrun - it is compiled with "-pg" option and it
also has i386.p.S (which is the profiled "by hands" version of i386.S)
I tested my patch only on RedHat Linux 5.1 system, but it will probably
work fine on any other x86 + gcc system and I believe that it should not
be too hard to port it for use with other C compilers (you will probably
need to change "-pg" to "-p" and "mcount" to something else) or even to
some other hardware architecture.
Of course there is NO WARRANTY - it works for me, but you are using it
at your own risk.
To use the profiler:
0) Get my patch from
1) Recompile your code with "-p" option (you do not have to recompile
.mli files, only .ml files).
2) Run your code. It should generate the "gmon.out" file.
3) Run "gprof <program executable>".
- The profiled binary will run much slower and the timing
information would not be completely accurate.
- You would not get much information for function that were compiled
without "-p" option (e.g. standard library)
- If you want to get very accurate call graph (at expense of less
accurate timing information), use "-p -compact -inline 0".
- If some function is called more than 2^32 times, the profiling
information becomes wrong. This happens especially fast when you compile
you code with "-compact" option and everybody starts calling caml_alloc
functions (caml_alloc, caml_alloc2, caml_alloc3 and caml_alloc4).
Home Page: http://www.cs.cornell.edu/nogin/
E-Mail: email@example.com (office), firstname.lastname@example.org (home)
Office: Upson 5162, tel: 1-607-255-7421
This archive was generated by hypermail 2b29 : Sun Jan 02 2000 - 11:58:15 MET