Version française
Home     About     Download     Resources     Contact us    
Browse thread
gc overhead
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Warren Harris <warrensomebody@g...>
Subject: Re: [Caml-list] gc overhead
Peter,

Thanks, this is excellent info. I've been using both gprof and shark  
and understand the tradeoffs. I really was looking for a way to just  
provide a simple live "gc overhead" number that we could graph along  
with a bunch of other server health stats for our zenoss monitors.  
Looks like I'd need to hack my runtime a bit to get this though.

Warren

On Mar 2, 2010, at 4:55 PM, Peter Hawkins wrote:

> Hi...
>
> On Tue, Mar 2, 2010 at 3:08 PM, Warren Harris <warrensomebody@gmail.com 
> > wrote:
>>
>> Peter - gprof with ocaml works quite well:
>> http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html
>
> I'm fully aware of gprof and ocaml's support of profiling.
>
> OCaml's profiling support works by adding calls to the _mcount library
> function at the entry point to every compiled function, which takes
> approximately 10 instructions on x86 (pushes and pops to save
> registers, and a call instruction). The _mcount function records
> function call counts, and is also responsible for producing the call
> graph. Separately, the profile library samples the program counter at
> some frequency, which lets us work out in which functions the program
> is spending its time.
>
> Using OCaml's profiling support has three problems:
> 1) programs compiled with profiling are slower, and
> 2) the profiling instrumentation itself distorts the resulting  
> profile, and
> 3) the call graph accounting is inaccurate.
>
> Let's discuss each of these in turn:
>
> Problem (1) is simply that your program has extra overhead from all of
> those _mcount calls, which occur on every function invocation. You
> can't turn them off, and you can't make them happen less frequently.
> It's an all-or-nothing proposition. It would be unusual to include
> profiling instrumentation in a production system.
>
> Problem (2) is a little more subtle. Recall that the profiling
> instrumentation adds ~10 instructions to the start of each function,
> regardless of its size. For a large function, this may be a negligible
> overhead. For a small function, say one that was only 5 or 10
> instructions in size to begin with, that is a substantial overhead.
> Since we determine how much time is spent in each function by sampling
> the program counter, small and frequently called functions will appear
> to take relatively longer than larger functions in the resulting
> profile. Small functions are common in OCaml code so we should see an
> appreciable amount of distortion.
>
> Problem (3) is a criticism of the _mcount mechanism in general. For
> each function f(), the profiler knows (a) how long we spent executing
> f() in total, and (b) how many times each of f()'s callers invoked
> f(). We do not know how much time f() spent executing on behalf of any
> given caller. If we assume that all of f()'s invocations took
> approximately the same amount of time, then we can use the caller
> counts to approximate the time spent executing f() on behalf of each
> caller. However, the assumption that f() always takes approximately
> the same amount of time is not necessarily a good one. I think it's an
> especially bad assumption in a functional program.
>
> These problems are avoided by using a sampling profiler like oprofile
> or shark, which samples an _uninstrumented_ binary at  a particular
> frequency. Because the binary is unmodified, we can turn profiling on
> and off on a running system, avoiding point (1); furthermore we can
> adjust the sampling rate so profiling overhead is low enough to be
> tolerable. Since there is no instrumentation added to the program, the
> resulting profile does not suffer from the distortion of point (2).
> Some profilers (e.g. shark on Mac OS X) can deal with point (3) as
> well --- all we need to do is record a complete stack trace at
> sampling time.
>
> My point was that oprofile or one of its cousins (e.g. shark) is
> probably adequate for your needs. You can set the sampling rate low
> enough that your service can run more or less as normal. To determine
> GC overhead, you simply need to look at the total amount of time spent
> in the various GC functions of the runtime.
>
> Peter