Browse thread
gc overhead
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Warren Harris <warrensomebody@g...> |
| Subject: | Re: [Caml-list] gc overhead |
Peter, Thanks, this is excellent info. I've been using both gprof and shark and understand the tradeoffs. I really was looking for a way to just provide a simple live "gc overhead" number that we could graph along with a bunch of other server health stats for our zenoss monitors. Looks like I'd need to hack my runtime a bit to get this though. Warren On Mar 2, 2010, at 4:55 PM, Peter Hawkins wrote: > Hi... > > On Tue, Mar 2, 2010 at 3:08 PM, Warren Harris <warrensomebody@gmail.com > > wrote: >> >> Peter - gprof with ocaml works quite well: >> http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html > > I'm fully aware of gprof and ocaml's support of profiling. > > OCaml's profiling support works by adding calls to the _mcount library > function at the entry point to every compiled function, which takes > approximately 10 instructions on x86 (pushes and pops to save > registers, and a call instruction). The _mcount function records > function call counts, and is also responsible for producing the call > graph. Separately, the profile library samples the program counter at > some frequency, which lets us work out in which functions the program > is spending its time. > > Using OCaml's profiling support has three problems: > 1) programs compiled with profiling are slower, and > 2) the profiling instrumentation itself distorts the resulting > profile, and > 3) the call graph accounting is inaccurate. > > Let's discuss each of these in turn: > > Problem (1) is simply that your program has extra overhead from all of > those _mcount calls, which occur on every function invocation. You > can't turn them off, and you can't make them happen less frequently. > It's an all-or-nothing proposition. It would be unusual to include > profiling instrumentation in a production system. > > Problem (2) is a little more subtle. Recall that the profiling > instrumentation adds ~10 instructions to the start of each function, > regardless of its size. For a large function, this may be a negligible > overhead. For a small function, say one that was only 5 or 10 > instructions in size to begin with, that is a substantial overhead. > Since we determine how much time is spent in each function by sampling > the program counter, small and frequently called functions will appear > to take relatively longer than larger functions in the resulting > profile. Small functions are common in OCaml code so we should see an > appreciable amount of distortion. > > Problem (3) is a criticism of the _mcount mechanism in general. For > each function f(), the profiler knows (a) how long we spent executing > f() in total, and (b) how many times each of f()'s callers invoked > f(). We do not know how much time f() spent executing on behalf of any > given caller. If we assume that all of f()'s invocations took > approximately the same amount of time, then we can use the caller > counts to approximate the time spent executing f() on behalf of each > caller. However, the assumption that f() always takes approximately > the same amount of time is not necessarily a good one. I think it's an > especially bad assumption in a functional program. > > These problems are avoided by using a sampling profiler like oprofile > or shark, which samples an _uninstrumented_ binary at a particular > frequency. Because the binary is unmodified, we can turn profiling on > and off on a running system, avoiding point (1); furthermore we can > adjust the sampling rate so profiling overhead is low enough to be > tolerable. Since there is no instrumentation added to the program, the > resulting profile does not suffer from the distortion of point (2). > Some profilers (e.g. shark on Mac OS X) can deal with point (3) as > well --- all we need to do is record a complete stack trace at > sampling time. > > My point was that oprofile or one of its cousins (e.g. shark) is > probably adequate for your needs. You can set the sampling rate low > enough that your service can run more or less as normal. To determine > GC overhead, you simply need to look at the total amount of time spent > in the various GC functions of the runtime. > > Peter