Browse thread
Benchmarking different dispatch types
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Nathaniel Gray <n8gray@g...> |
| Subject: | Re: [Caml-list] Benchmarking different dispatch types |
Jacques, thanks for the very useful reply! A few more comments below... On 1/18/07, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote: > There are a few problems in your methodology. > One is that you are running your test only once inside a function. > So what you are measuring ends up being (at least) the cost a calling > a closure + the real cost of your test. Usually the wrapping function > should itself be a loop. > let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done Right. I realized this myself and made an attempt to mitigate the problem, but probably not nearly enough. > Another problem is that with such micro-benchmarks, all kinds of > optimizations may skew results, either by the compiler or the CPU. > You disabled one with -inline 0, but there is noway to discard others > if you don't know what triggers them. > > For instance, when calling a method, normally you would have to search for > it in the method list stored inside the object. This is done by a > binary search, with logarithmic cost in the number of methods in the > list. Since having to do it for every method call would badly impact > performance, each call point caches the offset in the list for the > last object called. If the last object was from the same class, then > no search is done. There are only a few extra memory reads, to verify > that indeed this is the right offset. Ah, now this is the juicy stuff! > So if want to measure the cost in the worst situation, you have to > alternate calls (at the same point) between objects from different > classes, for which the offset is different. > In practice, hopefully this worst pattern doesn't occur too often, so > it is still safe to assume that method calls > > You should also look at the generated assembler (obtained with -S) to > verify that no strange optimization happens. I did glance at it but haven't had time to look in any detail. > > My own measurements on a Pentium M and PPC (using a slightly different > benchmark, using loops and several different methods and functions) > give (comparing to a direct function call): > Pentium M PPC G4 > Closure: 1.2x 3.2x > Method: 2.9x 5.6x > Unoptimized method: 6.9x 13x > > I'm a bit surprised by the low cost of a closure, particularly on > pentium M, but this may be related to some CPU optimization. > Note that with inlining you get a more than 10x speedup. > This suggests that even in the best case method calls are actually > about twice as expensive as closure calls, and 5 times in a > particularly bad case. Perhaps you could share your benchmark code? :-) Thanks, -n8 -- >>>-- Nathaniel Gray -- Caltech Computer Science ------> >>>-- Mojave Project -- http://mojave.cs.caltech.edu -->