Version française
Home     About     Download     Resources     Contact us    
Browse thread
Benchmarking different dispatch types
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Nathaniel Gray <n8gray@g...>
Subject: Re: [Caml-list] Benchmarking different dispatch types
Jacques, thanks for the very useful reply!  A few more comments below...

On 1/18/07, Jacques Garrigue <garrigue@math.nagoya-u.ac.jp> wrote:
> There are a few problems in your methodology.
> One is that you are running your test only once inside a function.
> So what you are measuring ends up being (at least) the cost a calling
> a closure + the real cost of your test. Usually the wrapping function
> should itself be a loop.
>   let call_f () = for i = 1 to 1000 do ignore (f 1 + 1) done

Right.  I realized this myself and made an attempt to mitigate the
problem, but probably not nearly enough.

> Another problem is that with such micro-benchmarks, all kinds of
> optimizations may skew results, either by the compiler or the CPU.
> You disabled one with -inline 0, but there is noway to discard others
> if you don't know what triggers them.
>
> For instance, when calling a method, normally you would have to search for
> it in the method list stored inside the object. This is done by a
> binary search, with logarithmic cost in the number of methods in the
> list. Since having to do it for every method call would badly impact
> performance, each call point caches the offset in the list for the
> last object called. If the last object was from the same class, then
> no search is done. There are only a few extra memory reads, to verify
> that indeed this is the right offset.

Ah, now this is the juicy stuff!

> So if want to measure the cost in the worst situation, you have to
> alternate calls (at the same point) between objects from different
> classes, for which the offset is different.
> In practice, hopefully this worst pattern doesn't occur too often, so
> it is still safe to assume that method calls
>
> You should also look at the generated assembler (obtained with -S) to
> verify that no strange optimization happens.

I did glance at it but haven't had time to look in any detail.

>
> My own measurements on a Pentium M and PPC (using a slightly different
> benchmark, using loops and several different methods and functions)
> give (comparing to a direct function call):
>                     Pentium M   PPC G4
> Closure:            1.2x        3.2x
> Method:             2.9x        5.6x
> Unoptimized method: 6.9x        13x
>
> I'm a bit surprised by the low cost of a closure, particularly on
> pentium M, but this may be related to some CPU optimization.
> Note that with inlining you get a more than 10x speedup.
> This suggests that even in the best case method calls are actually
> about twice as expensive as closure calls, and 5 times in a
> particularly bad case.

Perhaps you could share your benchmark code?  :-)

Thanks,
-n8

-- 
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->