Version française
Home     About     Download     Resources     Contact us    
Browse thread
Slow allocations with 64bit code?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Markus Mottl <markus.mottl@g...>
Subject: Re: [Caml-list] Slow allocations with 64bit code?
On 4/22/07, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> > I wonder whether others have already noticed that allocations may
> > surprisingly be slower on 64bit platforms than on 32bit ones.
>
> As already mentioned, on 64-bit platforms almost all Caml data
> representations are twice as large as on 32-bit platforms (exceptions:
> strings, float arrays), so the processor has twice as much data to
> move through its memory subsystem.

Interesting, I was obviously under the wrong assumption that a 64bit
machine would scale appropriately when accessing 64bit words in
memory.  Of course, I'm aware that cache effects also play a role, but
the minor heap should easily fit into the cache of any modern machine
in any case, and it's not like this experiment is eating memory.

> However, you certainly don't get a slowdown by a factor of 2, for two
> reasons: 1- the processor doesn't spend all its time doing memory
> accesses, there are some computations here and there; 2- cache lines
> are much bigger than 32 bits, meaning that accessing 64 bits at a
> given address is much cheaper than accessing two 32-bit
> quantities at two random addresses (spatial locality).
>
> Moreover, x86 in 64-bit mode is much more compiler-friendly than in
> 32-bit mode: twice as many registers, a sensible floating-point model
> at last.  So, OCaml in 64-bit mode generates better code than in
> 32-bit mode.
>
> All in all, your 10% slowdown seems reasonable and in line with what
> others reported using C benchmarks.

This seems reasonable.  It just seemed surprising to me that in some
of my tests a 64bit machine could be slower handling even "large"
Int64-values than in 32bit-mode, in which it always has to perform two
memory accesses and possibly some additional computation steps.

> Be careful with timings: I've seen simple changes in code placement
> (e.g. introducing or removing dead code) cause performance differences
> in excess of 20%.  It's an unfortunate fact of today's processors that
> their performance is very hard to predict.

This surely also requires some caution when interpreting mini-benchmarks.

> ocamlopt compiles module initialization code in the so-called
> "compact" model, where code size is reduced by not open-coding some
> operations such as heap allocation, but instead going through
> auxiliary functions like "caml_alloc2".  This makes sense since
> initialization code is usually large but not performance-critical.
> I recommend you put performance-critical code in functions, not in the
> initialization code.

Thanks, this is a very important bit of information that I wasn't
aware of!  I used to run mini-benchmarks from initialization code in
most cases, which is obviously a bad idea...

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com