Browse thread
Slow allocations with 64bit code?
-
Markus Mottl
- skaller
- Jon Harrop
-
Xavier Leroy
- Markus Mottl
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2007-04-22 (16:12) |
From: | Markus Mottl <markus.mottl@g...> |
Subject: | Re: [Caml-list] Slow allocations with 64bit code? |
On 4/22/07, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: > > I wonder whether others have already noticed that allocations may > > surprisingly be slower on 64bit platforms than on 32bit ones. > > As already mentioned, on 64-bit platforms almost all Caml data > representations are twice as large as on 32-bit platforms (exceptions: > strings, float arrays), so the processor has twice as much data to > move through its memory subsystem. Interesting, I was obviously under the wrong assumption that a 64bit machine would scale appropriately when accessing 64bit words in memory. Of course, I'm aware that cache effects also play a role, but the minor heap should easily fit into the cache of any modern machine in any case, and it's not like this experiment is eating memory. > However, you certainly don't get a slowdown by a factor of 2, for two > reasons: 1- the processor doesn't spend all its time doing memory > accesses, there are some computations here and there; 2- cache lines > are much bigger than 32 bits, meaning that accessing 64 bits at a > given address is much cheaper than accessing two 32-bit > quantities at two random addresses (spatial locality). > > Moreover, x86 in 64-bit mode is much more compiler-friendly than in > 32-bit mode: twice as many registers, a sensible floating-point model > at last. So, OCaml in 64-bit mode generates better code than in > 32-bit mode. > > All in all, your 10% slowdown seems reasonable and in line with what > others reported using C benchmarks. This seems reasonable. It just seemed surprising to me that in some of my tests a 64bit machine could be slower handling even "large" Int64-values than in 32bit-mode, in which it always has to perform two memory accesses and possibly some additional computation steps. > Be careful with timings: I've seen simple changes in code placement > (e.g. introducing or removing dead code) cause performance differences > in excess of 20%. It's an unfortunate fact of today's processors that > their performance is very hard to predict. This surely also requires some caution when interpreting mini-benchmarks. > ocamlopt compiles module initialization code in the so-called > "compact" model, where code size is reduced by not open-coding some > operations such as heap allocation, but instead going through > auxiliary functions like "caml_alloc2". This makes sense since > initialization code is usually large but not performance-critical. > I recommend you put performance-critical code in functions, not in the > initialization code. Thanks, this is a very important bit of information that I wasn't aware of! I used to run mini-benchmarks from initialization code in most cases, which is obviously a bad idea... Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com