[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2007-04-22 (10:23) |
From: | Xavier Leroy <Xavier.Leroy@i...> |
Subject: | Re: [Caml-list] Slow allocations with 64bit code? |
> I wonder whether others have already noticed that allocations may > surprisingly be slower on 64bit platforms than on 32bit ones. As already mentioned, on 64-bit platforms almost all Caml data representations are twice as large as on 32-bit platforms (exceptions: strings, float arrays), so the processor has twice as much data to move through its memory subsystem. However, you certainly don't get a slowdown by a factor of 2, for two reasons: 1- the processor doesn't spend all its time doing memory accesses, there are some computations here and there; 2- cache lines are much bigger than 32 bits, meaning that accessing 64 bits at a given address is much cheaper than accessing two 32-bit quantities at two random addresses (spatial locality). Moreover, x86 in 64-bit mode is much more compiler-friendly than in 32-bit mode: twice as many registers, a sensible floating-point model at last. So, OCaml in 64-bit mode generates better code than in 32-bit mode. All in all, your 10% slowdown seems reasonable and in line with what others reported using C benchmarks. > This is only a difference of about 10%, but I have seen more complex > cases where there are timing differences in excess of 50%, which is > already pretty substantial. Be careful with timings: I've seen simple changes in code placement (e.g. introducing or removing dead code) cause performance differences in excess of 20%. It's an unfortunate fact of today's processors that their performance is very hard to predict. > Looking at the assembly, there is really no difference in the loop > other than the use of the quad word instructions, which should not > take longer on the exact same platform (i.e. same CPU-frequency). But > there is a suspicious call to "caml_alloc2", which might cause these > differences. Can it be that there are alignment problems or similar > in the run time? ocamlopt compiles module initialization code in the so-called "compact" model, where code size is reduced by not open-coding some operations such as heap allocation, but instead going through auxiliary functions like "caml_alloc2". This makes sense since initialization code is usually large but not performance-critical. I recommend you put performance-critical code in functions, not in the initialization code. - Xavier Leroy