Browse thread
[Benchmark] NBody
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2005-02-25 (17:57) |
From: | Xavier Leroy <Xavier.Leroy@i...> |
Subject: | Re: [Caml-list] NBody (one more question) |
> When I compile the C code with -O0 (with gcc -o nbody.gcc -Wall > --fast-math nbody.c -lm), I get a time of 1.513s which is comparable > to OCaml (1.607s). But as soon as I turn on -O options (as with gcc > -o nbody.gcc -Wall -O1 --fast-math nbody.c -lm), the running time > drops down to 0.871s (0.58%). Can somebody tell me what is the > optimization that has such an effect and whether it could be applied > to OCaml ? First, make sure the Caml code is compiled with bounds checking off (ocamlopt -unsafe), otherwise the comparison isn't quite fair. But even with -unsafe, you are correct that the Caml code is significantly slower than the gcc -O1 code. This is especially surprising because the assembly code generated by ocamlopt and gcc look very similar. So, I don't think you can pinpoint the speed difference on a particular optimization. My current guess would be alignment issues: - data alignment: float arrays are 4-aligned in OCaml, 8-aligned in C, so if you're unlucky you can end up with slower unaligned accesses on every Caml float. - code alignment: it could be that OCaml doesn't perform sufficient alignment on function entry points and loop points. The proper alignments for various implementations of the x86 architecture are a mystery to me. Again, these are just wild guesses. To understand what is going on inside the chip, one would need to use performance monitoring counters. Unfortunately, I never felt motivated enough to shell out the $$$ for Intel's VTune analyzer... - Xavier Leroy