Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Benchmark] NBody
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] NBody (one more question)
> When I compile the C code with -O0 (with gcc -o nbody.gcc -Wall
> --fast-math nbody.c -lm), I get a time of 1.513s which is comparable
> to OCaml (1.607s).  But as soon as I turn on -O options (as with gcc
> -o nbody.gcc -Wall -O1 --fast-math nbody.c -lm), the running time
> drops down to 0.871s (0.58%).  Can somebody tell me what is the
> optimization that has such an effect and whether it could be applied
> to OCaml ?

First, make sure the Caml code is compiled with bounds checking off
(ocamlopt -unsafe), otherwise the comparison isn't quite fair.  But
even with -unsafe, you are correct that the Caml code is significantly
slower than the gcc -O1 code.  This is especially surprising because
the assembly code generated by ocamlopt and gcc look very similar.
So, I don't think you can pinpoint the speed difference on a
particular optimization.

My current guess would be alignment issues:
- data alignment: float arrays are 4-aligned in OCaml, 8-aligned in C,
  so if you're unlucky you can end up with slower unaligned accesses
  on every Caml float.
- code alignment: it could be that OCaml doesn't perform sufficient
  alignment on function entry points and loop points.  The proper
  alignments for various implementations of the x86 architecture are
  a mystery to me.

Again, these are just wild guesses.  To understand what is going on
inside the chip, one would need to use performance monitoring
counters.  Unfortunately, I never felt motivated enough to shell out
the $$$ for Intel's VTune analyzer...

- Xavier Leroy