Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] novice puzzled by speed tests
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] novice puzzled by speed tests
> Toying around with 3.07, I found that ocamlopt.opt -unsafe (on Mandrake
> 9.1, Pentium 4, 2.4 GHz) actually produces slower code than ocamlopt.opt.
> On my box, the corresponding C program (gcc -O3) is slightly slower than
> the ocamlopt.opt compiled O'Caml program, but about 25-30% faster than the
> -unsafe one:
> Of course it's good that range checking increases the speed of programs,
> but, being a long-time C user, I'm a little bit puzzled by miracles like
> this. I suspected that the sense of the -unsafe flag was inverted, but it
> isn't: the -unsafe program dies with SEGV when I deliberately introduce a
> range overflow, the safe one gets an exception.

Welcome to the wonderful world of modern processors.  It's not
uncommon to observe "absurd" speed differences of up to 20%.  By
"absurd" I mean for instance adding or removing dead code (never
executed) and observing a slowdown, or executing more code and
observing a speedup.

As far as I can guess, this is due to two processor features:

- Lots of instruction-level parallelism is available.  Thus, if your
main code doesn't use all of the computational units, adding extra code
(such as array bound checks) that can execute in parallel doesn't
reduce execution speed.

- Performance is very sensitive to code placement.  Things like code
cache conflicts, (mis-) alignment of branch targets, and oddities in the
instruction decoding logic can cause insertion *or deletion* of a few
instructions to have significant impact on execution speed.

These are just wild guesses.  The bottom line is that processors have
become so complex that explaining observed performances (let alone
predicting performances) has become nearly impossible, at least for
small performance variations (say, less than a factor of 1.5).  

(This makes compiler writers very unhappy, because they used to make a
living by cranking out 5% speed improvements, which are now lost in
the overall noise :-)

If you have access to a good performance analysis tool, such as
Intel's VTune, you could run it on your three executables and see if
some rational explanation comes out of VTune's figures.  But I
wouldn't bet on it.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners