English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
[Caml-list] DFT in OCaml vs. C
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-03-27 (21:21)
From: Issac Trotts <ijtrotts@u...>
Subject: Re: OCaml performance (was: Re: [Caml-list] DFT in OCaml vs. C)
David Monniaux wrote:

>>The "Pentium 4 SSE2" column is an experimental code generator for the
>>Pentium 4 that uses SSE2 instructions and registers for floating-point
>>computations.  (Before you ask: no, it's not publically available,
>In this case, to get meaningful comparison results, you should use
>gcc -march=pentium4 -msse2 or icc -march=pentium4
>>and it delivers about 2/3 of the performances of C, even on the Pentium.
>Let me tell you about our experience here. We are developing a large
>program consisting of
>- a large part of Caml code handling complex data structures
>- a smaller C library handling certain numerical matrix computations that
>  are triggered by the Caml code
>- some C (+ assembler) libraries dealing with system-dependent issues.
>I profiled the code using OProfile (http://oprofile.sourceforge.net), for
>expenses in clock cycles and cache faults. Earlier attempts were made with
>It turned out that we spent a significant amount of time in:
>- The Caml polymorphic compare function (15% time + some cache faults)
>  Part of the problem seems to lie with the fact that the same function is
>  called when comparing strings, int64's and other types, thus the
>  processor has to do lots of tests and jumps just to get at the correct
>  comparison function.
>  Wouldn't it be reasonable to define String.compare and Int64.compare to
>  call monomorphic functions?
>- The garbage collector (15% time + lots of cache faults)
>  There's little we can do about it. Changing the size of the minor heap,
>  adjusting it to optimize the use of L2 cache seems to gain 2.30% of the
>  total running time.
>  Curiously, using the compactor seems to slow things slightly.
>  Would it be possible to optimize the GC cache-wise? For instance, have
>  it ask the processor to "prefetch" data.
>- 17% in a particular matrix function written in C. There's little we can
>  do except trying to optimize it carefully and compiling it with the best
>  C compiler around.
>- The rest of the time is spent within the Caml code.
>Now this was a bit surprising to us, because we thought we spent far more
>time in the numerical computations.
>Now back to the original question about DFTs. In your real-life
>application, will DFT computations make a major part of the clock cycles
>spent by the program?
There's a small image processing experiment I want to do that will compute
lots of DFTs on small sub-images and will probably spend most of its 
clock cycles
doing the transforms.  

- Issac

To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners