Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] DFT in OCaml vs. C
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Issac Trotts <ijtrotts@u...>
Subject: Re: OCaml performance (was: Re: [Caml-list] DFT in OCaml vs. C)
David Monniaux wrote:

>>The "Pentium 4 SSE2" column is an experimental code generator for the
>>Pentium 4 that uses SSE2 instructions and registers for floating-point
>>computations.  (Before you ask: no, it's not publically available,
>>    
>>
>
>In this case, to get meaningful comparison results, you should use
>gcc -march=pentium4 -msse2 or icc -march=pentium4
>
>  
>
>>and it delivers about 2/3 of the performances of C, even on the Pentium.
>>    
>>
>
>Let me tell you about our experience here. We are developing a large
>program consisting of
>- a large part of Caml code handling complex data structures
>- a smaller C library handling certain numerical matrix computations that
>  are triggered by the Caml code
>- some C (+ assembler) libraries dealing with system-dependent issues.
>
>I profiled the code using OProfile (http://oprofile.sourceforge.net), for
>expenses in clock cycles and cache faults. Earlier attempts were made with
>gprof.
>
>It turned out that we spent a significant amount of time in:
>
>- The Caml polymorphic compare function (15% time + some cache faults)
>
>  Part of the problem seems to lie with the fact that the same function is
>  called when comparing strings, int64's and other types, thus the
>  processor has to do lots of tests and jumps just to get at the correct
>  comparison function.
>
>  Wouldn't it be reasonable to define String.compare and Int64.compare to
>  call monomorphic functions?
>
>- The garbage collector (15% time + lots of cache faults)
>
>  There's little we can do about it. Changing the size of the minor heap,
>  adjusting it to optimize the use of L2 cache seems to gain 2.30% of the
>  total running time.
>
>  Curiously, using the compactor seems to slow things slightly.
>
>  Would it be possible to optimize the GC cache-wise? For instance, have
>  it ask the processor to "prefetch" data.
>
>- 17% in a particular matrix function written in C. There's little we can
>  do except trying to optimize it carefully and compiling it with the best
>  C compiler around.
>
>- The rest of the time is spent within the Caml code.
>
>Now this was a bit surprising to us, because we thought we spent far more
>time in the numerical computations.
>
>
>Now back to the original question about DFTs. In your real-life
>application, will DFT computations make a major part of the clock cycles
>spent by the program?
>
There's a small image processing experiment I want to do that will compute
lots of DFTs on small sub-images and will probably spend most of its 
clock cycles
doing the transforms.  

- Issac




-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners