Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] DFT in OCaml vs. C
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Issac Trotts <ijtrotts@u...>
Subject: Re: [Caml-list] DFT in OCaml vs. C
Xavier Leroy wrote:

>>Here's a numerical mini-benchmark comparing C to OCaml
>>on a simple implementation of the Discrete Fourier Transform:
>>[...]
>>So the C version was about five times as fast.
>>I'd be interested if anyone on this list knows of a way
>>to make it perform as well as the C version (without using the FFT.)
>>    
>>
>
>It can be done, but not on a Pentium 3.  Here are my timings:
>
>                       Pentium 4        Pentium 4 SSE2     Alpha 21264
>                       (2 GHz)          (2 GHz)            (500 MHz)
>
>C                       20                20                 36
>OCaml (your code)      113                40                 52
>OCaml (variant 1)       90                26                 40
>OCaml (variant 2)       72                38                100
>
>Variants 1 and 2 differ on the complex multiply step:
>
>Your code:
>                let a2=c*. !a -. s*. !b 
>                and b2=c*. !b +. s*. !a in
>                a := a2; 
>                b := b2;
>Variant 1:
>                let x = s *. !a in
>                a := c*. !a -. s*. !b;
>                b := c*. !b +. x
>Variant 2:
>                let olda = !a and oldb = !b in
>                a := c *. olda -. s *. oldb;
>                b := c *. oldb +. s *. olda
>
>
>The "Pentium 4 SSE2" column is an experimental code generator for the
>Pentium 4 that uses SSE2 instructions and registers for floating-point
>computations.  (Before you ask: no, it's not publically available,
>but will be the basis for the x86_64 code generator as soon as the
>hardware becomes available.)
>
>As you can see above, variant 1 achieves almost the performance of C
>on platforms that have a regular register-based FP arithmetic unit.
>
>However, the x86 floating-point stack (what OCaml uses for
>compatibility with Pentium 3 and earlier processors) is notoriously
>cranky and hard to generate efficient code for.  gcc manages to
>exploit instruction-level parallelism between the "re" and "im"
>computations via amazing feats (fxch instructions, etc), but the
>ocamlopt x86 code generator just generates very sequential code...
>
>So, unless you have an Alpha at hand, you'd better consider FFT.
>There's an FFT implementation that I use as a benchmark here:
>
>        http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/fft.ml
>
>and it delivers about 2/3 of the performances of C, even on the Pentium.
>
>- Xavier Leroy
>
Thanks for a very informative and helpful message.

Issac





-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners