Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] DFT in OCaml vs. C
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <xavier.leroy@i...>
Subject: Re: [Caml-list] DFT in OCaml vs. C
> Here's a numerical mini-benchmark comparing C to OCaml
> on a simple implementation of the Discrete Fourier Transform:
> [...]
> So the C version was about five times as fast.
> I'd be interested if anyone on this list knows of a way
> to make it perform as well as the C version (without using the FFT.)

It can be done, but not on a Pentium 3.  Here are my timings:

                       Pentium 4        Pentium 4 SSE2     Alpha 21264
                       (2 GHz)          (2 GHz)            (500 MHz)

C                       20                20                 36
OCaml (your code)      113                40                 52
OCaml (variant 1)       90                26                 40
OCaml (variant 2)       72                38                100

Variants 1 and 2 differ on the complex multiply step:

Your code:
                let a2=c*. !a -. s*. !b 
                and b2=c*. !b +. s*. !a in
                a := a2; 
                b := b2;
Variant 1:
                let x = s *. !a in
                a := c*. !a -. s*. !b;
                b := c*. !b +. x
Variant 2:
                let olda = !a and oldb = !b in
                a := c *. olda -. s *. oldb;
                b := c *. oldb +. s *. olda

The "Pentium 4 SSE2" column is an experimental code generator for the
Pentium 4 that uses SSE2 instructions and registers for floating-point
computations.  (Before you ask: no, it's not publically available,
but will be the basis for the x86_64 code generator as soon as the
hardware becomes available.)

As you can see above, variant 1 achieves almost the performance of C
on platforms that have a regular register-based FP arithmetic unit.

However, the x86 floating-point stack (what OCaml uses for
compatibility with Pentium 3 and earlier processors) is notoriously
cranky and hard to generate efficient code for.  gcc manages to
exploit instruction-level parallelism between the "re" and "im"
computations via amazing feats (fxch instructions, etc), but the
ocamlopt x86 code generator just generates very sequential code...

So, unless you have an Alpha at hand, you'd better consider FFT.
There's an FFT implementation that I use as a benchmark here:

        http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/fft.ml

and it delivers about 2/3 of the performances of C, even on the Pentium.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners