Browse thread
[Caml-list] DFT in OCaml vs. C
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Issac Trotts <ijtrotts@u...> |
| Subject: | Re: [Caml-list] DFT in OCaml vs. C |
Xavier Leroy wrote: >>Here's a numerical mini-benchmark comparing C to OCaml >>on a simple implementation of the Discrete Fourier Transform: >>[...] >>So the C version was about five times as fast. >>I'd be interested if anyone on this list knows of a way >>to make it perform as well as the C version (without using the FFT.) >> >> > >It can be done, but not on a Pentium 3. Here are my timings: > > Pentium 4 Pentium 4 SSE2 Alpha 21264 > (2 GHz) (2 GHz) (500 MHz) > >C 20 20 36 >OCaml (your code) 113 40 52 >OCaml (variant 1) 90 26 40 >OCaml (variant 2) 72 38 100 > >Variants 1 and 2 differ on the complex multiply step: > >Your code: > let a2=c*. !a -. s*. !b > and b2=c*. !b +. s*. !a in > a := a2; > b := b2; >Variant 1: > let x = s *. !a in > a := c*. !a -. s*. !b; > b := c*. !b +. x >Variant 2: > let olda = !a and oldb = !b in > a := c *. olda -. s *. oldb; > b := c *. oldb +. s *. olda > > >The "Pentium 4 SSE2" column is an experimental code generator for the >Pentium 4 that uses SSE2 instructions and registers for floating-point >computations. (Before you ask: no, it's not publically available, >but will be the basis for the x86_64 code generator as soon as the >hardware becomes available.) > >As you can see above, variant 1 achieves almost the performance of C >on platforms that have a regular register-based FP arithmetic unit. > >However, the x86 floating-point stack (what OCaml uses for >compatibility with Pentium 3 and earlier processors) is notoriously >cranky and hard to generate efficient code for. gcc manages to >exploit instruction-level parallelism between the "re" and "im" >computations via amazing feats (fxch instructions, etc), but the >ocamlopt x86 code generator just generates very sequential code... > >So, unless you have an Alpha at hand, you'd better consider FFT. >There's an FFT implementation that I use as a benchmark here: > > http://camlcvs.inria.fr/cgi-bin/cvsweb.cgi/ocaml/test/fft.ml > >and it delivers about 2/3 of the performances of C, even on the Pentium. > >- Xavier Leroy > Thanks for a very informative and helpful message. Issac ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners