Version française
Home     About     Download     Resources     Contact us    
Browse thread
OCamlJIT2 vs. OCamlJIT
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Benedikt Meurer <benedikt.meurer@g...>
Subject: Re: [Caml-list] OCamlJIT2 vs. OCamlJIT

On Nov 30, 2010, at 23:06 , Jon Harrop wrote:

> Because benchmarks like my HLVM ones have proven that LLVM can generate
> *much* faster code than ocamlopt does.
> 
> For example, Fibonacci function over floats in HLVM with optimization passes
> disabled and compilation time included in the measurement:
> 
> # let rec fib (x: float) : float =
>    if x < 1.5 then x else fib(x - 1.0) + fib(x - 2.0);;
> # fib 40.0;;
> - : `Float = 1.02334e+08
> Live: 0
> 2.48074s total; 0s suspend; 0s mark; 0s sweep
> 
> And ocamlopt without compilation time:
> 
> $ cat >fib.ml
> let rec fib x = if x < 1.5 then x else fib(x -. 1.0) +. fib(x -. 2.0);;
> fib 40.0;;
> $ ocamlopt fib.ml -o fib
> $ time ./fib
> 
> real    0m7.811s
> user    0m7.808s
> sys     0m0.000s
> 
> Note that HLVM's *REPL* is over 3x faster than ocamlopt-compiled OCaml.

This has nothing to do with LLVM, but is simply due to the fact that your code does not box the float parameters/results. The following peace of C code is most probably even faster than your code, so what?

double fib(double x) { if (x < 1.5) return x else return fib(x-1) + fib(x-2); }

So this is about data representation not code generation (using LLVM with boxed floats would result in same/lower performance); HLVM ignores complex stuff like polymorphism, modules, etc. (at least last time I checked), which makes code generation almost trivial. The literature includes various solutions to implement stuff like ML polymorphism: tagged integers/boxed floats/objects is just one solution, not necessarily the best; but examples that simply ignore the complex stuff, and therefore deliver better performance don't really help to make progress.

A possible solution to improve ocamlopt's performance in this case would be to compile the recursive fib calls in a way that the parameter/result is passed in a floating point register (not boxed), and provide a wrapper for calls from outside, which unboxes the parameter, invokes the actual function code, and boxes the result. This should be doable on the Ulambda/Cmm level, so it is actually quite portable and completely independent of the low-level code generation (which is where LLVM comes into play). That way ocamlopt code will be as fast as the C code for this example.

> I have microbenchmarks where LLVM generates code over 10x faster than
> ocamlopt (specifically, floating point code on x86) and larger numerical
> programs that also wipe the floor with OCaml.

ocamlopt's x86 floating point backend is easy to beat, as demonstrated in the original post. Even a simple byte-code JIT engine (which still boxes floating point results, etc.) is able to beat it.

Your benchmarks do not prove that LLVM generates faster code than ocamlopt. Instead you proved that OCaml's floating point representation comes at a cost for number crunching applications (which is obvious). Use the same data representation with LLVM (or C) and you'll notice that the performance is the same (most likely worse) compared to ocamlopt.

> LLVM is also much better documented than ocamlopt's internals.

Definitely, but LLVM replaces just the low-level stuff of ocamlopt (most probably starting at the Cmm level), so a lot of (undocumented) ocamlopt internals remain.

> Cheers,
> Jon.

regards,
Benedikt