Version franaise
Home About Download Resources Contact us

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2010-12-12 (17:14)
From: Jon Harrop <jon@f...>
Subject: RE: Value types (Was: [Caml-list] ocamlopt LLVM support)
Török Edwin wrote:
> Problem #1: Int64.rem n 2 -> another idiv instruction
> A C compiler would optimize this to an 'and' instruction.
> Change that to 'Int64.logand n 1L = 0L'/

Yes. LLVM did that for me.

> Problem #2: Int64.div n 2 -> idiv instruction.
> A C compiler would optimize this to a right shift. Changing that to
> 'Int64.shift_right n 1' speeds
> up the code.

Yes. LLVM also did that for me. In fact, I have been bitten by ocamlopt not optimizing div and mod by a constant in real OCaml code before. This problem also turns up in the context of hash table implementations where you want to % by the length of the spine.

> With these changes I get almost the same speed as the C code:
> $ ocamlopt -o x && time ./x
> 837799
> real    0m0.664s
> user    0m0.667s
> sys     0m0.000s
> $ gcc -O3 x.c && time ./a.out
> 837799
> real    0m0.635s
> user    0m0.633s
> sys     0m0.000s
> Here's the OCaml code:
> let rec collatzLen(c, n) : int =
>     if n = 1L then c else
>       collatzLen (c+1, if Int64.logand n 1L = 0L then Int64.shift_right
> n 1 else Int64.add (Int64.mul 3L n) 1L);;
>   let rec loop(i, (nlen, n)) =
>     if i = 1L then n else
>       let ilen = collatzLen(1, i) in
>       let nlen, n = if ilen > nlen then ilen, i else nlen, n in
>       loop (Int64.sub i 1L, (nlen, n));;
>   let _ =
>       let s = loop (1000000L, (1,1000000L)) in
>       print_int (Int64.to_int s);;

I am unable to reproduce your results. Here, the time falls from 24s to 19.5s (using ocamlopt 3.12.0 on Intel x86) which is still 26× slower than HLVM.

> > 1. Unboxing can give huge performance improvements on serial code,
> s/Unboxing/arithmetic optimizations/
> Please find an example where the performance benefit is due to
> unboxing, and not due to arithmetic optimizations performed on the
> unboxed code.

The last example I gave (array of key-value pairs) demonstrates some of the performance improvements offered by unboxing in the heap (12.3× faster than OCaml in that case). I'm still not sure that this example is invalid because I cannot reproduce your results.

> > let alone parallel code. The optimized HLVM is running 32× faster
> > than the OCaml here.
> >
> > 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it
> > to beat GCC-compiled C code here.
> One advantage of using LLVM is that it would notice arithmetic
> optimizations like this and perform it itself (even if you use the
> boxed representation).

Yes. LLVM hopefully optimizes div/mod by any constant which is quite tricky in the general case.