Version franaise
Home About Download Resources Contact us
Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Török Edwin <edwintorok@g...>
Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support)
On Sun, 12 Dec 2010 14:54:14 -0000
"Jon Harrop" <jon@ffconsultancy.com> wrote:

> The Haskell guys got their best performance improvement moving to
> LLVM from the hailstone benchmark so it is interesting to examine
> this case as well. I just added support for 64-bit ints to HLVM to
> implement that benchmark and my code is:
> 
> Here’s the equivalent OCaml code:
> 
>   let rec collatzLen(c, n) : int =
>     if n = 1L then c else
>       collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else

OK, but boxing itself has nothing to do with the performance degration
here. It is the lack of compiler optimizations on the Int64 type. This
could be solved by implementing compiler optimizations for it (or
manually optimizing some integer arithmetic that is slow).

Lets run the code under a profiler, or look at the assembly (I used
'perf record' and 'perf report'). 2 'idiv' instructions turn up as top
offenders in the profile.

Problem #1: Int64.rem n 2 -> another idiv instruction

A C compiler would optimize this to an 'and' instruction.
Change that to 'Int64.logand n 1L = 0L'/

Problem #2: Int64.div n 2 -> idiv instruction. 

A C compiler would optimize this to a right shift. Changing that to 'Int64.shift_right n 1' speeds
up the code.

With these changes I get almost the same speed as the C code:
$ ocamlopt x.ml -o x && time ./x
837799
real    0m0.664s
user    0m0.667s
sys     0m0.000s

$ gcc -O3 x.c && time ./a.out
837799
real    0m0.635s
user    0m0.633s
sys     0m0.000s

Here's the OCaml code:
let rec collatzLen(c, n) : int =
    if n = 1L then c else
      collatzLen (c+1, if Int64.logand n 1L = 0L then Int64.shift_right
n 1 else Int64.add (Int64.mul 3L n) 1L);;

  let rec loop(i, (nlen, n)) =
    if i = 1L then n else
      let ilen = collatzLen(1, i) in
      let nlen, n = if ilen > nlen then ilen, i else nlen, n in
      loop (Int64.sub i 1L, (nlen, n));;

  let _ =
      let s = loop (1000000L, (1,1000000L)) in
      print_int (Int64.to_int s);;

> 1. Unboxing can give huge performance improvements on serial code,

s/Unboxing/arithmetic optimizations/
Please find an example where the performance benefit is due to
unboxing, and not due to arithmetic optimizations performed on the
unboxed code.

> let alone parallel code. The optimized HLVM is running 32× faster
> than the OCaml here.
> 
> 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it
> to beat GCC-compiled C code here.
> 

One advantage of using LLVM is that it would notice arithmetic
optimizations like this and perform it itself (even if you use the
boxed representation).

Best regards,
--Edwin