Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | RE: Value types (Was: [Caml-list] ocamlopt LLVM support) |
Török Edwin wrote: > Problem #1: Int64.rem n 2 -> another idiv instruction > > A C compiler would optimize this to an 'and' instruction. > Change that to 'Int64.logand n 1L = 0L'/ Yes. LLVM did that for me. > Problem #2: Int64.div n 2 -> idiv instruction. > > A C compiler would optimize this to a right shift. Changing that to > 'Int64.shift_right n 1' speeds > up the code. Yes. LLVM also did that for me. In fact, I have been bitten by ocamlopt not optimizing div and mod by a constant in real OCaml code before. This problem also turns up in the context of hash table implementations where you want to % by the length of the spine. > With these changes I get almost the same speed as the C code: > $ ocamlopt x.ml -o x && time ./x > 837799 > real 0m0.664s > user 0m0.667s > sys 0m0.000s > > $ gcc -O3 x.c && time ./a.out > 837799 > real 0m0.635s > user 0m0.633s > sys 0m0.000s > > Here's the OCaml code: > let rec collatzLen(c, n) : int = > if n = 1L then c else > collatzLen (c+1, if Int64.logand n 1L = 0L then Int64.shift_right > n 1 else Int64.add (Int64.mul 3L n) 1L);; > > let rec loop(i, (nlen, n)) = > if i = 1L then n else > let ilen = collatzLen(1, i) in > let nlen, n = if ilen > nlen then ilen, i else nlen, n in > loop (Int64.sub i 1L, (nlen, n));; > > let _ = > let s = loop (1000000L, (1,1000000L)) in > print_int (Int64.to_int s);; I am unable to reproduce your results. Here, the time falls from 24s to 19.5s (using ocamlopt 3.12.0 on Intel x86) which is still 26× slower than HLVM. > > 1. Unboxing can give huge performance improvements on serial code, > > s/Unboxing/arithmetic optimizations/ > Please find an example where the performance benefit is due to > unboxing, and not due to arithmetic optimizations performed on the > unboxed code. The last example I gave (array of key-value pairs) demonstrates some of the performance improvements offered by unboxing in the heap (12.3× faster than OCaml in that case). I'm still not sure that this example is invalid because I cannot reproduce your results. > > let alone parallel code. The optimized HLVM is running 32× faster > > than the OCaml here. > > > > 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it > > to beat GCC-compiled C code here. > > One advantage of using LLVM is that it would notice arithmetic > optimizations like this and perform it itself (even if you use the > boxed representation). Yes. LLVM hopefully optimizes div/mod by any constant which is quite tricky in the general case. Cheers, Jon.