Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Jon Harrop <jon@f...> |
| Subject: | Value types (Was: [Caml-list] ocamlopt LLVM support) |
The Haskell guys got their best performance improvement moving to LLVM from
the hailstone benchmark so it is interesting to examine this case as well. I
just added support for 64-bit ints to HLVM to implement that benchmark and
my code is:
let rec collatzLen ((c, n): int * int64) : int =
if n = 1L then c else
collatzLen (c+1, if n % 2L = 0L then n / 2L else 3L * n + 1L);;
let rec loop ((i, (nlen, n)): int64 * (int * int64)) : int64 =
if i = 1L then n else
let ilen = collatzLen(1, i) in
let nlen, n = if ilen > nlen then ilen, i else nlen, n in
loop (i-1L, (nlen, n));;
When run without using LLVM's optimization passes, this produces the
following output for 10k, 100k and 1M, respectively:
- : `Int64 = 6171L
Live: 0
0.00704098s total; 0s suspend; 0s mark; 0s sweep
- : `Int64 = 77031L
Live: 0
0.087815s total; 0s suspend; 0s mark; 0s sweep
- : `Int64 = 837799L
Live: 0
1.06907s total; 0s suspend; 0s mark; 0s sweep
With LLVM's default optimization passes enabled, I get:
- : `Int64 = 6171L
Live: 0
0.00508595s total; 0s suspend; 0s mark; 0s sweep
- : `Int64 = 77031L
Live: 0
0.0626569s total; 0s suspend; 0s mark; 0s sweep
- : `Int64 = 837799L
Live: 0
0.759738s total; 0s suspend; 0s mark; 0s sweep
Note the ~30% performance improvement in this case. In other cases, LLVM's
default optimization passes can degrade performance significantly.
Here’s the equivalent OCaml code:
let rec collatzLen(c, n) : int =
if n = 1L then c else
collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else
Int64.add (Int64.mul 3L n) 1L);;
let rec loop(i, (nlen, n)) =
if i = 1L then n else
let ilen = collatzLen(1, i) in
let nlen, n = if ilen > nlen then ilen, i else nlen, n in
loop (Int64.sub i 1L, (nlen, n));;
and Haskell:
import Data.Int
collatzLen :: Int -> Int64 -> Int
collatzLen c 1 = c
collatzLen c n = collatzLen (c+1) $ if n `mod` 2 == 0 then n `div` 2 else
3*n+1
pmax x n = x `max` (collatzLen 1 n, n)
main = print $ foldl pmax (1,1) [2..1000000]
and C99:
#include <stdio.h>
int collatzLen(int c, long long n) {
return (n==1 ? c : collatzLen(c+1, (n%2==0 ? n/2 : 3*n+1)));
}
long long loop(long long m) {
int nlen=1;
long long n = 1;
for (long long i=2; i<=m; ++i) {
const int ilen = collatzLen(1, i);
if (ilen > nlen) {
nlen = ilen;
n = i;
}
}
return n;
}
int main() {
printf("%lld", loop(1000000));
}
And here are my timings:
OCaml: 24.3s ocamlopt
Haskell: 24.0s ghc6.10 --make -O2
F#.NET: 3.45s fsc --optimize+ --platform:x86
C: 1.20s gcc -O3 -std=c99
HLVM: 1.07s ./repl --compile
HLVM (opt): 0.76s opt -tailcallelim -std-compile-opts
These results really demonstrate two things:
1. Unboxing can give huge performance improvements on serial code, let alone
parallel code. The optimized HLVM is running 32× faster than the OCaml here.
2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it to beat
GCC-compiled C code here.
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com