Version franaise
Home About Download Resources Contact us
Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Goswin von Brederlow <goswin-v-b@w...>
Subject: Re: Value types
Török Edwin <edwintorok@gmail.com> writes:

> On Sun, 12 Dec 2010 14:54:14 -0000
> "Jon Harrop" <jon@ffconsultancy.com> wrote:
>
>> The Haskell guys got their best performance improvement moving to
>> LLVM from the hailstone benchmark so it is interesting to examine
>> this case as well. I just added support for 64-bit ints to HLVM to
>> implement that benchmark and my code is:
>> 
>> Here’s the equivalent OCaml code:
>> 
>>   let rec collatzLen(c, n) : int =
>>     if n = 1L then c else
>>       collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else
>
> OK, but boxing itself has nothing to do with the performance degration
> here. It is the lack of compiler optimizations on the Int64 type. This
> could be solved by implementing compiler optimizations for it (or
> manually optimizing some integer arithmetic that is slow).
>
> Lets run the code under a profiler, or look at the assembly (I used
> 'perf record' and 'perf report'). 2 'idiv' instructions turn up as top
> offenders in the profile.
>
> Problem #1: Int64.rem n 2 -> another idiv instruction
>
> A C compiler would optimize this to an 'and' instruction.
> Change that to 'Int64.logand n 1L = 0L'/

But C would have an uint64_t there (if you are smart). Otherwise that
isn't equivalent:

void foo(int64_t x) {
  a = x % 2;
}

0000000000000000 <foo>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 c1 e8 3f             shr    $0x3f,%rax
   7:   48 01 c7                add    %rax,%rdi
   a:   83 e7 01                and    $0x1,%edi
   d:   48 29 c7                sub    %rax,%rdi
  10:   48 89 3d 00 00 00 00    mov    %rdi,0x0(%rip)        # 17 <foo+0x17>
  17:   c3                      retq   

> Problem #2: Int64.div n 2 -> idiv instruction. 
>
> A C compiler would optimize this to a right shift. Changing that to 'Int64.shift_right n 1' speeds
> up the code.

Same thing again:

void foo(int64_t x) {
  a = x / 2;
}

0000000000000000 <foo>:
   0:   48 89 f8                mov    %rdi,%rax
   3:   48 c1 e8 3f             shr    $0x3f,%rax
   7:   48 8d 3c 38             lea    (%rax,%rdi,1),%rdi
   b:   48 d1 ff                sar    %rdi
   e:   48 89 3d 00 00 00 00    mov    %rdi,0x0(%rip)        # 15 <foo+0x15>
  15:   c3                      retq   

In both cases gcc avoids the expensive idiv instruction but it needs to
handle the sign of the number with some tricks. Still faster than idiv
though.


An UInt64 module would be nice here.

MfG
        Goswin