Version franaise
Home About Download Resources Contact us
Browse thread
Value types (Was: [Caml-list] ocamlopt LLVM support)
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Török Edwin <edwintorok@g...>
Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support)
On Sun, 12 Dec 2010 18:01:13 -0000
"Jon Harrop" <jon@ffconsultancy.com> wrote:

> Török Edwin wrote:
> > Do you really need to use Int64 for that though? Won't the 63-bit
> > version do?
> 
> I'm running 32-bit.

That explains it, in a 32-bit chroot my modified version is still slow.
I thought that 32-bit systems are not that relevant anymore (except for
Windows, but then people start moving to 64-bit there also).

> 
> > > I am unable to reproduce your results. Here, the time falls from
> > > 24s to 19.5s (using ocamlopt 3.12.0 on Intel x86) which is still
> > > 26× slower than HLVM.
> 
> Sorry, I'm actually using an Opteron x86 (logged in from an Intel
> x86!).
> 
> > Do you still have 'idiv' in the compiled code? See my attached
> > assembly, and compare it with yours please.
> > I was doing the test on 64-bit, with ocamlopt 3.11.2 and 3.12.0.
> 
> I get what appear to be calls to C code:
> 
> camlCollatz__collatzLen_1030:
>         subl    $8, %esp
> .L103:
>         movl    %eax, 4(%esp)
>         movl    %ebx, 0(%esp)
>         pushl   $camlCollatz__10
>         pushl   %ebx
>         movl    $caml_equal, %eax
>         call    caml_c_call

Yes, that is quite bad. I don't know how OCaml's code generator works,
but it looks like it calls the C implementation if the CPU doesn't
support the operation directly. And since this is 32-bit you need all
the extra pushes and movs to do actually call something.
If only it could inline those calls, then it could optimize away most
of the overhead (LLVM would help here again).

> 
> > FWIW the original code took 2.8 seconds here, so only 4x slower
> > (this is an AMD Phenom II x6 1090T CPU). It probably depends how
> > fast/slow the 'idiv' is on your CPU.
> 
> The performance of idiv is irrelevant here. The bottleneck may be
> those C calls but I don't understand why they are being generated.

I think for the same reason gcc has __udivdi3 in libgcc: there is no
direct way of executing a 64-bit divide on a 32-bit machine, and it
saves code space to do it in a function.
However that doesn't make much sense for mul and add, which don't need
that many instructions to implement on 32-bit.

> 
> Cheers,
> Jon.
> 
>