Version française
Home     About     Download     Resources     Contact us    
Browse thread
Ocamlopt code generator question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Dmitry Bely <dmitry.bely@g...>
Subject: Re: [Caml-list] Ocamlopt x86-32 and SSE2
On Fri, May 8, 2009 at 2:21 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:

>> I see. Why I asked this: trying to improve floating-point performance
>> on 32-bit x86 platform I have merged floating-point SSE2 code
>> generator from amd64 ocamlopt back end to i386 one, making ia32sse2
>> architecture. It also inlines sqrt() via -ffast-math flag and slightly
>> optimizes emit_float_test (usually eliminates an extra jump) -
>> features that are missed in the original amd64 code generator.
>
> You just passed black belt in OCaml compiler hacking :-)

Thank you, sensei :-)

>> Is this of any interest to anybody?
>
> I'm definitely interested in the potential improvements to the amd64
> code generator.
>
> Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float
> arithmetic does improve performance and fit ocamlopt's compilation
> model much better than the current x87 float arithmetic, which is a
> bit of a hack.  Several options can be considered:
>
> 1- Have an additional "ia32sse2" port of ocamlopt in parallel with the
>   current "i386" port.
>
> 2- Declare pre-SSE2 processors obsolete and convert the current
>   "i386" port to always use SSE2 float arithmetic.
>
> 3- Support both x87 and SSE2 float arithmetic within the same i386
>   port, with a command-line option to activate SSE2, like gcc does.
>
> I'm really not keen on approach 1.  We have too many ports (and
> their variants for Windows/MSVC) already.  Moreover, I suspect
> packagers would stick to the i386 port for compatibility with old
> hardware, and most casual users would, too, out of lazyness, so this
> hypothetical "ia32sse2" port would receive little testing.
>
> Approach 2 is tempting for me because it would simplify the x86-32
> code generator and remove some historical cruft.  The issue is that it
> demands a processor that implements SSE2.  For a list of processors, see
>  http://en.wikipedia.org/wiki/SSE2
> As a rule of thumb, almost all desktop PC bought since 2004 has SSE2,
> as well as almost all notebooks since 2006.  That should be OK for
> professional users (it's nearly impossible to purchase maintenance
> beyond 3 years, anyway) and serious hobbyists.  However, packagers are
> going to be very unhappy: Debian still lists i486 as its bottom line;
> for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz
> processor", meaning Pentium III.  All these processors lack SSE2
> support.  Only MacOS X is SSE2-compatible from scratch.
>
> Approach 3 is probably the best from a user's point of view.  But it's
> going to complicate the code generator: the x87 cruft would still be
> there, and new cruft would need to be added to support SSE2.  Code
> compiled with the SSE2 flag could link with code compiled without,
> provided the SSE2 registers are not used for parameter and result
> passing.  But as Dmitry observed, this is already the case in the
> current ocamlopt compiler.

I am curious if passing unboxed floats is possible in the current
Ocaml data model?

As for proposed options - I tend to vote for #3 (and implement it if
there is a consensus). Still there is a plenty of low-power/embedded
x86 hardware that does not support SSE2. And one will be able to
compare x87 and SSE2 backends performance to convince him/herself that
the play really worths the candle :-)

- Dmitry Bely