Browse thread
Ocamlopt code generator question
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2009-05-11 (07:55) |
From: | Dmitry Bely <dmitry.bely@g...> |
Subject: | Re: [Caml-list] Ocamlopt x86-32 and SSE2 |
On Fri, May 8, 2009 at 2:21 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: >> I see. Why I asked this: trying to improve floating-point performance >> on 32-bit x86 platform I have merged floating-point SSE2 code >> generator from amd64 ocamlopt back end to i386 one, making ia32sse2 >> architecture. It also inlines sqrt() via -ffast-math flag and slightly >> optimizes emit_float_test (usually eliminates an extra jump) - >> features that are missed in the original amd64 code generator. > > You just passed black belt in OCaml compiler hacking :-) Thank you, sensei :-) >> Is this of any interest to anybody? > > I'm definitely interested in the potential improvements to the amd64 > code generator. > > Concerning the i386 code generator (x86 in 32-bit mode), SSE2 float > arithmetic does improve performance and fit ocamlopt's compilation > model much better than the current x87 float arithmetic, which is a > bit of a hack. Several options can be considered: > > 1- Have an additional "ia32sse2" port of ocamlopt in parallel with the > current "i386" port. > > 2- Declare pre-SSE2 processors obsolete and convert the current > "i386" port to always use SSE2 float arithmetic. > > 3- Support both x87 and SSE2 float arithmetic within the same i386 > port, with a command-line option to activate SSE2, like gcc does. > > I'm really not keen on approach 1. We have too many ports (and > their variants for Windows/MSVC) already. Moreover, I suspect > packagers would stick to the i386 port for compatibility with old > hardware, and most casual users would, too, out of lazyness, so this > hypothetical "ia32sse2" port would receive little testing. > > Approach 2 is tempting for me because it would simplify the x86-32 > code generator and remove some historical cruft. The issue is that it > demands a processor that implements SSE2. For a list of processors, see > http://en.wikipedia.org/wiki/SSE2 > As a rule of thumb, almost all desktop PC bought since 2004 has SSE2, > as well as almost all notebooks since 2006. That should be OK for > professional users (it's nearly impossible to purchase maintenance > beyond 3 years, anyway) and serious hobbyists. However, packagers are > going to be very unhappy: Debian still lists i486 as its bottom line; > for Fedora, it's Pentium or Pentium II; for Windows, it's "a 1GHz > processor", meaning Pentium III. All these processors lack SSE2 > support. Only MacOS X is SSE2-compatible from scratch. > > Approach 3 is probably the best from a user's point of view. But it's > going to complicate the code generator: the x87 cruft would still be > there, and new cruft would need to be added to support SSE2. Code > compiled with the SSE2 flag could link with code compiled without, > provided the SSE2 registers are not used for parameter and result > passing. But as Dmitry observed, this is already the case in the > current ocamlopt compiler. I am curious if passing unboxed floats is possible in the current Ocaml data model? As for proposed options - I tend to vote for #3 (and implement it if there is a consensus). Still there is a plenty of low-power/embedded x86 hardware that does not support SSE2. And one will be able to compare x87 and SSE2 backends performance to convince him/herself that the play really worths the candle :-) - Dmitry Bely