Version française
Home     About     Download     Resources     Contact us    
Browse thread
Ocamlopt code generator question
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Dmitry Bely <dmitry.bely@g...>
Subject: Re: [Caml-list] Ocamlopt code generator question
On Tue, May 5, 2009 at 1:24 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
>> For amd64 we have in asmcomp/amd64/proc_nt.mlp:
>>
>> (*  xmm0 - xmm15  100 - 115       xmm0 - xmm9: Caml function arguments
>>                                xmm0 - xmm3: C function arguments
>>                                xmm0: Caml and C function results
>>                                xmm6-xmm15 are preserved by C *)
>>
>> let loc_arguments arg =
>>  calling_conventions 0 9 100 109 outgoing arg
>> let loc_parameters arg =
>>  let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc
>> let loc_results res =
>>  let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc
>>
>> What these first_float=100 and last_float=109 for loc_arguments and
>> loc_parameters affect? My impression is that floats are always passed
>> boxed, so xmm registers are in fact never used to pass parameters. And
>> float values are returned as a pointer in eax, not a value in xmm0 as
>> loc_results would suggest.
>
> The ocamlopt code generators support unboxed floats as function
> parameters and results, as well as returning multiple results in
> several registers.  (Except for the x86-32 bits port, because of the
> weird floating-point model of this architecture.)  You're right that
> the ocamlopt "middle-end" does not currently take advantage of this
> possibility, since floats are passed between functions in boxed state.

I see. Why I asked this: trying to improve floating-point performance
on 32-bit x86 platform I have merged floating-point SSE2 code
generator from amd64 ocamlopt back end to i386 one, making ia32sse2
architecture. It also inlines sqrt() via -ffast-math flag and slightly
optimizes emit_float_test (usually eliminates an extra jump) -
features that are missed in the original amd64 code generator. All
this seems to work OK: beyond my own code all tests found in Ocaml CVS
test directory are passed. Of course this is idea is not new - you had
working IA32+SSE2 back end several years ago [1] but unfortunately
never released it to the public.

Is this of any interest to anybody?

- Dmitry Bely

[1] http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html