Browse thread
Ocamlopt code generator question
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: | 2009-05-05 (09:41) |
From: | Dmitry Bely <dmitry.bely@g...> |
Subject: | Re: [Caml-list] Ocamlopt code generator question |
On Tue, May 5, 2009 at 1:24 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote: >> For amd64 we have in asmcomp/amd64/proc_nt.mlp: >> >> (* xmm0 - xmm15 100 - 115 xmm0 - xmm9: Caml function arguments >> xmm0 - xmm3: C function arguments >> xmm0: Caml and C function results >> xmm6-xmm15 are preserved by C *) >> >> let loc_arguments arg = >> calling_conventions 0 9 100 109 outgoing arg >> let loc_parameters arg = >> let (loc, ofs) = calling_conventions 0 9 100 109 incoming arg in loc >> let loc_results res = >> let (loc, ofs) = calling_conventions 0 0 100 100 not_supported res in loc >> >> What these first_float=100 and last_float=109 for loc_arguments and >> loc_parameters affect? My impression is that floats are always passed >> boxed, so xmm registers are in fact never used to pass parameters. And >> float values are returned as a pointer in eax, not a value in xmm0 as >> loc_results would suggest. > > The ocamlopt code generators support unboxed floats as function > parameters and results, as well as returning multiple results in > several registers. (Except for the x86-32 bits port, because of the > weird floating-point model of this architecture.) You're right that > the ocamlopt "middle-end" does not currently take advantage of this > possibility, since floats are passed between functions in boxed state. I see. Why I asked this: trying to improve floating-point performance on 32-bit x86 platform I have merged floating-point SSE2 code generator from amd64 ocamlopt back end to i386 one, making ia32sse2 architecture. It also inlines sqrt() via -ffast-math flag and slightly optimizes emit_float_test (usually eliminates an extra jump) - features that are missed in the original amd64 code generator. All this seems to work OK: beyond my own code all tests found in Ocaml CVS test directory are passed. Of course this is idea is not new - you had working IA32+SSE2 back end several years ago [1] but unfortunately never released it to the public. Is this of any interest to anybody? - Dmitry Bely [1] http://caml.inria.fr/pub/ml-archives/caml-list/2003/03/e0db2f3f54ce19e4bad589ffbb082484.fr.html