English version
Accueil     À propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis à jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml à l'adresse ocaml.org.

Browse thread
Need for a built in round_to_int function
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2005-02-21 (16:00)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] Need for a built in round_to_int function
> [ "round float to int" primitive ]
> with the hacked ocamlopt compiler I'm getting about a 3 times speed
> improvement using round_to_int on a Pentium III. The speedup on P4
> is supposed to be even more because of the P4's deeper pipeline.

On the other hand, according to the P4 optimization manuals, the P4 is
supposed to special-case this particular use of fnstcw / fldcw, so
perhaps the situation is no worse than on the P3.  At any rate:

> I'd be interested in any comments from the official ocaml maintainers.
> What are the chances of this going into the official distribution?

Essentially zero :-(  Basically, this is a case where additional stuff is
introduced in the machine-independent parts of ocamlopt and in every
code generator just to work around the brain-dead x87 floating-point
instruction set.  Every other processor (as well as the SSE2 instr.set
on the P4) has a fast "truncate float to int" instruction, from which
an efficient "round to nearest int" is easy to obtain (truncate (x +. 0.5)).  

I spent a lot of time in the past trying to extract decent float
performance out of the x87 instruction set, under the strict
constraint that these performance hacks should be confined to the
x86-specific parts of ocamlopt.  Nowadays, I no longer care about
performance for x87: users who want good float performance should
simply use the x86-64 architecture (with SSE2 floats), it's cheaply
available from both AMD and Intel and works so much better for
floats...  At the extreme, I'd rather do a x86+SSE2 port (for P4 and
P3M-class processors) than spend more time on x86+x87.

- Xavier Leroy