Version française
Home     About     Download     Resources     Contact us    
Browse thread
Need for a built in round_to_int function
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Robert Roessler <roessler@r...>
Subject: Re: [Caml-list] Need for a built in round_to_int function
Erik de Castro Lopo wrote:

> I am about to port some code from C to O'caml. This code uses the 
> C99 function :
> 
>     long int lrint (double d) ;
> 
> which performs rounding on the double and then converts that to
> a long int.
> 
> In O'caml the only option seems to be:
> 
>     let round_to_int f = int_of_float (f +. 0.5) ;;
> 
> The problem is that this code on i386 produces really slow code:
> 
>     804b385:    dd 44 98 fc        fldl   0xfffffffc(%eax,%ebx,4)
>     804b389:    de c1              faddp  %st,%st(1)
>     804b38b:    83 ec 08           sub    $0x8,%esp
>     804b38e:    d9 7c 24 04        fnstcw 0x4(%esp)
>     804b392:    66 8b 44 24 04     mov    0x4(%esp),%ax
>     804b397:    b4 0c              mov    $0xc,%ah
>     804b399:    66 89 44 24 00     mov    %ax,0x0(%esp)
>     804b39e:    d9 6c 24 00        fldcw  0x0(%esp)
>     804b3a2:    db 1c 24           fistpl (%esp)
>     804b3a5:    8b 04 24           mov    (%esp),%eax
>     804b3a8:    d9 6c 24 04        fldcw  0x4(%esp)
>     804b3ac:    83 c4 08           add    $0x8,%esp
> 
> The killer here is the two fldcw (floating point load control word)
> instructions, around the fistpl (which actually does the float to int 
> conversion). Loading the FP control work causes a flush of the FPU
> pipeline. In code with a lot of floating point code interspersed
> with a round to int, there can be a significant slow down due to
> the fldcw instructions.

I will preface this by a Slashdot-like "IANANA" (I Am Not A Numerical 
Analyst).

The above approach is more or less what you expect if you (as a 
compiler code generator) a) want to do rounding following C/C++ 
standards ("Truncate (toward 0)"), and b) make no assumption regarding 
the state of the IEEE hardware rounding setting...

> The lrint function in C, replaces all the above with one fistpl
> and a single mov instruction and leaves the floating point
> control word intact. In C code that moved from:
> 
>     (int) floor (f + 0.5)
> 
> to
>     lrintf (f)
> 
> I have seen an up to 4 fold increase in speed.

You, on the other hand, are willing to make an assumption regarding 
the hardware rounding mode - [presumably] that it is set to the 
power-on default of "Round to nearest, or to even if equidistant", 
which may not be unreasonable - it just needs to be explicit that this 
*is* the assumption, and that you have a way of verifying (or at least 
reason to believe) that other software components in your app's 
environment are not invalidating this assumption.

The fact that the default hardware rounding mode does NOT match "(int) 
floor (f + 0.5)" should also be mentioned... the "+ 0.5" attempts to 
do what the hardware would call "Round up (toward +infinity)" while 
the "floor" would match the "Round down (toward -infinity)" mode. 
Combining them does not equate to "Round to nearest, or to even if 
equidistant". :)

In case it isn't obvious, the IEEE hardware default rounding behavior 
is chosen to minimize the effects of accumulated rounding errors in a 
series of calculations involving rounding.

> I've looked at the code for the O'Caml compiler and I think I 
> know how to implement this, at least for x86 and PowerPC, the two
> architectures I have access to. If I was to supply a patch would
> it be accepted?
> 
> 
> I know other suggestions like this one :
> 
>     http://sardes.inrialpes.fr/~aschmitt/cwn/2003.11.18.html#1
> 
> were not viewed favourably, but the addition of a single function
> with an explicit behaviour is a far neater solution.

This could take the form of a compiler switch exactly like "/QIfist", 
which was added to VC7 (and VC6 with the "Processor Pack").  Using 
this switch means you are aware of (or should be) and happy with the 
above detailed assumption.

Of course, if something like this were to added to ocamlopt (for 
target architectures using IEEE floating point), code (an additional 
bytecode op?) emulating the same behavior could be added to the 
runtime to maintain consistency across the interpreted and native 
operating environments - or not.

Robert Roessler
roessler@rftp.com
http://www.rftp.com