Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big_int.float_of_big_int is slightly unportable (was: Int64.to_float is slightly unportable) #5021

Closed
vicuna opened this issue Apr 7, 2010 · 8 comments

Comments

@vicuna
Copy link

vicuna commented Apr 7, 2010

Original bug ID: 5021
Reporter: Pascal Cuoq
Status: closed (set by @xavierleroy on 2017-03-07T17:36:32Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 3.11.2
Fixed in version: 4.03.0
Category: otherlibs
Tags: patch
Related to: #6896
Monitored by: @ygrek "Julien Signoles" @dbuenzli

Bug description

Int64.to_float is implemented in byterun/ints.c as:

CAMLprim value caml_int64_to_float(value v)
{
int64 i = Int64_val(v);
return caml_copy_double(I64_to_double(i));
}

where I64_to_double is defined by

#define I64_to_double(x) ((double)(x))

This is implementation-defined when the int64 cannot be represented exactly as a double.
Unfortunately, Mac OS X Snow Leopard has made the choice of using the FPU rounding mode for the choice of the resulting double, whereas most other platforms seem always to use the nearest-even double regardless of the FPU rounding mode. This means OCaml programs behave differently on Snow Leopard than elsewhere..

I think that one way to obtain the same behavior everywhere is to mask the int64 with 0x00000000FFFFFFFF and with 0xFFFFFFFF00000000, convert the two results to doubles separately (the result of each conversion is then specified because each conversion is exact), and add the two resulting doubles. This gives FPU-rounding everywhere, at a slight cost for all platforms, including the one that didn't need it. I can't think of a better way at the moment.

The same applies to native ints when those are 63-bit.

@vicuna
Copy link
Author

vicuna commented Apr 19, 2010

Comment author: @xavierleroy

Hi Pascal,

As far as I know, all x86-64 code generators (gcc, ocamlopt, etc) use the "cvtsi2sdq" instruction to convert from 64-bit int to 64-bit float, and this instruction rounds according to the current rounding mode. So, this behavior is not new with Snow Leopard; the same code is generated e.g. in MacOS 10.4 or in Linux. I haven't yet checked what other processors do.

The workaround you propose is clever (as usual) but I'm concerned about two things:

1- It's slower than the default long->double conversion. Probably not a big deal for Int64.to_float and Nativeint.to_float, but a bit of a concern for floatofint.

2- Cross-platform compatibility is good, but compatibility between ocamlopt-generated code and C compiled code on the same platform is also desirable. If the C compiler chooses to round towards zero, your code sequence behaves differently.

What do you think?

@vicuna
Copy link
Author

vicuna commented Apr 19, 2010

Comment author: pascal_cuoq

I am really sorry, I think I got sidetracked when I was investigating the portability issue I was having. I have not been able to produce any problem with Int64.to_float, which rounds according to rounding mode everywhere I tried.

It is Big_int.float_of_big_int I meant to complain about.

Here is this time a bona fide bug report:

conv.ml:

external round_up: unit -> unit = "round_up" ;;

let b = 2000000001L ;;

let b = Big_int.big_int_of_int64 (Int64.mul b b) ;;

round_up () ;;

Format.printf "%Ld@." (Int64.bits_of_float (Big_int.float_of_big_int b)) ;;

stub.c:

#include <fenv.h>
#include <float.h>
#include <math.h>

#include <caml/mlvalues.h>

value round_up(value dummy)
{
fesetround(FE_UPWARD);
return Val_unit;
}

Commandline:

ocamlopt nums.cmxa stub.o conv.ml

Result on Snow Leopard:
4885210896450059669

Result on Leopard on an Intel mac (and pretty much everywhere else where people run Frama-C's regression tests):
4885210896450059668

Note that Mac OS X started to use the FPU rounding mode for displaying floating-point numbers in Snow Leopard, unlike every other platform. This other change may be part of the same wave. According to Stephen Canon from stackoverflow.com, Snow Leopard's behavior is correct and every other platform is wrong to systematically use round-to-nearest for displaying floats.

@vicuna
Copy link
Author

vicuna commented Apr 19, 2010

Comment author: @xavierleroy

If you look at the definition of float_of_big_int, you'll see that it is really a hack:

let float_of_big_int bi =
float_of_string (string_of_big_int bi)

So, what you observe is the behavior of float_of_string, nee strtod(). And, no, I'm not surprised that the last bit of the mantissa differs from platform to platform; I wouldn't even be surprised if several low bits were wrong...

If you're using float_of_big_int in a context where every bit matters, it's time to reimplement it better, perhaps along the lines you mentioned. (Extract the top two 32-bit ints from the bigint, then convert to float, then combine and scale by the appropriate power of 2.) Sounds like a job for a floating-point expert :-) Are you interested in giving it a try?

@vicuna
Copy link
Author

vicuna commented Apr 19, 2010

Comment author: pascal_cuoq

You are not giving the current implementation of float_of_big_int enough credit. The only two behaviors we have seen so far are "using the current rounding mode", and "using nearest-even", which is better than what you get if you adapt my suggestion (which was originally for int64s only).

First, a bit of context: I had solved my problem by setting the rounding mode to nearest-even before calling float_of_big_int. But that's not good enough for going into OCaml, because OCaml currently does not depend on the C99-only, sometimes missing, fenv.h functions, because a programmer who sets the rounding mode does not expect float_of_big_int to change it, because setting the rounding mode can be a bit slow, ... Also for context, I am only interested in converting to big_ints those floats that would fit into an int64, so I still had plenty of circumvention possibilities, I realize now that I understand what was happening.

Now, let us assume someone wants to implement a better version, and can choose a (reasonable) specification as long as he documents it. The better version is not allowed to get or set the current rounding mode, but it can specify that it does not take it into account and always rounds to nearest or truncates.

  • there is a floating-point addition in the algorithm we have so far, so the only choice seems to be to specify that float_of_big_int uses the current rounding mode.

  • if the current rounding mode is nearest-even, any algorithm that does not read all the bigint's digits at least some of the times cannot be correct (incidentally, in the other cases the top three digits are often necessary, because the top digit may contain many leading zeroes).

  • if the current rounding mode is a directed one, scaling and combining more than two digits will produce a result that is wrong by more than 1ULP.

That still doesn't mean it can't be done, and I will give it a try, but the more I think about it, the more the current implementation seems elegant.

@vicuna
Copy link
Author

vicuna commented Apr 19, 2010

Comment author: pascal_cuoq

Xavier said: "Sounds like a job for a floating-point expert"

Well, let me put it this way: if you give yourself the objective of producing a conversion function from Ocaml nats to IEEE 754 doubles, IEEE 754 doubles are going to be the least of your problems.

In the reasoning above, the wisest path was to backtrack to the point where we were using floating-point addition, and to decide not to use it. From there it is simply a matter of specifying that the function always rounds towards zero and the rest is simple.

The function below seems to do the job for nats at least for 64-bit OCaml. It tried to write it so it would work in 32-bit. I do not know if OCaml supports architectures where doubles and ints have a different endianness, but this function surely won't work there.

Please, please, please, if you decide to use it for a float_of_big_int lookalike, give it a name other than "float_of_big_int" and preserve the old function for a while, otherwise it will be impossible to write code that is portable across 3.11 and 3.12.

I vaguely inferred the documentation of Nat functions from the type and random google results. If I may, keeping around the non-nativeint digit access functions when the digits are 64 (or presumably 32) bit is particularly vicious.

open Nat;;

let float_of_nat_truncate n =
if is_zero_nat n 0 (length_nat n)
then 0. (* special case because 0. is denormalized *)
else
let num_digits = num_digits_nat n 0 (length_nat n) in
let current_digit = ref (pred num_digits) in
let num_leading_zeroes = num_leading_zero_bits_in_digit n !current_digit in
let m = ref (Int64.of_nativeint (nth_digit_nat_native n !current_digit)) in
let bits_to_go = ref (53 - (length_of_digit - num_leading_zeroes)) in
let e = (length_of_digit * num_digits - num_leading_zeroes) in
let m =
if !bits_to_go < 0
then Int64.shift_right_logical !m (-(!bits_to_go))
else begin
while (!bits_to_go) > 0 && !current_digit != 0
do
decr current_digit;
let bits =
Int64.of_nativeint (nth_digit_nat_native n !current_digit)
in
let length, bits =
if !bits_to_go >= length_of_digit
then length_of_digit, bits
else
!bits_to_go,
Int64.shift_right_logical bits (length_of_digit - !bits_to_go)
in
bits_to_go := !bits_to_go - length;
m := Int64.logor (Int64.shift_left !m length) bits
done;
Int64.shift_left !m !bits_to_go
end;
in
let e = e + 1022 in
let e = Int64.shift_left (Int64.of_int e) 52 in
let m = Int64.sub m (Int64.shift_left 1L 52) in
Int64.float_of_bits (Int64.logor m e)

@vicuna
Copy link
Author

vicuna commented Apr 22, 2010

Comment author: pascal_cuoq

The function above does not handle overflows for nats of more than ~ 308 decimal digits.

@vicuna
Copy link
Author

vicuna commented Apr 22, 2010

Comment author: @xavierleroy

If you're (mostly) happy with the current implementation of float_of_big_int, we can go on with it for a while, it's not urgent to reimplement. I still think a better implementation is possible (better = more predictable w.r.t. rounding + faster), but at this point it's just a feature wish.

GMP takes the same approach as the implementation you proposed: synthesize the IEEE float directly, and always round toward zero. But such rounding is inconsistent with other int-to-float conversions (that's the starting point of this PR, isn't it?).

  • if the current rounding mode is nearest-even, any algorithm that does not
    read all the bigint's digits at least some of the times cannot be correct
    (incidentally, in the other cases the top three digits are often necessary,
    because the top digit may contain many leading zeroes).

The latter issue is easily avoided by left-shifting the big int so that the most significant bit is 1, then converting the top 64 bits to a float, then compensating for the shift via exponent scaling. The former issue (with nearest-even) is more troublesome...

  • if the current rounding mode is a directed one, scaling and combining more
    than two digits will produce a result that is wrong by more than 1ULP.

Left-shifting ensures that we would combine at most 2 float values (if we convert the top 64 bits of the bigint as two 32-bit conversions combined).

@vicuna
Copy link
Author

vicuna commented Mar 7, 2017

Comment author: @xavierleroy

Fixed 2015-07-24 by commit 16247: complete reimplementation of float_of_big_int that does not go through strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant