|Anonymous | Login | Signup for a new account||2013-06-19 11:37 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0005021||OCaml||OCaml general||public||2010-04-08 00:37||2010-04-22 09:49|
|Target Version||Fixed in Version|
|Summary||0005021: Big_int.float_of_big_int is slightly unportable (was: Int64.to_float is slightly unportable)|
|Description||Int64.to_float is implemented in byterun/ints.c as:|
CAMLprim value caml_int64_to_float(value v)
int64 i = Int64_val(v);
where I64_to_double is defined by
#define I64_to_double(x) ((double)(x))
This is implementation-defined when the int64 cannot be represented exactly as a double.
Unfortunately, Mac OS X Snow Leopard has made the choice of using the FPU rounding mode for the choice of the resulting double, whereas most other platforms seem always to use the nearest-even double regardless of the FPU rounding mode. This means OCaml programs behave differently on Snow Leopard than elsewhere..
I think that one way to obtain the same behavior everywhere is to mask the int64 with 0x00000000FFFFFFFF and with 0xFFFFFFFF00000000, convert the two results to doubles separately (the result of each conversion is then specified because each conversion is exact), and add the two resulting doubles. This gives FPU-rounding everywhere, at a slight cost for all platforms, including the one that didn't need it. I can't think of a better way at the moment.
The same applies to native ints when those are 63-bit.
|Tags||No tags attached.|
As far as I know, all x86-64 code generators (gcc, ocamlopt, etc) use the "cvtsi2sdq" instruction to convert from 64-bit int to 64-bit float, and this instruction rounds according to the current rounding mode. So, this behavior is not new with Snow Leopard; the same code is generated e.g. in MacOS 10.4 or in Linux. I haven't yet checked what other processors do.
The workaround you propose is clever (as usual) but I'm concerned about two things:
1- It's slower than the default long->double conversion. Probably not a big deal for Int64.to_float and Nativeint.to_float, but a bit of a concern for floatofint.
2- Cross-platform compatibility is good, but compatibility between ocamlopt-generated code and C compiled code on the same platform is also desirable. If the C compiler chooses to round towards zero, your code sequence behaves differently.
What do you think?
I am really sorry, I think I got sidetracked when I was investigating the portability issue I was having. I have not been able to produce any problem with Int64.to_float, which rounds according to rounding mode everywhere I tried.
It is Big_int.float_of_big_int I meant to complain about.
Here is this time a bona fide bug report:
external round_up: unit -> unit = "round_up" ;;
let b = 2000000001L ;;
let b = Big_int.big_int_of_int64 (Int64.mul b b) ;;
round_up () ;;
Format.printf "%Ld@." (Int64.bits_of_float (Big_int.float_of_big_int b)) ;;
value round_up(value dummy)
ocamlopt nums.cmxa stub.o conv.ml
Result on Snow Leopard:
Result on Leopard on an Intel mac (and pretty much everywhere else where people run Frama-C's regression tests):
Note that Mac OS X started to use the FPU rounding mode for displaying floating-point numbers in Snow Leopard, unlike every other platform. This other change may be part of the same wave. According to Stephen Canon from stackoverflow.com, Snow Leopard's behavior is correct and every other platform is wrong to systematically use round-to-nearest for displaying floats.
If you look at the definition of float_of_big_int, you'll see that it is really a hack:
let float_of_big_int bi =
float_of_string (string_of_big_int bi)
So, what you observe is the behavior of float_of_string, nee strtod(). And, no, I'm not surprised that the last bit of the mantissa differs from platform to platform; I wouldn't even be surprised if several low bits were wrong...
If you're using float_of_big_int in a context where every bit matters, it's time to reimplement it better, perhaps along the lines you mentioned. (Extract the top two 32-bit ints from the bigint, then convert to float, then combine and scale by the appropriate power of 2.) Sounds like a job for a floating-point expert :-) Are you interested in giving it a try?
You are not giving the current implementation of float_of_big_int enough credit. The only two behaviors we have seen so far are "using the current rounding mode", and "using nearest-even", which is better than what you get if you adapt my suggestion (which was originally for int64s only).
First, a bit of context: I had solved my problem by setting the rounding mode to nearest-even before calling float_of_big_int. But that's not good enough for going into OCaml, because OCaml currently does not depend on the C99-only, sometimes missing, fenv.h functions, because a programmer who sets the rounding mode does not expect float_of_big_int to change it, because setting the rounding mode can be a bit slow, ... Also for context, I am only interested in converting to big_ints those floats that would fit into an int64, so I still had plenty of circumvention possibilities, I realize now that I understand what was happening.
Now, let us assume someone wants to implement a better version, and can choose a (reasonable) specification as long as he documents it. The better version is not allowed to get or set the current rounding mode, but it can specify that it does not take it into account and always rounds to nearest or truncates.
- there is a floating-point addition in the algorithm we have so far, so the only choice seems to be to specify that float_of_big_int uses the current rounding mode.
- if the current rounding mode is nearest-even, any algorithm that does not read all the bigint's digits at least some of the times cannot be correct (incidentally, in the other cases the top three digits are often necessary, because the top digit may contain many leading zeroes).
- if the current rounding mode is a directed one, scaling and combining more than two digits will produce a result that is wrong by more than 1ULP.
That still doesn't mean it can't be done, and I will give it a try, but the more I think about it, the more the current implementation seems elegant.
Xavier said: "Sounds like a job for a floating-point expert"
Well, let me put it this way: if you give yourself the objective of producing a conversion function from Ocaml nats to IEEE 754 doubles, IEEE 754 doubles are going to be the least of your problems.
In the reasoning above, the wisest path was to backtrack to the point where we were using floating-point addition, and to decide not to use it. From there it is simply a matter of specifying that the function always rounds towards zero and the rest is simple.
The function below seems to do the job for nats at least for 64-bit OCaml. It tried to write it so it would work in 32-bit. I do not know if OCaml supports architectures where doubles and ints have a different endianness, but this function surely won't work there.
Please, please, please, if you decide to use it for a float_of_big_int lookalike, give it a name other than "float_of_big_int" and preserve the old function for a while, otherwise it will be impossible to write code that is portable across 3.11 and 3.12.
I vaguely inferred the documentation of Nat functions from the type and random google results. If I may, keeping around the non-nativeint digit access functions when the digits are 64 (or presumably 32) bit is particularly vicious.
let float_of_nat_truncate n =
if is_zero_nat n 0 (length_nat n)
then 0. (* special case because 0. is denormalized *)
let num_digits = num_digits_nat n 0 (length_nat n) in
let current_digit = ref (pred num_digits) in
let num_leading_zeroes = num_leading_zero_bits_in_digit n !current_digit in
let m = ref (Int64.of_nativeint (nth_digit_nat_native n !current_digit)) in
let bits_to_go = ref (53 - (length_of_digit - num_leading_zeroes)) in
let e = (length_of_digit * num_digits - num_leading_zeroes) in
let m =
if !bits_to_go < 0
then Int64.shift_right_logical !m (-(!bits_to_go))
while (!bits_to_go) > 0 && !current_digit != 0
let bits =
Int64.of_nativeint (nth_digit_nat_native n !current_digit)
let length, bits =
if !bits_to_go >= length_of_digit
then length_of_digit, bits
Int64.shift_right_logical bits (length_of_digit - !bits_to_go)
bits_to_go := !bits_to_go - length;
m := Int64.logor (Int64.shift_left !m length) bits
Int64.shift_left !m !bits_to_go
let e = e + 1022 in
let e = Int64.shift_left (Int64.of_int e) 52 in
let m = Int64.sub m (Int64.shift_left 1L 52) in
Int64.float_of_bits (Int64.logor m e)
edited on: 2010-04-22 09:37
The function above does not handle overflows for nats of more than ~ 308 decimal digits.
If you're (mostly) happy with the current implementation of float_of_big_int, we can go on with it for a while, it's not urgent to reimplement. I still think a better implementation is possible (better = more predictable w.r.t. rounding + faster), but at this point it's just a feature wish.
GMP takes the same approach as the implementation you proposed: synthesize the IEEE float directly, and always round toward zero. But such rounding is inconsistent with other int-to-float conversions (that's the starting point of this PR, isn't it?).
>- if the current rounding mode is nearest-even, any algorithm that does not
> read all the bigint's digits at least some of the times cannot be correct
> (incidentally, in the other cases the top three digits are often necessary,
> because the top digit may contain many leading zeroes).
The latter issue is easily avoided by left-shifting the big int so that the most significant bit is 1, then converting the top 64 bits to a float, then compensating for the shift via exponent scaling. The former issue (with nearest-even) is more troublesome...
> - if the current rounding mode is a directed one, scaling and combining more
> than two digits will produce a result that is wrong by more than 1ULP.
Left-shifting ensures that we would combine at most 2 float values (if we convert the top 64 bits of the bigint as two 32-bit conversions combined).
|2010-04-08 00:37||Pascal Cuoq||New Issue|
|2010-04-19 09:57||xleroy||Note Added: 0005340|
|2010-04-19 09:57||xleroy||Status||new => feedback|
|2010-04-19 14:47||pascal_cuoq||Note Added: 0005347|
|2010-04-19 15:19||xleroy||Note Added: 0005348|
|2010-04-19 15:19||xleroy||Summary||Int64.to_float is slightly unportable => Big_int.float_of_big_int is slightly unportable (was: Int64.to_float is slightly unportable)|
|2010-04-19 17:01||pascal_cuoq||Note Added: 0005349|
|2010-04-19 22:46||pascal_cuoq||Note Added: 0005350|
|2010-04-22 09:37||pascal_cuoq||Note Added: 0005359|
|2010-04-22 09:37||pascal_cuoq||Note Edited: 0005359|
|2010-04-22 09:49||xleroy||Note Added: 0005360|
|2010-04-22 09:49||xleroy||Severity||tweak => feature|
|2010-04-22 09:49||xleroy||Status||feedback => acknowledged|
|Copyright © 2000 - 2011 MantisBT Group|