floating-point difference between native-code and bytecode on x86 #4163

vicuna · 2006-11-15T20:00:41Z

Original bug ID: 4163
Reporter: jeremy
Status: closed (set by @xavierleroy on 2007-02-21T13:56:47Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 3.09.3
Category: ~DO NOT USE (was: OCaml general)
Monitored by: jeremy monniaux

Bug description

I'm reopening this issue which is a near-duplicate of 4025, 4149, 4161 (all marked "won't fix").

The ocamlopt manual claims: "Compatibility with the bytecode compiler is extremely high: the same source code should run identically when compiled with ocamlc and ocamlopt". This claim is false for floating point on x86. I am working on a scientific application where the supposed identicality is important. There could be an ocamlopt flag that sacrifices identicality for speed, but I need to be able to turn it off.

The following program outputs 4602678819172646912 on x86 native-code and 4602678819172646913 on all other targets.

let a = 0.5;;
let b = Int64.float_of_bits (Int64.of_string "4607182418800017407");;
let x = a /. b;;
print_endline (Int64.to_string (Int64.bits_of_float x));;

I'm not satisfied the explanation from bug 4161 that intermediate results are stored in extended precision. This is a simple division. What are the intermediate results?

Interestingly, I can work around the other floating point discrepencies by adding
let (+.) = (+.)
let (-.) = (-.)
let (.) = (.)
let (/.) = (/.)
to my program. But to work around this one, I have to write my own division code in C. And at this point, can I really trust multiplication...?

-Jeremy

vicuna · 2006-11-15T20:15:23Z

Comment author: jeremy

I didn't mean to sound cranky.
Thanks for any help!
-Jeremy

vicuna · 2006-11-23T22:46:23Z

Comment author: monniaux

This is not a bug in OCaml! You will get the same discrepancy between IA32 and other architectures if you try it in C.

#include <stdio.h>

int main() {
double a = 0x1p-1;
double b = 0x1.fffffffffffffp-1;
double x = a / b;
long double x2 = (long double) a / (long double) b;
double x3 = (double) x2;
printf("%a %a %a %a\n", a, b, x, x3);
return 0;
}

The problem is known as "double rounding" and is well-known among people interested in floating-point issues.

On AMD64 and more generally architectures other than IA32, ocamlopt and gcc use instructions that perform as follows.

float_plus_AMD64(a,b) = round64(real_plus(a,b))

However, on IA32, they use instructions that compute on 80 bits registers, then store back to memory. Thus:

float_plus_IA32(a,b) = round64(round80(real_plus(a,b))

Contrary to intuition, round64(round80(x)) is not always equal to round64(x) in the default rounding mode (round to nearest). This is because this mode, when trying to round a number in the exact middle of two possible rounded values, will "choose" the one with the last bit equal to zero (there has to be a resolution for this).

In your case, what you do is essentially, up to a power-of-two multiplicative factor:
1/(1-epsilon) = 1 + epsilon + epsilon^2 + epsilon^3 + ...
where epsilon is the unit at the last place.

The nearest 64-bit float value is 1+2 epsilon, and that's what you get on AMD64.

Now, on IA32, the system will first round to a 80-bit value, 1+epsilon. This 80-bit value will then be rounded to 1, because 1 has a 0 at the last place in the mantissa.

http://www.di.ens.fr/~monniaux/biblio/pitfalls_of_floating_point.pdf
will tell you more about other weird issues on IA32 floating-point.

vicuna · 2006-11-23T23:16:02Z

Comment author: jeremy

Thank you, I will be studying your reply and the pdf.

But, I still think there is an ocamlopt issue here. I certainly understand that passing through 80-bit floats can create discrepencies. But why does ocamlopt choose to pass through them?

In other words, why does the following C program behave differently from the following ocaml program (on Intel, using gcc and ocamlopt)?

void main() {
double a = 0.5;
long long int n = 4607182418800017407LL;
double b = (double)(&n);
double c = a/b;
printf("%lld\n", (long long int)(&c));
}

let a = 0.5;;
let b = Int64.float_of_bits (Int64.of_string "4607182418800017407");;
let c = a /. b;;
print_endline (Int64.to_string (Int64.bits_of_float c));;

vicuna · 2006-11-24T13:57:57Z

Comment author: @damiendoligez

There are two considerations here:

At some point it becomes an efficiency issue. ocamlopt does its best to generate efficient
code, and that means using the hardware FP instructions, even if they have slightly different
rounding behaviour.
All this is about rounding. If your program is using FP, it shouldn't depend on the last
bits of the result being correct. And if your computation diverges because of these rounding
errors, it means your problem is ill-conditioned and the results are meaningless anyway.

So, although I don't have the last word on this, I seriously doubt this will ever be fixed in
any way other than adding it to the documented list of differences between ocamlc and
ocamlopt.

vicuna · 2006-11-24T14:05:43Z

Comment author: monniaux

Jeremy, I don't know what platform you're testing on, but on IA32, which OCaml 3.09.2, I get 4602678819172646912 for both your C and OCaml code, regardless of whether ocamlc or ocamlopt is used.

However, on AMD64 I get 4602678819172646913, and I explained the discrepancy in my preceding note.

vicuna · 2006-11-24T17:40:10Z

Comment author: jeremy

monniaux: This is fascinating. I'm on an Intel Mac (ocaml 3.09.2, gcc 4.0.1). Both ocamlc and gcc give 4602678819172646913, whereas ocamlopt gives 4602678819172646912. Do you understand this? Maybe you can see why this seems like an ocaml bug to me.

doligez: "ocamlopt does its best to generate efficient code, and that means using the hardware FP instructions, even if they have slightly different rounding behaviour." Why not use the same hardware instructions when executing bytecode?

"And if your computation diverges because of these rounding errors, it means your problem is ill-conditioned and the results are meaningless anyway." The bytecode and native code versions of my program perform quite similarly, but not identically. One reason that identicality is important is so that when something goes wrong in native code, I can reproduce the situation in bytecode, for debugging.

"I seriously doubt this will ever be fixed in any way other than adding it to the documented list of differences between ocamlc and ocamlopt." This would a valid way to close the bug, although I'd be disappointed.

vicuna · 2006-11-24T18:41:42Z

Comment author: monniaux

Jeremy, I have a hypothesis about this, but since I don't have access to an Intel Mac you'll have to test it.

Try compiling the C program that you cited with gcc option -mfpmath=387. When executed, it should output ...12 and not ...13 as with the default options.

Then recompile ocaml using ./configure -cc "gcc -mfpmath=387". You should also get ...12 and not ...13.

I can also try it if somebody gives me a shell account on an Intel Mac.

vicuna · 2006-11-24T19:04:44Z

Comment author: monniaux

Confirmed, I got somebody to run my code under an Intel Mac.

Here's what happens: Intel and compatible CPUs have two floating-point units, the "387" one upward compatible with the 80387 coprocessor, and the "SSE" one starting in the Pentium II era. 387 computes internally in 80-bit and thus has interesting quirks (read my paper) while SSE computes directly in IEEE-754 formats.

On your example, 387 gives one result and SSE2 another, due to "double rounding".

On IA32, gcc/Linux (by default) and ocamlopt generate code for 387. On AMD64, gcc/Linux (by default) and ocamlopt generate code for SSE2.

It seems that for IA32 targets, Apple's gcc uses SSE2 by default (since there's no Intel Mac without SSE2 anyway). This explains the difference between gcc (for your C code) and ocamlopt on your Mac.

Your Caml code under ocamlc will give the same result as your C code, because ocamlc is an interpreter written in C and will implement floating-point multiplication as the C compiler used to compile it implements it.

A comment: again, read my paper, IA32 387 quirks are documented there. I even have a piece of code that does:

if (x!=0) { printf("hi there"); assert(x!=0);} and gives an assert failure. No, this is not a bug in the compiler.

vicuna · 2006-11-25T22:40:49Z

Comment author: jeremy

All very fascinating. Thanks for doing the detective work.

So, would it be reasonable to propose that ocamlopt should use SSE2 on Intel Macs? If I'm understanding you correctly, (1) all else equal, that is the preferred, less quirky floating point subsystem, and (2) since Apple has already chosen SSE2, this would maintain the rule that bytecode and native code run identically (assuming everything happens within a single architecture).

vicuna · 2006-11-27T18:08:26Z

Comment author: monniaux

OCaml could indeed perhaps also support using SSE2 on later generation 32-bit x86 family processors.

But, as Damien said earlier, you're wrong if you assume reproducibility of floating-point computations. Think about it: with C compiled code, results may depend on optimization options, whether a function is defined in the same file as another, and other seemingly irrelevant details.

vicuna · 2007-02-21T13:56:47Z

Comment author: @xavierleroy

Thanks for an interesting discussion; that's the first time I see a double rounding problem in actual code!

I will update the ocamlopt documentation in the reference manual to mention this problem with 80-bit floats in the "compatibility with the bytecode compiler" section.

I experimented with an Intel 32 bits / SSE2 code generator some time ago. The problem is that it would only work on Pentium 4 and more recent processors. You will be happy to learn however that the Intel 64 bits code generator uses 64-bit SSE2 floats for all its floating-point computations, making the problem go away on this platform.

vicuna closed this as completed Feb 21, 2007

vicuna added wontfix bug labels Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

floating-point difference between native-code and bytecode on x86 #4163

floating-point difference between native-code and bytecode on x86 #4163

vicuna commented Nov 15, 2006

vicuna commented Nov 15, 2006

vicuna commented Nov 23, 2006

vicuna commented Nov 23, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 25, 2006

vicuna commented Nov 27, 2006

vicuna commented Feb 21, 2007

floating-point difference between native-code and bytecode on x86 #4163

floating-point difference between native-code and bytecode on x86 #4163

Comments

vicuna commented Nov 15, 2006

Bug description

vicuna commented Nov 15, 2006

vicuna commented Nov 23, 2006

vicuna commented Nov 23, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 24, 2006

vicuna commented Nov 25, 2006

vicuna commented Nov 27, 2006

vicuna commented Feb 21, 2007