Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floating-point difference between native-code and bytecode on x86 #4163

Closed
vicuna opened this issue Nov 15, 2006 · 11 comments
Closed

floating-point difference between native-code and bytecode on x86 #4163

vicuna opened this issue Nov 15, 2006 · 11 comments

Comments

@vicuna
Copy link

vicuna commented Nov 15, 2006

Original bug ID: 4163
Reporter: jeremy
Status: closed (set by @xavierleroy on 2007-02-21T13:56:47Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 3.09.3
Category: ~DO NOT USE (was: OCaml general)
Monitored by: jeremy monniaux

Bug description

I'm reopening this issue which is a near-duplicate of 4025, 4149, 4161 (all marked "won't fix").

The ocamlopt manual claims: "Compatibility with the bytecode compiler is extremely high: the same source code should run identically when compiled with ocamlc and ocamlopt". This claim is false for floating point on x86. I am working on a scientific application where the supposed identicality is important. There could be an ocamlopt flag that sacrifices identicality for speed, but I need to be able to turn it off.

The following program outputs 4602678819172646912 on x86 native-code and 4602678819172646913 on all other targets.

let a = 0.5;;
let b = Int64.float_of_bits (Int64.of_string "4607182418800017407");;
let x = a /. b;;
print_endline (Int64.to_string (Int64.bits_of_float x));;

I'm not satisfied the explanation from bug 4161 that intermediate results are stored in extended precision. This is a simple division. What are the intermediate results?

Interestingly, I can work around the other floating point discrepencies by adding
let (+.) = (+.)
let (-.) = (-.)
let (.) = (.)
let (/.) = (/.)
to my program. But to work around this one, I have to write my own division code in C. And at this point, can I really trust multiplication...?

-Jeremy

@vicuna
Copy link
Author

vicuna commented Nov 15, 2006

Comment author: jeremy

I didn't mean to sound cranky.
Thanks for any help!
-Jeremy

@vicuna
Copy link
Author

vicuna commented Nov 23, 2006

Comment author: monniaux

This is not a bug in OCaml! You will get the same discrepancy between IA32 and other architectures if you try it in C.

#include <stdio.h>

int main() {
double a = 0x1p-1;
double b = 0x1.fffffffffffffp-1;
double x = a / b;
long double x2 = (long double) a / (long double) b;
double x3 = (double) x2;
printf("%a %a %a %a\n", a, b, x, x3);
return 0;
}

The problem is known as "double rounding" and is well-known among people interested in floating-point issues.

On AMD64 and more generally architectures other than IA32, ocamlopt and gcc use instructions that perform as follows.

float_plus_AMD64(a,b) = round64(real_plus(a,b))

However, on IA32, they use instructions that compute on 80 bits registers, then store back to memory. Thus:

float_plus_IA32(a,b) = round64(round80(real_plus(a,b))

Contrary to intuition, round64(round80(x)) is not always equal to round64(x) in the default rounding mode (round to nearest). This is because this mode, when trying to round a number in the exact middle of two possible rounded values, will "choose" the one with the last bit equal to zero (there has to be a resolution for this).

In your case, what you do is essentially, up to a power-of-two multiplicative factor:
1/(1-epsilon) = 1 + epsilon + epsilon^2 + epsilon^3 + ...
where epsilon is the unit at the last place.

The nearest 64-bit float value is 1+2 epsilon, and that's what you get on AMD64.

Now, on IA32, the system will first round to a 80-bit value, 1+epsilon. This 80-bit value will then be rounded to 1, because 1 has a 0 at the last place in the mantissa.

http://www.di.ens.fr/~monniaux/biblio/pitfalls_of_floating_point.pdf
will tell you more about other weird issues on IA32 floating-point.

@vicuna
Copy link
Author

vicuna commented Nov 23, 2006

Comment author: jeremy

Thank you, I will be studying your reply and the pdf.

But, I still think there is an ocamlopt issue here. I certainly understand that passing through 80-bit floats can create discrepencies. But why does ocamlopt choose to pass through them?

In other words, why does the following C program behave differently from the following ocaml program (on Intel, using gcc and ocamlopt)?

void main() {
double a = 0.5;
long long int n = 4607182418800017407LL;
double b = (double)(&n);
double c = a/b;
printf("%lld\n", (long long int)(&c));
}

let a = 0.5;;
let b = Int64.float_of_bits (Int64.of_string "4607182418800017407");;
let c = a /. b;;
print_endline (Int64.to_string (Int64.bits_of_float c));;

@vicuna
Copy link
Author

vicuna commented Nov 24, 2006

Comment author: @damiendoligez

There are two considerations here:

  1. At some point it becomes an efficiency issue. ocamlopt does its best to generate efficient
    code, and that means using the hardware FP instructions, even if they have slightly different
    rounding behaviour.

  2. All this is about rounding. If your program is using FP, it shouldn't depend on the last
    bits of the result being correct. And if your computation diverges because of these rounding
    errors, it means your problem is ill-conditioned and the results are meaningless anyway.

So, although I don't have the last word on this, I seriously doubt this will ever be fixed in
any way other than adding it to the documented list of differences between ocamlc and
ocamlopt.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2006

Comment author: monniaux

Jeremy, I don't know what platform you're testing on, but on IA32, which OCaml 3.09.2, I get 4602678819172646912 for both your C and OCaml code, regardless of whether ocamlc or ocamlopt is used.

However, on AMD64 I get 4602678819172646913, and I explained the discrepancy in my preceding note.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2006

Comment author: jeremy

monniaux: This is fascinating. I'm on an Intel Mac (ocaml 3.09.2, gcc 4.0.1). Both ocamlc and gcc give 4602678819172646913, whereas ocamlopt gives 4602678819172646912. Do you understand this? Maybe you can see why this seems like an ocaml bug to me.

doligez: "ocamlopt does its best to generate efficient code, and that means using the hardware FP instructions, even if they have slightly different rounding behaviour." Why not use the same hardware instructions when executing bytecode?

"And if your computation diverges because of these rounding errors, it means your problem is ill-conditioned and the results are meaningless anyway." The bytecode and native code versions of my program perform quite similarly, but not identically. One reason that identicality is important is so that when something goes wrong in native code, I can reproduce the situation in bytecode, for debugging.

"I seriously doubt this will ever be fixed in any way other than adding it to the documented list of differences between ocamlc and ocamlopt." This would a valid way to close the bug, although I'd be disappointed.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2006

Comment author: monniaux

Jeremy, I have a hypothesis about this, but since I don't have access to an Intel Mac you'll have to test it.

Try compiling the C program that you cited with gcc option -mfpmath=387. When executed, it should output ...12 and not ...13 as with the default options.

Then recompile ocaml using ./configure -cc "gcc -mfpmath=387". You should also get ...12 and not ...13.

I can also try it if somebody gives me a shell account on an Intel Mac.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2006

Comment author: monniaux

Confirmed, I got somebody to run my code under an Intel Mac.

Here's what happens: Intel and compatible CPUs have two floating-point units, the "387" one upward compatible with the 80387 coprocessor, and the "SSE" one starting in the Pentium II era. 387 computes internally in 80-bit and thus has interesting quirks (read my paper) while SSE computes directly in IEEE-754 formats.

On your example, 387 gives one result and SSE2 another, due to "double rounding".

On IA32, gcc/Linux (by default) and ocamlopt generate code for 387. On AMD64, gcc/Linux (by default) and ocamlopt generate code for SSE2.

It seems that for IA32 targets, Apple's gcc uses SSE2 by default (since there's no Intel Mac without SSE2 anyway). This explains the difference between gcc (for your C code) and ocamlopt on your Mac.

Your Caml code under ocamlc will give the same result as your C code, because ocamlc is an interpreter written in C and will implement floating-point multiplication as the C compiler used to compile it implements it.

A comment: again, read my paper, IA32 387 quirks are documented there. I even have a piece of code that does:

if (x!=0) { printf("hi there"); assert(x!=0);} and gives an assert failure. No, this is not a bug in the compiler.

@vicuna
Copy link
Author

vicuna commented Nov 25, 2006

Comment author: jeremy

All very fascinating. Thanks for doing the detective work.

So, would it be reasonable to propose that ocamlopt should use SSE2 on Intel Macs? If I'm understanding you correctly, (1) all else equal, that is the preferred, less quirky floating point subsystem, and (2) since Apple has already chosen SSE2, this would maintain the rule that bytecode and native code run identically (assuming everything happens within a single architecture).

@vicuna
Copy link
Author

vicuna commented Nov 27, 2006

Comment author: monniaux

OCaml could indeed perhaps also support using SSE2 on later generation 32-bit x86 family processors.

But, as Damien said earlier, you're wrong if you assume reproducibility of floating-point computations. Think about it: with C compiled code, results may depend on optimization options, whether a function is defined in the same file as another, and other seemingly irrelevant details.

@vicuna
Copy link
Author

vicuna commented Feb 21, 2007

Comment author: @xavierleroy

Thanks for an interesting discussion; that's the first time I see a double rounding problem in actual code!

I will update the ocamlopt documentation in the reference manual to mention this problem with 80-bit floats in the "compatibility with the bytecode compiler" section.

I experimented with an Intel 32 bits / SSE2 code generator some time ago. The problem is that it would only work on Pentium 4 and more recent processors. You will be happy to learn however that the Intel 64 bits code generator uses 64-bit SSE2 floats for all its floating-point computations, making the problem go away on this platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant