New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
floating-point difference between native-code and bytecode on x86 #4163
Comments
Comment author: jeremy I didn't mean to sound cranky. |
Comment author: monniaux This is not a bug in OCaml! You will get the same discrepancy between IA32 and other architectures if you try it in C. #include <stdio.h> int main() { The problem is known as "double rounding" and is well-known among people interested in floating-point issues. On AMD64 and more generally architectures other than IA32, ocamlopt and gcc use instructions that perform as follows. float_plus_AMD64(a,b) = round64(real_plus(a,b)) However, on IA32, they use instructions that compute on 80 bits registers, then store back to memory. Thus: float_plus_IA32(a,b) = round64(round80(real_plus(a,b)) Contrary to intuition, round64(round80(x)) is not always equal to round64(x) in the default rounding mode (round to nearest). This is because this mode, when trying to round a number in the exact middle of two possible rounded values, will "choose" the one with the last bit equal to zero (there has to be a resolution for this). In your case, what you do is essentially, up to a power-of-two multiplicative factor: The nearest 64-bit float value is 1+2 epsilon, and that's what you get on AMD64. Now, on IA32, the system will first round to a 80-bit value, 1+epsilon. This 80-bit value will then be rounded to 1, because 1 has a 0 at the last place in the mantissa. http://www.di.ens.fr/~monniaux/biblio/pitfalls_of_floating_point.pdf |
Comment author: jeremy Thank you, I will be studying your reply and the pdf. But, I still think there is an ocamlopt issue here. I certainly understand that passing through 80-bit floats can create discrepencies. But why does ocamlopt choose to pass through them? In other words, why does the following C program behave differently from the following ocaml program (on Intel, using gcc and ocamlopt)? void main() { let a = 0.5;; |
Comment author: @damiendoligez There are two considerations here:
So, although I don't have the last word on this, I seriously doubt this will ever be fixed in |
Comment author: monniaux Jeremy, I don't know what platform you're testing on, but on IA32, which OCaml 3.09.2, I get 4602678819172646912 for both your C and OCaml code, regardless of whether ocamlc or ocamlopt is used. However, on AMD64 I get 4602678819172646913, and I explained the discrepancy in my preceding note. |
Comment author: jeremy monniaux: This is fascinating. I'm on an Intel Mac (ocaml 3.09.2, gcc 4.0.1). Both ocamlc and gcc give 4602678819172646913, whereas ocamlopt gives 4602678819172646912. Do you understand this? Maybe you can see why this seems like an ocaml bug to me. doligez: "ocamlopt does its best to generate efficient code, and that means using the hardware FP instructions, even if they have slightly different rounding behaviour." Why not use the same hardware instructions when executing bytecode? "And if your computation diverges because of these rounding errors, it means your problem is ill-conditioned and the results are meaningless anyway." The bytecode and native code versions of my program perform quite similarly, but not identically. One reason that identicality is important is so that when something goes wrong in native code, I can reproduce the situation in bytecode, for debugging. "I seriously doubt this will ever be fixed in any way other than adding it to the documented list of differences between ocamlc and ocamlopt." This would a valid way to close the bug, although I'd be disappointed. |
Comment author: monniaux Jeremy, I have a hypothesis about this, but since I don't have access to an Intel Mac you'll have to test it. Try compiling the C program that you cited with gcc option -mfpmath=387. When executed, it should output ...12 and not ...13 as with the default options. Then recompile ocaml using ./configure -cc "gcc -mfpmath=387". You should also get ...12 and not ...13. I can also try it if somebody gives me a shell account on an Intel Mac. |
Comment author: monniaux Confirmed, I got somebody to run my code under an Intel Mac. Here's what happens: Intel and compatible CPUs have two floating-point units, the "387" one upward compatible with the 80387 coprocessor, and the "SSE" one starting in the Pentium II era. 387 computes internally in 80-bit and thus has interesting quirks (read my paper) while SSE computes directly in IEEE-754 formats. On your example, 387 gives one result and SSE2 another, due to "double rounding". On IA32, gcc/Linux (by default) and ocamlopt generate code for 387. On AMD64, gcc/Linux (by default) and ocamlopt generate code for SSE2. It seems that for IA32 targets, Apple's gcc uses SSE2 by default (since there's no Intel Mac without SSE2 anyway). This explains the difference between gcc (for your C code) and ocamlopt on your Mac. Your Caml code under ocamlc will give the same result as your C code, because ocamlc is an interpreter written in C and will implement floating-point multiplication as the C compiler used to compile it implements it. A comment: again, read my paper, IA32 387 quirks are documented there. I even have a piece of code that does: if (x!=0) { printf("hi there"); assert(x!=0);} and gives an assert failure. No, this is not a bug in the compiler. |
Comment author: jeremy All very fascinating. Thanks for doing the detective work. So, would it be reasonable to propose that ocamlopt should use SSE2 on Intel Macs? If I'm understanding you correctly, (1) all else equal, that is the preferred, less quirky floating point subsystem, and (2) since Apple has already chosen SSE2, this would maintain the rule that bytecode and native code run identically (assuming everything happens within a single architecture). |
Comment author: monniaux OCaml could indeed perhaps also support using SSE2 on later generation 32-bit x86 family processors. But, as Damien said earlier, you're wrong if you assume reproducibility of floating-point computations. Think about it: with C compiled code, results may depend on optimization options, whether a function is defined in the same file as another, and other seemingly irrelevant details. |
Comment author: @xavierleroy Thanks for an interesting discussion; that's the first time I see a double rounding problem in actual code! I will update the ocamlopt documentation in the reference manual to mention this problem with 80-bit floats in the "compatibility with the bytecode compiler" section. I experimented with an Intel 32 bits / SSE2 code generator some time ago. The problem is that it would only work on Pentium 4 and more recent processors. You will be happy to learn however that the Intel 64 bits code generator uses 64-bit SSE2 floats for all its floating-point computations, making the problem go away on this platform. |
Original bug ID: 4163
Reporter: jeremy
Status: closed (set by @xavierleroy on 2007-02-21T13:56:47Z)
Resolution: won't fix
Priority: normal
Severity: minor
Version: 3.09.3
Category: ~DO NOT USE (was: OCaml general)
Monitored by: jeremy monniaux
Bug description
I'm reopening this issue which is a near-duplicate of 4025, 4149, 4161 (all marked "won't fix").
The ocamlopt manual claims: "Compatibility with the bytecode compiler is extremely high: the same source code should run identically when compiled with ocamlc and ocamlopt". This claim is false for floating point on x86. I am working on a scientific application where the supposed identicality is important. There could be an ocamlopt flag that sacrifices identicality for speed, but I need to be able to turn it off.
The following program outputs 4602678819172646912 on x86 native-code and 4602678819172646913 on all other targets.
let a = 0.5;;
let b = Int64.float_of_bits (Int64.of_string "4607182418800017407");;
let x = a /. b;;
print_endline (Int64.to_string (Int64.bits_of_float x));;
I'm not satisfied the explanation from bug 4161 that intermediate results are stored in extended precision. This is a simple division. What are the intermediate results?
Interestingly, I can work around the other floating point discrepencies by adding
let (+.) = (+.)
let (-.) = (-.)
let (.) = (.)
let (/.) = (/.)
to my program. But to work around this one, I have to write my own division code in C. And at this point, can I really trust multiplication...?
-Jeremy
The text was updated successfully, but these errors were encountered: