New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float printing and round-trippability #7218
Comments
Comment author: @gasche Note that the new hexadecimal representation in 4.03 allows pixel-perfect representation of float values, and thus roundtripping: # Scanf.sscanf
(Printf.sprintf "%h" epsilon_float)
"%h" (fun x -> x = epsilon_float);;
- : bool = true |
Comment author: braibant That was also the case before, using conversion to int64 # (epsilon_float |> Int64.bits_of_float |> Int64.to_string |> Int64.of_string |> Int64.float_of_bits) = epsilon_float;; However, this is not really usable when humans need to be able to read and interpret those values (e.g., stored in configuration files). |
Comment author: @gasche Well the hexadecimal notation is actually rather readable (see examples below) and should be used whenever you care about precision or specific values. But I don't disagree with your point -- I'd vote to wait for someone to be motivated to implement one of the shiny float-printing algorithm and see how it fares performance-wise. # List.iter (fun x -> Printf.printf "%14.2f = %h\n" x x)
[0.; -0.; epsilon_float;
1.; -1.;
1.3; 1.5;
2. ** 32.; 2. ** 32. -. 1.];;
0.00 = 0x0p+0
-0.00 = -0x0p+0
0.00 = 0x1p-52
1.00 = 0x1p+0
-1.00 = -0x1p+0
1.30 = 0x1.4cccccccccccdp+0
1.50 = 0x1.8p+0
4294967296.00 = 0x1p+32
4294967295.00 = 0x1.fffffffep+31 |
Comment author: braibant I will probably play a bit with the 2010 Grisu algorithm and see how it fares performance wise. |
Comment author: @dbuenzli If someone wants to play note that the 2016 paper braibant linked to has MIT licensed code here: https://github.com/marcandrysco/Errol and uses only C. The code for the first paper is here https://github.com/google/double-conversion but uses cpp. Having strtod/dtoa in the runtime system would solve the various locale dependency problems and help people that are running the OCaml system on bare metal or as virtual machines. In the latter context it's a recurring problem that it would be nice to solve at this level, since after this and the snprintf usage through caml_alloc_sprintf, the remaining C that is needed to be able to compile the runtime system is quite minimal (see e.g. https://github.com/dbuenzli/rpi-boot-ocaml/tree/master/libc-ocaml). |
Comment author: braibant @dbuenzli Actually, this is a good point: apparently, Grisu is hard to build right (which is why Errol was initially measured to be faster). Maybe the ease of integration should weight more than the relative performance between the two. |
Comment author: @dbuenzli Rather than performance the advantage I would see to Grisu is that it seems to have been widely adopted by JavaScript engines which means that it's very well tested. However Errol's code base seems more approachable (and I guess C++ is simply a no go for the runtime system) and according to the paper we'd still get significant speed ups unless the system's libc libraries switched to Grisu (I don't know). |
Comment author: @stedolan
I disagree somewhat with the readability point, and entirely with the precision point: the value of every finite IEEE754 double can be represented precisely as a finite decimal string. (The converse, that every finite decimal string can be represented precisely as a double, is of course not true). |
just stumbled across this, I don't know how I missed it before. I ported Grisu / double-conversion from C++ to C here: https://github.com/flowtype/ocaml-dtoa would be awesome to have in stdlib instead |
Are there some performance measurements available for ocaml-dtoa? Looking at the code, it seems to be a hard sell:
Looking at the Errol paper, it looks like Grisu3 does not actually suffer from a hard failure on those inputs, just that it generates a suboptimal result with more numbers than necessary. Maybe that would actually be okay. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Just for the sake of completeness, for https://arxiv.org/pdf/2101.11408.pdf (https://github.com/fastfloat/fast_float) (Pointed to me by @let-def, as I was contemplating a profiling trace that spent 50% of the time in macOS's |
Duplicate of #11360, there are many issues discussing float printing precision (including #10744), and there was a small documentation improvement merged recently #11353. |
Duplicate of #11360 |
Original bug ID: 7218
Reporter: @braibant
Status: acknowledged (set by @damiendoligez on 2017-04-14T14:36:18Z)
Resolution: open
Priority: normal
Severity: feature
Category: standard library
Related to: #4688
Monitored by: @ygrek @jmeber @dbuenzli @alainfrisch
Bug description
The stdlib functions float_of_string and string_of_float are not round-tripping by default.
It would be nice if the output of
string_of_float
was guaranteed to be "correct" in the sense of [1,2] (that is, the output string always belongs to the interval described by the input float). It would also be nice if this output kept a "small" number of digits when possible.It's possible to get an always correct result by changing
https://github.com/ocaml/ocaml/blob/trunk/otherlibs/threads/pervasives.ml#L266-L268
to increase the default precision from 12 to a sufficient value. However, this would not necessarily preserve the second desirable property of producing "small" outputs when possible.
(At this point, let me mention that it could make sense for the toplevel to also use an extended number of digits to avoid confusion.
)
I am wondering to what extent it would make sense to embed a portable C float printing routine in the runtime? It could give more flexibility and make it easier to solve other small issues like the printing of float being locale dependent. I agree with X. Leroy in #6701 that it's an awful lot of code, but it might be worth the effort.
1 http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
2 http://cseweb.ucsd.edu/~lerner/papers/fp-printing-popl16.pdf
The text was updated successfully, but these errors were encountered: