Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float printing and round-trippability #7218

Closed
vicuna opened this issue Apr 8, 2016 · 14 comments
Closed

Float printing and round-trippability #7218

vicuna opened this issue Apr 8, 2016 · 14 comments

Comments

@vicuna
Copy link

vicuna commented Apr 8, 2016

Original bug ID: 7218
Reporter: @braibant
Status: acknowledged (set by @damiendoligez on 2017-04-14T14:36:18Z)
Resolution: open
Priority: normal
Severity: feature
Category: standard library
Related to: #4688
Monitored by: @ygrek @jmeber @dbuenzli @alainfrisch

Bug description

The stdlib functions float_of_string and string_of_float are not round-tripping by default.

# let x = string_of_float epsilon_float |> float_of_string in x = epsilon_float;;
- : bool = false

It would be nice if the output of string_of_float was guaranteed to be "correct" in the sense of [1,2] (that is, the output string always belongs to the interval described by the input float). It would also be nice if this output kept a "small" number of digits when possible.

It's possible to get an always correct result by changing
https://github.com/ocaml/ocaml/blob/trunk/otherlibs/threads/pervasives.ml#L266-L268
to increase the default precision from 12 to a sufficient value. However, this would not necessarily preserve the second desirable property of producing "small" outputs when possible.

(At this point, let me mention that it could make sense for the toplevel to also use an extended number of digits to avoid confusion.

# let x = float_of_string "0.1000000000000002" in x;;
- : float = 0.1 
# let x = float_of_string "0.1000000000000002" in let y = float_of_string "0.1" in x = y;;
- : bool = false    

)

I am wondering to what extent it would make sense to embed a portable C float printing routine in the runtime? It could give more flexibility and make it easier to solve other small issues like the printing of float being locale dependent. I agree with X. Leroy in #6701 that it's an awful lot of code, but it might be worth the effort.

1 http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
2 http://cseweb.ucsd.edu/~lerner/papers/fp-printing-popl16.pdf

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: @gasche

Note that the new hexadecimal representation in 4.03 allows pixel-perfect representation of float values, and thus roundtripping:

# Scanf.sscanf
  (Printf.sprintf "%h" epsilon_float)
  "%h" (fun x -> x = epsilon_float);;
- : bool = true

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: braibant

That was also the case before, using conversion to int64

# (epsilon_float |> Int64.bits_of_float |> Int64.to_string |> Int64.of_string |> Int64.float_of_bits) = epsilon_float;;

However, this is not really usable when humans need to be able to read and interpret those values (e.g., stored in configuration files).

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: @gasche

Well the hexadecimal notation is actually rather readable (see examples below) and should be used whenever you care about precision or specific values. But I don't disagree with your point -- I'd vote to wait for someone to be motivated to implement one of the shiny float-printing algorithm and see how it fares performance-wise.

# List.iter (fun x -> Printf.printf "%14.2f = %h\n" x x)
  [0.; -0.; epsilon_float;
   1.; -1.;
   1.3; 1.5;
   2. ** 32.; 2. ** 32. -. 1.];;

          0.00 = 0x0p+0
         -0.00 = -0x0p+0
          0.00 = 0x1p-52
          1.00 = 0x1p+0
         -1.00 = -0x1p+0
          1.30 = 0x1.4cccccccccccdp+0
          1.50 = 0x1.8p+0
 4294967296.00 = 0x1p+32
 4294967295.00 = 0x1.fffffffep+31

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: braibant

I will probably play a bit with the 2010 Grisu algorithm and see how it fares performance wise.

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: @dbuenzli

If someone wants to play note that the 2016 paper braibant linked to has MIT licensed code here:

https://github.com/marcandrysco/Errol

and uses only C. The code for the first paper is here https://github.com/google/double-conversion but uses cpp.

Having strtod/dtoa in the runtime system would solve the various locale dependency problems and help people that are running the OCaml system on bare metal or as virtual machines.

In the latter context it's a recurring problem that it would be nice to solve at this level, since after this and the snprintf usage through caml_alloc_sprintf, the remaining C that is needed to be able to compile the runtime system is quite minimal (see e.g. https://github.com/dbuenzli/rpi-boot-ocaml/tree/master/libc-ocaml).

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: braibant

@dbuenzli Actually, this is a good point: apparently, Grisu is hard to build right (which is why Errol was initially measured to be faster). Maybe the ease of integration should weight more than the relative performance between the two.

@vicuna
Copy link
Author

vicuna commented Apr 8, 2016

Comment author: @dbuenzli

Rather than performance the advantage I would see to Grisu is that it seems to have been widely adopted by JavaScript engines which means that it's very well tested.

However Errol's code base seems more approachable (and I guess C++ is simply a no go for the runtime system) and according to the paper we'd still get significant speed ups unless the system's libc libraries switched to Grisu (I don't know).

@vicuna
Copy link
Author

vicuna commented Apr 11, 2016

Comment author: @stedolan

Well the hexadecimal notation is actually rather readable (see examples below) and should be used whenever you care about precision or specific values.

I disagree somewhat with the readability point, and entirely with the precision point: the value of every finite IEEE754 double can be represented precisely as a finite decimal string. (The converse, that every finite decimal string can be represented precisely as a double, is of course not true).

@mroch
Copy link

mroch commented Feb 10, 2021

just stumbled across this, I don't know how I missed it before. I ported Grisu / double-conversion from C++ to C here: https://github.com/flowtype/ocaml-dtoa

would be awesome to have in stdlib instead

@gasche
Copy link
Member

gasche commented Feb 10, 2021

Are there some performance measurements available for ocaml-dtoa?

Looking at the code, it seems to be a hard sell:

  • The implementation looks complex (but I guess all those implementations are complex anyway).
  • It is not complete, it will fail on some floating-point values, so we need a fallback algorithm (in particular, using the system/libc formatter as a fallback negates the consistency benefits of switching to a non-system implementation).

Looking at the Errol paper, it looks like Grisu3 does not actually suffer from a hard failure on those inputs, just that it generates a suboptimal result with more numbers than necessary. Maybe that would actually be okay.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@dbuenzli
Copy link
Contributor

Just for the sake of completeness, for float_of_string there is also

https://arxiv.org/pdf/2101.11408.pdf (https://github.com/fastfloat/fast_float)

(Pointed to me by @let-def, as I was contemplating a profiling trace that spent 50% of the time in macOS's strtod function – of course the problem is rather that people should not use XML to serialize a million of floats :-).

@edwintorok
Copy link
Contributor

Duplicate of #11360, there are many issues discussing float printing precision (including #10744), and there was a small documentation improvement merged recently #11353.
Probably best to link these issues together (might need someone with write access to mark as duplicate according to https://docs.github.com/en/issues/tracking-your-work-with-issues/marking-issues-or-pull-requests-as-a-duplicate), keep only one issue open and close the others.

@nojb
Copy link
Contributor

nojb commented Dec 25, 2022

Duplicate of #11360

@nojb nojb marked this as a duplicate of #11360 Dec 25, 2022
@nojb nojb closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants