Float printing and round-trippability #7218

vicuna · 2016-04-08T10:12:57Z

Original bug ID: 7218
Reporter: @braibant
Status: acknowledged (set by @damiendoligez on 2017-04-14T14:36:18Z)
Resolution: open
Priority: normal
Severity: feature
Category: standard library
Related to: #4688
Monitored by: @ygrek @jmeber @dbuenzli @alainfrisch

Bug description

The stdlib functions float_of_string and string_of_float are not round-tripping by default.

# let x = string_of_float epsilon_float |> float_of_string in x = epsilon_float;;
- : bool = false

It would be nice if the output of string_of_float was guaranteed to be "correct" in the sense of [1,2] (that is, the output string always belongs to the interval described by the input float). It would also be nice if this output kept a "small" number of digits when possible.

It's possible to get an always correct result by changing
https://github.com/ocaml/ocaml/blob/trunk/otherlibs/threads/pervasives.ml#L266-L268
to increase the default precision from 12 to a sufficient value. However, this would not necessarily preserve the second desirable property of producing "small" outputs when possible.

(At this point, let me mention that it could make sense for the toplevel to also use an extended number of digits to avoid confusion.

# let x = float_of_string "0.1000000000000002" in x;;
- : float = 0.1 
# let x = float_of_string "0.1000000000000002" in let y = float_of_string "0.1" in x = y;;
- : bool = false

)

I am wondering to what extent it would make sense to embed a portable C float printing routine in the runtime? It could give more flexibility and make it easier to solve other small issues like the printing of float being locale dependent. I agree with X. Leroy in #6701 that it's an awful lot of code, but it might be worth the effort.

1 http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
2 http://cseweb.ucsd.edu/~lerner/papers/fp-printing-popl16.pdf

The text was updated successfully, but these errors were encountered:

vicuna · 2016-04-08T12:47:10Z

Comment author: @gasche

Note that the new hexadecimal representation in 4.03 allows pixel-perfect representation of float values, and thus roundtripping:

# Scanf.sscanf
  (Printf.sprintf "%h" epsilon_float)
  "%h" (fun x -> x = epsilon_float);;
- : bool = true

vicuna · 2016-04-08T12:55:15Z

Comment author: braibant

That was also the case before, using conversion to int64

# (epsilon_float |> Int64.bits_of_float |> Int64.to_string |> Int64.of_string |> Int64.float_of_bits) = epsilon_float;;

However, this is not really usable when humans need to be able to read and interpret those values (e.g., stored in configuration files).

vicuna · 2016-04-08T13:03:44Z

Comment author: @gasche

Well the hexadecimal notation is actually rather readable (see examples below) and should be used whenever you care about precision or specific values. But I don't disagree with your point -- I'd vote to wait for someone to be motivated to implement one of the shiny float-printing algorithm and see how it fares performance-wise.

# List.iter (fun x -> Printf.printf "%14.2f = %h\n" x x)
  [0.; -0.; epsilon_float;
   1.; -1.;
   1.3; 1.5;
   2. ** 32.; 2. ** 32. -. 1.];;

          0.00 = 0x0p+0
         -0.00 = -0x0p+0
          0.00 = 0x1p-52
          1.00 = 0x1p+0
         -1.00 = -0x1p+0
          1.30 = 0x1.4cccccccccccdp+0
          1.50 = 0x1.8p+0
 4294967296.00 = 0x1p+32
 4294967295.00 = 0x1.fffffffep+31

vicuna · 2016-04-08T13:15:18Z

Comment author: braibant

I will probably play a bit with the 2010 Grisu algorithm and see how it fares performance wise.

vicuna · 2016-04-08T13:41:35Z

Comment author: @dbuenzli

If someone wants to play note that the 2016 paper braibant linked to has MIT licensed code here:

https://github.com/marcandrysco/Errol

and uses only C. The code for the first paper is here https://github.com/google/double-conversion but uses cpp.

Having strtod/dtoa in the runtime system would solve the various locale dependency problems and help people that are running the OCaml system on bare metal or as virtual machines.

In the latter context it's a recurring problem that it would be nice to solve at this level, since after this and the snprintf usage through caml_alloc_sprintf, the remaining C that is needed to be able to compile the runtime system is quite minimal (see e.g. https://github.com/dbuenzli/rpi-boot-ocaml/tree/master/libc-ocaml).

vicuna · 2016-04-08T14:19:59Z

Comment author: braibant

@dbuenzli Actually, this is a good point: apparently, Grisu is hard to build right (which is why Errol was initially measured to be faster). Maybe the ease of integration should weight more than the relative performance between the two.

vicuna · 2016-04-08T14:45:44Z

Comment author: @dbuenzli

Rather than performance the advantage I would see to Grisu is that it seems to have been widely adopted by JavaScript engines which means that it's very well tested.

However Errol's code base seems more approachable (and I guess C++ is simply a no go for the runtime system) and according to the paper we'd still get significant speed ups unless the system's libc libraries switched to Grisu (I don't know).

vicuna · 2016-04-11T15:29:36Z

Comment author: @stedolan

Well the hexadecimal notation is actually rather readable (see examples below) and should be used whenever you care about precision or specific values.

I disagree somewhat with the readability point, and entirely with the precision point: the value of every finite IEEE754 double can be represented precisely as a finite decimal string. (The converse, that every finite decimal string can be represented precisely as a double, is of course not true).

mroch · 2021-02-10T01:59:46Z

just stumbled across this, I don't know how I missed it before. I ported Grisu / double-conversion from C++ to C here: https://github.com/flowtype/ocaml-dtoa

would be awesome to have in stdlib instead

gasche · 2021-02-10T07:09:56Z

Are there some performance measurements available for ocaml-dtoa?

Looking at the code, it seems to be a hard sell:

The implementation looks complex (but I guess all those implementations are complex anyway).
It is not complete, it will fail on some floating-point values, so we need a fallback algorithm (in particular, using the system/libc formatter as a fallback negates the consistency benefits of switching to a non-system implementation).

Looking at the Errol paper, it looks like Grisu3 does not actually suffer from a hard failure on those inputs, just that it generates a suboptimal result with more numbers than necessary. Maybe that would actually be okay.

github-actions · 2022-02-14T04:25:14Z

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

dbuenzli · 2022-07-24T23:17:07Z

Just for the sake of completeness, for float_of_string there is also

https://arxiv.org/pdf/2101.11408.pdf (https://github.com/fastfloat/fast_float)

(Pointed to me by @let-def, as I was contemplating a profiling trace that spent 50% of the time in macOS's strtod function – of course the problem is rather that people should not use XML to serialize a million of floats :-).

edwintorok · 2022-12-24T23:41:10Z

Duplicate of #11360, there are many issues discussing float printing precision (including #10744), and there was a small documentation improvement merged recently #11353.
Probably best to link these issues together (might need someone with write access to mark as duplicate according to https://docs.github.com/en/issues/tracking-your-work-with-issues/marking-issues-or-pull-requests-as-a-duplicate), keep only one issue open and close the others.

nojb · 2022-12-25T02:18:33Z

Duplicate of #11360

vicuna added the stdlib label Mar 14, 2019

vicuna mentioned this issue Sep 24, 2017

Special floating-point values aren't converted to strings correctly under Windows #4688

Closed

vicuna added the feature-wish label Mar 20, 2019

github-actions bot added the Stale label Feb 14, 2022

gasche removed the Stale label Feb 14, 2022

dbuenzli mentioned this issue Jul 20, 2022

Increase the precision used for string_of_float #11360

Closed

nojb marked this as a duplicate of #11360 Dec 25, 2022

nojb closed this as not planned Won't fix, can't repro, duplicate, stale Dec 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float printing and round-trippability #7218

Float printing and round-trippability #7218

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 11, 2016

mroch commented Feb 10, 2021

gasche commented Feb 10, 2021 •

edited

github-actions bot commented Feb 14, 2022

dbuenzli commented Jul 24, 2022

edwintorok commented Dec 24, 2022

nojb commented Dec 25, 2022

Float printing and round-trippability #7218

Float printing and round-trippability #7218

Comments

vicuna commented Apr 8, 2016 • edited by damiendoligez

Bug description

vicuna commented Apr 8, 2016 • edited by damiendoligez

vicuna commented Apr 8, 2016 • edited by damiendoligez

vicuna commented Apr 8, 2016 • edited by damiendoligez

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 8, 2016

vicuna commented Apr 11, 2016

mroch commented Feb 10, 2021

gasche commented Feb 10, 2021 • edited

github-actions bot commented Feb 14, 2022

dbuenzli commented Jul 24, 2022

edwintorok commented Dec 24, 2022

nojb commented Dec 25, 2022

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

vicuna commented Apr 8, 2016 •

edited by damiendoligez

gasche commented Feb 10, 2021 •

edited