Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007724OCamlruntime system and C interfacepublic2018-02-12 16:552018-02-19 10:33
Reportersbleazard 
Assigned To 
PrioritylowSeverityfeatureReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0007724: Performance improvements when printing integers stored in floats
DescriptionWhen converting numbers to strings string_of_float has a significant performance impact on the X86 architecture when the number is an INT. Detecting INTs and using string_of_int significantly improves performance. Thus, using

let f2s f =
  let i = int_of_float f in
  let f1 = float_of_int i in
  if f1 = f then string_of_int i
  else string_of_float f

Has a cost of around 6.5% for floats but results in around 4x performance improvement for INTs. Here are comparisons of straight string_of_float and f2s for 1,000,000 conversions on a 64bit X86 machine:

string_of_float

1,552,011,783 cycles
2,958,207,469 instructions

f2s

  363,014,703 cycles
  945,099,124 instructions

Similar improvements would be expected on other architectures.
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0018871)
xclerc (reporter)
2018-02-13 14:54
edited on: 2018-02-13 15:07

We can substantially lower the overhead by doing
the check in C (thus avoiding to tag/untag the
integer value) with something along the lines of:

--- a/byterun/floats.c
+++ b/byterun/floats.c
@@ -95,11 +95,17 @@ CAMLprim value caml_format_float(value fmt, value arg)
 {
   value res;
   double d = Double_val(arg);
+ intnat i;
 
 #ifdef HAS_BROKEN_PRINTF
   if (isfinite(d)) {
 #endif
- res = caml_alloc_sprintf(String_val(fmt), d);
+ i = (intnat) d;
+ if (d == (double) i) {
+ res = caml_alloc_sprintf("%ld", i); /* TODO: should use ARCH_INTNAT_PRINTF_FORMAT */
+ } else {
+ res = caml_alloc_sprintf(String_val(fmt), d);
+ }
 #ifdef HAS_BROKEN_PRINTF
   } else {
     if (isnan(d)) {


With a dummy test program generating and converting
numbers, I get:

old implementation with "really float" values:

    24,129,906,778 cycles
    37,797,606,083 instructions

new implementation with "really float" values:

    24,079,087,367 cycles
    37,844,377,075 instructions

old implementation with "actually int" values:

    12,010,671,909 cycles
    23,996,041,476 instructions

new implementation with "actually int" values:

     4,777,066,875 cycles
    10,938,940,262 instructions

(0018873)
gasche (developer)
2018-02-13 17:21

I think that with either proposal the output changes: (string_of_float 1.) returns "1.", not "1". Can you add the additional dot in your measurements?

(Also, if I read the cycle numbers correct, the patch also decreased the cycle count (but not the instruction count) on "really float" values, how is this possible?)
(0018874)
xclerc (reporter)
2018-02-13 23:33
edited on: 2018-02-13 23:53

> I think that with either proposal the output changes: (string_of_float 1.) returns "1.", not "1". Can you add the additional dot in your measurements?

I beg to disagree; the patch to `caml_format_float` applying
changes only to the C side, the returned result goes through
`valid_float_lexem` that will add the missing dot (because
the libc would not -- `printf("%.12g", 1.)` outputs "1").

> (Also, if I read the cycle numbers correct, the patch also decreased the cycle count (but not the instruction count) on "really float" values, how is this possible?)

I will not go as far as to say it did not surprise me, but
given that modern CPUs have multiple (yet non-uniform)
ALUs, my understanding is that we should not expect to
observe a constant IPC (instructions per cycle).

(0018881)
xleroy (administrator)
2018-02-17 18:02

The patch to caml_format_float proposed by @xclerc is incorrect: the required format (e.g. "%5.2f") is not honored in the it-is-an-integer path.

The patch proposed by @sbleazard is probably correct. One could wonder why this optimization is not done by the C standard library functions, if it gives such impressive speedups. On the other hand, sprintf has to cope with all sorts of FP formats, so the optimization would be harder to exploit.

Speaking of the C standard library: which version is used for the timings? Performance of e.g. glibc and MSVC-CRT differ widely...
(0018889)
xclerc (reporter)
2018-02-19 10:33

> The patch to caml_format_float proposed by @xclerc is incorrect: the required format (e.g. "%5.2f") is not honored in the it-is-an-integer path.

Of course; the patch was not meant to be a drop-in replacement for
`caml_format_float` in general, but only for `caml_format_float` when
used to implement `string_of_float` (where the passed format is
always "%.12g").


> Speaking of the C standard library: which version is used for the timings? Performance of e.g. glibc and MSVC-CRT differ widely...

I used glibc (2.17).

- Issue History
Date Modified Username Field Change
2018-02-12 16:55 sbleazard New Issue
2018-02-13 14:54 xclerc Note Added: 0018871
2018-02-13 15:07 xclerc Note Edited: 0018871 View Revisions
2018-02-13 15:07 xclerc Note Edited: 0018871 View Revisions
2018-02-13 17:21 gasche Note Added: 0018873
2018-02-13 23:33 xclerc Note Added: 0018874
2018-02-13 23:53 xclerc Note Edited: 0018874 View Revisions
2018-02-17 18:02 xleroy Note Added: 0018881
2018-02-17 18:02 xleroy Status new => acknowledged
2018-02-19 10:33 xclerc Note Added: 0018889


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker