New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements when printing integers stored in floats #7724
Comments
Comment author: @xclerc We can substantially lower the overhead by doing --- a/byterun/floats.c
+++ b/byterun/floats.c
@@ -95,11 +95,17 @@ CAMLprim value caml_format_float(value fmt, value arg)
{
value res;
double d = Double_val(arg);
+ intnat i;
#ifdef HAS_BROKEN_PRINTF
if (isfinite(d)) {
#endif
- res = caml_alloc_sprintf(String_val(fmt), d);
+ i = (intnat) d;
+ if (d == (double) i) {
+ res = caml_alloc_sprintf("%ld", i); /* TODO: should use ARCH_INTNAT_PRINTF_FORMAT */
+ } else {
+ res = caml_alloc_sprintf(String_val(fmt), d);
+ }
#ifdef HAS_BROKEN_PRINTF
} else {
if (isnan(d)) { With a dummy test program generating and converting old implementation with "really float" values:
new implementation with "really float" values:
old implementation with "actually int" values:
new implementation with "actually int" values:
|
Comment author: @gasche I think that with either proposal the output changes: (string_of_float 1.) returns "1.", not "1". Can you add the additional dot in your measurements? (Also, if I read the cycle numbers correct, the patch also decreased the cycle count (but not the instruction count) on "really float" values, how is this possible?) |
Comment author: @xclerc
I beg to disagree; the patch to
I will not go as far as to say it did not surprise me, but |
Comment author: @xavierleroy The patch to caml_format_float proposed by @xclerc is incorrect: the required format (e.g. "%5.2f") is not honored in the it-is-an-integer path. The patch proposed by @sbleazard is probably correct. One could wonder why this optimization is not done by the C standard library functions, if it gives such impressive speedups. On the other hand, sprintf has to cope with all sorts of FP formats, so the optimization would be harder to exploit. Speaking of the C standard library: which version is used for the timings? Performance of e.g. glibc and MSVC-CRT differ widely... |
Comment author: @xclerc
Of course; the patch was not meant to be a drop-in replacement for
I used glibc (2.17). |
Comment author: sbleazard The flowing re-implements string_of_float in C and adding NaN/Infinity support CAMLprim value caml_string_of_float_fast_int(value arg)
{
CAMLparam1(arg);
CAMLlocal1(res);
double d = Double_val(arg);
long long int l;
switch (fpclassify(d)) {
case FP_NAN:
res = caml_copy_string("nan");
break;
case FP_INFINITE:
if (d > 0)
res = caml_copy_string("inf");
else
res = caml_copy_string("-inf");
break;
default:
l = (long long int)d;
if (d == (double)l)
res = caml_alloc_sprintf("%lld.", l);
else
res = caml_alloc_sprintf("%.12g", d);
break;
}
CAMLreturn(res);
} |
Comment author: sbleazard Bench testing caml_string_of_float_fast_int with Core_bench gives the following results for 1000 conversions bench tested over 10s:
Note that in the case of a floating point number the cost of the test is negligable (1.5%) in the case of small floats but string_of_float_fast_int is 6% faster with larger (11 digits before the decimal) floats. Same test but using 11111111111.1 instead of 1.1:
I suspect valid_float_lexem is adding a significant cost |
Comment author: sbleazard Mantis does not display boxes well, here's another run with the results in ascii:
Same test but using 11111111111.1 instead of 1.1
|
Comment author: @alainfrisch FWIW, we use a similar approach for speeding up printing of floats, also supporting floats which can be printed with few decimal digits. I was also surprised by the gain of the approach (in the fast path case), esp. for the MSVC libc, but also glibc to a lesser extent. Here is our current code, which implements printing with full precision: static char format_buffer[100] = "(";
CAMLprim value caml_lexem_format_float(value v_paren, value v_arg)
{
const int shift = 5;
double d = Double_val(v_arg);
char *p = &format_buffer[1], *pp;
long k;
int zeroes, n;
if (isfinite(d)) {
/* Fast path when the double can be printed with a small number
of decimal digits.
We multiply by a bit more than 100000 to handle numbers
such as -0.00121
*/
long i = (long)(d * (double) 100000.00001);
if ((double) i / (double) 100000. == d) {
int neg = 0;
if (i == 0) {
if (1/d > 0) return caml_copy_string("0.");
else if (Bool_val(v_paren)) return caml_copy_string("(-0.)");
else return caml_copy_string("-0.");
}
pp = p;
/* Print the '-' sign if needed */
if (i < 0) {
neg = 1;
*(pp++) = '-';
i = -i;
/* if i = -2^32 (resp. 64), then -i = i... */
if (i < 0) goto fallback;
}
/* Count trailing 0s */
k = i;
zeroes = 0;
while (k % 10 == 0 && zeroes < shift) { k /= 10; zeroes++; }
/* Remember the part without trailing 0s */
i = k;
/* Count actual digits to be printed */
n = 0;
while (k > 0) { k /= 10; n++; }
/* At least (shift+1) digits, including trailing 0s (to force at least one digit before '.') */
if (n + zeroes <= shift) n = shift + 1 - zeroes;
/* Move to the end, write the final \0 (and ')' if needed) */
pp += n;
n -= (shift - zeroes);
if (neg && Bool_val(v_paren)) {
*(pp + 1)= ')';
*(pp + 2) = 0;
p--; /* This will return the string starting at '(' */
} else {
*(pp + 1)= 0;
}
/* Write the fractional part */
while (zeroes < shift) {
zeroes++;
*(pp--) = '0' + (i % 10);
i /= 10;
}
*(pp--) = '.';
/* Write the integral part */
while (n > 1) {
*(pp--) = '0' + (i % 10);
i /= 10;
n--;
}
*(pp--) = '0' + i;
return caml_copy_string(p);
}
else {
fallback:
sprintf(p, "%.15g", d);
if (strtod(p, NULL) != d)
sprintf(p, "%.17g", d);
/* Add '.' if needed */
for( ; ; p++) {
if (*p == 0) { *p = '.'; *(++p) = 0; break;}
else if ((*p == '.') || (*p == 'e')) {break;}
}
if (format_buffer[1] == '-' && Bool_val(v_paren)) {
/* Add ')' if needed (the initial '(' is already there) */
while (*p != 0) { p++; };
*p = ')'; *(++p) = 0;
return caml_copy_string(format_buffer);
} else
return caml_copy_string(&format_buffer[1]);
}
} else {
/* Could use pre-allocated values for nan/infinity */
if (isnan(d)) return caml_copy_string("nan");
else if (d > 0) return caml_copy_string("infinity");
else return caml_copy_string("neg_infinity");
}
} Other possible ways to optimize such printing code:
|
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
I have the impression that this optimization could make a sizeable difference for many user programs. (Excepting that numbers are floats is unfortunately common these days thanks to Javascript and JSON.) There are several proposals in this thread, I like this one ( (@alainfrisch's code may be faster and appropriate in other cases as well, it is worth measuring, and there are new algorithms proposed from time to time to print floating-point numbers. But I'm less excited about having to review all this code and maintain our own float-printing implementation in the runtime.) |
Not tested, but I seems it changes the result of |
Ouch! I have not actually reviewed this code yet; so "seems correct" was a bit of an exaggeration. |
Original bug ID: 7724
Reporter: sbleazard
Status: acknowledged (set by @xavierleroy on 2018-02-17T17:02:46Z)
Resolution: open
Priority: low
Severity: feature
Category: runtime system and C interface
Monitored by: sbleazard @nojb @hhugo @gasche @hcarty
Bug description
When converting numbers to strings string_of_float has a significant performance impact on the X86 architecture when the number is an INT. Detecting INTs and using string_of_int significantly improves performance. Thus, using
Has a cost of around 6.5% for floats but results in around 4x performance improvement for INTs. Here are comparisons of straight string_of_float and f2s for 1,000,000 conversions on a 64bit X86 machine:
Similar improvements would be expected on other architectures.
The text was updated successfully, but these errors were encountered: