You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I guess that the current strategy for the x87 fp stack is to use it only within a basic block, and it would probably be quite difficult to change that in general. Fragments of numerical code which have branches only because of check bounds could probably benefit a little bit from optimizing cases such as above. The spill above can be avoided by the attached patch, which doesn't bind the rhs of the assignment (i.e. the unboxing code) before the checkbound. The patch is quite restrictive, it should be ok to do so as long as the rhs cannot raise an exception.
let f a (x : float) =
for i = 1 to 1000 do
for j = 0 to Array.length a - 1 do
a.(j) <- x;
a.(j) <- x;
a.(j) <- x;
a.(j) <- x
done
done
let () =
let a = Array.make 1024 0. in
for i = 1 to 1000 do
f a (float i)
done
gives the following results:
Before patch, -unsafe mode: 1.8s
Before patch, safe mode: 3.2s
After patch, -unsafe mode: 1.8s
After patch, safe mode: 2.1s
which shows that most of the overhead for bound checks comes from the spilling overhead.
I strongly advocate inclusion of this patch as it gives big speedups (on x86) of a very common and frequent operation in numeric code (in-place modification of float array with "simple" values) when you still want to benefit from bounds checking.
We noted this behavior during a more general initiative in our company to try to implement some numerical kernel routines "back" in OCaml (instead of having them as external C routines) but without loosing too much speed (and yes, we have to support x86 platform for now).
Thanks for merging. That said, the patch only covers a very specific case and the problem might be more general. I wonder whether we should keep the ticket open for this reason.
Original bug ID: 6924
Reporter: @alainfrisch
Assigned to: @gasche
Status: closed (set by @xavierleroy on 2017-02-16T14:15:02Z)
Resolution: fixed
Priority: low
Severity: tweak
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: back end (clambda to assembly)
Monitored by: @gasche @jmeber
Bug description
The code generated for
on x86 (32-bit) unboxes the float argument, spills it, does the checkbound, and loads the float back from the stack before putting it in the array:
I guess that the current strategy for the x87 fp stack is to use it only within a basic block, and it would probably be quite difficult to change that in general. Fragments of numerical code which have branches only because of check bounds could probably benefit a little bit from optimizing cases such as above. The spill above can be avoided by the attached patch, which doesn't bind the rhs of the assignment (i.e. the unboxing code) before the checkbound. The patch is quite restrictive, it should be ok to do so as long as the rhs cannot raise an exception.
The code generated after the patch is:
A very rough micro-benchmark on the code below:
gives the following results:
which shows that most of the overhead for bound checks comes from the spilling overhead.
File attachments
The text was updated successfully, but these errors were encountered: