You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While the compiler generally optimizes patterns like "unbox (box v) -> v", here it does not work because of the let that appears inbetween. This happens every time a function is called with a non-trivial argument and returns a float that is used for further computations. The attached patch diff1.txt is a trivial fix to implement the optimization over the "let". I am fairly convinced this patch is correct. FYI, in the previous example, it generates the following optimized code:
When the compiler needs to access both the value of a float and the same boxed float (usually as a return value), it generates code like: "let z = float_value in let zbox = box(z) in body". Note in the example above that two boxed floats are always allocated, even though we will only use one of them. Moreover, if "unbox(zbox)" appears in "body" (because of inlining), we could optimize by expanding the binding. Generally speaking, whenever "zbox" appears only once in "body" (but not in a loop), it is strictly better to expand the definition (if it appears twice, it might also be better to expand twice and potentially allocate twice but it is harder to know in advance).
The second proposed patch (see file diff2.txt, includes diff1.txt) implements this idea. Here is the new code for the example:
It seems there are a few more cases one can optimise in diff1.txt. Besides Clet, Cifthenelse, and Csequence, which are already considered, it would be great to also optimise Cswitch, Ctrywith and Ccatch (and any other cases we may have missed).
Thanks for two very interesting suggestions. The first (unboxing across let and others) is really sweet; well spotted! I think it is almost always a win -- I can see a corner case on x86 32 bits where pushing the unbox down the branches of a conditional would be less efficient (because this platform has no float registers), but that's certainly negligible compared with other benefits. I have integrated it in the CVS working sources, generalized it to a few more constructs as suggested by Markus (but I don't think this makes much of a difference in practice), and applied it also to the unboxing of boxed integers.
I'm still evaluating the second suggestion (delayed boxing). For "min"-like functions, it's a win. For some straight-line functions, it can generate slightly less efficient code. Consider:
let test fn x =
let y = x +. 1.0 and z = x *. 2.0 in
fn z (fn x y)
If "x" and "y" are boxed at the point of "let", the two allocations are coalesced, resulting in just one test whether to call the GC. If the boxing is performed at point of use, we end up with two allocations in two different basic blocks, no coalescing, and two GC tests. On my (limited) tests, this doesn't seem to happen often: the number of GC check points globally decreases, but without noticeable performance gains. I keep this PR open until I have more data.
Original bug ID: 4558
Reporter: pzimmer
Assigned to: @xavierleroy
Status: closed (set by @xavierleroy on 2010-04-29T12:25:51Z)
Resolution: open
Priority: normal
Severity: feature
Version: 3.10.2
Category: ~DO NOT USE (was: OCaml general)
Monitored by: wagerlabs @lefessan dvaillancourt BenediktGrundmann @mshinwell sweeks sds @hcarty @dbuenzli yminsky @Chris00 @yakobowski pzimmer @alainfrisch @mmottl
Bug description
I would like to propose two patches to the compiler that prevent some useless boxing and unboxing of intermediate floats.
The following code:
let f x = x -. 3.
let g x = 2. *. f (x +. 1.)
is compiled into the following CMM code:
(function camlTest13__g_63 (x/64: addr)
(alloc 2301
(*f 2.
(load float64u
(let x/192 (+f (load float64u x/64) 1.) (alloc 2301 (-f x/192 3.)))))))
While the compiler generally optimizes patterns like "unbox (box v) -> v", here it does not work because of the let that appears inbetween. This happens every time a function is called with a non-trivial argument and returns a float that is used for further computations. The attached patch diff1.txt is a trivial fix to implement the optimization over the "let". I am fairly convinced this patch is correct. FYI, in the previous example, it generates the following optimized code:
(function camlTest13__g_64 (x/65: addr)
(alloc 2301 (*f 2. (let x/190 (+f (load float64u x/65) 1.) (-f x/190 3.)))))
Here is the second example:
let min (x : float) y = if x < y then x else y
let f x = min (x +. 1.) (x *. 2.)
(function camlTest13__f_108 (x/109: addr)
(let
(y/181 (*f (load float64u x/109) 2.) y/174 (alloc 2301 y/181)
x/182 (+f (load float64u x/109) 1.) x/175 (alloc 2301 x/182))
(if (<f x/182 y/181) x/175 y/174)))
When the compiler needs to access both the value of a float and the same boxed float (usually as a return value), it generates code like: "let z = float_value in let zbox = box(z) in body". Note in the example above that two boxed floats are always allocated, even though we will only use one of them. Moreover, if "unbox(zbox)" appears in "body" (because of inlining), we could optimize by expanding the binding. Generally speaking, whenever "zbox" appears only once in "body" (but not in a loop), it is strictly better to expand the definition (if it appears twice, it might also be better to expand twice and potentially allocate twice but it is harder to know in advance).
The second proposed patch (see file diff2.txt, includes diff1.txt) implements this idea. Here is the new code for the example:
(function camlTest13__f_109 (x/110: addr)
(let
(y/179 (*f (load float64u x/110) 2.) x/180 (+f (load float64u x/110) 1.))
(if (<f x/180 y/179) (alloc 2301 x/180) (alloc 2301 y/179))))
All my test cases worked fine with this patch, but some other people should definitely consider it carefully before including it.
File attachments
The text was updated successfully, but these errors were encountered: