Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004558OCamlOCaml generalpublic2008-06-02 22:362010-04-29 14:25
Reporterpzimmer 
Assigned Toxleroy 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusclosedResolutionopen 
PlatformOSOS Version
Product Version3.10.2 
Target VersionFixed in Version 
Summary0004558: Compiler patch to avoid useless float boxing
DescriptionI would like to propose two patches to the compiler that prevent some useless boxing and unboxing of intermediate floats.

The following code:

let f x = x -. 3.
let g x = 2. *. f (x +. 1.)

is compiled into the following CMM code:

(function camlTest13__g_63 (x/64: addr)
 (alloc 2301
   (*f 2.
     (load float64u
       (let x/192 (+f (load float64u x/64) 1.) (alloc 2301 (-f x/192 3.)))))))

While the compiler generally optimizes patterns like "unbox (box v) -> v", here it does not work because of the let that appears inbetween. This happens every time a function is called with a non-trivial argument and returns a float that is used for further computations. The attached patch diff1.txt is a trivial fix to implement the optimization over the "let". I am fairly convinced this patch is correct. FYI, in the previous example, it generates the following optimized code:

(function camlTest13__g_64 (x/65: addr)
 (alloc 2301 (*f 2. (let x/190 (+f (load float64u x/65) 1.) (-f x/190 3.)))))


Here is the second example:

let min (x : float) y = if x < y then x else y
let f x = min (x +. 1.) (x *. 2.)

(function camlTest13__f_108 (x/109: addr)
 (let
   (y/181 (*f (load float64u x/109) 2.) y/174 (alloc 2301 y/181)
    x/182 (+f (load float64u x/109) 1.) x/175 (alloc 2301 x/182))
   (if (<f x/182 y/181) x/175 y/174)))

When the compiler needs to access both the value of a float and the same boxed float (usually as a return value), it generates code like: "let z = float_value in let zbox = box(z) in body". Note in the example above that two boxed floats are always allocated, even though we will only use one of them. Moreover, if "unbox(zbox)" appears in "body" (because of inlining), we could optimize by expanding the binding. Generally speaking, whenever "zbox" appears only once in "body" (but not in a loop), it is strictly better to expand the definition (if it appears twice, it might also be better to expand twice and potentially allocate twice but it is harder to know in advance).

The second proposed patch (see file diff2.txt, includes diff1.txt) implements this idea. Here is the new code for the example:

(function camlTest13__f_109 (x/110: addr)
 (let
   (y/179 (*f (load float64u x/110) 2.) x/180 (+f (load float64u x/110) 1.))
   (if (<f x/180 y/179) (alloc 2301 x/180) (alloc 2301 y/179))))

All my test cases worked fine with this patch, but some other people should definitely consider it carefully before including it.
TagsNo tags attached.
Attached Filestxt file icon diff1.txt [^] (1,025 bytes) 2008-06-02 22:36 [Show Content]
txt file icon diff2.txt [^] (5,817 bytes) 2008-06-02 22:36 [Show Content]

- Relationships

-  Notes
(0004520)
mottl (reporter)
2008-06-11 21:02

It seems there are a few more cases one can optimise in diff1.txt. Besides Clet, Cifthenelse, and Csequence, which are already considered, it would be great to also optimise Cswitch, Ctrywith and Ccatch (and any other cases we may have missed).
(0004571)
xleroy (administrator)
2008-08-05 15:45

Thanks for two very interesting suggestions. The first (unboxing across let and others) is really sweet; well spotted! I think it is almost always a win -- I can see a corner case on x86 32 bits where pushing the unbox down the branches of a conditional would be less efficient (because this platform has no float registers), but that's certainly negligible compared with other benefits. I have integrated it in the CVS working sources, generalized it to a few more constructs as suggested by Markus (but I don't think this makes much of a difference in practice), and applied it also to the unboxing of boxed integers.

I'm still evaluating the second suggestion (delayed boxing). For "min"-like functions, it's a win. For some straight-line functions, it can generate slightly less efficient code. Consider:

let test fn x =
  let y = x +. 1.0 and z = x *. 2.0 in
  fn z (fn x y)

If "x" and "y" are boxed at the point of "let", the two allocations are coalesced, resulting in just one test whether to call the GC. If the boxing is performed at point of use, we end up with two allocations in two different basic blocks, no coalescing, and two GC tests. On my (limited) tests, this doesn't seem to happen often: the number of GC check points globally decreases, but without noticeable performance gains. I keep this PR open until I have more data.



- Issue History
Date Modified Username Field Change
2008-06-02 22:36 pzimmer New Issue
2008-06-02 22:36 pzimmer File Added: diff1.txt
2008-06-02 22:36 pzimmer File Added: diff2.txt
2008-06-11 21:02 mottl Note Added: 0004520
2008-08-01 14:55 xleroy Status new => assigned
2008-08-01 14:55 xleroy Assigned To => xleroy
2008-08-05 15:45 xleroy Note Added: 0004571
2008-08-05 15:45 xleroy Status assigned => resolved
2010-04-29 14:25 xleroy Status resolved => closed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker