Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006242OCamlback end (clambda to assembly)public2013-11-18 16:332017-03-07 13:19
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
PlatformOSOS Version
Product Version 
Target VersionlaterFixed in Version 
Summary0006242: Better compilation of local functions only used for tail calls
DescriptionIt is a common idiom to define local functions to share a "continuation" (used in several branches of the current function). Example:

let foo x y =
  let ret n = x + n * y in
  if x mod 2 = 0 then ret 3
  else if y mod 2 = 0 then ret 4
  else 10

This requires to allocate a closure, even if it won't be called. I propose to detect such cases of local binding of a function which is only used, fully applied, in tail position within the body of the local binding. This can be compiled more efficiently by turning the function into a "staticcatch" handler. I attach a patch which implements this optimization at the level of the lambda code.


 - Avoid the allocation of the local closure.

 - Can benefit for function-wide register allocation and float unboxing.

 - More chance of inlining the whole function (because its body becomes simpler).

Possible drawbacks:

 - Slower compilation. The optimization is currently implemented as a single pass over the lambda code, plus an extra traversal to count the number of uses in tail position for each local binding to a function. It's possible to do a single traversal (but it's probably useless for hand-written code at least).

 - Remove chances of inlining the local function (currently, even when it is inlined, the closure is still allocated).

Micro-benchmark of the code above with:

let () =
  for k = 1 to 1000 do
    for i = 1 to 1000 do
      for j = 1 to 1000 do
        ignore (foo i j)

with -inline 0:
  BEFORE: 5.19s
  AFTER: 3.94s (25% speedup)

with -inline 10000:
  BEFORE: 4.98s
  AFTER: 3.04s (39% speedup)

Micro-benchmark of the code above with:

let () =
  for k = 1 to 1000 do
    for i = 1 to 1000 do
      for j = 1 to 1000 do
        ignore (foo 1 1)

with -inline 0:
  BEFORE: 5.36s
  AFTER: 3.88s (27% speedup)

with -inline 10000:
  BEFORE: 5.36s
  AFTER: 0.64s (88% speedup)

Possible improvements to the patch:

 - Detect the case where the "continuation" takes a single () argument (and remove it from the static handler). This is probably going to be mostly cosmetic.

 - Detect more "tail" contexts (in particular the r.h.s. of || and &&).

Note: the same effect could be achieved manually for critical code with "local exceptions" (0005879), but it doesn't seem they will be included.
Attached Filesdiff file icon local_continuations.diff [^] (3,682 bytes) 2013-11-18 16:33 [Show Content]
diff file icon local_continuations_2.diff [^] (4,584 bytes) 2013-12-05 09:37 [Show Content]
diff file icon local_continuations_3.diff [^] (4,978 bytes) 2015-06-15 10:46 [Show Content]

- Relationships
related to 0005879resolvedfrisch Static exception handlers (i.e. well-disciplined gotos!) 
related to 0007017confirmed Allow unboxing across "static fail" 

-  Notes
shinwell (developer)
2013-11-18 17:21

I wonder if Pierre Chambart's work on better inlining might subsume these cases. (In particular, I have a feeling that the return continuations might usually be small enough that it would be reasonable to inline them every time they occur.) If fully inlined, the definition of [ret] is dead code.
frisch (developer)
2013-11-18 17:33

It's true that a more aggressive inlining (and a fix to remove the closure allocation if it is not needed) could optimize away some cases, but in many cases I have in mind, there is a complex continuation (maybe called rarely enough) which won't be inlined.
chambart (developer)
2013-11-18 17:37

In those particular examples, the main limitation of the current compiler seems to
be its inability to eliminate the useless closure and for the second case, to inline
a function with a local closure.

An other way to improve this kind of code would be to do some lambda lifting. You
loose the possibility of sharing the context for register allocation, but you
could apply this even when the function is not tail called. For the unboxing,
this is doable at function interface, so it is not necessary lost.
frisch (developer)
2013-11-19 13:59

Inlining require heuristics, which might be hard to predict, and it will not always be applicable (it could blow up the code size).

Lambda lifting seems to be a better fit, and it would indeed be applicable to more contexts (although it would require special care for the case where the surrounding function is a recursive function called by the continuation). But for the specific cases I have in mind (sharing continuation between several branches), the overhead of crossing a function boundary can be eliminated, and the corresponding optimization is not only very simple, but also very predictable.

Unboxing at function interface is possible (and I hope it will be included!), but it will probably introduce multiple entry points or a wrapping function, in a case where this is useless (so we'd need to rely on a further optimization to remove this extra stuff). And there are more optimizations possible within a single function (not only register allocation and unboxing), such as turning references into mutable variables.
frisch (developer)
2013-12-05 09:39

Uploaded an improved patch:

 - More tail contexts are detected (r.h.s of || and &&).

 - Apply the rewriting before other optimizations on the lambda-code (in particular, this allows to benefit from the pass which turns local references into mutable variables).

 - Also inline any local function used only once (fully-applied).
frisch (developer)
2015-06-15 10:47

Added a version (local_continuations_3.diff) adapted for trunk.
shinwell (developer)
2017-03-07 13:19

I tend to think we should leave all of this to Flambda. It should already be able to do the lambda lifting and I think we could add a contification rule in the forthcoming version to avoid having to lambda lift.

As an aside, I don't think use-based optimisations such as inlining functions used once should be on by default, since they produce fragility at the user level.

- Issue History
Date Modified Username Field Change
2013-11-18 16:33 frisch New Issue
2013-11-18 16:33 frisch File Added: local_continuations.diff
2013-11-18 16:34 frisch Description Updated View Revisions
2013-11-18 16:37 frisch Note Added: 0010640
2013-11-18 16:42 frisch Description Updated View Revisions
2013-11-18 16:42 frisch Note Deleted: 0010640
2013-11-18 16:44 frisch Relationship added related to 0005879
2013-11-18 17:21 shinwell Note Added: 0010642
2013-11-18 17:33 frisch Note Added: 0010645
2013-11-18 17:37 chambart Note Added: 0010646
2013-11-19 13:59 frisch Note Added: 0010655
2013-11-30 23:19 doligez Tag Attached: patch
2013-11-30 23:23 doligez Status new => acknowledged
2013-12-02 18:58 frisch Note Added: 0010686
2013-12-03 16:21 frisch Note Deleted: 0010686
2013-12-05 09:37 frisch File Added: local_continuations_2.diff
2013-12-05 09:39 frisch Note Added: 0010693
2014-07-16 18:32 doligez Severity minor => tweak
2014-07-16 18:32 doligez Target Version => 4.03.0+dev / +beta1
2015-06-15 10:46 frisch File Added: local_continuations_3.diff
2015-06-15 10:47 frisch Note Added: 0014071
2015-10-15 15:36 frisch Relationship added related to 0007017
2015-11-27 18:27 frisch Target Version 4.03.0+dev / +beta1 => later
2016-12-07 17:45 shinwell Category OCaml general => OCaml backend (code generation)
2017-02-23 16:35 doligez Category OCaml backend (code generation) => Back end (clambda to assembly)
2017-02-23 16:44 doligez Category Back end (clambda to assembly) => back end (clambda to assembly)
2017-03-07 13:19 shinwell Note Added: 0017585
2017-03-07 13:19 shinwell Severity tweak => feature

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker