Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005894OCamlOCaml backend (code generation)public2013-01-17 21:072013-12-16 14:04
Reporterchambart 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0005894: [patch] Avoid boxing float/int32/int64 when doing direct call
DescriptionFunctions taking floating points parameters always receive them boxed.
When a direct call is done, it is possible to specialise the call to
pass those parameters in registers. This also applies to boxed integers.
The return value can also be unboxed.

This allow avoiding all allocations in a function like

let (+) = Nativeint.add
let (-) = Nativeint.sub
let rec fib n =
  if n < 2n then 0n + 1n else fib(n-1n) + fib(n-2n)
Additional InformationThis patch take care of not unboxing parameters when it is possible that this
could increase the number of allocation.

For instance a function like this

let bad (x:float) =
  ignore [x] (* force boxing of x *)

bad will not be specialised because its parameter is used boxed inside its body.

In a case like:

let rec f x y =
  let _ = x +. 1. in
  let _ = y +. 1. in
  h 1. x

and g a b =
  let _ = a+.1. in
  let _ = b+.1. in
  f b 1.

and h c d =
  let _ = c +. 1. in
  let _ = d +. 1. in
  let _ = g 1. c in
  let _ = g 1. d in
  bad d;

only the y parameter of f and a of g will be unboxed.

The return value will be unboxed if every possible returned value is unboxed.

For instance

x +. y is considered unboxed

but

if b
then x +. y
else List.hd l

is considered boxed because List.hd return an already boxed value.

Current problems:

- Constants are considered boxed because they are already boxed at
  compile time, but this may not be the right default choice. It can still be
  circumvented by things like 0n + 1n in the fibonacci example.

- Specialised functions code is duplicated:
  * It is possible to know that some functions will never be used in direct call:
    those can be compiled only one time
  * specialisation could be forbidden over a certain size
  * the generic version of function specialised only for parameters (not return
    value) could be compiled by adding a stub before the specialised version.
Tagspatch
Attached Filespatch file icon 0001-Preparing-for-specialisation-patches-1.patch [^] (23,207 bytes) 2013-01-17 21:07 [Show Content]
patch file icon 0002-Preparing-for-specialisation-patches-2.patch [^] (7,497 bytes) 2013-01-17 21:08 [Show Content]
patch file icon 0003-Propagate-simple-type-informations-about-parameters-.patch [^] (7,392 bytes) 2013-01-17 21:08 [Show Content]
patch file icon 0004-Propagate-type-information-to-clambda.patch [^] (43,180 bytes) 2013-01-17 21:08 [Show Content]
patch file icon 0005-Use-type-information-in-cmm.patch [^] (8,296 bytes) 2013-01-17 21:08 [Show Content]

- Relationships

-  Notes
(0008773)
gasche (developer)
2013-01-17 23:45
edited on: 2013-01-17 23:46

Have you tested the efficiency of this optimization on some of the float-heavy benchmark code that started flowing in (from Alain for example)?

> Specialised functions code is duplicated

It should be possible to have the boxed version just unbox, and then call the unboxed version; that would result in little additional compiled code. But this is only efficient if the inliner reliably removes this indirection layer (which amounts to unboxing at call-site).

Can the inliner be relied upon for this? How much does this degrade the performances in the case where no inlining can happen anyway (no .cmx, functor application...)?

(A way to work around the "not sure if boxing will be needed" problem is to specialize to a function that takes *two* arguments, the boxed and the unboxed data. This is better than the boxing version because you don't have to unbox at definition site, could actually not have much overhead if the unboxed data is passed efficiently, and guarantees that no additional allocation is necessary -- it's essentially a specialization of the technique of "On the runtime complexity of type-directed unboxing", Garrigue and Minamide, 1998. I don't expect it to work in practice, but you never know.)

(0008774)
frisch (developer)
2013-01-18 09:06

This is great news. I fully support this project!

> will not be specialised because its parameter is used boxed inside its body.

I'm not sure this is a good criterion, although it is in line with the current intra-function behavior (don't unbox if there is a potential use site in boxed form). It could very well be the case that the parameter might be used boxed inside the body, but only in rare cases (e.g. to print an error message if some invariant is broken). If this disable the optimization, it means the caller might need to box the parameter.

Have you tried to combine your patch with the strategy I propose to deal with boxing (0005204), i.e. box "lazily" (on demand + memoization)?
(0008778)
chambart (developer)
2013-01-21 12:21
edited on: 2013-01-21 18:54

On Alain's bench on my computer:
without patch:
 time: 21.0s
 minor_words: 2243147047
 promoted_words: 450541

with patch (and a few forgoten cases present in my git branch):
 time: 20.1s
 minor_words: 1007129106
 promoted_words: 238960

Notice that more than half the running time is taken in __ieee754_exp

The majority of other float heavy benchmarks I encountered, where written with
a big while loop such that nothing was boxed.

I will update the patch with a few more things: function parameters
where not considered as unboxable float so a function like

  let f x = let a = if Random.bool () then x else x in a +.a

was compiled allocating x and loading from it.

This is fixed in my git branch:
  https://github.com/chambart/ocaml/tree/float_args [^]

testing it with opam:

  opam remote add comp https://github.com/chambart/opam-compilers-repository.git [^]
  opam switch install trunk+floatarg

Gabriel:
  No the inliner cannot (yet) generate direct calls so a code like:

    let apply f x = f x
    let g x = apply (+.) x

  will generate a generic call to (+.) I have another patch in preparation
  for that kind of things.

  The main interest of avoiding boxing is to avoid some allocation, passing both
  parameters would loose that. The cost of loading from the boxed value is not
  that much and you would still have to pay for if at each function call (reloading
  spilled registers).

Alain:
  I didn't merge both patch yet, I think that I will need to change the
  stategy a bit. I am not completely sure of which cases will not be worth
  unboxing, maybe when we can show that every possible path in the function
  use the value boxed.

Off topic: How do I remove an attached file ? Is the bug reporter
  supposed to have the rights to do that ?


- Issue History
Date Modified Username Field Change
2013-01-17 21:07 chambart New Issue
2013-01-17 21:07 chambart File Added: 0001-Preparing-for-specialisation-patches-1.patch
2013-01-17 21:08 chambart File Added: 0002-Preparing-for-specialisation-patches-2.patch
2013-01-17 21:08 chambart File Added: 0003-Propagate-simple-type-informations-about-parameters-.patch
2013-01-17 21:08 chambart File Added: 0004-Propagate-type-information-to-clambda.patch
2013-01-17 21:08 chambart File Added: 0005-Use-type-information-in-cmm.patch
2013-01-17 23:45 gasche Note Added: 0008773
2013-01-17 23:46 gasche Note Edited: 0008773 View Revisions
2013-01-17 23:46 gasche Note Edited: 0008773 View Revisions
2013-01-18 09:06 frisch Note Added: 0008774
2013-01-21 12:21 chambart Note Added: 0008778
2013-01-21 18:54 chambart Note Edited: 0008778 View Revisions
2013-06-19 20:48 doligez Status new => acknowledged
2013-12-16 14:04 doligez Tag Attached: patch


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker