New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better compilation of functors #6710
Comments
Comment author: @gasche Sharing environments between independent closures would risk widening the lifetime of some of the captured values. Consider: let g x =
let module F () = struct
let f0 () = I_use_a_lot_of_memory
let g () = x
end in
let module M = F () in
M.g () If f0 and g shared the same environment, then I_use_a_lot_of_memory would still be reachable through M.g's env at the end of this evaluation. In your case that cannot happen because the functions already depend on each other, so each function's env is reachable from the last function. This scenario is also the one that is costly currently, as the quadratic blowup comes from the growing size of closed environments. I think we could optimize consecutive function declarations can be proved to have the set environment-reachable set by compiling them just as mutually-recursive functions. (We could even considering reordering effect-free function declarations by a topological sort hoping to gain new opportunities for this). This would improve the scenario you give without any value-lifetime increase. |
Comment author: @gasche (It should be reasonably easy to test for the real-life performance impact of this kind of optimizations by looking for performance-sensitive functor-heavy code in the wild. OCamlgraph would qualify, Mirage as well, maybe the functorized static-analysis programs. I expect we wouldn't see much change, but who knows.) |
Comment author: @alainfrisch
In your very example, f0 is an abstraction: the only identifiers it could depend upon are the functor's argument and other variables in scope, such as "x" in your case (but it is already kept alive by g). The same example, but with "let f0 = ..." instead of "let f0 () = ..." could exhibit more easily the bad property you mention, but I doubt this occurs in practice: how often does a functor keep big data structures that can be released earlier than its functions? |
Comment author: @gasche I actually meant I_use_a_lot_of_memory as a reference to a variable in scope that takes a lot of memory -- sorry for the confusing use of a capital letter here. I can craft reasonably-credible examples out of a large temporary variable created for caching a computation that happens locally, while a simple teardown function that doesn't depend on this memory is returned: module Compute (Param : struct val len : int end) = struct
let cache = Hashtbl.make ...
let compute () = ... grows cache a lot ...
let return_cps result k = k result
end
let f param =
let module C = Compute (struct let len = param end) in
let result = C.compute () in
C.return_cps result Do you think restricting your optimization to the safe case of consecutive bindings that have the same liveset would limit its effectiveness in practice? Wherever it does not apply, the expert user can still turn it into a "let rec .. and .." to enforce the sharing semantics -- while I'm not sure how users could control the not-liveness-safe version when it bites them. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
I don't remember the details of this discussion, but Stale-bot is asking for @alainfrisch to confirm whether they still care about this idea. More precisely:
|
Doesn't this discussion predate the inclusion of Flambda ? While the particular patch that Alain had in mind hasn't been implemented, Flambda has improved significantly the compilation of functors, so I wouldn't be surprised if the issue was much less relevant now. |
Another point in the balance are the recent discussions to get rid of Infix_tag maybe. Plans to improve compilation by using it more often are maybe not as enticing than they once were. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
As far as I remember, this was more a theoretical concern than a practical one, so I'm closing the issue. (And the pressure on the register allocator is no longer an issue for us thanks to the the linear scan allocator.) |
Original bug ID: 6710
Reporter: @alainfrisch
Status: acknowledged (set by @damiendoligez on 2015-01-08T23:25:02Z)
Resolution: open
Priority: low
Severity: feature
Category: middle end (typedtree to clambda)
Monitored by: @diml @ygrek @jmeber @hcarty @yakobowski
Bug description
A functor typically returns a bunch of functions that refers to the functor's argument and to previous functions defined in the same functor body.
Consider for example:
and compile it with -inline 0 -dcmm. The code for F looks like:
This shows that the memory allocated by the functor evaluation can easily become quadratic in the number of functions (and thus also the evaluation time). Also, this creates a number of potentially long-lived local variables (f1/f2/f3), which can put some pressure on the register allocator (compile time). All that is probably not a big deal, but it is unfortunate in certain scenarios such as using functors to simulate type classes.
Another (perhaps more problematic) performance issue is that referring to a previous function requires a memory load:
I did not try to exhibit micro-benchmark, but I can imagine that this would induce a noticeable penalty e.g. in a functor that implements an efficient data structure with some shared helper functions (each call to such a function would require one extra memory load).
Interestingly, turning the set of function definitions into a single "let rec" actually produces nicer-looking and probably more efficient code. The functor body itself is compiled to a single common closure:
and calls between functions in the body doesn't require a memory load (just an offset):
Could this transformation be applied automatically, and what could be the potential drawbacks?
An even better compilation scheme could to be create a single block (Y), which would represent the resulting structure, with extra fields to represent the "big closure", including the reference to X. The environment of the closure would be Y itself, and functions would access X and previously-defined functions through Y itself (this removes the need for adjusting the environment pointer when calling another function). Non-functional fields defined by the functor (and used in functions) would also be accessible through Y, without requiring extra fields to the closure environment. Y would be created first, with some dummy values for fields (the closure part is mostly static, except for the reference to X), and then filled up during the evaluation of the functor body (as for toplevel structures).
The text was updated successfully, but these errors were encountered: