New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closure computation is quadratic in the number of functions #7826
Comments
Comment author: @ppedrot Easy enough to reproduce with a trivial example: gen.ml: let file = "out.ml" let clos_size = int_of_string Sys.argv.(1) let rec gen chan n = let () = %%%%%%%%%%%%%%%%%% $ ocaml gen.ml 5000 && time ocamlopt -c out.ml $ ocaml gen.ml 10000 && time ocamlopt -c out.ml $ ocaml gen.ml 20000 && time ocamlopt -c out.ml |
Comment author: @xavierleroy Yes, this is one of the many quadratic behaviors in ocamlopt. (There are exponential behaviors too.) They are never encountered by hand-written code and sometimes encountered by auto-generated code. As a consequence, nobody feels a strong urge to rewrite from scratch with more efficient algorithms when they exist. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
Please don't close this. It's still an issue with |
(Also, why does the bot not consider new mentions of this issue in other repos as activity?) |
cc @xavierleroy we have a request for the stale bot: consider "mentions from other issues" as activity. |
For the record, I cannot reproduce the suspicion that the Closure pass is the cause of the specific quadratic behavior that affects fiat-crypto, using my own hand-written generator or @ppedrot. My own generator seems to spend more time in register allocation, and @ppedrot's example creates terms that spend half their time in typing (I use It's not reasonable to expect us to fix "all quadratic behaviors" right now, we have to focus on those that block users in practice and for which we can reproduce the issue. If we cannot reproduce the specific shape of code that is produced by fiat-crypto (right now I have neither the actual code nor a generator of similar-problem-hitting code), I don't see how we can consider fixing this. |
I can still reproduce with the allegedly old 4.06.1+fp OCaml version, judging from the perf graph. I'll try with a more recent version. |
Something very wrong happened since OCaml 4.06. On the example above, typing is now taking most of the time and the overall runtime has been multiplied by 10 for n=5000. 4.06.1+fp
4.08.1+fp
4.10.0+fp
|
This is not a feature offered by the bot, as far as I can see. Anyone is welcome to write a better bot, but I have other things to do. |
"transl" also takes much longer. However, code generation got faster :-) Yes, this is worrying. There have been other reports of increased compilation times in 4.07, but none that dramatic. I think this would deserve opening a new issue, and closing this one. |
Well, I don't think that the underlying quadratic behaviour of free variable computation has been fixed, it's probably still there buried under the other, much more important regression. But I am fine with this bug being closed as WONTFIX if it's the morale of the story though. |
:-( |
The code is available on our CI. For example, here's the extracted code from a recent run. And here are just the .ml files, in more permanant storage. |
Wow. There are several >100MB |
Indeed! These are statically-allocated PHOASTs. When @ppedrot suggested that I avoid this issue by switching them to a non-closure-based representation (such as de Bruijn), I got an error message from the Haskell compiler on the Haskell equivalent of this code that it could not allocate such large compile-time data blobs. |
I think if the compiler is expected to perform decently on these monstrosities (and to keep performing decently), one of them needs to be checked in as a test case (perhaps in gzipped form), even if it's only run sporadically. |
Feel free to upload any or all of them as a test-case. For reference, the build log with timing info from OCaml 4.05.0 is
|
The regression in type-checking time comes from 15e8f62 (cc @trefis, thrown into the shark pool). I think it's worth investigating what is going on and whether we can fix it in a not-too-difficult way. It would be great if we could fix fiat-crypto without unnatural implementation efforts on the codebase, but equally or more importantly the regressions that we observe in this corner case may also occur with more common programs (but there maybe they are more diffuse and we hadn't been able to pinpoint them). (This all highlights that we really should have better monitoring of the OCaml compile times as the compiler evolves. Unfortunately no one has been able to work on this consistently yet.) |
Concerning fiat-crypto, we're talking about source files of size ONE HUNDRED MEGABYTES. Compiling 100 Mb of source code in 20 seconds is very fast. I've seen gcc segfault on less huge inputs. So count yourself lucky: this could have gone infinitely worse. Also, one could wonder why 100 Mb of OCaml are produced to describe one cryptographic primitive. |
Indeed, this is because our reification procedure duplicates many subterms (in particular, it unfolds any constant it doesn't recognize without caching the reification). That the file contains so much duplication is noticable in the fact that it compresses to be quite small.
Indeed, I'm grateful that I didn't have to spend the week or so I expect it would take to implement such a cache for my reification procedure. |
Note that this should reduce the overall build time of fiat-crypto related targets by about 10--20 minutes, as I've removed the heaviest jobs (about 25--30 minutes in serial) from the OCaml target. I'd like to keep the OCaml target around just to make sure that Coq doesn't introduce a change to extraction that breaks compilation of extracted OCaml code. See ocaml/ocaml#7826 for the issue tracking performance of compiling the extracted OCaml code (and perhaps there should be another issue opened on the OCaml bug tracker about flambda on the fiat-crypto extracted files?) Alternative to coq#12405 Closes coq#12405
Note that this should reduce the overall build time of fiat-crypto related targets by about 10--20 minutes, as I've removed the heaviest jobs (about 25--30 minutes in serial) from the OCaml target. I'd like to keep the OCaml target around just to make sure that Coq doesn't introduce a change to extraction that breaks compilation of extracted OCaml code. See ocaml/ocaml#7826 for the issue tracking performance of compiling the extracted OCaml code (and perhaps there should be another issue opened on the OCaml bug tracker about flambda on the fiat-crypto extracted files?) Alternative to coq#12405 Closes coq#12405 Fixes coq#12400
Note that this should reduce the overall build time of fiat-crypto related targets by about 10--20 minutes, as I've removed the heaviest jobs (about 25--30 minutes in serial) from the OCaml target. I'd like to keep the OCaml target around just to make sure that Coq doesn't introduce a change to extraction that breaks compilation of extracted OCaml code. See ocaml/ocaml#7826 for the issue tracking performance of compiling the extracted OCaml code (and perhaps there should be another issue opened on the OCaml bug tracker about flambda on the fiat-crypto extracted files?) Alternative to coq#12405 Closes coq#12405 Fixes coq#12400
Note that this should reduce the overall build time of fiat-crypto related targets by about 10--20 minutes, as I've removed the heaviest jobs (about 25--30 minutes in serial) from the OCaml target. I'd like to keep the OCaml target around just to make sure that Coq doesn't introduce a change to extraction that breaks compilation of extracted OCaml code. See ocaml/ocaml#7826 for the issue tracking performance of compiling the extracted OCaml code (and perhaps there should be another issue opened on the OCaml bug tracker about flambda on the fiat-crypto extracted files?) Alternative to coq#12405 Closes coq#12405 Fixes coq#12400 (cherry picked from commit f427613)
Note that this should reduce the overall build time of fiat-crypto related targets by about 10--20 minutes, as I've removed the heaviest jobs (about 25--30 minutes in serial) from the OCaml target. I'd like to keep the OCaml target around just to make sure that Coq doesn't introduce a change to extraction that breaks compilation of extracted OCaml code. See ocaml/ocaml#7826 for the issue tracking performance of compiling the extracted OCaml code (and perhaps there should be another issue opened on the OCaml bug tracker about flambda on the fiat-crypto extracted files?) Alternative to coq#12405 Closes coq#12405 Fixes coq#12400 (cherry picked from commit f427613)
Also bump Coq to 8.13.1 This is a somewhat inefficient way to test more versions of OCaml, but I want information on if the version of OCaml has any impact on the compilation time of the extracted binaries, a la ocaml/ocaml#7826 and in particular ocaml/ocaml#7826 (comment)
Also bump Coq to 8.13.1 This is a somewhat inefficient way to test more versions of OCaml, but I want information on if the version of OCaml has any impact on the compilation time of the extracted binaries, a la ocaml/ocaml#7826 and in particular ocaml/ocaml#7826 (comment)
Also bump Coq to 8.13.1 This is a somewhat inefficient way to test more versions of OCaml, but I want information on if the version of OCaml has any impact on the compilation time of the extracted binaries, a la ocaml/ocaml#7826 and in particular ocaml/ocaml#7826 (comment)
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
The issue is still open, so it's unlikely the issue has been fixed.
Which seems a lot worse than the initial benchmark, as observed by @ppedrot. Lift_constants in flambda also seems to perform poorly with this input. |
I know what is quadratic in Lift_constants, and will discuss possible fixes with @chambart. |
Original bug ID: 7826
Reporter: @ppedrot
Status: acknowledged (set by @xavierleroy on 2018-07-19T16:01:07Z)
Resolution: open
Priority: normal
Severity: minor
Version: 4.06.1
Category: middle end (typedtree to clambda)
Monitored by: @nojb
Bug description
The Closure.close_functions function computes the set of free variables of its arguments, which is O(n). Unluckily this function is recursive and it subcalls can do the same, which leads to a runtime quadratic in the number of nested abstraction nodes.
I doubt this is an issue in handwritten code, but the problem sure happens in generated code. I hit it in the native compilation scheme of Coq code from fiat-crypto, and the compiler takes ages for a term that is big but not crazy either (~ 500kloc of sparse code, every line being a constructor most of the time).
I do not have a small reproducible test-case at hand, this would require extracting it from the generated code and making it standalone, which is not easy.
The text was updated successfully, but these errors were encountered: