Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure computation is quadratic in the number of functions #7826

Open
vicuna opened this issue Jul 19, 2018 · 27 comments
Open

Closure computation is quadratic in the number of functions #7826

vicuna opened this issue Jul 19, 2018 · 27 comments

Comments

@vicuna
Copy link

vicuna commented Jul 19, 2018

Original bug ID: 7826
Reporter: @ppedrot
Status: acknowledged (set by @xavierleroy on 2018-07-19T16:01:07Z)
Resolution: open
Priority: normal
Severity: minor
Version: 4.06.1
Category: middle end (typedtree to clambda)
Monitored by: @nojb

Bug description

The Closure.close_functions function computes the set of free variables of its arguments, which is O(n). Unluckily this function is recursive and it subcalls can do the same, which leads to a runtime quadratic in the number of nested abstraction nodes.

I doubt this is an issue in handwritten code, but the problem sure happens in generated code. I hit it in the native compilation scheme of Coq code from fiat-crypto, and the compiler takes ages for a term that is big but not crazy either (~ 500kloc of sparse code, every line being a constructor most of the time).

I do not have a small reproducible test-case at hand, this would require extracting it from the generated code and making it standalone, which is not easy.

@vicuna
Copy link
Author

vicuna commented Jul 19, 2018

Comment author: @ppedrot

Easy enough to reproduce with a trivial example:

gen.ml:
%%%%%%%%%%%%%%%%%%

let file = "out.ml"

let clos_size = int_of_string Sys.argv.(1)

let rec gen chan n =
if n = 0 then ()
else
let () = Printf.fprintf chan "fun x_%i ->\nlet () = print_newline () in\n" n in
gen chan (n - 1)

let () =
let chan = open_out file in
let () = Printf.fprintf chan "let f =\n%!" in
let () = Printf.fprintf chan "%a\n%!" gen clos_size in
let () = Printf.fprintf chan "()\n%!" in
close_out chan

%%%%%%%%%%%%%%%%%%

$ ocaml gen.ml 5000 && time ocamlopt -c out.ml
2.45user 0.11system 0:02.56elapsed 99%CPU (0avgtext+0avgdata 86292maxresident)k

$ ocaml gen.ml 10000 && time ocamlopt -c out.ml
10.11user 0.22system 0:10.33elapsed 99%CPU (0avgtext+0avgdata 163768maxresident)k

$ ocaml gen.ml 20000 && time ocamlopt -c out.ml
30.02user 0.45system 0:30.48elapsed 99%CPU (0avgtext+0avgdata 319000maxresident)k

@vicuna
Copy link
Author

vicuna commented Jul 19, 2018

Comment author: @xavierleroy

Yes, this is one of the many quadratic behaviors in ocamlopt. (There are exponential behaviors too.) They are never encountered by hand-written code and sometimes encountered by auto-generated code. As a consequence, nobody feels a strong urge to rewrite from scratch with more efficient algorithms when they exist.

@github-actions
Copy link

github-actions bot commented May 7, 2020

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 7, 2020
@JasonGross
Copy link

Please don't close this. It's still an issue with native_compute in Coq and extracted code in fiat-crypto, and is unlikely to go away without actually being fixed here.

@JasonGross
Copy link

(Also, why does the bot not consider new mentions of this issue in other repos as activity?)

@gasche gasche removed the Stale label May 7, 2020
@gasche
Copy link
Member

gasche commented May 7, 2020

cc @xavierleroy we have a request for the stale bot: consider "mentions from other issues" as activity.

@gasche
Copy link
Member

gasche commented May 7, 2020

For the record, I cannot reproduce the suspicion that the Closure pass is the cause of the specific quadratic behavior that affects fiat-crypto, using my own hand-written generator or @ppedrot. My own generator seems to spend more time in register allocation, and @ppedrot's example creates terms that spend half their time in typing (I use ocamlopt -dtimings to get this information, but initially I used perf record --call-graph=dwarf ocamlopt ...; perf report).

It's not reasonable to expect us to fix "all quadratic behaviors" right now, we have to focus on those that block users in practice and for which we can reproduce the issue. If we cannot reproduce the specific shape of code that is produced by fiat-crypto (right now I have neither the actual code nor a generator of similar-problem-hitting code), I don't see how we can consider fixing this.

@ppedrot
Copy link
Contributor

ppedrot commented May 7, 2020

I can still reproduce with the allegedly old 4.06.1+fp OCaml version, judging from the perf graph. I'll try with a more recent version.

@ppedrot
Copy link
Contributor

ppedrot commented May 7, 2020

Something very wrong happened since OCaml 4.06. On the example above, typing is now taking most of the time and the overall runtime has been multiplied by 10 for n=5000.

4.06.1+fp

pm@ouranos:/tmp/foo$ ocaml gen.ml 5000 && time ocamlopt -dtimings -c out.ml
2.993s out.ml
  0.045s parsing
    0.044s parser
  0.216s typing
  0.037s transl
  2.695s generate
    0.019s cmm
    0.308s compile_phrases
      0.022s selection
      0.009s comballoc
      0.015s cse
      0.026s liveness
      0.009s deadcode
      0.013s spill
      0.013s split
      0.020s regalloc
      0.007s available_regs
      0.008s linearize
      0.007s scheduling
      0.026s emit
      0.134s other
    0.236s assemble
    2.132s other
0.004s other

real    0m3,042s
user    0m2,873s
sys     0m0,127s

4.08.1+fp

34.307s out.ml
  00.058s parsing
    00.058s parser
  20.715s typing
  11.638s transl
  01.895s generate
    00.028s cmm
    00.285s compile_phrases
      00.021s selection
      00.010s comballoc
      00.015s cse
      00.025s liveness
      00.009s deadcode
      00.013s spill
      00.016s split
      00.017s regalloc
      00.010s available_regs
      00.009s linearize
      00.007s scheduling
      00.021s emit
      00.114s other
    00.180s assemble
    01.402s other
00.007s other

real    0m34,650s
user    0m34,126s
sys     0m0,191s

4.10.0+fp

35.445s out.ml
  00.064s parsing
    00.063s parser
    00.001s other
  20.435s typing
  12.964s transl
  01.981s generate
    00.016s cmm
    00.294s compile_phrases
      00.023s selection
      00.010s comballoc
      00.015s cse
      00.024s liveness
      00.010s deadcode
      00.012s spill
      00.016s split
      00.024s regalloc
      00.007s available_regs
      00.009s linearize
      00.007s scheduling
      00.025s emit
      00.112s other
    00.182s assemble
    01.490s other
00.006s other

real    0m35,697s
user    0m35,304s
sys     0m0,151s

@xavierleroy
Copy link
Contributor

cc @xavierleroy we have a request for the stale bot: consider "mentions from other issues" as activity.

This is not a feature offered by the bot, as far as I can see. Anyone is welcome to write a better bot, but I have other things to do.

@xavierleroy
Copy link
Contributor

Something very wrong happened since OCaml 4.06. On the example above, typing is now taking most of the time and the overall runtime has been multiplied by 10 for n=5000.

"transl" also takes much longer. However, code generation got faster :-)

Yes, this is worrying. There have been other reports of increased compilation times in 4.07, but none that dramatic. I think this would deserve opening a new issue, and closing this one.

@ppedrot
Copy link
Contributor

ppedrot commented May 7, 2020

I think this would deserve opening a new issue, and closing this one.

Well, I don't think that the underlying quadratic behaviour of free variable computation has been fixed, it's probably still there buried under the other, much more important regression. But I am fine with this bug being closed as WONTFIX if it's the morale of the story though.

@JasonGross
Copy link

But I am fine with this bug being closed as WONTFIX if it's the morale of the story though.

:-(

@JasonGross
Copy link

JasonGross commented May 7, 2020

right now I have neither the actual code nor a generator of similar-problem-hitting code

The code is available on our CI. For example, here's the extracted code from a recent run. And here are just the .ml files, in more permanant storage.

@bluddy
Copy link
Contributor

bluddy commented May 7, 2020

The code is available on our CI.

Wow. There are several >100MB .ml files in there!

@JasonGross
Copy link

Wow. There are several >100MB .ml files in there!

Indeed! These are statically-allocated PHOASTs. When @ppedrot suggested that I avoid this issue by switching them to a non-closure-based representation (such as de Bruijn), I got an error message from the Haskell compiler on the Haskell equivalent of this code that it could not allocate such large compile-time data blobs.

@bluddy
Copy link
Contributor

bluddy commented May 8, 2020

I think if the compiler is expected to perform decently on these monstrosities (and to keep performing decently), one of them needs to be checked in as a test case (perhaps in gzipped form), even if it's only run sporadically.

@JasonGross
Copy link

Feel free to upload any or all of them as a test-case. For reference, the build log with timing info from OCaml 4.05.0 is


2020-05-07T20:14:21.5046408Z OCAMLOPT src/ExtractionOCaml/unsaturated_solinas.ml -o src/ExtractionOCaml/unsaturated_solinas
2020-05-07T20:14:34.3746102Z src/ExtractionOCaml/perf_word_by_word_montgomery.ml (real: 22.79, user: 22.06, sys: 0.60, mem: 1704564 ko)
2020-05-07T20:14:34.6739624Z OCAMLOPT src/ExtractionOCaml/saturated_solinas.ml -o src/ExtractionOCaml/saturated_solinas
2020-05-07T20:14:57.6164541Z src/ExtractionOCaml/unsaturated_solinas (real: 36.10, user: 32.53, sys: 1.11, mem: 1212164 ko)
2020-05-07T20:14:57.6177444Z OCAMLOPT src/ExtractionOCaml/word_by_word_montgomery.ml -o src/ExtractionOCaml/word_by_word_montgomery
2020-05-07T20:15:05.0145548Z src/ExtractionOCaml/saturated_solinas (real: 30.33, user: 28.91, sys: 1.02, mem: 1052920 ko)
2020-05-07T20:15:05.0160523Z OCAMLOPT src/ExtractionOCaml/base_conversion.ml -o src/ExtractionOCaml/base_conversion
2020-05-07T20:15:34.5687435Z src/ExtractionOCaml/base_conversion (real: 29.55, user: 28.17, sys: 0.92, mem: 1088436 ko)
2020-05-07T20:15:34.5698996Z OCAMLOPT src/ExtractionOCaml/perf_unsaturated_solinas.ml -o src/ExtractionOCaml/perf_unsaturated_solinas
2020-05-07T20:15:40.5476673Z src/ExtractionOCaml/word_by_word_montgomery (real: 42.92, user: 41.26, sys: 1.45, mem: 1594944 ko)
2020-05-07T20:15:40.5488437Z OCAMLOPT src/ExtractionOCaml/perf_word_by_word_montgomery.ml -o src/ExtractionOCaml/perf_word_by_word_montgomery
2020-05-07T20:16:03.3492523Z src/ExtractionOCaml/perf_unsaturated_solinas (real: 28.77, user: 27.29, sys: 0.90, mem: 945320 ko)
2020-05-07T20:16:13.2625182Z src/ExtractionOCaml/perf_word_by_word_montgomery (real: 32.71, user: 31.59, sys: 1.02, mem: 1053020 ko)

@gasche
Copy link
Member

gasche commented May 8, 2020

The regression in type-checking time comes from 15e8f62 (cc @trefis, thrown into the shark pool).

I think it's worth investigating what is going on and whether we can fix it in a not-too-difficult way. It would be great if we could fix fiat-crypto without unnatural implementation efforts on the codebase, but equally or more importantly the regressions that we observe in this corner case may also occur with more common programs (but there maybe they are more diffuse and we hadn't been able to pinpoint them).

(This all highlights that we really should have better monitoring of the OCaml compile times as the compiler evolves. Unfortunately no one has been able to work on this consistently yet.)

@xavierleroy
Copy link
Contributor

Concerning fiat-crypto, we're talking about source files of size ONE HUNDRED MEGABYTES. Compiling 100 Mb of source code in 20 seconds is very fast. I've seen gcc segfault on less huge inputs. So count yourself lucky: this could have gone infinitely worse.

Also, one could wonder why 100 Mb of OCaml are produced to describe one cryptographic primitive.

@JasonGross
Copy link

one could wonder why 100 Mb of OCaml are produced to describe one cryptographic primitive.

Indeed, this is because our reification procedure duplicates many subterms (in particular, it unfolds any constant it doesn't recognize without caching the reification). That the file contains so much duplication is noticable in the fact that it compresses to be quite small.

So count yourself lucky: this could have gone infinitely worse.

Indeed, I'm grateful that I didn't have to spend the week or so I expect it would take to implement such a cache for my reification procedure.

JasonGross added a commit to JasonGross/coq that referenced this issue May 31, 2020
Note that this should reduce the overall build time of fiat-crypto
related targets by about 10--20 minutes, as I've removed the heaviest
jobs (about 25--30 minutes in serial) from the OCaml target.

I'd like to keep the OCaml target around just to make sure that Coq
doesn't introduce a change to extraction that breaks compilation of
extracted OCaml code.  See ocaml/ocaml#7826
for the issue tracking performance of compiling the extracted OCaml
code (and perhaps there should be another issue opened on the OCaml bug
tracker about flambda on the fiat-crypto extracted files?)

Alternative to coq#12405
Closes coq#12405
JasonGross added a commit to JasonGross/coq that referenced this issue May 31, 2020
Note that this should reduce the overall build time of fiat-crypto
related targets by about 10--20 minutes, as I've removed the heaviest
jobs (about 25--30 minutes in serial) from the OCaml target.

I'd like to keep the OCaml target around just to make sure that Coq
doesn't introduce a change to extraction that breaks compilation of
extracted OCaml code.  See ocaml/ocaml#7826
for the issue tracking performance of compiling the extracted OCaml
code (and perhaps there should be another issue opened on the OCaml bug
tracker about flambda on the fiat-crypto extracted files?)

Alternative to coq#12405
Closes coq#12405
Fixes coq#12400
JasonGross added a commit to JasonGross/coq that referenced this issue May 31, 2020
Note that this should reduce the overall build time of fiat-crypto
related targets by about 10--20 minutes, as I've removed the heaviest
jobs (about 25--30 minutes in serial) from the OCaml target.

I'd like to keep the OCaml target around just to make sure that Coq
doesn't introduce a change to extraction that breaks compilation of
extracted OCaml code.  See ocaml/ocaml#7826
for the issue tracking performance of compiling the extracted OCaml
code (and perhaps there should be another issue opened on the OCaml bug
tracker about flambda on the fiat-crypto extracted files?)

Alternative to coq#12405
Closes coq#12405
Fixes coq#12400
Zimmi48 pushed a commit to Zimmi48/coq that referenced this issue Jun 2, 2020
Note that this should reduce the overall build time of fiat-crypto
related targets by about 10--20 minutes, as I've removed the heaviest
jobs (about 25--30 minutes in serial) from the OCaml target.

I'd like to keep the OCaml target around just to make sure that Coq
doesn't introduce a change to extraction that breaks compilation of
extracted OCaml code.  See ocaml/ocaml#7826
for the issue tracking performance of compiling the extracted OCaml
code (and perhaps there should be another issue opened on the OCaml bug
tracker about flambda on the fiat-crypto extracted files?)

Alternative to coq#12405
Closes coq#12405
Fixes coq#12400

(cherry picked from commit f427613)
Zimmi48 pushed a commit to Zimmi48/coq that referenced this issue Jun 2, 2020
Note that this should reduce the overall build time of fiat-crypto
related targets by about 10--20 minutes, as I've removed the heaviest
jobs (about 25--30 minutes in serial) from the OCaml target.

I'd like to keep the OCaml target around just to make sure that Coq
doesn't introduce a change to extraction that breaks compilation of
extracted OCaml code.  See ocaml/ocaml#7826
for the issue tracking performance of compiling the extracted OCaml
code (and perhaps there should be another issue opened on the OCaml bug
tracker about flambda on the fiat-crypto extracted files?)

Alternative to coq#12405
Closes coq#12405
Fixes coq#12400

(cherry picked from commit f427613)
@JasonGross
Copy link

The regression in type-checking time comes from 15e8f62 (cc @trefis, thrown into the shark pool).

Has this regression been fixed?

JasonGross added a commit to JasonGross/fiat-crypto that referenced this issue Apr 11, 2021
Also bump Coq to 8.13.1

This is a somewhat inefficient way to test more versions of OCaml, but I
want information on if the version of OCaml has any impact on the
compilation time of the extracted binaries, a la
ocaml/ocaml#7826 and in particular
ocaml/ocaml#7826 (comment)
JasonGross added a commit to mit-plv/fiat-crypto that referenced this issue Apr 11, 2021
Also bump Coq to 8.13.1

This is a somewhat inefficient way to test more versions of OCaml, but I
want information on if the version of OCaml has any impact on the
compilation time of the extracted binaries, a la
ocaml/ocaml#7826 and in particular
ocaml/ocaml#7826 (comment)
andres-erbsen pushed a commit to mit-plv/fiat-crypto that referenced this issue May 19, 2021
Also bump Coq to 8.13.1

This is a somewhat inefficient way to test more versions of OCaml, but I
want information on if the version of OCaml has any impact on the
compilation time of the extracted binaries, a la
ocaml/ocaml#7826 and in particular
ocaml/ocaml#7826 (comment)
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Feb 28, 2022
@JasonGross
Copy link

The regression in type-checking time comes from 15e8f62 (cc @trefis, thrown into the shark pool).

Has this regression been fixed?

Still waiting for an answer here...

@github-actions github-actions bot removed the Stale label Mar 2, 2022
@github-actions
Copy link

github-actions bot commented Mar 3, 2023

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label Mar 3, 2023
@gasche gasche removed the Stale label Mar 3, 2023
@smuenzel
Copy link
Contributor

smuenzel commented Mar 8, 2023

Has this regression been fixed?

Still waiting for an answer here...

The issue is still open, so it's unlikely the issue has been fixed.
Indeed, repeating the benchmark on trunk shows:

Iterations Typing Time Typing Time (flambda) Lift_constants time (flambda)
5000 7.204s 6.715s 26.814s
10000 41.361s 38.391s 145.052s
20000 287.313s ran out of patience ran out of patience

Which seems a lot worse than the initial benchmark, as observed by @ppedrot. Lift_constants in flambda also seems to perform poorly with this input.

@lthls
Copy link
Contributor

lthls commented Mar 8, 2023

I know what is quadratic in Lift_constants, and will discuss possible fixes with @chambart.
I can't help with the typing code though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants