Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.02.0 compiles modules twice slower than 4.01 #6529

Closed
vicuna opened this issue Aug 30, 2014 · 9 comments
Closed

4.02.0 compiles modules twice slower than 4.01 #6529

vicuna opened this issue Aug 30, 2014 · 9 comments

Comments

@vicuna
Copy link

vicuna commented Aug 30, 2014

Original bug ID: 6529
Reporter: jpdeplaix
Status: closed (set by @xavierleroy on 2016-12-07T10:34:38Z)
Resolution: fixed
Priority: high
Severity: major
Version: 4.02.0+beta1 / +rc1
Fixed in version: 4.02.1+dev
Category: ~DO NOT USE (was: OCaml general)
Monitored by: @bobzhang @hcarty @yakobowski

Bug description

ocaml/opam-repository#2547

Steps to reproduce

Just test with any OCaml program.

@vicuna
Copy link
Author

vicuna commented Aug 31, 2014

Comment author: @xavierleroy

Smoking gun (shown by "perf" profiling): significant time is spent in List.assoc (none in 4.01) and (consequently) significantly more time in polymorphic comparison.

@vicuna
Copy link
Author

vicuna commented Aug 31, 2014

Comment author: @xavierleroy

Fixed quadratic-time code in Consistbl.extract (commit 15169 in 4.02, 15170 on trunk). Compilation time for Why3 is down to 11.0s user time, from 15.6s in 4.02 and 8.6s in 4.01. The remaining 1.4s are unaccounted for (yet).

@vicuna
Copy link
Author

vicuna commented Aug 31, 2014

Comment author: @lpw25

The fix to Consistbl.extract is a good change and reasonable solution to the problem. However, I think that the "root cause" of the bug is elsewhere.

The quadratic code was already around most uses of Consistbl.extract, but it was not around the one in Env.imports. This suggests that Env.imported_units is growing very large. So the cause of the bug is probably Env.add_imports which should be checking that it does not add duplicates to that list.

When I wrote Env.add_imports I think I thought that ps_crcs_checked would prevent duplicates. However, this obviously only works for directly imported modules; indirectly imported modules will still be duplicated. Worse, the number of (duplicated) indirect imports is probably quadratic in the number of units in a project. Combined with the quadratic code in Consistbl.extract this seems to have lead to some dramatic slowdowns.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: @alainfrisch

Following Leo's advice, I've committed a change to Env (15171 on trunk, 15172 on 4.02) to (i) avoid duplicates in imported_units (using a StringSet instead of a list) and (ii) avoid repeated consistency checks for the same unit (adding a bool ref to the cache of loaded persistent units to remember which ones have already been checked).

Compiling typing/typecore.ml now takes 0.52s (0.76s before the change) with ocamlopt.opt and 2.24s (3.52s before) with ocamlopt.

Since we're discussing performances of the compiler, it's also worth noting that tweaking GC parameters can have a huge impact on them. For instance, compiling typing/typecore.ml with ocamlopt.opt takes only 0.35s (instead of 0.52s) with OCAMLRUNPARAM=s=64M.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: jpdeplaix

Great ! Here is the new results on why3:

  • 4.01: ~38s
  • 4.02: ~77s
  • 4.02 dev: ~41s

and for js_of_ocaml:

  • 4.01: ~42s
  • 4.02: ~73s
  • 4.02 dev: ~47s

Thanks a lot ! :)
There is still a little overhead but I don't know if it is normal or not.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: @xavierleroy

Thanks for the retest. The approx 10% overhead that remains can be caused by many different things (e.g. ocamlopt 4.02 performs two additional optimization passes compared with 4.01) and I don't think it is worth deeper investigation. Marking this PR as resolved.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: jpdeplaix

Just for information, when do you think the 4.02.1 can be released ?

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: @lpw25

(ii) avoid repeated consistency checks for the same unit (adding a bool ref to the cache of loaded persistent units to remember which ones have already been checked).

This is what the ps_crcs_checked field of pers_struct was supposed to do. However, it seems to have been removed. I think that a field of pers_struct is a bit cleaner than a separate bool ref, so perhaps ps_crcs_checked should be resurrected.

@vicuna
Copy link
Author

vicuna commented Sep 1, 2014

Comment author: @alainfrisch

Thanks Leo, I've moved the flag to the pers_struct record.

@vicuna vicuna closed this as completed Dec 7, 2016
avsm pushed a commit to avsm/ocaml that referenced this issue Jun 21, 2020
avsm pushed a commit to avsm/ocaml that referenced this issue Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant