Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile time performance of opens #5916

Closed
vicuna opened this issue Feb 7, 2013 · 20 comments
Closed

Compile time performance of opens #5916

vicuna opened this issue Feb 7, 2013 · 20 comments

Comments

@vicuna
Copy link

vicuna commented Feb 7, 2013

Original bug ID: 5916
Reporter: @sliquister
Status: resolved (set by @xavierleroy on 2015-12-11T12:49:06Z)
Resolution: suspended
Priority: normal
Severity: minor
Version: 4.00.0
Target version: 4.03.0+dev / +beta1
Category: typing
Tags: patch
Has duplicate: #5877
Monitored by: @diml @alainfrisch

Bug description

The attached patch is a try at improving compilation speed.

It does two things:

  1. compose the two substitutions applied when typing [open] constructs like [open Core.Std], instead of applying the two substitutions one after the other
  2. apply the resulting substitution lazily. The substitution is only applied on inner modules of, say, [Core.Std] when they are looked up.

The reason for this patch is because many libraries at jane street like core and async are packed as one module, and are accessed through the Std submodule. As a consequence, the aforementioned substitutions are currenty applied on the whole library (twice) when writing [open Lib.Std], as opposed to once on the modules used by the current compilation unit with the patch.

I haven't tried the patch completely by itself, but compiling (without linking) one big tree using

  1. a standard 4.00.0 (or 4.00.1?) compiler took 168min (wall clock time).
  2. the same compiler with OCAMLRUNPARAM=s=32M took between 141min and 151min.
  3. the same compiler with a patch equivalent to the one attached and OCAMLRUNPARAM=s=32M took 109min.

If we just looked at user time, the improvement would be more visible (3. would be roughly 3x faster than 1. if I remember right).

File attachments

@vicuna
Copy link
Author

vicuna commented Feb 7, 2013

Comment author: @alainfrisch

We already incorporated a patch to improve performance of opens in the compiler (#5877). Do you think your proposal brings an additional improvement, or maybe that it should replace the other patch?

@vicuna
Copy link
Author

vicuna commented Feb 7, 2013

Comment author: @sliquister

I don't understand the other patch but judging by the description, I don't think they overlap. This one is about making opening of big structure faster because even one open is already quite costly. It aims for this kind of code:
open Core.Std
open Core_extended.Std
open Async.Std
...
There are no multiple opens here so the other patch would do nothing.

@vicuna
Copy link
Author

vicuna commented Feb 7, 2013

Comment author: @lefessan

The two patches are independant. From what I understood, this one lazily applies the substitution, but if there are two opens, then the substitution is (lazily) applied twice. Maybe such laziness is enough to solve the problem of multiple opens, it would need more investigation to know.

Anyway, I think both patches should be applied for now.

@vicuna
Copy link
Author

vicuna commented Feb 7, 2013

Comment author: @alainfrisch

I'm a bit worried by the introduction of a lazy component in the environment, which needs to be serialized. Shouldn't the "explicit lazy" mechanism that you (Fabrice) introduced be used as well here?

@vicuna
Copy link
Author

vicuna commented Feb 7, 2013

Comment author: @sliquister

I had the impression that the environment was not serialized, but I looked at at cmi files. Do you mean serialized in cmt files?

@vicuna
Copy link
Author

vicuna commented Feb 8, 2013

Comment author: @garrigue

Actually, if I remember correctly the discussion we had about binannot, the full environment is not serialized, only the env_summary field is.
So I'm not sure that EnvLazy is useful at all.
Why is it there ?

@vicuna
Copy link
Author

vicuna commented Feb 8, 2013

Comment author: @alainfrisch

Precisely, the patch adds laziness to the summary:

  • | Env_module of summary * Ident.t * module_type
  • | Env_module of summary * Ident.t * module_type Lazy.t

@vicuna
Copy link
Author

vicuna commented Feb 8, 2013

Comment author: @sliquister

Well it adds laziness both to the summary and to the environment. I can change that but what is the problem with serializing lazy values?

@vicuna
Copy link
Author

vicuna commented Feb 8, 2013

Comment author: @garrigue

It is just that a non-evaluated lazy value contains a closure, which cannot be serialized.
The EnvLazy module avoids that by providing the evaluation function explicitly, at extraction time.
So it could solve that problem.
Another solution would be to force the evaluation of these lazy values before serialization, but then you get back your performance problem if you use binannot.

@vicuna
Copy link
Author

vicuna commented Feb 8, 2013

Comment author: @alainfrisch

AFAIK, environment summaries are also used by the bytecode debugger.

@vicuna
Copy link
Author

vicuna commented Feb 20, 2013

Comment author: @sliquister

It took me some time, but I attached a new version of the patch. This one was written against a recent version of trunk, so the conflicts with Fabrice's changes have been solved.
I replaced the lazy by an EnvLazy. Since Env.summary contains this lazy value and it is exposed, I could have exposed the Env module altogether but I thought it was simpler to just expose a function to force these lazy values.

I retried the whole tree build and the performance improvement is the same.

What is new though is that now my /tmp contains 50000 ocamlpp* files (and I don't compile anything else on that machine), but I did not investigate that.

@vicuna
Copy link
Author

vicuna commented Mar 17, 2013

Comment author: @gasche

What is the status of this patch? Alain, Fabrice, do you think it could be applied now that EnvLazy is used?

PS: the ocamlpp* problem was independently submitted by sliquister (thanks!) as 0005930 and is now solved.

@vicuna
Copy link
Author

vicuna commented Mar 17, 2013

Comment author: @sliquister

Just a note: this patch would probably be much less useful if we had namespaces, assuming the design of namespaces allowed to only load (thus substitute) the submodules of packed modules that are referenced (or the parts of whatever would replace packed modules).
I didn't know namespaces where in the works when I started that, and I don't know how sure it is that they will be incorporated in caml and how far away they are.

@vicuna
Copy link
Author

vicuna commented Mar 18, 2013

Comment author: @alainfrisch

I don't have anything against the patch but I won't have time to review it soon. Gabriel: if you review the patch and if you feel comfortable with the patch, I think you can apply it.

@vicuna
Copy link
Author

vicuna commented Jul 15, 2013

Comment author: @alainfrisch

I've uploaded a new version of the patch adapted to current trunk (resolving conflicts with #5965 and #5980).

I'm still unable to produce a micro-benchmark to illustrate the gain. I've tried generating huge nested structures and opening only a sub-structure but I could not observe a difference in compilation time (yet).

@vicuna
Copy link
Author

vicuna commented Jul 29, 2013

Comment author: @sliquister

I would need to look at the patch again, but right now I can't compile the compiler at home. I think the description I initially gave was more general than what the patch actually does. Have you tried:

a.ml: big structure
b.ml: empty file
c.ml: module A = A module B = B
pack.cmi: pack of a.cmi b.cmi c.cmi

and then comparing the compilation time of a file containing only 'open C.B' or only 'open C.A' (both should be slow without the patch whereas the first one should be fast with the patch).

@vicuna
Copy link
Author

vicuna commented Sep 18, 2013

Comment author: @alainfrisch

a.ml: let x1 = 1;; let x2 = 2;; ...;; let x10000 = 10000
b.ml: (* empty file )
c.ml: module A = A module B = B
da.ml: open C.A;; open C.A;; open C.A;; (
10 times )
db.ml: open C.B;; open C.B;; open C.B;; (
10 times *)

Compiling with trunk:

$ time ocamlc -c da.ml
ocamlc -c da.ml 0.48s user 0.02s system 101% cpu 0.493 total
$ time ocamlc -c db.ml
ocamlc -c db.ml 0.02s user 0.00s system 99% cpu 0.020 total

So "open C.B" already seems to be quite fast. To eliminate the effect of the memoization patch, I've also tried with a single "open C.A" vs "open C.B":

$ time ocamlc -c da.ml
ocamlc -c da.ml 0.11s user 0.00s system 106% cpu 0.103 total
$ time ocamlc -c db.ml
ocamlc -c db.ml 0.02s user 0.00s system 99% cpu 0.020 total
$ time ocamlc.opt -c da.ml
ocamlc.opt -c da.ml 0.03s user 0.01s system 100% cpu 0.040 total
$ time ocamlc.opt -c db.ml
ocamlc.opt -c db.ml 0.01s user 0.01s system 129% cpu 0.015 total

Even there, "open C.B" is much faster than "open C.A" (without the patch).

Could you run your experiment again on your large code base to confirm the impact of the patch?

@vicuna
Copy link
Author

vicuna commented Dec 10, 2015

Comment author: @xavierleroy

Any new information about this patch? (Inactive since 2013.) If not, it won't be considered for 4.03.

@vicuna
Copy link
Author

vicuna commented Dec 11, 2015

Comment author: @sliquister

I haven't tried anything new, no. I think module aliases and -no-alias-deps essentially solved this problem.

@vicuna
Copy link
Author

vicuna commented Dec 11, 2015

Comment author: @xavierleroy

I think module aliases and -no-alias-deps essentially solved this problem.

Right. Then, I'm "suspending" this PR to mark it as not requiring action and to keep it for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant