Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005916OCamlOCaml typingpublic2013-02-07 06:242013-04-15 12:13
Reportersliquister 
Assigned Tolefessan 
PrioritynormalSeverityminorReproducibilityalways
StatusconfirmedResolutionopen 
PlatformOSOS Version
Product Version4.00.0 
Target VersionFixed in Version 
Summary0005916: Compile time performance of opens
DescriptionThe attached patch is a try at improving compilation speed.

It does two things:
1. compose the two substitutions applied when typing [open] constructs like [open Core.Std], instead of applying the two substitutions one after the other
2. apply the resulting substitution lazily. The substitution is only applied on inner modules of, say, [Core.Std] when they are looked up.

The reason for this patch is because many libraries at jane street like core and async are packed as one module, and are accessed through the Std submodule. As a consequence, the aforementioned substitutions are currenty applied on the whole library (twice) when writing [open Lib.Std], as opposed to once on the modules used by the current compilation unit with the patch.

I haven't tried the patch completely by itself, but compiling (without linking) one big tree using
1. a standard 4.00.0 (or 4.00.1?) compiler took 168min (wall clock time).
2. the same compiler with OCAMLRUNPARAM=s=32M took between 141min and 151min.
3. the same compiler with a patch equivalent to the one attached and OCAMLRUNPARAM=s=32M took 109min.

If we just looked at user time, the improvement would be more visible (3. would be roughly 3x faster than 1. if I remember right).
TagsNo tags attached.
Attached Filesdiff file icon lazy_subst.diff [^] (13,259 bytes) 2013-02-07 06:24 [Show Content]
diff file icon lazy-subst2.diff [^] (16,693 bytes) 2013-02-20 15:18 [Show Content]

- Relationships
has duplicate 0005877resolvedfrisch multiple "open" can become expensive in memory 

-  Notes
(0008823)
frisch (developer)
2013-02-07 06:39

We already incorporated a patch to improve performance of opens in the compiler (0005877). Do you think your proposal brings an additional improvement, or maybe that it should replace the other patch?
(0008824)
sliquister (reporter)
2013-02-07 07:07

I don't understand the other patch but judging by the description, I don't think they overlap. This one is about making opening of big structure faster because even one open is already quite costly. It aims for this kind of code:
  open Core.Std
  open Core_extended.Std
  open Async.Std
  ...
There are no multiple opens here so the other patch would do nothing.
(0008825)
lefessan (developer)
2013-02-07 10:37

The two patches are independant. From what I understood, this one lazily applies the substitution, but if there are two opens, then the substitution is (lazily) applied twice. Maybe such laziness is enough to solve the problem of multiple opens, it would need more investigation to know.

Anyway, I think both patches should be applied for now.
(0008827)
frisch (developer)
2013-02-07 11:15

I'm a bit worried by the introduction of a lazy component in the environment, which needs to be serialized. Shouldn't the "explicit lazy" mechanism that you (Fabrice) introduced be used as well here?
(0008828)
sliquister (reporter)
2013-02-07 15:33

I had the impression that the environment was not serialized, but I looked at at cmi files. Do you mean serialized in cmt files?
(0008830)
garrigue (manager)
2013-02-08 03:17

Actually, if I remember correctly the discussion we had about binannot, the full environment is not serialized, only the env_summary field is.
So I'm not sure that EnvLazy is useful at all.
Why is it there ?
(0008831)
frisch (developer)
2013-02-08 06:44

Precisely, the patch adds laziness to the summary:

- | Env_module of summary * Ident.t * module_type
+ | Env_module of summary * Ident.t * module_type Lazy.t
(0008832)
sliquister (reporter)
2013-02-08 14:42

Well it adds laziness both to the summary and to the environment. I can change that but what is the problem with serializing lazy values?
(0008833)
garrigue (manager)
2013-02-08 15:18

It is just that a non-evaluated lazy value contains a closure, which cannot be serialized.
The EnvLazy module avoids that by providing the evaluation function explicitly, at extraction time.
So it could solve that problem.
Another solution would be to force the evaluation of these lazy values before serialization, but then you get back your performance problem if you use binannot.
(0008834)
frisch (developer)
2013-02-08 15:30

AFAIK, environment summaries are also used by the bytecode debugger.
(0008883)
sliquister (reporter)
2013-02-20 15:18

It took me some time, but I attached a new version of the patch. This one was written against a recent version of trunk, so the conflicts with Fabrice's changes have been solved.
I replaced the lazy by an EnvLazy. Since Env.summary contains this lazy value and it is exposed, I could have exposed the Env module altogether but I thought it was simpler to just expose a function to force these lazy values.

I retried the whole tree build and the performance improvement is the same.

What is new though is that now my /tmp contains 50000 ocamlpp* files (and I don't compile anything else on that machine), but I did not investigate that.
(0008978)
gasche (developer)
2013-03-17 17:40
edited on: 2013-03-17 17:42

What is the status of this patch? Alain, Fabrice, do you think it could be applied now that EnvLazy is used?

PS: the ocamlpp* problem was independently submitted by sliquister (thanks!) as 0005930 and is now solved.

(0008989)
sliquister (reporter)
2013-03-17 20:34

Just a note: this patch would probably be much less useful if we had namespaces, assuming the design of namespaces allowed to only load (thus substitute) the submodules of packed modules that are referenced (or the parts of whatever would replace packed modules).
I didn't know namespaces where in the works when I started that, and I don't know how sure it is that they will be incorporated in caml and how far away they are.
(0008991)
frisch (developer)
2013-03-18 09:34

I don't have anything against the patch but I won't have time to review it soon. Gabriel: if you review the patch and if you feel comfortable with the patch, I think you can apply it.

- Issue History
Date Modified Username Field Change
2013-02-07 06:24 sliquister New Issue
2013-02-07 06:24 sliquister File Added: lazy_subst.diff
2013-02-07 06:36 frisch Relationship added has duplicate 0005877
2013-02-07 06:39 frisch Note Added: 0008823
2013-02-07 07:07 sliquister Note Added: 0008824
2013-02-07 10:37 lefessan Note Added: 0008825
2013-02-07 11:15 frisch Note Added: 0008827
2013-02-07 15:33 sliquister Note Added: 0008828
2013-02-08 03:17 garrigue Note Added: 0008830
2013-02-08 06:44 frisch Note Added: 0008831
2013-02-08 14:42 sliquister Note Added: 0008832
2013-02-08 15:18 garrigue Note Added: 0008833
2013-02-08 15:30 frisch Note Added: 0008834
2013-02-20 15:18 sliquister Note Added: 0008883
2013-02-20 15:18 sliquister File Added: lazy-subst2.diff
2013-03-17 17:40 gasche Note Added: 0008978
2013-03-17 17:42 gasche Note Edited: 0008978 View Revisions
2013-03-17 20:34 sliquister Note Added: 0008989
2013-03-18 09:34 frisch Note Added: 0008991
2013-04-15 12:13 lefessan Assigned To => lefessan
2013-04-15 12:13 lefessan Status new => confirmed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker