Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple "open" can become expensive in memory #5877

Closed
vicuna opened this issue Jan 8, 2013 · 12 comments
Closed

multiple "open" can become expensive in memory #5877

vicuna opened this issue Jan 8, 2013 · 12 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Jan 8, 2013

Original bug ID: 5877
Reporter: @lefessan
Assigned to: @alainfrisch
Status: closed (set by @alainfrisch on 2016-10-04T08:10:58Z)
Resolution: not a bug
Priority: normal
Severity: minor
Version: 4.00.1
Category: ~DO NOT USE (was: OCaml general)
Duplicate of: #5916
Monitored by: @bobzhang @hcarty gerd @alainfrisch

Bug description

The "open" construct has a non-negligeable cost, as it applies a substitution to a signature before including it in the environment. With new syntax sugar like M.(x) and "let open X in", opening several times the same signature has become easy. However, if the signature is big (for example, if it is a pack of many interfaces), memory can become a problem.

For example, with Core, "open Core.Std" costs 50 MB of memory, opening it 20 times is 1 GB.

The provided patch improves sharing of substituted signatures, making multiple opens not more expensive than one open.

File attachments

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @alainfrisch

Shouldn't the two memoization tables be cleared between compilation units (e.g. in Env.reset_cache)?

I'm a little bit concerned by the structural equality on signatures, since they contain mutable components. I don't see an actual problem, but are you convinced that your approach is sound?

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @alainfrisch

I've uploaded a new version of your patch which applies to the current trunk (and in universal format, which I find more readable).

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @lefessan

For Env.reset_cache, it should be done, indeed, but it is not a problem if it is not done (since the comparison is done with physical equality and caches of persistent signatures are cleared, there is no risk to use a signature from an older compilation).

For the mutable parts, I think they either relate to lazy evaluation, or they are only mutated when the signature is being built, and a signature is not mutated afterwards. So there should be no problem with sharing these mutable components.

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @alainfrisch

I've uploaded a new version, which merges the two memoization tables. Fabrice: do you agree with this change?

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @alainfrisch

making multiple opens not more expensive than one open.

Each open still involves adding all toplevel elements of the opened structure to the environment, which is not a no-op. I've create a source file with 10000 lines equal to "open Hashtbl". The maximum resident set size for compiling this module with ocamlc is reduced by a factor about 4 with the patch (but not more!), and compilation time by a factor of 7. This is good, but I would not go as far as to say that multiple opens become not more expensive than one open.

@vicuna
Copy link
Author

vicuna commented Jan 8, 2013

Comment author: @lefessan

It would be possible to save some more space (a lot in your example) by removing redundant entries from the environment, i.e. [store_value env x y] would check first if [x] is not already associated (physically) with [y] before adding it.

Actually, it would even be possible to use such checks to display warnings when opening a module hides a value that was already bound in the environement.

@vicuna
Copy link
Author

vicuna commented Jan 9, 2013

Comment author: @alainfrisch

(a lot in your example)

The real use case is rather when the code has a lot of non-nested local opens, so this would not apply.

@vicuna
Copy link
Author

vicuna commented Jan 9, 2013

Comment author: @lefessan

Maybe you can memoize on Ident.hide ?

@vicuna
Copy link
Author

vicuna commented Jan 9, 2013

Comment author: @lefessan

Or check if a type before a substitution on a value is not polymorphic, in which case there is probably no need to do the substitution and thus, the value_description can be shared.

That sounds complex for a problem that has not yet been exhibited in practice on real code.

@vicuna
Copy link
Author

vicuna commented Jan 9, 2013

Comment author: @alainfrisch

That sounds complex for a problem that has not yet been exhibited in practice on real code.

Agreed! I just wanted to make sure that the patch indeed solves the problem it is intended to solve (repeated local opens taking too much memory).

@vicuna
Copy link
Author

vicuna commented Jan 15, 2013

Comment author: @alainfrisch

I've committed the revised patch to the trunk (r13236).

@vicuna
Copy link
Author

vicuna commented Oct 4, 2016

Comment author: @alainfrisch

Just to drop a note that #834 should fully address the problem reported here (repeated opening of the same module becomes really cheap).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants