Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating module values may lead to memory leaks #5098

Closed
vicuna opened this issue Jul 8, 2010 · 11 comments
Closed

creating module values may lead to memory leaks #5098

vicuna opened this issue Jul 8, 2010 · 11 comments
Labels
Milestone

Comments

@vicuna
Copy link

vicuna commented Jul 8, 2010

Original bug ID: 5098
Reporter: milanst
Status: closed (set by @damiendoligez on 2013-06-03T14:47:06Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.12.0+beta1 or 3.12.0+rc1
Target version: 4.01.0+dev
Fixed in version: 4.01.0+dev
Category: ~DO NOT USE (was: OCaml general)
Has duplicate: #5932
Related to: #5931
Monitored by: @gasche "Pascal Cuoq" @ygrek sweeks "Julien Signoles" @hcarty yminsky @Chris00 @mmottl

Bug description

compiler optimizes expressions (module T : S) and avoids block creation if values in T line up with S. The consequence is that values in T that are hidden by S are kept around for as long as the resulting module value is not collected, although they can't be accessed.

Additional information

in the following code
module type S = sig
val x : int
end

let pack1 x =
(module struct
let x = x
let should_be_collected = "but it isn't"
end : S)

let pack2 x =
(module struct
let should_be_collected = "and this one is"
let x = x
end : S)

let () =
let a1 = pack1 0 in
let a2 = pack2 0 in
...

If we check representation of [a1] we can see that it is a block with 2 elements, int and a string should_be_collected. [a2] on the other had is a block with one element, just an int.
I think that optimization is case of [pack1] leads to a memory leak.

The optimization should make a stricter check, namely in (module T:S) the allocation should be omitted only if representation of T is exactly the same as S, which is probably the case in practice.

File attachments

@vicuna
Copy link
Author

vicuna commented Jul 9, 2010

Comment author: @alainfrisch

Note that there is no allocation occurring when evaluating (module T : S) (if T is already evaluated). In fact, the problem you mentions is not directly related to turning modules into first-class values, although I agree that the new feature makes the issue more visible.

Before first-class modules, you could already capture a module in a closure and face the same problem:

let f =
let module M : sig val x : int end = struct
let x = 1
let s = "...."
end
in
fun () -> M.x

I believe the issue is minor, though, and can be avoided by explicitly copying the fields you want to keep. Better, the "include" mechanism can do it for you:

let cleanup x =
let module M = (val x : S) in
let module M = struct include M end in
(module M : S)

This is not to say that the optimization which avoids truncating a structure at runtime couldn't be questioned.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @gasche

Markus Mottl relayed this surprise and change request in the discussion of bug 0005931 ( #5931 ), and the duplicate issue 0005932. Interestingly, it reportedly also mirrors a recommendation resulting from the LAFOSEC security-oriented study of the OCaml programming language: as the polymorphic equality will still structurally compare the fields hidden from the signature, their non-erasure breaks type abstraction.

We have no hard data to evaluate this, but a mild guess of the performance benefits of this optimization would be: "it doesn't change performances in practice".

I think we could consider changing this (removing the optimization of order-relevant signature prefixes, to only avoid the copy for exact signature correspondence). Markus or anyone, would you care to submit a patch for this? No guarantee that it will be implemented in the end (someone may still come up with a deadly real-world example where it is application-critical that no copy is made, in which case Alain remark that enforcing a copy is still expressible in the current system might prevail), but otherwise it will be "when someone else that cares look at it".

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @alainfrisch

Even without the "optimization/bug", polymorphic equality still breaks type abstraction.

module type S = sig type t val x : t end;;

module type S = sig type t val x : t end

(module struct type t = int let x = 1 end : S) = (module struct type t = bool let x = true end : S);;

  • : bool = true

But the same can be achieved with (i) exceptions and (ii) GADTs.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @gasche

Indeed, but this is an orthogonal issue: it is the existential type that is being broken here, while the optimization/bug (nice name!) breaks signature ascription, even in absence of actually abstract types in the signature.

(I do agree that those are related issues and that a real solution for the existential problem would possibly also solve the optimization/bug problem. My initial mention of this work was more a way to say "by the way, other people have been unhappy about this feature/bug for yet another reason, so it starts to look like we could make more people happy than unhappy with this change.".)

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @mmottl

It hasn't occurred to me that people might want to polymorphically compare modules, because they usually contain closures. But since they can also be used as extensible records with ordinary value fields, that's certainly a sane thing to do. This is indeed another argument against the current semantics, because it not only affects memory management but also program logic, e.g.:


module type S = sig val x : int end
module type T = sig include S val y : int end

let () =
let module S = struct let x = 42 end in
let t = (module struct include S let y = 3 end : S) in
(* let t = (module struct include (struct include S let y = 3 end : S) end : S) in *)
assert ((module S : S) = t)

This program should succeed, but it doesn't. Uncommenting the alternative definition of "t" will yield the solution that a programmer implementing the above would likely have intended.

I also agree that this "optimization" of not allocating a new module will likely not matter that much in practice anyway. If the source and target module signatures are the same, then, of course, it should be done and will be sound. But otherwise there is no guarantee anyway that module signatures will line up properly. So depending on the ordering of module entries (e.g. changing the order of an "include") we may get different semantics (or at least memory leaks).

If this issue came up for a vote, I'd clearly fall into the "lets fix this" camp.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @gasche

Let's fix this! Who volunteers to write the patch?

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @Chris00

Does this also happens for objects? If an object t is coerced, (t :> s), and only this latter object is reachable, are the values only reachable from the hidden methods of t collected?

@vicuna
Copy link
Author

vicuna commented Feb 25, 2013

Comment author: @xavierleroy

@Christophe Troestler: I'm speaking from memory here, but I'm pretty sure that coercions (t :> s) generate no code and don't change the representation of t. (Just like an ordinary type constraint.)

@ALL: yes, I agree this "prefix" optimization on structure thinning should go away, for reasons of space safety (this PR) and of security (as pointed out by the Lafosec study). The main reason for this optimization, as far as I can remember, was not so much to save execution time but to avoid generating big code to copy & thin the structure. I have no idea how often it fires in practice, though.

@vicuna
Copy link
Author

vicuna commented Feb 28, 2013

Comment author: @alainfrisch

A lightly tested patch is attached. I haven't even tried to evaluate the impact on performance / code size.

@vicuna
Copy link
Author

vicuna commented Mar 1, 2013

Comment author: @gasche

I find it a bit vexing that this patch makes the relevant part of the code slightly more complex, rather than less (which is why you would except after disabling an optimization except in its most trivial case).

I have tried to get a better formulation, but failed. I think the reason why checking "subset with identity coercions" is easier than "bijection with identity coercions" is that the code is doing a subtyping test, so naturally considers only the fields of sig1 that also appear in sig2. In this context the optimization for prefix is actually more natural than the optimization for whole-structures.

Count this as a token of support for the patch: I have reviewed it and believe it is correct and cannot be much improved. I think we should add a comment around the "if len1 = len2" test explaining why the heck we care about that, mentioning the memory leak issue.

@vicuna
Copy link
Author

vicuna commented Jun 3, 2013

Comment author: @damiendoligez

Patch applied and committed (rev 13735)

@vicuna vicuna closed this as completed Jun 3, 2013
@vicuna vicuna added this to the 4.01.0 milestone Mar 14, 2019
This was referenced Mar 14, 2019
@vicuna vicuna added the bug label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant