Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005931OCamlOCaml backend (code generation)public2013-02-24 06:292013-02-25 15:29
Reportermottl 
Assigned To 
PrioritynormalSeveritytweakReproducibilityalways
StatusclosedResolutionno change required 
PlatformMac OS XOSOS Version
Product Version4.00.1 
Target VersionFixed in Version 
Summary0005931: Inefficient record duplication
DescriptionThe following code demonstrates that duplicating a non-mutable record is surprisingly less efficient than duplicating a first-class module:

-----
type t = { foo : int }
let copy_rec { foo } = { foo }

module type T = sig val foo : int end
let copy_mod (module T : T) = (module T : T)
-----

The generated assembly shows that the record duplication will actually involve an allocation and copying of fields whereas the compiler is smart enough to recognize that the module duplication is really equivalent to the identity function.

I think this suggests a general optimization: if any non-mutable block is copied to another non-mutable one (maybe even of different type!), then the generated code should just be the identity function.
TagsNo tags attached.
Attached Files

- Relationships
related to 0005098closed creating module values may lead to memory leaks 

-  Notes
(0008899)
frisch (developer)
2013-02-24 10:48

There isn't any clever logic for the module case: it is just that the first-class module is represented exactly as the underlying module, so:

  let copy_mod (module T : T) = (module T : T)

is morally compiled as:

  let copy_mode t = t

The optimization you suggest would require some easy but heavy logic to keep track of which identifier come from which record, and try to re-use an existing record value if an exact copy is created. How often does this happen in practice? I'm not sure it is worth adding logic for such a case, especially because doing the optimization manually is trivial without degrading code readability.
(0008904)
gasche (developer)
2013-02-24 14:52

Indeed, this seems to be an issue of writing

  let copy (t : t) = t

vs the explicit destructuring then reconstruction

  let copy { foo } = { foo }

Do you have a real-world example where turning the second into the first is important for an application?
(0008905)
mottl (reporter)
2013-02-24 17:51

I actually found a solution that doesn't force me to duplicate the record. The problem was that a functor argument required a record declaration (or module type in the alternative solution). The argument implementation, however, did not declare this record (or module type) by itself, it had to use a type alias to some other place. The solution was to add a manifest type to that alias so that the compiler would know that the record declarations will match. No such manifest type was necessary with the module type solution to avoid copying.

I wonder how often it actually happens in practice that two non-mutable values would have the exact same physical representation. E.g. as in:

  function [x] -> Some x

Or:

  function (Tag0 (a, b, c)) -> { a; b; c}

Though there are many possible ways in which such optimization opportunities could arise in practice, it's probably still not a frequent enough occurrence to warrant an extra optimization step.
(0008906)
gasche (developer)
2013-02-24 18:23

Correct me if I'm wrong, but here is my rephrasing of what you say: because of a complex situation with functors, signatures and module system wizardry, you had two records types (or first-class module types) that had the same fields of the same type, but without an explicit equality.

Because module types are compared structurally (field-by-field) by the type checker, you could write (fun t -> t); but with record types that have a nominal flavor (two records with the same name and fields are not necessarily equal), you had to write an explicit coercion term that deconstructed and reconstructed the record, leading to an inefficiency.

Adding an explicit equality between the source and destination type solved the problem by allowing to write (fun t -> t) in both cases.

(Marking the issue resolved.)
(0008907)
mottl (reporter)
2013-02-25 01:59
edited on: 2013-02-25 03:00

That's basically correct. However, I've just found out that there is clearly something different if not to say fishy going on. I'll file a separate bug report (0005932) about a memory leak concerning first-class modules, which is somewhat orthogonal to this optimization discussion.

Anyway, in my test case concerning records the source and target type for the record were the same, not just structurally but also nominally. Yet, the generated assembly code was still inferior to the module case. That's what surprised me, because I always thought that modules were really just fancy records and would be either treated the same or worse, certainly not better.

I also thought that writing "(module T : T)" wasn't just some sort of declaration, but that the runtime would actually do something, i.e. copy an underlying record, because the type constraint on the destination might not be the same as for the source. There does seem to be some special treatment here that records apparently do not receive and which does not seem to be sound in terms of memory management.

(0008910)
frisch (developer)
2013-02-25 09:41

An expression "(module M : T)" is compiled as the coercion between T0, the module type inferred for M, and T. If the coercion is the identity, no code is needed. As an optimization, if the coercion is a "prefix projection" (i.e. value components of T0 form a prefix of value components of T), the compiler also uses the identity. This can indeed leads to memory leaks, since other values in M are still part of the block but might become dead otherwise. But note that this is not strictly related to first-class module per se; the same already happens with normal modules (which can be generated on the fly with local modules and hidden inside abstractions).
(0008914)
mottl (reporter)
2013-02-25 15:02

Thanks for the explanation, Alain, I've just read your comments for the duplicate bug report, too. It's good to know that there is at least a workaround using "include". In case anybody needs to know the details, the following alternative copy function will do the right thing when dealing with different source and target module signatures:

  let copy (module M : T) = (module struct include (M : S) end : S)

I think I'd still be happier if the current behavior could be changed. It's quite surely not what most people would intuitively expect. Even if the programmer knows about this issue, it's just too easy to make this mistake, especially with first-class modules.
(0008916)
frisch (developer)
2013-02-25 15:10

I don't have a strong opinion on this subject. In some cases, when using first-class modules as extensible records, it could be a nice property that a "prefix" extraction does not induce any allocation, but I tend to agree that being safer w.r.t. memory leaks is more important.

- Issue History
Date Modified Username Field Change
2013-02-24 06:29 mottl New Issue
2013-02-24 10:48 frisch Note Added: 0008899
2013-02-24 14:52 gasche Note Added: 0008904
2013-02-24 14:52 gasche Status new => feedback
2013-02-24 17:51 mottl Note Added: 0008905
2013-02-24 17:51 mottl Status feedback => new
2013-02-24 18:23 gasche Note Added: 0008906
2013-02-24 18:23 gasche Status new => resolved
2013-02-24 18:23 gasche Resolution open => no change required
2013-02-24 18:23 gasche Assigned To => gasche
2013-02-25 01:59 mottl Note Added: 0008907
2013-02-25 03:00 mottl Note Edited: 0008907 View Revisions
2013-02-25 09:41 frisch Note Added: 0008910
2013-02-25 15:02 mottl Note Added: 0008914
2013-02-25 15:10 frisch Note Added: 0008916
2013-02-25 15:23 gasche Relationship added related to 0005098
2013-02-25 15:29 gasche Status resolved => closed
2013-02-25 15:29 gasche Assigned To gasche =>


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker