Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007056OCamltypingpublic2015-11-23 10:542017-04-10 14:39
Reportershinwell 
Assigned Togarrigue 
PrioritynormalSeverityminorReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version
Product Version4.03.0+dev / +beta1 
Target VersionFixed in Version 
Summary0007056: "include" with module aliases
DescriptionIt seems that if you have something like:

std.ml
======
module A = Foo_a

test.ml
=======
include Std

then the generated code for test.ml includes the creation of module blocks that copy (presumably unit) values out of the dummy fields created for A in std.ml, even though A is an alias. I've seen this lead to a huge blowup in code when someone did something along the lines of "include Core.Std". Could we suppress this behaviour?
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0014821)
garrigue (manager)
2015-11-24 14:31

How huge the blow-up ?
Supposedly, it's only one word per module.
And this should not be one per module in Foo_a, but one per module in Std.
I.e., here, only one word for A.

Of course, module aliases require no runtime information, so we could change the internal representation. However, this complicates the computation of offsets, which is easy to get wrong, so I would first like to know if this really matters.
(0014822)
shinwell (developer)
2015-11-24 14:46
edited on: 2015-11-24 14:47

I'm almost certain it wasn't one word per module. It looked like it was redefining the entire structure, starting at (something like) Core.Std and going all the way down. I only have a dump of the unsimplified flambda now. Here is an excerpt:

        Pmakeblock_arg/542087
          *(let
             (let/457002 Core_kernel__Validate.camlCore_kernel__Validate
              Pmakeblock_arg/457038
                *(let
                   (Pfield_arg/457036
                      Core_kernel__Validate.camlCore_kernel__Validate
                    Pfield/457037 (field 31 Pfield_arg/457036))
                   Pfield/457037)
              Pmakeblock_arg/457039 (field 30 let/457002)
              Pmakeblock_arg/457040 (field 29 let/457002)
              Pmakeblock_arg/457041 (field 28 let/457002)
              Pmakeblock_arg/457042 (field 27 let/457002)
              Pmakeblock_arg/457043 (field 26 let/457002)
              Pmakeblock_arg/457044 (field 25 let/457002)
              Pmakeblock_arg/457045 (field 24 let/457002)
              Pmakeblock_arg/457046 (field 23 let/457002)
              Pmakeblock_arg/457047 (field 22 let/457002)
              Pmakeblock_arg/457048 (field 21 let/457002)
              Pmakeblock_arg/457049 (field 20 let/457002)
              Pmakeblock_arg/457050 (field 19 let/457002)
              Pmakeblock_arg/457051 (field 18 let/457002)
              Pmakeblock_arg/457052 (field 17 let/457002)
              Pmakeblock_arg/457053 (field 16 let/457002)
              Pmakeblock_arg/457054 (field 15 let/457002)
              Pmakeblock_arg/457055 (field 14 let/457002)
              Pmakeblock_arg/457056 (field 13 let/457002)
              Pmakeblock_arg/457057 (field 12 let/457002)
              Pmakeblock_arg/457058 (field 11 let/457002)
              Pmakeblock_arg/457059 (field 10 let/457002)
              Pmakeblock_arg/457060 (field 9 let/457002)
              Pmakeblock_arg/457061 (field 8 let/457002)
              Pmakeblock_arg/457062 (field 7 let/457002)
              Pmakeblock_arg/457063 (field 6 let/457002)
              Pmakeblock_arg/457064 (field 5 let/457002)
              Pmakeblock_arg/457065 (field 4 let/457002)
              Pmakeblock_arg/457066 (field 3 let/457002)
              Pmakeblock_arg/457067 (field 2 let/457002)
              Pmakeblock_arg/457068 (field 1 let/457002)
              Pmakeblock_arg/457069 (field 0 let/457002)
              Pmakeblock/457070
                (makeblock 0 Pmakeblock_arg/457069 Pmakeblock_arg/457068
                  Pmakeblock_arg/457067 Pmakeblock_arg/457066
                  Pmakeblock_arg/457065 Pmakeblock_arg/457064
                  Pmakeblock_arg/457063 Pmakeblock_arg/457062
                  Pmakeblock_arg/457061 Pmakeblock_arg/457060
                  Pmakeblock_arg/457059 Pmakeblock_arg/457058
                  Pmakeblock_arg/457057 Pmakeblock_arg/457056
                  Pmakeblock_arg/457055 Pmakeblock_arg/457054
                  Pmakeblock_arg/457053 Pmakeblock_arg/457052
                  Pmakeblock_arg/457051 Pmakeblock_arg/457050
                  Pmakeblock_arg/457049 Pmakeblock_arg/457048
                  Pmakeblock_arg/457047 Pmakeblock_arg/457046
                  Pmakeblock_arg/457045 Pmakeblock_arg/457044
                  Pmakeblock_arg/457043 Pmakeblock_arg/457042
                  Pmakeblock_arg/457041 Pmakeblock_arg/457040
                  Pmakeblock_arg/457039 Pmakeblock_arg/457038))
             Pmakeblock/457070)

I'm not sure if this makes any difference, but flambda uses the bytecode compilation method for modules (all of the transl_store* functions have been removed).

(0014823)
shinwell (developer)
2015-11-24 14:46

(Oh, and the blowup was enormous. The file took about ten minutes to compile on a fast machine.)
(0014825)
shinwell (developer)
2015-11-24 14:49

(For information: in the excerpt above, Core_kernel__Validate.camlCore_kernel__Validate is the symbol assigned to the module block of Core_kernel.Validate in the .cmx file being imported.)
(0014831)
sliquister (reporter)
2015-11-25 01:59

Mark, are you sure the code is not instead:

include (Core.Std : module type of Core.Std)

or something similar? Because 'module type of' expands all aliases, which creates big typing environments and most likely the corresponding big ast after the module coercions are explicit.
(0014832)
shinwell (developer)
2015-11-25 09:14

I will check again, but I'm pretty sure "module type of" was not involved.
(0014848)
garrigue (manager)
2015-11-27 07:02

Note that the coercion need not be on the include itself; doing it in the mli for instance may have the same impact.
Anything that coerces a module alias to a given signature forces to populate an actual structure. There may be some hidden inefficiencies.
(0014850)
lpw25 (developer)
2015-11-27 11:10

IIRC module aliases still occupy a position in the module at runtime, even though this position always contains unit. It is not clear to me why this is the case. Couldn't they just be excluded from the runtime module?
(0014894)
garrigue (manager)
2015-11-30 02:44

Yes, but that wouldn't solve the problem described here.
Namely, this only takes one word per alias.
For include, there are as many aliases as immediate submodules, but we are still talking of a few dozens of words for large libraries.

There are 2 reasons I didn't to try to get rid of these few words: (1) this makes computing offsets more complex, and (2) this would require some changes in the way coercions are built. Both of those are tricky. But this could certainly be done.
(0014903)
shinwell (developer)
2015-11-30 10:02

Ah yes, the original code had "include module type of Foo.Std" in the .mli ...
(0014906)
garrigue (manager)
2015-11-30 12:15

If this is the case, then the right way to write this is going to be

include (module type of struct include Foo.Std)

which ensures that you really get aliases.
If this doesn't work then this should be fixed.
(0015365)
sliquister (reporter)
2016-02-16 15:53

(module type of struct include X end) doesn't ensure one gets aliases. They are expanded unconditionally (except for the bug in remove_aliases where it doesn't traverse functor types), which is a problem because it means [include (X : module type of struct include X end)] forces any compilation unit pointed to by an alias in the type of X to be linked in.
(0015385)
garrigue (manager)
2016-02-22 01:52

@sliquister:
You're right. "module type of" always calls Mtype.expand_aliases, so there is no way to keep aliases when you get your signature from an implementation.
This was done to keep backward compatibility with code that uses this idiom to provide multiple implementations of the same module.
If you just want to copy the contents of a namespace adding new aliases, the best approach may be to write only a .ml (no .mli)...
(In the case you describe, the .mli seems identical to the .ml, not hiding anything.)

Maybe we could add a "module type of this M", to mean that we keep exactly the original strengthened signature.
(0015386)
sliquister (reporter)
2016-02-22 03:55

The case where the expansion added unneeded link time dependencies for me was not the one reported here, but was of the following form:

include (X : module type of X with module A := X.A)
module A = struct
  include X.A
  ...
end

I would gladly not use [module type of], but the compiler forces me to provide a signature. Several language changes could fix this [1].

That being said, why not change [module type of] it to give the actual type of the module (perhaps behind a temporary flag for compatibility)?

If you use [module type of] to get compatible types, then the change would be an improvement since it gets rid of the wonky behavior discussed here, and these weird [struct include .. end].
If you use [module type of] to get incompatible types, there are 2 possibilities. Let's say you're taking a copy of the Queue interface.
If you use [module type of] to create a new abstract type that is just a queue underneath: [module My_queue : module type of Queue = Queue], this wouldn't be supported anymore by the change, because it's a bad idea: it's very fragile, because if anyone refactors Queue by moving its type in a different module, or if Queue becomes an alias to the actual implementation, then My_queue.t becomes equals to Queue.t silently.
If you use [module type of] to write a new implementation with a signature compatible with Queue: [module My_queue : module type of Queue = struct .. end] then you can still do that with the change to [module type of], except more explicitely. It also fixes the fragility from before:

module type Queue_intf = sig .. end
(* checking that the interface we wrote is really the interface of Queue, except
   for the type t *)
module type S1 = Queue_intf with type t = Queue.t
module type S2 = module type of Queue
module Id1(X : S1) : S2 = X
module Id2(X : S2) : S1 = X
module My_queue : Queue_intf = struct .. end

If you're going to write a whole new implementation, it's a fraction of the code to also write an interface.



[1]
Allow shadowing modules:
  include X
  module A = struct
    include X.A
    ...
  end

Removing components from modules instead of module types:
  include (X with module A := X.A)
  module A = struct
    include X.A
    ...
  end

Or as you say, a syntax for the actual type of a module:
  include (X : actual module type of X with module A := X.A)
  module A = struct
    include X.A
    ...
  end
(although I think the current construct should be [new module type of] and have [module type of] be the natural thing)
(0015387)
lpw25 (developer)
2016-02-22 11:41
edited on: 2016-02-22 11:42

> Maybe we could add a "module type of this M", to mean that we keep exactly the original strengthened signature.

I would much rather see "module type of" changed to mean the
original strengthened signature. I know this breaks the idiom of
using "module type of" to provide multiple implementations of the
same module, but that idiom is already a bad idea.

My arguments for changing from the current semantics to the
strengthened semantics are:

1) The most common use of "module type of" is for signatures for
   includes. In this case the signature you want is the
   strengthened one and that is not currently available ("module
   type of struct include .. end" is closer but as discussed
   above still not correct). The fact that there is no signature
   item that corresponds to the structure item "include M" is
   very frustrating.

2) Returning a weakened signature is broken with respect to
   abstraction, as demonstrated by:

     # module M : sig type t end = struct type t = int end;;
     module M : sig type t end

     # module N : module type of M = struct type t = float end;;
     module N : sig type t end

   works but if we reveal the definition of M.t:

     # module M : sig type t = int end = struct type t = int end;;
     module M : sig type t = int end

     # module N : module type of M = struct type t = float end;;
     Characters 30-55:
       module N : module type of M = struct type t = float end;;
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
     Error: Signature mismatch:
            Modules do not match:
              sig type t = float end
            is not included in
              sig type t = int end
            Type declarations do not match:
              type t = float
            is not included in
              type t = int

   or if you define M via an alias:

     # module M' : sig type t end = struct type t = int end;;
     module M' : sig type t end

     # module M = M';;
     module M = M'

     # module N : module type of M = struct type t = float end;;
     Characters 30-55:
       module N : module type of M = struct type t = float end;;
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
     Error: Signature mismatch:
            Modules do not match:
              sig type t = float end
            is not included in
              sig type t = M'.t end
            Type declarations do not match:
              type t = float
            is not included in
              type t = M'.t

3) It is only really safe to use "module type of" on modules with
   an explicit signature. Although if you are going to write an
   explicit signature anyway you might as well give that
   signature a name which avoids the need to use "module type of"
   in the first place.

4) Using "module type of" to define multiple implementations of
   an interface is bad practice. If there are multiple
   implementations of an interface then that interface clearly
   has some meaning independent of a particular implementation,
   so it should be a separate named entity. Defining "Bar" as
   being "similar to Foo" rather than defining "an interface Baz
   of which there are two implementations Foo and Bar" is not
   good programming style.

Essentially, we currently have a semantics designed to support an
idiom that is bad practice, and which prevents a fundamental part
of the language ("include") from working properly.

At Jane Street we have to deal with a new issue based on "module
type of" not working as expected approximately once a month. So I
really would like to address these issues.

(0015393)
garrigue (manager)
2016-02-25 08:43

lpw25, I understand your point.
But changing semantics (and breaking backward compatibility) cannot be done lightly.
This cannot be discussed here, only on the caml-devel list.

Concerning the problem with include, what about adding a new syntax "include module M",
which would exactly mean the signature resulting of "include M"? The advantage here is that there is no ambiguity about what it's supposed to do.
(0015400)
sliquister (reporter)
2016-02-25 21:15
edited on: 2016-02-25 21:15

For what it's worth, I happened to notice about 10 big cmts in our tree today (from 50MB to 190MB), and in all cases, the source file contains 'module type of something big' where the alias expansion is just an undesirable side effect.


- Issue History
Date Modified Username Field Change
2015-11-23 10:54 shinwell New Issue
2015-11-23 10:54 shinwell Status new => assigned
2015-11-23 10:54 shinwell Assigned To => garrigue
2015-11-24 14:31 garrigue Note Added: 0014821
2015-11-24 14:31 garrigue Status assigned => feedback
2015-11-24 14:46 shinwell Note Added: 0014822
2015-11-24 14:46 shinwell Status feedback => assigned
2015-11-24 14:46 shinwell Note Added: 0014823
2015-11-24 14:47 shinwell Note Edited: 0014822 View Revisions
2015-11-24 14:49 shinwell Note Added: 0014825
2015-11-25 01:59 sliquister Note Added: 0014831
2015-11-25 09:14 shinwell Note Added: 0014832
2015-11-27 07:02 garrigue Note Added: 0014848
2015-11-27 11:10 lpw25 Note Added: 0014850
2015-11-30 02:44 garrigue Note Added: 0014894
2015-11-30 10:02 shinwell Note Added: 0014903
2015-11-30 12:15 garrigue Note Added: 0014906
2016-02-03 15:48 doligez Target Version => 4.03.0+dev / +beta1
2016-02-16 15:53 sliquister Note Added: 0015365
2016-02-22 01:52 garrigue Note Added: 0015385
2016-02-22 03:55 sliquister Note Added: 0015386
2016-02-22 11:41 lpw25 Note Added: 0015387
2016-02-22 11:42 lpw25 Note Edited: 0015387 View Revisions
2016-02-25 08:43 garrigue Note Added: 0015393
2016-02-25 21:15 sliquister Note Added: 0015400
2016-02-25 21:15 sliquister Note Edited: 0015400 View Revisions
2016-04-15 15:22 doligez Target Version 4.03.0+dev / +beta1 => 4.03.1+dev
2017-02-16 14:01 doligez Target Version 4.03.1+dev => undecided
2017-02-23 16:45 doligez Category OCaml typing => typing
2017-04-10 14:39 doligez Target Version undecided =>


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker