Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"include" with module aliases #7056

Closed
vicuna opened this issue Nov 23, 2015 · 19 comments
Closed

"include" with module aliases #7056

vicuna opened this issue Nov 23, 2015 · 19 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Nov 23, 2015

Original bug ID: 7056
Reporter: @mshinwell
Assigned to: @garrigue
Status: assigned (set by @mshinwell on 2015-11-24T13:46:20Z)
Resolution: open
Priority: normal
Severity: minor
Version: 4.03.0+dev / +beta1
Category: typing
Monitored by: @gasche @diml @hcarty @yakobowski @alainfrisch

Bug description

It seems that if you have something like:

std.ml

module A = Foo_a

test.ml

include Std

then the generated code for test.ml includes the creation of module blocks that copy (presumably unit) values out of the dummy fields created for A in std.ml, even though A is an alias. I've seen this lead to a huge blowup in code when someone did something along the lines of "include Core.Std". Could we suppress this behaviour?

@vicuna
Copy link
Author

vicuna commented Nov 24, 2015

Comment author: @garrigue

How huge the blow-up ?
Supposedly, it's only one word per module.
And this should not be one per module in Foo_a, but one per module in Std.
I.e., here, only one word for A.

Of course, module aliases require no runtime information, so we could change the internal representation. However, this complicates the computation of offsets, which is easy to get wrong, so I would first like to know if this really matters.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2015

Comment author: @mshinwell

I'm almost certain it wasn't one word per module. It looked like it was redefining the entire structure, starting at (something like) Core.Std and going all the way down. I only have a dump of the unsimplified flambda now. Here is an excerpt:

    Pmakeblock_arg/542087
      *(let
         (let/457002 Core_kernel__Validate.camlCore_kernel__Validate
          Pmakeblock_arg/457038
            *(let
               (Pfield_arg/457036
                  Core_kernel__Validate.camlCore_kernel__Validate
                Pfield/457037 (field 31 Pfield_arg/457036))
               Pfield/457037)
          Pmakeblock_arg/457039 (field 30 let/457002)
          Pmakeblock_arg/457040 (field 29 let/457002)
          Pmakeblock_arg/457041 (field 28 let/457002)
          Pmakeblock_arg/457042 (field 27 let/457002)
          Pmakeblock_arg/457043 (field 26 let/457002)
          Pmakeblock_arg/457044 (field 25 let/457002)
          Pmakeblock_arg/457045 (field 24 let/457002)
          Pmakeblock_arg/457046 (field 23 let/457002)
          Pmakeblock_arg/457047 (field 22 let/457002)
          Pmakeblock_arg/457048 (field 21 let/457002)
          Pmakeblock_arg/457049 (field 20 let/457002)
          Pmakeblock_arg/457050 (field 19 let/457002)
          Pmakeblock_arg/457051 (field 18 let/457002)
          Pmakeblock_arg/457052 (field 17 let/457002)
          Pmakeblock_arg/457053 (field 16 let/457002)
          Pmakeblock_arg/457054 (field 15 let/457002)
          Pmakeblock_arg/457055 (field 14 let/457002)
          Pmakeblock_arg/457056 (field 13 let/457002)
          Pmakeblock_arg/457057 (field 12 let/457002)
          Pmakeblock_arg/457058 (field 11 let/457002)
          Pmakeblock_arg/457059 (field 10 let/457002)
          Pmakeblock_arg/457060 (field 9 let/457002)
          Pmakeblock_arg/457061 (field 8 let/457002)
          Pmakeblock_arg/457062 (field 7 let/457002)
          Pmakeblock_arg/457063 (field 6 let/457002)
          Pmakeblock_arg/457064 (field 5 let/457002)
          Pmakeblock_arg/457065 (field 4 let/457002)
          Pmakeblock_arg/457066 (field 3 let/457002)
          Pmakeblock_arg/457067 (field 2 let/457002)
          Pmakeblock_arg/457068 (field 1 let/457002)
          Pmakeblock_arg/457069 (field 0 let/457002)
          Pmakeblock/457070
            (makeblock 0 Pmakeblock_arg/457069 Pmakeblock_arg/457068
              Pmakeblock_arg/457067 Pmakeblock_arg/457066
              Pmakeblock_arg/457065 Pmakeblock_arg/457064
              Pmakeblock_arg/457063 Pmakeblock_arg/457062
              Pmakeblock_arg/457061 Pmakeblock_arg/457060
              Pmakeblock_arg/457059 Pmakeblock_arg/457058
              Pmakeblock_arg/457057 Pmakeblock_arg/457056
              Pmakeblock_arg/457055 Pmakeblock_arg/457054
              Pmakeblock_arg/457053 Pmakeblock_arg/457052
              Pmakeblock_arg/457051 Pmakeblock_arg/457050
              Pmakeblock_arg/457049 Pmakeblock_arg/457048
              Pmakeblock_arg/457047 Pmakeblock_arg/457046
              Pmakeblock_arg/457045 Pmakeblock_arg/457044
              Pmakeblock_arg/457043 Pmakeblock_arg/457042
              Pmakeblock_arg/457041 Pmakeblock_arg/457040
              Pmakeblock_arg/457039 Pmakeblock_arg/457038))
         Pmakeblock/457070)

I'm not sure if this makes any difference, but flambda uses the bytecode compilation method for modules (all of the transl_store* functions have been removed).

@vicuna
Copy link
Author

vicuna commented Nov 24, 2015

Comment author: @mshinwell

(Oh, and the blowup was enormous. The file took about ten minutes to compile on a fast machine.)

@vicuna
Copy link
Author

vicuna commented Nov 24, 2015

Comment author: @mshinwell

(For information: in the excerpt above, Core_kernel__Validate.camlCore_kernel__Validate is the symbol assigned to the module block of Core_kernel.Validate in the .cmx file being imported.)

@vicuna
Copy link
Author

vicuna commented Nov 25, 2015

Comment author: @sliquister

Mark, are you sure the code is not instead:

include (Core.Std : module type of Core.Std)

or something similar? Because 'module type of' expands all aliases, which creates big typing environments and most likely the corresponding big ast after the module coercions are explicit.

@vicuna
Copy link
Author

vicuna commented Nov 25, 2015

Comment author: @mshinwell

I will check again, but I'm pretty sure "module type of" was not involved.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2015

Comment author: @garrigue

Note that the coercion need not be on the include itself; doing it in the mli for instance may have the same impact.
Anything that coerces a module alias to a given signature forces to populate an actual structure. There may be some hidden inefficiencies.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2015

Comment author: @lpw25

IIRC module aliases still occupy a position in the module at runtime, even though this position always contains unit. It is not clear to me why this is the case. Couldn't they just be excluded from the runtime module?

@vicuna
Copy link
Author

vicuna commented Nov 30, 2015

Comment author: @garrigue

Yes, but that wouldn't solve the problem described here.
Namely, this only takes one word per alias.
For include, there are as many aliases as immediate submodules, but we are still talking of a few dozens of words for large libraries.

There are 2 reasons I didn't to try to get rid of these few words: (1) this makes computing offsets more complex, and (2) this would require some changes in the way coercions are built. Both of those are tricky. But this could certainly be done.

@vicuna
Copy link
Author

vicuna commented Nov 30, 2015

Comment author: @mshinwell

Ah yes, the original code had "include module type of Foo.Std" in the .mli ...

@vicuna
Copy link
Author

vicuna commented Nov 30, 2015

Comment author: @garrigue

If this is the case, then the right way to write this is going to be

include (module type of struct include Foo.Std)

which ensures that you really get aliases.
If this doesn't work then this should be fixed.

@vicuna
Copy link
Author

vicuna commented Feb 16, 2016

Comment author: @sliquister

(module type of struct include X end) doesn't ensure one gets aliases. They are expanded unconditionally (except for the bug in remove_aliases where it doesn't traverse functor types), which is a problem because it means [include (X : module type of struct include X end)] forces any compilation unit pointed to by an alias in the type of X to be linked in.

@vicuna
Copy link
Author

vicuna commented Feb 22, 2016

Comment author: @garrigue

@sliquister:
You're right. "module type of" always calls Mtype.expand_aliases, so there is no way to keep aliases when you get your signature from an implementation.
This was done to keep backward compatibility with code that uses this idiom to provide multiple implementations of the same module.
If you just want to copy the contents of a namespace adding new aliases, the best approach may be to write only a .ml (no .mli)...
(In the case you describe, the .mli seems identical to the .ml, not hiding anything.)

Maybe we could add a "module type of this M", to mean that we keep exactly the original strengthened signature.

@vicuna
Copy link
Author

vicuna commented Feb 22, 2016

Comment author: @sliquister

The case where the expansion added unneeded link time dependencies for me was not the one reported here, but was of the following form:

include (X : module type of X with module A := X.A)
module A = struct
  include X.A
  ...
end

I would gladly not use module type of, but the compiler forces me to provide a signature. Several language changes could fix this [1].

That being said, why not change module type of it to give the actual type of the module (perhaps behind a temporary flag for compatibility)?

If you use module type of to get compatible types, then the change would be an improvement since it gets rid of the wonky behavior discussed here, and these weird struct include .. end.
If you use module type of to get incompatible types, there are 2 possibilities. Let's say you're taking a copy of the Queue interface.
If you use module type of to create a new abstract type that is just a queue underneath: module My_queue : module type of Queue = Queue, this wouldn't be supported anymore by the change, because it's a bad idea: it's very fragile, because if anyone refactors Queue by moving its type in a different module, or if Queue becomes an alias to the actual implementation, then My_queue.t becomes equals to Queue.t silently.
If you use module type of to write a new implementation with a signature compatible with Queue: module My_queue : module type of Queue = struct .. end then you can still do that with the change to module type of, except more explicitely. It also fixes the fragility from before:

module type Queue_intf = sig .. end
(* checking that the interface we wrote is really the interface of Queue, except
   for the type t *)
module type S1 = Queue_intf with type t = Queue.t
module type S2 = module type of Queue
module Id1(X : S1) : S2 = X
module Id2(X : S2) : S1 = X
module My_queue : Queue_intf = struct .. end

If you're going to write a whole new implementation, it's a fraction of the code to also write an interface.

[1]
Allow shadowing modules:

  include X
  module A = struct
    include X.A
    ...
  end

Removing components from modules instead of module types:

  include (X with module A := X.A)
  module A = struct
    include X.A
    ...
  end

Or as you say, a syntax for the actual type of a module:

  include (X : actual module type of X with module A := X.A)
  module A = struct
    include X.A
    ...
  end

(although I think the current construct should be new module type of and have module type of be the natural thing)

@vicuna
Copy link
Author

vicuna commented Feb 22, 2016

Comment author: @lpw25

Maybe we could add a "module type of this M", to mean that we keep exactly the original strengthened signature.

I would much rather see "module type of" changed to mean the
original strengthened signature. I know this breaks the idiom of
using "module type of" to provide multiple implementations of the
same module, but that idiom is already a bad idea.

My arguments for changing from the current semantics to the
strengthened semantics are:

  1. The most common use of "module type of" is for signatures for
    includes. In this case the signature you want is the
    strengthened one and that is not currently available ("module
    type of struct include .. end" is closer but as discussed
    above still not correct). The fact that there is no signature
    item that corresponds to the structure item "include M" is
    very frustrating.

  2. Returning a weakened signature is broken with respect to
    abstraction, as demonstrated by:

      # module M : sig type t end = struct type t = int end;;
      module M : sig type t end
    
      # module N : module type of M = struct type t = float end;;
      module N : sig type t end
    

    works but if we reveal the definition of M.t:

      # module M : sig type t = int end = struct type t = int end;;
      module M : sig type t = int end
    
      # module N : module type of M = struct type t = float end;;
      Characters 30-55:
        module N : module type of M = struct type t = float end;;
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^
      Error: Signature mismatch:
             Modules do not match:
               sig type t = float end
             is not included in
               sig type t = int end
             Type declarations do not match:
               type t = float
             is not included in
               type t = int
    

    or if you define M via an alias:

      # module M' : sig type t end = struct type t = int end;;
      module M' : sig type t end
    
      # module M = M';;
      module M = M'
    
      # module N : module type of M = struct type t = float end;;
      Characters 30-55:
        module N : module type of M = struct type t = float end;;
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^
      Error: Signature mismatch:
             Modules do not match:
               sig type t = float end
             is not included in
               sig type t = M'.t end
             Type declarations do not match:
               type t = float
             is not included in
               type t = M'.t
    
  3. It is only really safe to use "module type of" on modules with
    an explicit signature. Although if you are going to write an
    explicit signature anyway you might as well give that
    signature a name which avoids the need to use "module type of"
    in the first place.

  4. Using "module type of" to define multiple implementations of
    an interface is bad practice. If there are multiple
    implementations of an interface then that interface clearly
    has some meaning independent of a particular implementation,
    so it should be a separate named entity. Defining "Bar" as
    being "similar to Foo" rather than defining "an interface Baz
    of which there are two implementations Foo and Bar" is not
    good programming style.

Essentially, we currently have a semantics designed to support an
idiom that is bad practice, and which prevents a fundamental part
of the language ("include") from working properly.

At Jane Street we have to deal with a new issue based on "module
type of" not working as expected approximately once a month. So I
really would like to address these issues.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2016

Comment author: @garrigue

lpw25, I understand your point.
But changing semantics (and breaking backward compatibility) cannot be done lightly.
This cannot be discussed here, only on the caml-devel list.

Concerning the problem with include, what about adding a new syntax "include module M",
which would exactly mean the signature resulting of "include M"? The advantage here is that there is no ambiguity about what it's supposed to do.

@vicuna
Copy link
Author

vicuna commented Feb 25, 2016

Comment author: @sliquister

For what it's worth, I happened to notice about 10 big cmts in our tree today (from 50MB to 190MB), and in all cases, the source file contains 'module type of something big' where the alias expansion is just an undesirable side effect.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 11, 2020
@lpw25
Copy link
Contributor

lpw25 commented May 11, 2020

I think this is now fixed. Module aliases no longer take up space in a module and module type of no longer expands aliases.

@lpw25 lpw25 closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants