Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006016OCamlOCaml typingpublic2013-05-17 18:012014-10-22 14:25
Reporterdim 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Versionlater 
Target VersionlaterFixed in Version 
Summary0006016: non-recursive type declarations
DescriptionAll type declarations are by default recursive. This is a problem when one wants to alias a type in a sub-module. The classic solution is to create a "proxy" type:

  type t
  module M = struct
    type tmp = t
    type t = tmp
  end

At Jane Street we have been using a camlp4 extension to do that automatically when a type is marked with the keyword "nonrec":

  type t
  module M = struct
    type nonrec t = t
  end

We believe it could be useful to everybody and it would be better to have it into the language proper, for instance to avoid polluting signatures with useless types everywhere.
TagsNo tags attached.
Attached Files

- Relationships
related to 0006623new Allow extension nodes after "type", e.g. "type%foo" 

-  Notes
(0009296)
frisch (developer)
2013-05-17 18:10
edited on: 2013-05-17 18:11

A more general request: it would also be useful to refer to a type defined above (especially in an outer module) in the module even if is has been shadowed.

 type t = A | B
 module M = struct
    type t = X | Y

    let f x : (*DOTDOT.t*) = match x with X -> A | Y -> B

    type map = t -> (*DOTDOT.t*)
 end

I don't have a syntax to propose, though. This is really the same problem as the one you mention and the work-around is the same, but only supporting a "nonrec" annotation would not be enough to solve it. So I guess it's worth thinking about a more general solution.

(0009297)
frisch (developer)
2013-05-17 18:18

Another remark: it is quite common in syntax extensions to introduce new identifiers (types in your example, and often also values) to be used in code also inserted by the syntax extension, but we don't really want them to show up in the inferred signature or pollute too much the processed code.

I think Hongbo suggested at some point to introduce a convention that identifiers starting with "__" should not be kept in the inferred signatures (for values, this is easy, for types, this is more difficult). An alternative could be to provide a way to "unset" an identifier for the remaining scope (and to drop it from the inferred signature).

Examples:

  type tmp = t
  type t = tmp
  unset type t


  module X = struct
    let local_name_XYZ12345 = ... (* a local value inserted by a syntax extension, to be used within this module *)
    ...
    ...
    unset val local_name_XYZ12345
  end
(0009298)
gasche (developer)
2013-05-17 18:19

I think all bindings constructs should have both a recursive and a non-recursive version, with the "general usage case" deciding which one is the default. I do support the proposal as suggested by Jeremy, but the addition of a new keyword "nonrec" is quite problematic -- I'm afraid this may be a case of doing nothing because no consensual syntax is acceptable, due to the lack of negative keyword.

Alain, I'm much less convinced by DOTDOT. Recursive vs. non-recursive binding is a familiar concept that already exists in the language (let, module...). On the contrary, few programming languages have a feature for referring to previous versions of shadowed variables. That seems to be a case of feature that is hard to design right, surprising to newcomers, and without which most of the world has apparently lived fine so far.
(0009301)
sliquister (reporter)
2013-05-17 20:30

type-conv is actually implemented in such a way that nonrec doesn't expose any temporary types.

type nonrec t = t

is rewritten (roughly):

include (struct
  type tmp = t
  type t = tmp
end with type tmp := t)
(0009303)
frisch (developer)
2013-05-17 23:12

> few programming languages have a feature for referring to previous versions of shadowed variables

And few programming languages distinguish support both recursive and non-recursive definitions for the same kind of objects.

I believe that the problem is not related to recursive vs. non-recursive declarations, but about the fact that there is no way to use a global name for components (e.g. Foo.t in my example, if it were was put in a file foo.ml). In C# for instance, all class declarations are mutually recursive, but you can refer to an outer class from an inner one using its global path. I'm not claiming that we should necessarily support something like that in OCaml, just that the proposed solution is still a specific work-around for this lack of naming feature (which is not to say that it's not a decent work-around), which is a more general problem.
(0009305)
lpw25 (developer)
2013-05-18 12:38

>I believe that the problem is not related to recursive vs. non-recursive declarations, but about the fact that there is no way to use a global name for components

Not all components have a global name, so providing global names would probably not be a full replacement for "nonrec". For example:

  module F = functor (M: sig end) -> struct
    type t = A | B
    module M = struct
      type nonrec t = t * string
    end
  end
(0009306)
frisch (developer)
2013-05-20 09:17

> but the addition of a new keyword "nonrec" is quite problematic

What about "not rec"?

> Not all components have a global name

But precisely, one could try to provide more ways to name components. I don't think this is really required, but it would provide a more general solution to the reported problem.

One possibility could be to name structures: struct ... end as M. Within the "..." one could refer to M.t (provided it has already been defined, i.e. this does not introduce recursivity; and "M" does not become a valid module identifier).

  module F = functor (M: sig end) -> struct
    type t = A | B
    module M = struct
      type t = N.t * string
    end
  end as N

Other possibilities: a "scope capture" feature (give a name to the current environment, to allow referring to it in an inner scope); relative paths (traversing nested structures).
(0009307)
dario (reporter)
2013-05-20 16:02

> > but the addition of a new keyword "nonrec" is quite problematic
>
> What about "not rec"?

I agree that "type not rec" would be readable and intuitive. However, "not" is presently not a keyword, and it would become one in this new syntax. Won't this have possibly undesirable ramifications? Note that currently all boolean operators are defined as functions in Pervasives...
(0009308)
sliquister (reporter)
2013-05-21 03:34

I don't know if 'not rec' is better than 'nonrec' but I don't see why we need to add a keyword. Just adding a token is enough.
I just made "not" a token locally and replaced all the occurrences of LIDENT in the parser by lident where:

lident:
  LIDENT { $1 }
| NOT { "not" }

and then added NOT REC in the production for type definitions. As expected, it all worked fine. There was just one tiny shift/reduce conflict that I solved by inlining the rule optional_type_parameters.
(0009309)
gasche (developer)
2013-05-21 08:53

We have been purposefully holding back from adding "contextual keywords" (words that take a predefined meaning only in some syntactic contexts) so far, and I'm not sure we are ready to change this.

(I personally prefer to have a single word in this place, because "not rec" or "nonrec" is a modality that comes as one atomic construct. Modalities after "let" are either absent or exactly one keyword (rec), we could think of having several modalities as several word ("method private virtual ..."), but it seems unwise to have a single modality split as several words.)
(0009310)
garrigue (manager)
2013-05-21 11:39
edited on: 2013-05-21 11:47

I basically agree with Alain that the real problem is that one cannot refer to values in an absolute way.

I'm afraid there is no really clean solution for that, as the ability to hide components is an essential part of the ML module system.
But one could imagine a way to control scope.
I'm not a fan of scope capture or module naming, because this adds a new category of binders, whereas the real problem is how to refer to them, more than how to bind them. Moreover if you refer to them using the usual dot notation, it looks like a scope is a module, which of course is not the case.
(Note that I'm not following the discussion on namespaces, which may be related)

A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.
A syntax could be:
^ would mean the global scope (only builtins and definitions from cmi's)
^M would mean go back to the scope inside module M (which must enclose the current position)
and since there maybe several modules with the same name, one could further allow nesting
^M^M: go back to module M inside module M (the default being the most external one).

type t = t1
module M = struct
  type t = t2
  module N = struct
     type t = t3
  end
  module P = struct
    type t = t4
    module M = struct
      type t = t5
      module type S = sig
        type t1 = ^.t
        type t2 = ^M.t
        type t3 = ^M.N.t (* or just N.t if N is not shadowed *)
        type t4 = ^P.t
        type t5 = ^M^M.t
      end
    end
  end
end

Actually I'm not sure starting from the outermost scope is the right approach, w.r.t refactoring.

(0009311)
lpw25 (developer)
2013-05-21 14:17
edited on: 2013-05-21 14:23

> A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.

This solution doesn't address the problem of unnamed modules (in particular
functors). In general I dislike these kind of solutions because they seem to
treat modules as static containers, and in doing so relegate functors to a kind
of second-class status.

> (Note that I'm not following the discussion on namespaces, which may be related)

Since namespaces are essentially named environments, this might be the best
route to a general solution. The ability to create a namespace by hand (or to
capture the current environment as a namespace) would definitely be a full
replacement for "nonrec" and provide a general mechanism for accessing any
previously defined component:

  module F = functor (M: sig end) -> struct
    type t = A | B
    open namespace {{ type t }} as N (* use an "open" syntax to emphasise that N will not be part of the struct *)
    module M = struct
      type t = N#t * string
    end
  end

(0009312)
garrigue (manager)
2013-05-21 14:36

> > A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.

> This solution doesn't address the problem of unnamed modules (in particular
> functors). In general I dislike these kind of solutions because they seem to
> treat modules as static containers, and in doing so relegate functors to a kind
> of second-class status.

This is not the case.
Functors can be just handled the same way as modules:

module F = functor (X : argsig) -> body

in both body and argsig you can refer to locally shadowed definitions coming from outside of F.
I see no real difference. If you have a submodule inside body you can also use ^F to refer to definitions in body outside of this submodule.

The point is that to use several times the same name you need nesting, and to get nesting you need to name modules anyway.
(0009313)
lpw25 (developer)
2013-05-21 15:04

> If you have a submodule inside body you can also use ^F to refer to definitions in body outside of this submodule.

I see, so ^M refers to the "module M = module_expr" construct itself rather than to the module M.

> The point is that to use several times the same name you need nesting, and to get nesting you need to name modules anyway.

This is not technically true, you can put module expressions (and therefore create nested scopes) in other places than "module M = ..." :

  type t = A
  include (struct
    type t = B
    module M = struct
      type t = t * int
    end
  end : sig end)

or

  type t = A
  let m =
  (module struct
    type t = B
    module M = struct
      type t = t * int
    end
  end)

although these are obviously less important than functors.
(0009319)
garrigue (manager)
2013-05-23 07:10
edited on: 2013-05-23 07:33

> I see, so ^M refers to the "module M = module_expr" construct itself rather than to the module M.

This is the idea.
A more restrictive definition would say: all the names in the current scope that were introduced inside module_expr, but not under another "module N = ..." construct.
I.e. this gives a partition of non-qualified identifiers.
It might be sensible to remove identifiers added through open from this list.

> This is not technically true, you can put module expressions (and therefore create nested scopes) in other places than "module M = ..."

Actually, your example with include is not valid: you end up defining two versions of t, which is not allowed in the same module.
A valid version would replace "type t = B" with "open N", with N containing a type t.
However, I think this goes outside of the original goal, which was about allowing access to t without changing the exported signature: you can already write the same code without using open.
(The problem is also solved by having ^ exclude all names introduced by open)

Concerning first-class modules, they are indeed a way to create modules without the "module" keyword.
However, your code is again easily rewritten as

 let m =
  let module M0 = struct
    type t = B
    module M = struct
      type t = ^M0.t * int
    end
  end in (module M0 : S)

without changing the signature.

(0009321)
lpw25 (developer)
2013-05-23 12:24
edited on: 2013-05-23 12:30

> Actually, your example with include is not valid: you end up defining two versions of t, which is not allowed in the same module.

It is valid because the signature prevents the double definition. However, these examples don't actually matter, I was only illustrating that it was possible.

Personally, I still don't like this feature because it uses module names to refer to things that are not modules, and I think it will mostly just confuse people. I would prefer either a simple "nonrec" feature or a more general feature built around namespaces (or some other explicit handling of environments).

(0010074)
yminsky (reporter)
2013-08-02 00:27

This thread seems to have died. What do people think about the original idea of adding a nonrec tag to the compiler? The syntax-extension version we use is an annoyance, and we'd like to stop using it.

My basic view is that it is that nonrec definitions are as natural at the type level as they are at the value level, and it would be best if the compiler made them available.
(0010774)
yallop (developer)
2014-01-03 02:19

It looks like it's going to be difficult to reach agreement on adding a new keyword. How about introducing a different binding symbol for simple non-recursive aliases?

  type t
  module M = struct
    type t := t
  end

(cf. make's recursively expanded vs simply expanded variables, which also use '=' and ':=' respectively: http://www.gnu.org/software/make/manual/make.html#Flavors [^])
(0010775)
gasche (developer)
2014-01-03 09:50

It's a bit doubtful as it mirrors the syntax for destructive substitution, and it's not regular (while we could allow both "rec" and "nonrec" on all binding forms, irrespectively of what their default is). I'm sorry to find myself in the position of impeding compromise and progress, but my gut feeling is that I'd rather have an explicit, non-surprising "nonrec" form (as is as a keyword, or something else), or keeping using the old workaround.
(0010776)
dim (developer)
2014-01-03 10:23

What are the obstacles to adding a nonrec keyword? Is there a disagreement on the name or the usefulness of such a keyword? I doubt it is going to break anything and it is trivial to fix anyway.
(0010788)
frisch (developer)
2014-01-09 18:15

> What are the obstacles to adding a nonrec keyword?

I don't think that the objection is against the addition of the "nonrec" keyword, but rather about the addition of any new keyword. I don't have a strong opinion on this general topic. I think I'd have preferred for instance to add an "override" keyword rather than to use the less descriptive "!" symbol on object methods, but it's true that we have plenty of occurrences of "override" as an identifier in our code base.

For the specific instance under discussion:

 - What about "not rec" (which does not require to reserve 'not' itself as a keyword)? Gabriel objected that the marker should be syntactically atomic. What do other people think?

 - One could also use "-rec" (the minus token followed by the 'rec' keyword), which would look more atomic.

 - I still believe that it's worth investigating the problem of referring to a type (or another kind of component) defined in an earlier scope.
(0010797)
trefis (reporter)
2014-01-13 15:25

> - What about "not rec" (which does not require to reserve 'not' itself as a keyword)?

I don't think that's any better than adding a new keyword and personally, I'd rather have the "nonrec" keyword than this construction (where "not" is not a keyword but has a special role in this particular context).

> One could also use "-rec" (the minus token followed by the 'rec' keyword), which would look more atomic.

I have no real argument for or against that, except that I find it ugly.
(0010811)
lpw25 (developer)
2014-01-20 12:20

This may be more ambitious, but we could try to fix this "properly" in a more long term way:

First allow the form

  type rec t = ...

to indicate that t will be used recursively in a type *equation*.

Then deprecate (with a warning) using a type recursively in a type *equation* without using the `rec` keyword. Recursive uses in type *definitions* will not be deprecated.

Then, in a version or two, make type equations without the `rec` keyword non-recursive.

I think that recursive type equations are fairly rare, and without `-rectypes` I think they can only exist with objects or polymorphic variants. So this change shouldn't cause problems for most people.
(0010812)
dario (reporter)
2014-01-20 12:44
edited on: 2014-01-20 12:45

Two things:

1) I prefer lpw25's proposal of a gradual deprecation.

2) If 1) is a no-go, I would rather have 'nonrec' added as new keyword than the 'non rec' hack. I understand the reluctance to add a new keyword. However, let's put things into perspective: it's trivial for users to grep their code bases to see if 'nonrec' is used as an identifier and do a search and replace. Moreover, the addition of 'nonrec' to the keyword set could also be done gradually. Version 4.02 would not have 'nonrec' as a keyword, but could issue a warning whenever it found an identifier with that name. Then version 4.03 could see the actual addition.

(0010813)
dim (developer)
2014-01-20 13:05

Wouldn't it be a bit confusing to have two different conventions for the same keyword?
(0010814)
lpw25 (developer)
2014-01-20 13:26

> Wouldn't it be a bit confusing to have two different conventions for the same keyword?

I'm not sure I understand what you mean. Do you mean the difference between equations and definitions?

The current form of type declarations is:

type foo = [equation] = [definition]

where both the equation and the definition can be left out. I'm basically suggesting that `foo` not be introduced to the environment until the second `=` unless `rec` is included.

I don't think people will find this confusing, they already need to understand the difference between a definition and an equation to understand many aspects of the language. Even if they don't fully grasp the difference, some notion along the lines of "You don't need to use `rec` for variant and record definitions" should suffice.
(0010815)
yallop (developer)
2014-01-20 13:55

I like some aspects of lpw25's proposal. The distinction between recursive definitions and non-recursive equations works well in SML, although SML has the advantage of a clearer syntactic distinction between the two.

However, I'm not sure how rare recursive type equations are in OCaml once you take recursive groups into account. I have this sort of thing in my code:

    type _ t =
      A : 'a r t
    | B : 'a s t
    and 'a r = (int, 'a) u
    and 'a s = (float, 'a) u
    and ('a, 'b) u = { u : 'a -> 'b; t : ('a -> 'b) t }

Of course, it's possible to refactor the code to expand away the aliases, at the loss of some clarity, but code like this is still going to break if the scope of type alias constructors is changed.
(0010816)
lpw25 (developer)
2014-01-20 14:15
edited on: 2014-01-20 14:15

> However, I'm not sure how rare recursive type equations are in OCaml once you take recursive groups into account.

Oh yes, I forgot about mutually recursive type equations. I think that they are still relatively rare though (perhaps a scan of OPAM is in order).

(0010817)
dim (developer)
2014-01-20 15:08

> I'm not sure I understand what you mean. Do you mean the difference between equations and definitions?

Yes.

> I don't think people will find this confusing, they already need to understand the difference between a definition and an equation to understand many aspects of the language. Even if they don't fully grasp the difference, some notion along the lines of "You don't need to use `rec` for variant and record definitions" should suffice.

If that's the general feeling I'm fine with the proposal.

- Issue History
Date Modified Username Field Change
2013-05-17 18:01 dim New Issue
2013-05-17 18:10 frisch Note Added: 0009296
2013-05-17 18:11 frisch Note Edited: 0009296 View Revisions
2013-05-17 18:18 frisch Note Added: 0009297
2013-05-17 18:19 gasche Note Added: 0009298
2013-05-17 18:20 gasche Status new => acknowledged
2013-05-17 20:30 sliquister Note Added: 0009301
2013-05-17 23:12 frisch Note Added: 0009303
2013-05-18 12:38 lpw25 Note Added: 0009305
2013-05-20 09:17 frisch Note Added: 0009306
2013-05-20 16:02 dario Note Added: 0009307
2013-05-21 03:34 sliquister Note Added: 0009308
2013-05-21 08:53 gasche Note Added: 0009309
2013-05-21 11:39 garrigue Note Added: 0009310
2013-05-21 11:47 garrigue Note Edited: 0009310 View Revisions
2013-05-21 14:17 lpw25 Note Added: 0009311
2013-05-21 14:18 lpw25 Note Edited: 0009311 View Revisions
2013-05-21 14:23 lpw25 Note Edited: 0009311 View Revisions
2013-05-21 14:36 garrigue Note Added: 0009312
2013-05-21 15:04 lpw25 Note Added: 0009313
2013-05-23 07:10 garrigue Note Added: 0009319
2013-05-23 07:33 garrigue Note Edited: 0009319 View Revisions
2013-05-23 12:24 lpw25 Note Added: 0009321
2013-05-23 12:29 lpw25 Note Edited: 0009321 View Revisions
2013-05-23 12:30 lpw25 Note Edited: 0009321 View Revisions
2013-08-02 00:27 yminsky Note Added: 0010074
2014-01-03 02:19 yallop Note Added: 0010774
2014-01-03 09:50 gasche Note Added: 0010775
2014-01-03 10:23 dim Note Added: 0010776
2014-01-09 18:15 frisch Note Added: 0010788
2014-01-13 15:25 trefis Note Added: 0010797
2014-01-20 12:20 lpw25 Note Added: 0010811
2014-01-20 12:44 dario Note Added: 0010812
2014-01-20 12:45 dario Note Edited: 0010812 View Revisions
2014-01-20 13:05 dim Note Added: 0010813
2014-01-20 13:26 lpw25 Note Added: 0010814
2014-01-20 13:55 yallop Note Added: 0010815
2014-01-20 14:15 lpw25 Note Added: 0010816
2014-01-20 14:15 lpw25 Note Edited: 0010816 View Revisions
2014-01-20 15:08 dim Note Added: 0010817
2014-10-22 14:25 whitequark Relationship added related to 0006623


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker