Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-recursive type declarations #6016

Closed
vicuna opened this issue May 17, 2013 · 59 comments
Closed

non-recursive type declarations #6016

vicuna opened this issue May 17, 2013 · 59 comments

Comments

@vicuna
Copy link

vicuna commented May 17, 2013

Original bug ID: 6016
Reporter: @diml
Assigned to: @diml
Status: closed (set by @xavierleroy on 2016-12-07T10:47:21Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: later
Target version: later
Fixed in version: 4.02.2+dev / +rc1
Category: typing
Related to: #6623
Monitored by: bobot "Julien Signoles" @Chris00

Bug description

All type declarations are by default recursive. This is a problem when one wants to alias a type in a sub-module. The classic solution is to create a "proxy" type:

  type t
  module M = struct
    type tmp = t
    type t = tmp
  end

At Jane Street we have been using a camlp4 extension to do that automatically when a type is marked with the keyword "nonrec":

  type t
  module M = struct
    type nonrec t = t
  end

We believe it could be useful to everybody and it would be better to have it into the language proper, for instance to avoid polluting signatures with useless types everywhere.

File attachments

@vicuna
Copy link
Author

vicuna commented May 17, 2013

Comment author: @alainfrisch

A more general request: it would also be useful to refer to a type defined above (especially in an outer module) in the module even if is has been shadowed.

 type t = A | B
 module M = struct
    type t = X | Y

    let f x : (*DOTDOT.t*) = match x with X -> A | Y -> B

    type map = t -> (*DOTDOT.t*)
 end

I don't have a syntax to propose, though. This is really the same problem as the one you mention and the work-around is the same, but only supporting a "nonrec" annotation would not be enough to solve it. So I guess it's worth thinking about a more general solution.

@vicuna
Copy link
Author

vicuna commented May 17, 2013

Comment author: @alainfrisch

Another remark: it is quite common in syntax extensions to introduce new identifiers (types in your example, and often also values) to be used in code also inserted by the syntax extension, but we don't really want them to show up in the inferred signature or pollute too much the processed code.

I think Hongbo suggested at some point to introduce a convention that identifiers starting with "__" should not be kept in the inferred signatures (for values, this is easy, for types, this is more difficult). An alternative could be to provide a way to "unset" an identifier for the remaining scope (and to drop it from the inferred signature).

Examples:

  type tmp = t
  type t = tmp
  unset type t


  module X = struct
    let local_name_XYZ12345 = ... (* a local value inserted by a syntax extension, to be used within this module *)
    ...
    ...
    unset val local_name_XYZ12345
  end

@vicuna
Copy link
Author

vicuna commented May 17, 2013

Comment author: @gasche

I think all bindings constructs should have both a recursive and a non-recursive version, with the "general usage case" deciding which one is the default. I do support the proposal as suggested by Jeremy, but the addition of a new keyword "nonrec" is quite problematic -- I'm afraid this may be a case of doing nothing because no consensual syntax is acceptable, due to the lack of negative keyword.

Alain, I'm much less convinced by DOTDOT. Recursive vs. non-recursive binding is a familiar concept that already exists in the language (let, module...). On the contrary, few programming languages have a feature for referring to previous versions of shadowed variables. That seems to be a case of feature that is hard to design right, surprising to newcomers, and without which most of the world has apparently lived fine so far.

@vicuna
Copy link
Author

vicuna commented May 17, 2013

Comment author: @sliquister

type-conv is actually implemented in such a way that nonrec doesn't expose any temporary types.

type nonrec t = t

is rewritten (roughly):

include (struct
  type tmp = t
  type t = tmp
end with type tmp := t)

@vicuna
Copy link
Author

vicuna commented May 17, 2013

Comment author: @alainfrisch

few programming languages have a feature for referring to previous versions of shadowed variables

And few programming languages distinguish support both recursive and non-recursive definitions for the same kind of objects.

I believe that the problem is not related to recursive vs. non-recursive declarations, but about the fact that there is no way to use a global name for components (e.g. Foo.t in my example, if it were was put in a file foo.ml). In C# for instance, all class declarations are mutually recursive, but you can refer to an outer class from an inner one using its global path. I'm not claiming that we should necessarily support something like that in OCaml, just that the proposed solution is still a specific work-around for this lack of naming feature (which is not to say that it's not a decent work-around), which is a more general problem.

@vicuna
Copy link
Author

vicuna commented May 18, 2013

Comment author: @lpw25

I believe that the problem is not related to recursive vs. non-recursive declarations, but about the fact that there is no way to use a global name for components

Not all components have a global name, so providing global names would probably not be a full replacement for "nonrec". For example:

  module F = functor (M: sig end) -> struct
    type t = A | B
    module M = struct
      type nonrec t = t * string
    end
  end

@vicuna
Copy link
Author

vicuna commented May 20, 2013

Comment author: @alainfrisch

but the addition of a new keyword "nonrec" is quite problematic

What about "not rec"?

Not all components have a global name

But precisely, one could try to provide more ways to name components. I don't think this is really required, but it would provide a more general solution to the reported problem.

One possibility could be to name structures: struct ... end as M. Within the "..." one could refer to M.t (provided it has already been defined, i.e. this does not introduce recursivity; and "M" does not become a valid module identifier).

  module F = functor (M: sig end) -> struct
    type t = A | B
    module M = struct
      type t = N.t * string
    end
  end as N

Other possibilities: a "scope capture" feature (give a name to the current environment, to allow referring to it in an inner scope); relative paths (traversing nested structures).

@vicuna
Copy link
Author

vicuna commented May 20, 2013

Comment author: dario

but the addition of a new keyword "nonrec" is quite problematic

What about "not rec"?

I agree that "type not rec" would be readable and intuitive. However, "not" is presently not a keyword, and it would become one in this new syntax. Won't this have possibly undesirable ramifications? Note that currently all boolean operators are defined as functions in Pervasives...

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @sliquister

I don't know if 'not rec' is better than 'nonrec' but I don't see why we need to add a keyword. Just adding a token is enough.
I just made "not" a token locally and replaced all the occurrences of LIDENT in the parser by lident where:

lident:
LIDENT { $1 }
| NOT { "not" }

and then added NOT REC in the production for type definitions. As expected, it all worked fine. There was just one tiny shift/reduce conflict that I solved by inlining the rule optional_type_parameters.

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @gasche

We have been purposefully holding back from adding "contextual keywords" (words that take a predefined meaning only in some syntactic contexts) so far, and I'm not sure we are ready to change this.

(I personally prefer to have a single word in this place, because "not rec" or "nonrec" is a modality that comes as one atomic construct. Modalities after "let" are either absent or exactly one keyword (rec), we could think of having several modalities as several word ("method private virtual ..."), but it seems unwise to have a single modality split as several words.)

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @garrigue

I basically agree with Alain that the real problem is that one cannot refer to values in an absolute way.

I'm afraid there is no really clean solution for that, as the ability to hide components is an essential part of the ML module system.
But one could imagine a way to control scope.
I'm not a fan of scope capture or module naming, because this adds a new category of binders, whereas the real problem is how to refer to them, more than how to bind them. Moreover if you refer to them using the usual dot notation, it looks like a scope is a module, which of course is not the case.
(Note that I'm not following the discussion on namespaces, which may be related)

A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.
A syntax could be:
^ would mean the global scope (only builtins and definitions from cmi's)
^M would mean go back to the scope inside module M (which must enclose the current position)
and since there maybe several modules with the same name, one could further allow nesting
^M^M: go back to module M inside module M (the default being the most external one).

type t = t1
module M = struct
  type t = t2
  module N = struct
     type t = t3
  end
  module P = struct
    type t = t4
    module M = struct
      type t = t5
      module type S = sig
        type t1 = ^.t
        type t2 = ^M.t
        type t3 = ^M.N.t   (* or just N.t if N is not shadowed *)
        type t4 = ^P.t
        type t5 = ^M^M.t
      end
    end
  end
end

Actually I'm not sure starting from the outermost scope is the right approach, w.r.t refactoring.

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @lpw25

A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.

This solution doesn't address the problem of unnamed modules (in particular
functors). In general I dislike these kind of solutions because they seem to
treat modules as static containers, and in doing so relegate functors to a kind
of second-class status.

(Note that I'm not following the discussion on namespaces, which may be related)

Since namespaces are essentially named environments, this might be the best
route to a general solution. The ability to create a namespace by hand (or to
capture the current environment as a namespace) would definitely be a full
replacement for "nonrec" and provide a general mechanism for accessing any
previously defined component:

module F = functor (M: sig end) -> struct
type t = A | B
open namespace {{ type t }} as N (* use an "open" syntax to emphasise that N will not be part of the struct *)
module M = struct
type t = N#t * string
end
end

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @garrigue

A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.

This solution doesn't address the problem of unnamed modules (in particular
functors). In general I dislike these kind of solutions because they seem to
treat modules as static containers, and in doing so relegate functors to a kind
of second-class status.

This is not the case.
Functors can be just handled the same way as modules:

module F = functor (X : argsig) -> body

in both body and argsig you can refer to locally shadowed definitions coming from outside of F.
I see no real difference. If you have a submodule inside body you can also use ^F to refer to definitions in body outside of this submodule.

The point is that to use several times the same name you need nesting, and to get nesting you need to name modules anyway.

@vicuna
Copy link
Author

vicuna commented May 21, 2013

Comment author: @lpw25

If you have a submodule inside body you can also use ^F to refer to definitions in body outside of this submodule.

I see, so ^M refers to the "module M = module_expr" construct itself rather than to the module M.

The point is that to use several times the same name you need nesting, and to get nesting you need to name modules anyway.

This is not technically true, you can put module expressions (and therefore create nested scopes) in other places than "module M = ..." :

  type t = A
  include (struct 
    type t = B
    module M = struct
      type t = t * int
    end
  end : sig end)

or

  type t = A
  let m = 
  (module struct
    type t  = B
    module M = struct
      type t = t * int 
    end
  end)

although these are obviously less important than functors.

@vicuna
Copy link
Author

vicuna commented May 23, 2013

Comment author: @garrigue

I see, so ^M refers to the "module M = module_expr" construct itself rather than to the module M.

This is the idea.
A more restrictive definition would say: all the names in the current scope that were introduced inside module_expr, but not under another "module N = ..." construct.
I.e. this gives a partition of non-qualified identifiers.
It might be sensible to remove identifiers added through open from this list.

This is not technically true, you can put module expressions (and therefore create nested scopes) in other places than "module M = ..."

Actually, your example with include is not valid: you end up defining two versions of t, which is not allowed in the same module.
A valid version would replace "type t = B" with "open N", with N containing a type t.
However, I think this goes outside of the original goal, which was about allowing access to t without changing the exported signature: you can already write the same code without using open.
(The problem is also solved by having ^ exclude all names introduced by open)

Concerning first-class modules, they are indeed a way to create modules without the "module" keyword.
However, your code is again easily rewritten as

 let m = 
  let module M0 = struct
    type t = B
    module M = struct
      type t = ^M0.t * int 
    end
  end in (module M0 : S)

without changing the signature.

@vicuna
Copy link
Author

vicuna commented May 23, 2013

Comment author: @lpw25

Actually, your example with include is not valid: you end up defining two versions of t, which is not allowed in the same module.

It is valid because the signature prevents the double definition. However, these examples don't actually matter, I was only illustrating that it was possible.

Personally, I still don't like this feature because it uses module names to refer to things that are not modules, and I think it will mostly just confuse people. I would prefer either a simple "nonrec" feature or a more general feature built around namespaces (or some other explicit handling of environments).

@vicuna
Copy link
Author

vicuna commented Aug 1, 2013

Comment author: yminsky

This thread seems to have died. What do people think about the original idea of adding a nonrec tag to the compiler? The syntax-extension version we use is an annoyance, and we'd like to stop using it.

My basic view is that it is that nonrec definitions are as natural at the type level as they are at the value level, and it would be best if the compiler made them available.

@vicuna
Copy link
Author

vicuna commented Jan 3, 2014

Comment author: @yallop

It looks like it's going to be difficult to reach agreement on adding a new keyword. How about introducing a different binding symbol for simple non-recursive aliases?

  type t
  module M = struct
    type t := t
  end

(cf. make's recursively expanded vs simply expanded variables, which also use '=' and ':=' respectively: http://www.gnu.org/software/make/manual/make.html#Flavors)

@vicuna
Copy link
Author

vicuna commented Jan 3, 2014

Comment author: @gasche

It's a bit doubtful as it mirrors the syntax for destructive substitution, and it's not regular (while we could allow both "rec" and "nonrec" on all binding forms, irrespectively of what their default is). I'm sorry to find myself in the position of impeding compromise and progress, but my gut feeling is that I'd rather have an explicit, non-surprising "nonrec" form (as is as a keyword, or something else), or keeping using the old workaround.

@vicuna
Copy link
Author

vicuna commented Jan 3, 2014

Comment author: @diml

What are the obstacles to adding a nonrec keyword? Is there a disagreement on the name or the usefulness of such a keyword? I doubt it is going to break anything and it is trivial to fix anyway.

@vicuna
Copy link
Author

vicuna commented Jan 9, 2014

Comment author: @alainfrisch

What are the obstacles to adding a nonrec keyword?

I don't think that the objection is against the addition of the "nonrec" keyword, but rather about the addition of any new keyword. I don't have a strong opinion on this general topic. I think I'd have preferred for instance to add an "override" keyword rather than to use the less descriptive "!" symbol on object methods, but it's true that we have plenty of occurrences of "override" as an identifier in our code base.

For the specific instance under discussion:

  • What about "not rec" (which does not require to reserve 'not' itself as a keyword)? Gabriel objected that the marker should be syntactically atomic. What do other people think?

  • One could also use "-rec" (the minus token followed by the 'rec' keyword), which would look more atomic.

  • I still believe that it's worth investigating the problem of referring to a type (or another kind of component) defined in an earlier scope.

@vicuna
Copy link
Author

vicuna commented Jan 13, 2014

Comment author: @trefis

  • What about "not rec" (which does not require to reserve 'not' itself as a keyword)?

I don't think that's any better than adding a new keyword and personally, I'd rather have the "nonrec" keyword than this construction (where "not" is not a keyword but has a special role in this particular context).

One could also use "-rec" (the minus token followed by the 'rec' keyword), which would look more atomic.

I have no real argument for or against that, except that I find it ugly.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @lpw25

This may be more ambitious, but we could try to fix this "properly" in a more long term way:

First allow the form

type rec t = ...

to indicate that t will be used recursively in a type equation.

Then deprecate (with a warning) using a type recursively in a type equation without using the rec keyword. Recursive uses in type definitions will not be deprecated.

Then, in a version or two, make type equations without the rec keyword non-recursive.

I think that recursive type equations are fairly rare, and without -rectypes I think they can only exist with objects or polymorphic variants. So this change shouldn't cause problems for most people.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: dario

Two things:

  1. I prefer lpw25's proposal of a gradual deprecation.

  2. If 1) is a no-go, I would rather have 'nonrec' added as new keyword than the 'non rec' hack. I understand the reluctance to add a new keyword. However, let's put things into perspective: it's trivial for users to grep their code bases to see if 'nonrec' is used as an identifier and do a search and replace. Moreover, the addition of 'nonrec' to the keyword set could also be done gradually. Version 4.02 would not have 'nonrec' as a keyword, but could issue a warning whenever it found an identifier with that name. Then version 4.03 could see the actual addition.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @diml

Wouldn't it be a bit confusing to have two different conventions for the same keyword?

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @lpw25

Wouldn't it be a bit confusing to have two different conventions for the same keyword?

I'm not sure I understand what you mean. Do you mean the difference between equations and definitions?

The current form of type declarations is:

type foo = [equation] = [definition]

where both the equation and the definition can be left out. I'm basically suggesting that foo not be introduced to the environment until the second = unless rec is included.

I don't think people will find this confusing, they already need to understand the difference between a definition and an equation to understand many aspects of the language. Even if they don't fully grasp the difference, some notion along the lines of "You don't need to use rec for variant and record definitions" should suffice.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @yallop

I like some aspects of lpw25's proposal. The distinction between recursive definitions and non-recursive equations works well in SML, although SML has the advantage of a clearer syntactic distinction between the two.

However, I'm not sure how rare recursive type equations are in OCaml once you take recursive groups into account. I have this sort of thing in my code:

type _ t =
  A : 'a r t
| B : 'a s t
and 'a r = (int, 'a) u
and 'a s = (float, 'a) u
and ('a, 'b) u = { u : 'a -> 'b; t : ('a -> 'b) t }

Of course, it's possible to refactor the code to expand away the aliases, at the loss of some clarity, but code like this is still going to break if the scope of type alias constructors is changed.

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @lpw25

However, I'm not sure how rare recursive type equations are in OCaml once you take recursive groups into account.

Oh yes, I forgot about mutually recursive type equations. I think that they are still relatively rare though (perhaps a scan of OPAM is in order).

@vicuna
Copy link
Author

vicuna commented Jan 20, 2014

Comment author: @diml

I'm not sure I understand what you mean. Do you mean the difference between equations and definitions?

Yes.

I don't think people will find this confusing, they already need to understand the difference between a definition and an equation to understand many aspects of the language. Even if they don't fully grasp the difference, some notion along the lines of "You don't need to use rec for variant and record definitions" should suffice.

If that's the general feeling I'm fine with the proposal.

@vicuna
Copy link
Author

vicuna commented Oct 23, 2014

Comment author: @diml

I gave it a try: https://github.com/diml/ocaml/tree/explicit-manifest-rec-type

It requires adding 5 "type rec" in the compiler and 1 in the stdlib:

https://github.com/diml/ocaml/commit/de639b2fc2702caa9fc3a8037cc09b274421c4c5

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: yminsky

I'm worried by the "type rec" proposal for two reasons: first, I'd be quite surprised if this didn't break a ton of code. From speaking to Dim offline, apparently this does in fact break lots of code across OPAM.

Second, it's just surprising. The fact that here:

type t = [Some of t | None]

the t on the rhs refers to a previously defined t, but here

type t = Some of t | None

the t refers to self, is quite confusing. I prefer the semantics proposed for nonrec, where [t]'s on the rhs of "type nonrec t =" always refer to previously defined [t]s, no matter where they are.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @yallop

Well, the idea of the new proposal is to significantly reduce breakage of existing code.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: yminsky

Perhaps, and we can see how much it helps by a compiler patch and checking against Opam. I suspect it will still break things pretty broadly in opam, though.

But my main complaint is that the semantics is surprising, and assuming your adjustment to the proposal still distinguishes regular variants and polymorphic variants in the way described above, I still think it's more confusing than "nonrec".

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @yallop

I understand the concern that the adjusted scope is surprising, but I think in this case the surprise is not really justified. Type equations elsewhere in the language already have non-recursive semantics, so in some ways that fact that they have recursive semantics in definitions is a bit of an anomaly.

Here's an example. It's worth considering (1) why the example doesn't type check and (2) why a polymorphic variant description is allowed here, but a regular variant definition isn't.

  type t = int

  module M :
  sig type t end
    with type t = [`Some of t | `None] =
  struct type t = [`Some of t | `None] end

Under my proposal the example passes type checking, which I see as an improvement.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @sliquister

At the value level and module level, we have recursive bindings and non recursive bindings. nonrec does the same thing at the type level, rather than create rules that are arbitrary (the 4 combinations of (rec | nonrec) * (generative definition | alias) make sense) and would make trivial code change like changing a record to a tuple or putting a gadt constructor around an alias affect the scoping.
The advantage of having the same scoping rules at the value and type level, apart that it's easy to learn, is that it's nice for preprocessors that generate values from types.
I don't know what the upside of Leo's proposal is, compared to nonrec.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @yallop

What are your thoughts on the example I posted just above, sliquister?

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @sliquister

I think maybe you're looking at a half full glass.

You're saying [S with type t = typ] is the same as inlining [S] and replacing the body of [t] with [typ], or its unification with [typ] or something. Ok.

But in [type t = A of u and u = typ], right now (or with nonrec) I can inline [u] in [t] and create the same type. With your proposal, inlining [u] captures occurrences of [t] in [typ].

So it looks like you're trading one property for another, and it's not clear why the first property should be more desirable than the second.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @yallop

Actually, with my proposal [t] is already in scope in [typ] in the definition [type t = A of u and u = typ], so inlining [u] doesn't capture additional occurrences of [t].

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @sliquister

Ah, sorry. How about that then:

type t = A of u
and u = t

works but:

type t = A of u
and u = v (* oops, previously defined v *)
and v = t

doesn't because one more type is named.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2014

Comment author: @lpw25

As a side note to the discussion, if nonrec is added it should probably be done uniformly so that rec and nonrec work on all binding forms which can be recursive.

@vicuna
Copy link
Author

vicuna commented Oct 25, 2014

Comment author: @yallop

Looking at examples is a good idea, and can help compare the approaches. Here's a simple example: you have a signature and a type

  module type T = sig type t end
  type t = int

and you want to constrain and implement the signature, first as a module N using a non-recursive type and then as a module R using a recursive type.

In current OCaml both the non-recursive type and the recursive version are awkward. The non-recursive version is awkward because alias bindings are recursive (hence this Mantis issue) and the recursive version is awkward because constraint bindings are not recursive:

  (* In current OCaml *)

  module N :
  T with type t = [`T of t] =
  struct type t' = t
         type t = [`T of t'] end

  module R :
  T with type t = [`T of 't] as 't =
  struct type t = [`T of t] end

With the nonrec proposal the non-recursive version becomes a little less awkward, but rather non-uniform. The recursive version is unaffected:

  (* Using nonrec *)

  module N :
  T with type t = [`T of t] =
  struct type nonrec t = [`T of t] end

  module R :
  T with type t = [`T of 't] as 't =
  struct type t = [`T of t] end

With the 'type rec' proposal (i.e. a variant of lpw25's proposal) the implementation has a rather pleasing symmetry:

  (* Using type rec *)

  module N :
  T with type t = [`T of t] =
  struct type t = [`T of t] end

  module R :
  T with type rec t = [`T of t] =
  struct type rec t = [`T of t] end

@vicuna
Copy link
Author

vicuna commented Oct 25, 2014

Comment author: @gasche

To be clear, my vision of having nonrec in the language is the following:

  • all definition constructs (syntactic forms that bind names for some of their subterms) accept a "rec" and a "nonrec" version, that determine whether or not the name being bound also scopes over the defined subterms
  • each definition construct has a default (either rec or nonrec) according to common practices, past history, etc.
  • the actual implementation is done on a best-effort and best-uptream-acceptance basis (eg. if adding a "rec" in some specific place is judged too difficult for implementation or semantics, we can leave it out for now)

We can decide to give a different default to type synonyms vs. type declaration vs. "with" type equations. I think it would be better to always have the same default for all "type" declarations, but that's already not the case today. We could decide changing some defaults (eg. make synonyms non-recursive), but I think the greater impact comes from enabling both forms, less from discussing specifics, so I would rather adopt a conservative approach for now.

@vicuna
Copy link
Author

vicuna commented Nov 3, 2014

Comment author: @alainfrisch

Do we actually need to make "nonrec" a keyword (i.e. reject its use as a regular identifier and thus potentially break code) to support:

type nonrec t = ....

?

I've uploaded a very quick patch to parser.mly that allows an optional arbitrary LIDENT just after the "type" keyword (and also after the subsequent "and"s). One could then recognize "nonrec" (and fail on any other non-empty modifier).

@vicuna
Copy link
Author

vicuna commented Nov 4, 2014

Comment author: @lpw25

Do we actually need to make "nonrec" a keyword (i.e. reject its use as a regular identifier and thus potentially break code) to support:

type nonrec t = ....

?

We don't for that one, but we do in order to allow:

let nonrec x = ....

which I think we should for consistency.

I also think it is good practise to make keywords unavailable as identifiers, because any code using keywords as identifiers becomes confusing anyway.

For example, if we do not support nonrec on let then the above code would be valid and would create a function called nonrec, but to most users it would look like it was a non-recursive definition of x (especially as nonrec would probably be highlighted as a keyword in their editor).

@vicuna
Copy link
Author

vicuna commented Nov 4, 2014

Comment author: @alainfrisch

I'm in favor of adding a new keyword, but considering there is a strong resistance against adding keywords, I'd prefer to go with the hack above rather other proposed solutions or nothing at all.

@vicuna
Copy link
Author

vicuna commented Nov 11, 2014

Comment author: @diml

I updated my patch and created a pull request:

#116

@vicuna
Copy link
Author

vicuna commented Nov 11, 2014

Comment author: @gasche

I fully support the feature as it is presented in Jérémie's patch:

  • nonrec and rec are available to explicitly specify the recursivity status
  • in absence of the "recursivity flag", the default which was found most convenient by practice and history is chosen: values are non-recursive (because recursive values are relatively rare that making them explicit helps readability) and types are recursive (because (iso-)recursive types are on the contrary extremely frequest).

I think making "nonrec" not a keyword (eg. extensions type%nonrec ...) makes the syntax irregular and uses the extension mechanism for relatively nefarious purposes -- I would be against such a proposal. I think changing the default recursivity flag for types (all of them or just synonyms) is a change with respect to the current language that is too large (and thus too painful) to be justified.
This is a relatively small change, so we should neither under-do it (why tolerate the extra ugliness when we can keep the statu quo?) nor over-do it (is the transition cost worth it?). I feel dim's proposal hits the sweet spot: the resulting language is regular, well-justified, and has a reasonable transition cost.

@vicuna
Copy link
Author

vicuna commented Nov 11, 2014

Comment author: @yallop

I think we're (almost) all agreed that recursive type representation definitions and non-recursive value definitions are a reasonable default.

If we actually had recursive type representation definitions and non-recursive type equations I don't think that anybody would be saying it was "confusing", especially if we used separate keywords/syntax for the two forms, as in (e.g.) Caml Light.

Adding an optional 'nonrec' everywhere seems difficult to defend. OCaml doesn't have keywords that simply specify the default behaviour in other similar circumstances: there's no 'nonprivate', 'nonmutable' or 'nonvirtual'. Nobody seems to actually want 'let nonrec', except for the sake of uniformity: it doesn't make anything shorter or clearer. It's a significant addition to the language to solve a fairly minor problem which would have a much better solution if we could start from scratch.

@vicuna
Copy link
Author

vicuna commented Nov 12, 2014

Comment author: @diml

OCaml doesn't have keywords that simply specify the default behaviour in other similar circumstances: there's no 'nonprivate', 'nonmutable' or 'nonvirtual'.

The situation is a bit different for these cases as 'private', 'mutable' and 'virtual' are never the default.

@vicuna
Copy link
Author

vicuna commented Nov 12, 2014

Comment author: @garrigue

If nonrec is really needed, I believe it should only be allowed where it helps.
I.e. type nonrec, class type nonrec and class nonrec.

Also, for coherence, nonrec should be a keyword.
Independently of this, we might want to introduce a new category of "definition qualifiers", for keywords that appear only after another keyword, but this is not part of this discussion.

This said, I still have a strange feeling about this "nonrec" discussion, because this all boils down to questions of internal vs. external names, and they would be the only way to give a real solution to this problem. Namely, here we only discuss what can be defined, but there is also the question of printing signatures correctly in presence of identifier shadowing.

So yet another approach would be to allow making these names explicit:
type t as t_local = t * t
Note however that for this to work properly, the scope of t_local is not only the body of the definition, but everything else appearing after it in the module.
So if you want to export it as t to the rest of the module, you would have to write
include (struct type t as t_local = t * t end)
This is getting complicated, but I believe this is in theory the only solution....
If you introduce this kind of mechanism, then "type nonrec t = t * t" could be seen as syntactic sugar for the above include.

@vicuna
Copy link
Author

vicuna commented Dec 9, 2014

Comment author: @diml

If nonrec is really needed, I believe it should only be allowed where it helps.
I.e. type nonrec, class type nonrec and class nonrec.

OK. I allowed nonrec everywhere in the proposal for uniformity, but if the consensus is to not allow it everywhere I don't mind changing the proposal. The main part is adding the nonrec keyword.

So yet another approach would be to allow making these names explicit:
type t as t_local = t * t

To fix the signature printing problem we would need the same mechanism for modules. And probably for values too for so that it play nicely with deriving like code generators. I'm not sure about the as-syntax as it wouldn't be consistent with the "as" in patterns: [let x as y = 42] defines [x] and [y] both internally and externally.

But IMHO this doesn't seem natural. Doing it the other way around seems much better. i.e. [type t as t_local] would define both [t] and [t_local] (with [t_local] being local only) so that one would write:

type t as t_local

module M = struct
type t = t_local * t_local

Though that wouldn't work well when [t] comes from an [include]. And it still wouldn't be consistent with the "as" in patterns.

But back to the main discussion I think "nonrec" is quite simple, has been used for a long time through type_conv and the code is already written, so I'm still in favor of adding it.

If you introduce this kind of mechanism, then "type nonrec t = t * t" could be seen as syntactic sugar for the above include.

I believe "nonrec" should have the same status as "rec". Recursivity of a definition is something the compiler and ppx rewriters should know about. It shouldn't be a syntactic sugar.

@vicuna
Copy link
Author

vicuna commented Mar 13, 2015

Comment author: @diml

Commited in 4.02 (rev 15926-15933) and trunk (rev 15918-15925).

@vicuna
Copy link
Author

vicuna commented Mar 13, 2015

Comment author: @yallop

Printing's broken; it doesn't preserve 'and':

$ cat nr.ml
type t = int
module M = struct
type nonrec t = float
and s = t
end
$ ocamlc -i nr.ml
type t = int
module M : sig type nonrec t = float type nonrec s = t end

@vicuna
Copy link
Author

vicuna commented Mar 13, 2015

Comment author: @diml

Indeed, I attached a patch that try to fix the issue by making nonrec groups [Trec_not; Trec_next, ...] instead of [Trec_not; Trec_not; ...]. I need to check that it works with the other use of rec_status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant