New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-recursive type declarations #6016
Comments
Comment author: @alainfrisch A more general request: it would also be useful to refer to a type defined above (especially in an outer module) in the module even if is has been shadowed. type t = A | B
module M = struct
type t = X | Y
let f x : (*DOTDOT.t*) = match x with X -> A | Y -> B
type map = t -> (*DOTDOT.t*)
end I don't have a syntax to propose, though. This is really the same problem as the one you mention and the work-around is the same, but only supporting a "nonrec" annotation would not be enough to solve it. So I guess it's worth thinking about a more general solution. |
Comment author: @alainfrisch Another remark: it is quite common in syntax extensions to introduce new identifiers (types in your example, and often also values) to be used in code also inserted by the syntax extension, but we don't really want them to show up in the inferred signature or pollute too much the processed code. I think Hongbo suggested at some point to introduce a convention that identifiers starting with "__" should not be kept in the inferred signatures (for values, this is easy, for types, this is more difficult). An alternative could be to provide a way to "unset" an identifier for the remaining scope (and to drop it from the inferred signature). Examples: type tmp = t
type t = tmp
unset type t
module X = struct
let local_name_XYZ12345 = ... (* a local value inserted by a syntax extension, to be used within this module *)
...
...
unset val local_name_XYZ12345
end |
Comment author: @gasche I think all bindings constructs should have both a recursive and a non-recursive version, with the "general usage case" deciding which one is the default. I do support the proposal as suggested by Jeremy, but the addition of a new keyword "nonrec" is quite problematic -- I'm afraid this may be a case of doing nothing because no consensual syntax is acceptable, due to the lack of negative keyword. Alain, I'm much less convinced by DOTDOT. Recursive vs. non-recursive binding is a familiar concept that already exists in the language (let, module...). On the contrary, few programming languages have a feature for referring to previous versions of shadowed variables. That seems to be a case of feature that is hard to design right, surprising to newcomers, and without which most of the world has apparently lived fine so far. |
Comment author: @sliquister type-conv is actually implemented in such a way that nonrec doesn't expose any temporary types. type nonrec t = t is rewritten (roughly): include (struct
type tmp = t
type t = tmp
end with type tmp := t) |
Comment author: @alainfrisch
And few programming languages distinguish support both recursive and non-recursive definitions for the same kind of objects. I believe that the problem is not related to recursive vs. non-recursive declarations, but about the fact that there is no way to use a global name for components (e.g. Foo.t in my example, if it were was put in a file foo.ml). In C# for instance, all class declarations are mutually recursive, but you can refer to an outer class from an inner one using its global path. I'm not claiming that we should necessarily support something like that in OCaml, just that the proposed solution is still a specific work-around for this lack of naming feature (which is not to say that it's not a decent work-around), which is a more general problem. |
Comment author: @lpw25
Not all components have a global name, so providing global names would probably not be a full replacement for "nonrec". For example: module F = functor (M: sig end) -> struct
type t = A | B
module M = struct
type nonrec t = t * string
end
end |
Comment author: @alainfrisch
What about "not rec"?
But precisely, one could try to provide more ways to name components. I don't think this is really required, but it would provide a more general solution to the reported problem. One possibility could be to name structures: struct ... end as M. Within the "..." one could refer to M.t (provided it has already been defined, i.e. this does not introduce recursivity; and "M" does not become a valid module identifier). module F = functor (M: sig end) -> struct
type t = A | B
module M = struct
type t = N.t * string
end
end as N Other possibilities: a "scope capture" feature (give a name to the current environment, to allow referring to it in an inner scope); relative paths (traversing nested structures). |
Comment author: dario
I agree that "type not rec" would be readable and intuitive. However, "not" is presently not a keyword, and it would become one in this new syntax. Won't this have possibly undesirable ramifications? Note that currently all boolean operators are defined as functions in Pervasives... |
Comment author: @sliquister I don't know if 'not rec' is better than 'nonrec' but I don't see why we need to add a keyword. Just adding a token is enough. lident: and then added NOT REC in the production for type definitions. As expected, it all worked fine. There was just one tiny shift/reduce conflict that I solved by inlining the rule optional_type_parameters. |
Comment author: @gasche We have been purposefully holding back from adding "contextual keywords" (words that take a predefined meaning only in some syntactic contexts) so far, and I'm not sure we are ready to change this. (I personally prefer to have a single word in this place, because "not rec" or "nonrec" is a modality that comes as one atomic construct. Modalities after "let" are either absent or exactly one keyword (rec), we could think of having several modalities as several word ("method private virtual ..."), but it seems unwise to have a single modality split as several words.) |
Comment author: @garrigue I basically agree with Alain that the real problem is that one cannot refer to values in an absolute way. I'm afraid there is no really clean solution for that, as the ability to hide components is an essential part of the ML module system. A simple approach would be to use modules names as scope markers, and to refer to scopes using a new syntax, moving in a stack of environments. Everytime you enter a new named module (or module type), you push the current environment with the previous name.
Actually I'm not sure starting from the outermost scope is the right approach, w.r.t refactoring. |
Comment author: @lpw25
This solution doesn't address the problem of unnamed modules (in particular
Since namespaces are essentially named environments, this might be the best module F = functor (M: sig end) -> struct |
Comment author: @garrigue
This is not the case. module F = functor (X : argsig) -> body in both body and argsig you can refer to locally shadowed definitions coming from outside of F. The point is that to use several times the same name you need nesting, and to get nesting you need to name modules anyway. |
Comment author: @lpw25
I see, so ^M refers to the "module M = module_expr" construct itself rather than to the module M.
This is not technically true, you can put module expressions (and therefore create nested scopes) in other places than "module M = ..." :
or
although these are obviously less important than functors. |
Comment author: @garrigue
This is the idea.
Actually, your example with include is not valid: you end up defining two versions of t, which is not allowed in the same module. Concerning first-class modules, they are indeed a way to create modules without the "module" keyword.
without changing the signature. |
Comment author: @lpw25
It is valid because the signature prevents the double definition. However, these examples don't actually matter, I was only illustrating that it was possible. Personally, I still don't like this feature because it uses module names to refer to things that are not modules, and I think it will mostly just confuse people. I would prefer either a simple "nonrec" feature or a more general feature built around namespaces (or some other explicit handling of environments). |
Comment author: yminsky This thread seems to have died. What do people think about the original idea of adding a nonrec tag to the compiler? The syntax-extension version we use is an annoyance, and we'd like to stop using it. My basic view is that it is that nonrec definitions are as natural at the type level as they are at the value level, and it would be best if the compiler made them available. |
Comment author: @yallop It looks like it's going to be difficult to reach agreement on adding a new keyword. How about introducing a different binding symbol for simple non-recursive aliases?
(cf. make's recursively expanded vs simply expanded variables, which also use '=' and ':=' respectively: http://www.gnu.org/software/make/manual/make.html#Flavors) |
Comment author: @gasche It's a bit doubtful as it mirrors the syntax for destructive substitution, and it's not regular (while we could allow both "rec" and "nonrec" on all binding forms, irrespectively of what their default is). I'm sorry to find myself in the position of impeding compromise and progress, but my gut feeling is that I'd rather have an explicit, non-surprising "nonrec" form (as is as a keyword, or something else), or keeping using the old workaround. |
Comment author: @diml What are the obstacles to adding a nonrec keyword? Is there a disagreement on the name or the usefulness of such a keyword? I doubt it is going to break anything and it is trivial to fix anyway. |
Comment author: @alainfrisch
I don't think that the objection is against the addition of the "nonrec" keyword, but rather about the addition of any new keyword. I don't have a strong opinion on this general topic. I think I'd have preferred for instance to add an "override" keyword rather than to use the less descriptive "!" symbol on object methods, but it's true that we have plenty of occurrences of "override" as an identifier in our code base. For the specific instance under discussion:
|
Comment author: @trefis
I don't think that's any better than adding a new keyword and personally, I'd rather have the "nonrec" keyword than this construction (where "not" is not a keyword but has a special role in this particular context).
I have no real argument for or against that, except that I find it ugly. |
Comment author: @lpw25 This may be more ambitious, but we could try to fix this "properly" in a more long term way: First allow the form type rec t = ... to indicate that t will be used recursively in a type equation. Then deprecate (with a warning) using a type recursively in a type equation without using the Then, in a version or two, make type equations without the I think that recursive type equations are fairly rare, and without |
Comment author: dario Two things:
|
Comment author: @diml Wouldn't it be a bit confusing to have two different conventions for the same keyword? |
Comment author: @lpw25
I'm not sure I understand what you mean. Do you mean the difference between equations and definitions? The current form of type declarations is: type foo = [equation] = [definition] where both the equation and the definition can be left out. I'm basically suggesting that I don't think people will find this confusing, they already need to understand the difference between a definition and an equation to understand many aspects of the language. Even if they don't fully grasp the difference, some notion along the lines of "You don't need to use |
Comment author: @yallop I like some aspects of lpw25's proposal. The distinction between recursive definitions and non-recursive equations works well in SML, although SML has the advantage of a clearer syntactic distinction between the two. However, I'm not sure how rare recursive type equations are in OCaml once you take recursive groups into account. I have this sort of thing in my code:
Of course, it's possible to refactor the code to expand away the aliases, at the loss of some clarity, but code like this is still going to break if the scope of type alias constructors is changed. |
Comment author: @lpw25
Oh yes, I forgot about mutually recursive type equations. I think that they are still relatively rare though (perhaps a scan of OPAM is in order). |
Comment author: @diml
Yes.
If that's the general feeling I'm fine with the proposal. |
Comment author: @diml I gave it a try: https://github.com/diml/ocaml/tree/explicit-manifest-rec-type It requires adding 5 "type rec" in the compiler and 1 in the stdlib: https://github.com/diml/ocaml/commit/de639b2fc2702caa9fc3a8037cc09b274421c4c5 |
Comment author: yminsky I'm worried by the "type rec" proposal for two reasons: first, I'd be quite surprised if this didn't break a ton of code. From speaking to Dim offline, apparently this does in fact break lots of code across OPAM. Second, it's just surprising. The fact that here: type t = [ the t on the rhs refers to a previously defined t, but here type t = Some of t | None the t refers to self, is quite confusing. I prefer the semantics proposed for nonrec, where [t]'s on the rhs of "type nonrec t =" always refer to previously defined [t]s, no matter where they are. |
Comment author: @yallop Well, the idea of the new proposal is to significantly reduce breakage of existing code. |
Comment author: yminsky Perhaps, and we can see how much it helps by a compiler patch and checking against Opam. I suspect it will still break things pretty broadly in opam, though. But my main complaint is that the semantics is surprising, and assuming your adjustment to the proposal still distinguishes regular variants and polymorphic variants in the way described above, I still think it's more confusing than "nonrec". |
Comment author: @yallop I understand the concern that the adjusted scope is surprising, but I think in this case the surprise is not really justified. Type equations elsewhere in the language already have non-recursive semantics, so in some ways that fact that they have recursive semantics in definitions is a bit of an anomaly. Here's an example. It's worth considering (1) why the example doesn't type check and (2) why a polymorphic variant description is allowed here, but a regular variant definition isn't.
Under my proposal the example passes type checking, which I see as an improvement. |
Comment author: @sliquister At the value level and module level, we have recursive bindings and non recursive bindings. nonrec does the same thing at the type level, rather than create rules that are arbitrary (the 4 combinations of (rec | nonrec) * (generative definition | alias) make sense) and would make trivial code change like changing a record to a tuple or putting a gadt constructor around an alias affect the scoping. |
Comment author: @yallop What are your thoughts on the example I posted just above, sliquister? |
Comment author: @sliquister I think maybe you're looking at a half full glass. You're saying [S with type t = typ] is the same as inlining [S] and replacing the body of [t] with [typ], or its unification with [typ] or something. Ok. But in [type t = A of u and u = typ], right now (or with nonrec) I can inline [u] in [t] and create the same type. With your proposal, inlining [u] captures occurrences of [t] in [typ]. So it looks like you're trading one property for another, and it's not clear why the first property should be more desirable than the second. |
Comment author: @yallop Actually, with my proposal [t] is already in scope in [typ] in the definition [type t = A of u and u = typ], so inlining [u] doesn't capture additional occurrences of [t]. |
Comment author: @sliquister Ah, sorry. How about that then: type t = A of u works but: type t = A of u doesn't because one more type is named. |
Comment author: @lpw25 As a side note to the discussion, if |
Comment author: @yallop Looking at examples is a good idea, and can help compare the approaches. Here's a simple example: you have a signature and a type
and you want to constrain and implement the signature, first as a module N using a non-recursive type and then as a module R using a recursive type. In current OCaml both the non-recursive type and the recursive version are awkward. The non-recursive version is awkward because alias bindings are recursive (hence this Mantis issue) and the recursive version is awkward because constraint bindings are not recursive:
With the nonrec proposal the non-recursive version becomes a little less awkward, but rather non-uniform. The recursive version is unaffected:
With the 'type rec' proposal (i.e. a variant of lpw25's proposal) the implementation has a rather pleasing symmetry:
|
Comment author: @gasche To be clear, my vision of having nonrec in the language is the following:
We can decide to give a different default to type synonyms vs. type declaration vs. "with" type equations. I think it would be better to always have the same default for all "type" declarations, but that's already not the case today. We could decide changing some defaults (eg. make synonyms non-recursive), but I think the greater impact comes from enabling both forms, less from discussing specifics, so I would rather adopt a conservative approach for now. |
Comment author: @alainfrisch Do we actually need to make "nonrec" a keyword (i.e. reject its use as a regular identifier and thus potentially break code) to support: type nonrec t = .... ? I've uploaded a very quick patch to parser.mly that allows an optional arbitrary LIDENT just after the "type" keyword (and also after the subsequent "and"s). One could then recognize "nonrec" (and fail on any other non-empty modifier). |
Comment author: @lpw25
We don't for that one, but we do in order to allow:
which I think we should for consistency. I also think it is good practise to make keywords unavailable as identifiers, because any code using keywords as identifiers becomes confusing anyway. For example, if we do not support |
Comment author: @alainfrisch I'm in favor of adding a new keyword, but considering there is a strong resistance against adding keywords, I'd prefer to go with the hack above rather other proposed solutions or nothing at all. |
Comment author: @gasche I fully support the feature as it is presented in Jérémie's patch:
I think making "nonrec" not a keyword (eg. extensions type%nonrec ...) makes the syntax irregular and uses the extension mechanism for relatively nefarious purposes -- I would be against such a proposal. I think changing the default recursivity flag for types (all of them or just synonyms) is a change with respect to the current language that is too large (and thus too painful) to be justified. |
Comment author: @yallop I think we're (almost) all agreed that recursive type representation definitions and non-recursive value definitions are a reasonable default. If we actually had recursive type representation definitions and non-recursive type equations I don't think that anybody would be saying it was "confusing", especially if we used separate keywords/syntax for the two forms, as in (e.g.) Caml Light. Adding an optional 'nonrec' everywhere seems difficult to defend. OCaml doesn't have keywords that simply specify the default behaviour in other similar circumstances: there's no 'nonprivate', 'nonmutable' or 'nonvirtual'. Nobody seems to actually want 'let nonrec', except for the sake of uniformity: it doesn't make anything shorter or clearer. It's a significant addition to the language to solve a fairly minor problem which would have a much better solution if we could start from scratch. |
Comment author: @diml
The situation is a bit different for these cases as 'private', 'mutable' and 'virtual' are never the default. |
Comment author: @garrigue If nonrec is really needed, I believe it should only be allowed where it helps. Also, for coherence, nonrec should be a keyword. This said, I still have a strange feeling about this "nonrec" discussion, because this all boils down to questions of internal vs. external names, and they would be the only way to give a real solution to this problem. Namely, here we only discuss what can be defined, but there is also the question of printing signatures correctly in presence of identifier shadowing. So yet another approach would be to allow making these names explicit: |
Comment author: @diml
OK. I allowed nonrec everywhere in the proposal for uniformity, but if the consensus is to not allow it everywhere I don't mind changing the proposal. The main part is adding the nonrec keyword.
To fix the signature printing problem we would need the same mechanism for modules. And probably for values too for so that it play nicely with deriving like code generators. I'm not sure about the as-syntax as it wouldn't be consistent with the "as" in patterns: [let x as y = 42] defines [x] and [y] both internally and externally. But IMHO this doesn't seem natural. Doing it the other way around seems much better. i.e. [type t as t_local] would define both [t] and [t_local] (with [t_local] being local only) so that one would write: type t as t_local module M = struct Though that wouldn't work well when [t] comes from an [include]. And it still wouldn't be consistent with the "as" in patterns. But back to the main discussion I think "nonrec" is quite simple, has been used for a long time through type_conv and the code is already written, so I'm still in favor of adding it.
I believe "nonrec" should have the same status as "rec". Recursivity of a definition is something the compiler and ppx rewriters should know about. It shouldn't be a syntactic sugar. |
Comment author: @diml Commited in 4.02 (rev 15926-15933) and trunk (rev 15918-15925). |
Comment author: @yallop Printing's broken; it doesn't preserve 'and': $ cat nr.ml |
Comment author: @diml Indeed, I attached a patch that try to fix the issue by making nonrec groups [Trec_not; Trec_next, ...] instead of [Trec_not; Trec_not; ...]. I need to check that it works with the other use of rec_status. |
Original bug ID: 6016
Reporter: @diml
Assigned to: @diml
Status: closed (set by @xavierleroy on 2016-12-07T10:47:21Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: later
Target version: later
Fixed in version: 4.02.2+dev / +rc1
Category: typing
Related to: #6623
Monitored by: bobot "Julien Signoles" @Chris00
Bug description
All type declarations are by default recursive. This is a problem when one wants to alias a type in a sub-module. The classic solution is to create a "proxy" type:
At Jane Street we have been using a camlp4 extension to do that automatically when a type is marked with the keyword "nonrec":
We believe it could be useful to everybody and it would be better to have it into the language proper, for instance to avoid polluting signatures with useless types everywhere.
File attachments
The text was updated successfully, but these errors were encountered: