Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant type annotation on argument causes polymorphic-recursive definition to be rejected #6673

Closed
vicuna opened this issue Nov 24, 2014 · 22 comments

Comments

@vicuna
Copy link

vicuna commented Nov 24, 2014

Original bug ID: 6673
Reporter: @fpottier
Assigned to: @garrigue
Status: acknowledged (set by @garrigue on 2014-12-18T01:17:40Z)
Resolution: duplicate
Priority: normal
Severity: minor
Version: 4.01.0
Category: typing
Has duplicate: #7483
Related to: #6569
Monitored by: @ygrek

Bug description

This is accepted:
let f : 'a . 'a -> 'a = fun x -> x;;

And this is accepted too:
let f = fun (x : 'a) -> x;;

But this is rejected:
let f : 'a . 'a -> 'a = fun (x : 'a) -> x;;
Error: This definition has type 'a -> 'a which is less general than
'a0. 'a0 -> 'a0

This behavior seems inconsistent to me, and somewhat unplesant.
(I have teaching in mind.) Is it a bug and could it be fixed?

Thanks
Francois.

File attachments

@vicuna
Copy link
Author

vicuna commented Nov 24, 2014

Comment author: @gasche

This is a known issue with type variables: because they have no explicit binder, it is hard to know where they are introduced; I think they are considered at the level of the innermost toplevel item (they can be nested if we are in a submodule), and this is above the annotation in your example.

Constant constructors where introduced to alleviate this issue by having an explicit binding site for rigid variables:

let id (type a) : a -> a = fun (x : a) -> x;;

The "type a ." syntax desugars to a combination of a constant-constructor binding and a polymorphic annotation:

let id : type a . a -> a = fun (x : a) -> x;;

(with the somewhat surprising and ugly effect that (type a . ...) scopes outside the (...).)

@vicuna
Copy link
Author

vicuna commented Nov 24, 2014

Comment author: @gasche

You may be amused to know that another workaround is to perform the type annotation "above" the innermost toplevel item by moving it into a signature constraint:

include (struct let id = fun (x : 'a) -> x end : sig val id : 'a -> 'a end);;

For teaching and readability purposes, I would like the following syntax to be accepted in the future:

val id : 'a . 'a -> 'a
let id x = x (* here (x : 'a) would be valid, but still an unrelated flexible variable *)

@vicuna
Copy link
Author

vicuna commented Nov 24, 2014

Comment author: @lpw25

This is a known issue with type variables: because they have no explicit binder, it is hard to know where they are introduced

Personally, I think the problem here is not that the regular type variables are introduced at the top-level, but that the universal type variables are not introduced at the top-level (even though they are syntactically part of the top-level construct).

Is there any strong reason this could not be supported, by introducing universal variables earlier?

@vicuna
Copy link
Author

vicuna commented Nov 24, 2014

Comment author: @lpw25

Can we stop marking this issue resolved (same as for #6569)? The code in the example is perfectly reasonable, and would ideally be accepted. The issue isn't resolved by the existence of a work-around, because it is not a question of expressibility but a question of disallowing perfectly reasonable code for a reason that is not obvious to even experienced users of the language.

@vicuna
Copy link
Author

vicuna commented Nov 24, 2014

Comment author: @lpw25

(Perhaps this issue is marked resolved because it is a duplicate of #6569, in which case my above complaint is about that issue being marked resolved rather than this one.)

@vicuna
Copy link
Author

vicuna commented Nov 26, 2014

Comment author: @garrigue

There is a confusion here.
In annotations of the form ('a. 'a -> 'a), the universal type variable is local to the annotation, and it is not bound anywhere else.

So, in the following code, the 'a on x denotes another type variable, global this time. Since variables at the global level cannot be generalized locally, this makes typing fail.

let f : 'a . 'a -> 'a = fun (x : 'a) -> x;;

The solution is to use locally abstract types:

let f : type a. a -> a = fun (x : a) -> x;;

By the way, allowing the first program would require changing the specification, possibly breaking existing programs.
A half baked solution could be to have a warning when the same type variable name is used to denote different variables.

@vicuna
Copy link
Author

vicuna commented Nov 26, 2014

Comment author: @fpottier

Jacques wrote: "allowing the first program would require changing the specification, possibly breaking existing programs".

I am not sure about that. If one decides to change \exists\forall to
\forall\exists, then one accepts more programs.

I would argue that in this code:

let f : 'a . 'a -> 'a = fun (x : 'b) -> x;;

it makes sense to view the left-hand type annotation as "at the toplevel"
and quantify 'a universally before quantifying 'b existentially. The
question of whether 'a and 'b have the same name or different names seems
irrelevant.

Best,
François.

@vicuna
Copy link
Author

vicuna commented Nov 26, 2014

Comment author: @yakobowski

So, in the following code, the 'a on x denotes another type variable,
global this time. Since variables at the global level cannot be generalized
locally, this makes typing fail.

So this is yet another instance of the fact that named type variables that that are not explicitly bound are quantified at toplevel? I noticed the issue for the first time when working on MLF, and I did not understand why those variables were not bound at the nearest (lowest) possible binding site -- especially given the way type generalization is implemented in OCaml's type checker. I'm afraid I still don't understand :-(

(This would be coherent with François's last proposal: 'b would be introduced at the same level as 'a.)

@vicuna
Copy link
Author

vicuna commented Nov 27, 2014

Comment author: @garrigue

Ok, again this proposal of moving to the SML approach of using the innermost level.
Indeed, this would solve some problems, at the cost of an extra pass on the syntax tree. This is still a change in the specification, but not one that breaks existing programs. I suppose we should take a decision on this.

My point was that, however, this change would not give the behavior that some seemed to expect, namely that 'a is shared between the two annotations. If you want that behavior, there is already a syntax for that.

@vicuna
Copy link
Author

vicuna commented Nov 27, 2014

Comment author: @lpw25

In annotations of the form ('a. 'a -> 'a), the universal type variable is local to the annotation, and it is not bound anywhere else.

Sorry, I should have been clearer. When I talked about universal type variables not being introduced at the top-level, I was not referring to their scope, but to their level.

At the moment, universal variables are handled at level !global_level + 1 whereas unification variables have level !global_level so they cannot be generalized when comparing with the universal variables. In other words, check_univars is called from within a begin_def/end_def pair.

When I suggest that universal variables be introduced earlier, I mean that they should be handled at the same level as their enclosing let construct (i.e. at !global_level). In other words, I am suggesting we move the call to check_univars to after the outmost end_def in type_let.

The attached patch may not quite be correct, but it illustrates what I am suggesting, and successfully types the example.

@vicuna
Copy link
Author

vicuna commented Dec 16, 2014

Comment author: @lpw25

Jacques, any thoughts on the patch I attached?

@vicuna
Copy link
Author

vicuna commented Dec 18, 2014

Comment author: @garrigue

OK, what you are suggesting is delaying check_univars until after all the definitions are generalized. This may make sense (i.e., be sound); have to think about that. In that case we probably wouldn't need all the level mangling.

@vicuna
Copy link
Author

vicuna commented Dec 22, 2014

Comment author: @garrigue

It took me a while, but no I recall why this cannot work (at least with the current approach).
Namely, we need one more level, to ensure that universal variables do not leak.
With your patch, we do not check that, and end up accepting wrong programs such as:

let rec depth : 'a. 'a t -> _ =
function Leaf x -> x | Node x -> depth x;;

(Well, the program is not wrong, but the type inferred is 'a t -> 'b, which is wrong).

I see no easy solution, at least level-based.

@vicuna
Copy link
Author

vicuna commented Dec 22, 2014

Comment author: @lpw25

With your patch, we do not check that, and end up accepting wrong programs such as:

let rec depth : 'a. 'a t -> _ =
function Leaf x -> x | Node x -> depth x;;

I see.

So the current scheme instantiates the poly type, unifying the result with the type of the expression and then uses check_univars to lift the instances of universal variables which are still generalisable back into being universal variables, to check whether every universal variable is appropriately generalised.

This essentially misses out the equivalent of the occur_univars check that prevents 'a. 'a -> 'b from unifying with 'a. 'a -> 'a. This doesn't currently matter because the levels of free variables in the poly type ensure that they are not generalisable during check_univars and so will not be replaced by a universal variable.

My patch changes the levels of the free variables in the poly type, so that they may be inappropriately changed into universal variables.

It seems to me that a simple solution to this is just to run occur_univars over the poly type at the end of check_univars. This will ensure that none of the free variables in the poly type have been turned into universal variables, because the universal variables created by check_univars are fresh and so will not be bound by a surrounding Tpoly node.

The attached patch implements this suggestion, and does indeed give an appropriate error for the above example.

@vicuna
Copy link
Author

vicuna commented Dec 22, 2014

Comment author: @lpw25

It seems to me that a simple solution to this is just to run occur_univars over the poly type at the end of check_univars. This will ensure that none of the free variables in the poly type have been turned into universal variables, because the universal variables created by check_univars are fresh and so will not be bound by a surrounding Tpoly node.

The attached patch implements this suggestion, and does indeed give an appropriate error for the above example.

There is definitely something wrong with this patch, as it fails one test, but I haven't worked out what it is yet. I will investigate further.

@vicuna
Copy link
Author

vicuna commented Dec 22, 2014

Comment author: @lpw25

There is definitely something wrong with this patch, as it fails one test, but I haven't worked out what it is yet. I will investigate further.

The problem is that the universal variables created by check_univars can appear in the types of other bindings in a set of mutually recursive bindings. Since these bindings don't have a poly type ascription, they do not have occur_univars run on them and so the escaping universal variable goes unnoticed. It seems something a touch more sophisticated is required.

@vicuna
Copy link
Author

vicuna commented Jan 17, 2015

Comment author: @lpw25

I've attached a patch implementing a more complete solution.

It changes the approach for checking polymorphic types. Instead of generalising the unification variables and then mutating them to be universal variables, it generalises the whole type and then instantiates it with univeral variables for the generalised unification variables. This instance is then unified with the intended polymorphic type -- which allows the usual unification checks to ensure that universal variables do not escape.

I believe this approach to be correct, and the patch successfully types the examples in this thread. The patch passes all tests (modulo error messages) but one: unlike current OCaml, the patch allows the following definition to type-check:

let rec depth : 'a. 'a t -> _ =
  function Leaf _ -> 1 | Node x -> 1 + d x

and d x = depth x

I believe this is sound and just illustrates the improvement in flexibility of this approach.

(Note that another test was also failing, but it was just an instance of #6744).

@vicuna
Copy link
Author

vicuna commented Jan 19, 2015

Comment author: @garrigue

I didn't check your patch, but I see two problems:

  1. What you are checking is the equality of polymorphic types.
    However, the actual function type is allowed to be more general than the required one.
    If this already works, we need to know why.

  2. Due to recursive types, substituting on a type is more complex than that...

@vicuna
Copy link
Author

vicuna commented Jan 19, 2015

Comment author: @lpw25

  1. What you are checking is the equality of polymorphic types.
    However, the actual function type is allowed to be more general than the required one.
    If this already works, we need to know why.

Sorry, when I said "instantiates it with universal variables for the generalised unification variables" I meant the unification variables returned by instance_poly not every single generalised variable in the type. In other words, it uses the exact same approach as the existing handling of polymorphic types.

The checking of whether the expressions type is sufficiently generalised is basically identical to what is there at the moment. The difference is that, during the check, instead of mutating the expression's type, we create a copy (instance) and surround it with a Tpoly node to give us the poly-type of the expression. This allows us to then use regular unification to check that universal variables do not escape, rather than relying on levels.

  1. Due to recursive types, substituting on a type is more complex than that...

I'm afraid I don't really understand what you mean here. Could you make your point more concretely by referring to the patch itself?

@vicuna
Copy link
Author

vicuna commented Jan 20, 2015

Comment author: @garrigue

Sorry, for point (2) I misunderstood your code.
You're calling Ctype.copy, so this should be ok.

I now think that this is sound, but we need a more concrete argument.
Also, if we choose this approach, we need an exact specification of what is accepted and what is not, independently of the algorithm.
I expect a rule like:

E, x1 : s1, ..., xn : sn |- Mi : ti Gen(E,ti) < si si = Si(gi) (i=1..n)
E, x1 : Gen(E,s1), ..., xn : Gen(E,sn) |- M : t

E |- let rec x1 : g1 = M1 ... xn : gn = Mn in M : t

(where Gen(E,ti) < si means that si is an instance of Gen(E,ti), the maximal generalization of ti under E, and S_i is a substitution on non-quantified source variables)
but we want to be sure that we are sound and complete.

@vicuna
Copy link
Author

vicuna commented Apr 3, 2017

Comment author: @lpw25

I've moved my patch into a pull request, as that's the fashion these days:

#1132

@lpw25
Copy link
Contributor

lpw25 commented Jan 21, 2020

This has been fixed on trunk by #1132.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants