Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no easy way to give names to existential variables introduced by GADT pattern-matching #7074

Closed
vicuna opened this issue Dec 4, 2015 · 23 comments
Assignees
Labels
feature-wish typing typing-GADTS GADT typing and exhaustiveness bugs

Comments

@vicuna
Copy link

vicuna commented Dec 4, 2015

Original bug ID: 7074
Reporter: @garrigue
Assigned to: @garrigue
Status: assigned (set by @garrigue on 2015-12-04T01:38:20Z)
Resolution: open
Priority: normal
Severity: feature
Version: 4.02.3
Target version: later
Category: typing
Has duplicate: #5867
Related to: #5780
Child of: #5998
Monitored by: @gasche @diml "Julien Signoles" @yallop @hcarty @alainfrisch

Bug description

When you do pattern-matching on GADTs, new existential variables may be introduced in the environment, either by refining existing locally abstract types, or because the constructor contains some existential variables. One cannot give them a name: they are incompatible with any existing existential variable, and using a normal type variable name causes a scope extrusion.

One often wants to introduce type annotations just for readability, but in some cases they are even required for typing.

There has been already a lengthy discussion in #5780 on what kind of syntax could be used to do that, but no conclusion yet. Let's move the discussion here.

Steps to reproduce

type zero = Zero
type 'a succ = Succ of 'a
type _ nat =
  | NZ : zero nat
  | NS : 'a nat -> 'a succ nat

type (_,_) equal = Eq : ('a,'a) equal

let rec compare : type m n. m nat -> n nat -> (m,n) equal option = fun m n ->
  match m, n with
  | NZ, NZ -> Some Eq
  | NS m', NS n' -> (match compare m' n' with Some Eq -> Some Eq | None -> None)
  | _ -> None

(* There is no way to write a type annotation on m' or n' in the above function *)

Additional information

(* The following workaround does work, but it is verbose, introduces some eta-expansion,
    and forgets some local equalities (i.e., one cannot write (m,n) equal option) *)

let rec compare : type m n. m nat -> n nat -> (m,n) equal option = fun m n ->
  match m, n with
  | NZ, NZ -> Some Eq
  | NS m', NS n' ->
     let cmp : type m' n'. m' nat -> n' nat -> (m' succ,n' succ) equal option = fun m' n' ->
        match compare m' n' with Some Eq -> Some Eq | None -> None
     in cmp m' n'
  | _ -> None
@vicuna
Copy link
Author

vicuna commented Jun 6, 2016

Comment author: @stedolan

How about allowing (type a) before the patterns in a match case, since it's already allowed before the pattern of a fun?

| (type a) (type b) NS (m' : a nat), NS (n' : b nat) -> ...

I interpret the above as introducing new locally abstract types, which are immediately unified with the GADT existentials.

@vicuna
Copy link
Author

vicuna commented Jun 7, 2016

Comment author: @garrigue

@stedolan
This was already considered in #5780, and explained why it doesn't work from the point of view of scoping.
Still have to decide on something eventually.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 11, 2020
@garrigue
Copy link
Contributor

We should do something about that, eventually.
The problem is mostly a design question.

@yallop
Copy link
Member

yallop commented May 16, 2020

Here's a concrete proposal, distilled from the discussion in #5780:

  1. We allow (but do not require) explicit quantifiers for type variables in existential definitions. The following are equivalent:

    type t = C : 'a * ('a -> 'b list) -> t
    type t = C : 'a 'b. 'a * ('a -> 'b list) -> t
    type t = C : 'b 'a. 'a * ('a -> 'b list) -> t

    Ordering of quantifiers not significant, just as it is not significant in the universal quantifiers in polymorphic record fields and polymorphic methods. The syntax is also very close to Haskell's syntax:

    data T = C : forall a b. (a, (a -> [b])) -> T
  2. We allow quantified type ascriptions for type constructors during pattern matching. The following are equivalent:

    fun (C : type a b. a * (a -> b list) -> t. (x, f)) ->
       (module struct t = b let l = f x end)
    fun (C : type b a. a * (a -> b list) -> t. (x, f)) ->
       (module struct t = b let l = f x end)
    fun (C (type a b) ((x : a), (f : a -> b list))) ->
       (module struct t = b let l = f x end)

    The first two forms mirror the syntax of the declaration and amount to giving the declared type of the constructor a second time; the third more concise form separates quantifiers and annotations, so that constructor arguments can be annotated in-place.

@gasche
Copy link
Member

gasche commented May 16, 2020

Distillation is warmly welcome here, thanks!

I don't like 2.a and 2.b, because I find (C : type a b . <foo> . <p>) visually confusing. Could we require ((C : <ty>) p) for this form? It may be less discoverable and it is not perfect, but it is less confusing.

(Should we also allow (C : ty) args in expressions then?)

With this design patterns can introduce local abstract type constructors in addition to term variables. The same restriction for or-patterns must apply that they bind the same variables -- or at least that only the intersection of the two sides stays in the resulting environment. Do we know how to implement this under or-patterns, or is this syntax forbidden under or-patterns for now?

With the proposal, I guess we can also write either of the following:

type _ tag = Int : int tag

let default (type a) : a tag -> a = function
| (Int : int t) -> 0
(* or *)
| (Int : type a = int . a t) -> 0

which is cool. (The second one is slightly confusing because it is not checking that the two a coincide, although the syntax suggest it does. I would not recommend using it.)

@garrigue
Copy link
Contributor

@yallop's 3rd version does indeed make sense.

Yet, re-reading the discussion, the solution proposed by @stedolan above, and argued by @lpw25 in #5780, feels more natural. It's just a question putting a correct semantics on it.

It seems to me that, if we intend to match the existential type variables through normal unification, these new names should actually represent plain unification variables.
For instance

function 'a. C ((x : 'a), f) -> ...

makes perfect sense (assuming that we accept that 'a. in a pattern introduces a fresh unification variable, not to be mixed with its meaning in types). Of course, we would prefer to be able to use a type name for this rather than a variable name, but maybe this is an independent problem. An hacky solution would be to have the existential type variable named after the name of the type variable.

@lpw25
Copy link
Contributor

lpw25 commented May 16, 2020

I would be happy with parts 1 and 2c. The syntax is natural and easy to work out what it means. I have previously argued that 2c shouldn't need full type annotation. However, it is clearly forwards compatible to force full annotation now, and allow less annotation if we work out how to do it well later.

@gasche
Copy link
Member

gasche commented May 16, 2020

I would also agree to go for 1 and 2c, which have a nice consensus and do not open further questions (unlike 2a/2b).

@Octachron Octachron added the typing-GADTS GADT typing and exhaustiveness bugs label May 18, 2020
@craigfe
Copy link
Contributor

craigfe commented May 18, 2020

It seems to me that, if we intend to match the existential type variables through normal unification, these new names should actually represent plain unification variables.
For instance

function 'a. C ((x : 'a), f) -> ...

makes perfect sense (assuming that we accept that 'a. in a pattern introduces a fresh unification variable, not to be mixed with its meaning in types).

From an outsider's perspective, this does not make perfect sense to me. (Especially with the <'a> variant in the linked PR, since it is no longer benefiting from the analogy to 'a. in record fields.)

One of the most common misunderstandings I see in OCaml is in thinking that constraints of the form (x : 'a) will achieve something, as in:

let id (x : 'a) : 'a = x + 1
module type T = sig type t val print : t -> unit end
let f (module T : T with type t = 'a) (x : 'a) = T.print x

My standard response to this is something like: "Type variables like 'a are only implicitly universally quantified in signatures; in structures you need to introduce locally abstract types with (type), which you can then easily see in expressions because they don't start with '."

I end up giving this explanation to a lot of students, colleagues, etc. and the new syntax would force me to add "unless the local type is introduced by function <'a>, in which case the constraint refers to the variable bound there".1

If we're going to get more ways to introduce local types (which I really appreciate!), I'd much prefer it if they didn't have a leading quote mark. I think this

  • (a) is much less surprising, and
  • (b) comes with the extra ergonomics / visual hinting that (x : a) requires a to have been bound somewhere in the surrounding scope.

Footnotes

  1. There's at least one existing exception to this in the form of class parameters, but (a) these are far less prevalent to the point where I think most OCaml users will never see them and (b) class definitions are already "operating at the type level" in some naive sense.

@garrigue
Copy link
Contributor

But that was exactly my point: locally abstract types arising from existentials cannot be easily bound other than through some kind of unification process. However, if we use unification variables, we can at least have a way to refer to them (which doesn't imply anything more, i.e. the 'a could as well refer to int * int). The role of the local binder is only to introduce this variable in the right scope, since currently using a type variable in this situation causes a scoping error. The need for such variables has been discussed repeatedly, and is not limited to this case.
(BTW, the syntax chosen in the related PR is just a place-holder, as menhir wouldn't let me compile a conflicting grammar)

Of course, I agree with you that using properly introduced type names rather than type variables is more comfortable. Unfortunately it is not always easy to retrofit that on a language based on type inference. I'm still considering how to implement @yallop's suggestion, as it will probably require to do something like for universal type variables (as in polymorphic methods), but using locally abstract types instead... much harder to specify properly.

@gasche
Copy link
Member

gasche commented May 18, 2020

Intuitively, one way to relate the notations let f (type a) (x : a) = ... and | C (type a) (x : a) -> ... is the following:

  • in functions, a locally abstract variable is entirely determined by the caller, so the function cannot constrain it in any way
  • in patterns, a locally abstract variable is entirely determined by the scrutinee, so the usage context cannot constrain it in any way (the context around the match cannot even refer to it, only the right-hand-side can).

From an inference point of view, this interpretation would suggest having those variables be flexible when checking the pattern, and rigid when checking the right-hand-side. This works well when they are used at the position of existential variables, but it behaves weirdly in other situations (if they are unified with non-variables or variables from the context, then only the remaining flexible parts would be rigidified?).

@garrigue
Copy link
Contributor

This was also my analysis. The trouble being that, in normal unification mode, reification does not commute with unification. So that the syntax you suggested would actually be the safest bet.

function
| (Int : type a = int . a t) -> ...
| (C : type a b. a * (a -> b list) -> t) (x,f) -> ...

I.e. give a complete type, and check that it properly instantiates the constructor. In particular, that all existentials are fresh (and distinct). Actually, this follows @yallop's 2(1) with a more standard syntax (hoping that it doesn't generate conflicts...)

@gasche
Copy link
Member

gasche commented May 19, 2020

The trouble being that, in normal unification mode, reification does not commute with unification.

I don't understand what you mean here, can you elaborate? (In my intuition, but I don't know how local type variables are implemented, we would first type-check the pattern part with just local inference variables, then "ridify them" by turning the locally-bound parts into (rigid) local type variables, and then type-check the body. First we do unification, then rigification, and then more type-checking.)

@gasche
Copy link
Member

gasche commented May 19, 2020

Intuitively, another approach would be to create locally abstract types from the start, and then type-check the pattern arguments against an instance of the constructor type where existentials are replaced by fresh unification variables (that can unify with the locally-abstract types). The unification variables could unify "too much", so we then need to check that the resulting inferred type preserved the existential types., I suppose can be done by generalizing it (including the locally-abstract types) and then unifying it with a rigid instance of the constructor type. This reminds me of the discussion in #1132 which contains some of the same ingredients (an inference process that is first flexible, then compared against a rigid version).

@garrigue
Copy link
Contributor

It was a bit fuzzy. The essential problem is that rigid type variables cannot be refined during normal unification, so we need to decide them before using unification. The unify-then-rigidify approach only works if we know exactly the expected type.
So in practice we need something like your second approach, but I prefer a syntax where the expected instance comes in one piece, rather than having to stitch them together. I.e., the problem is not about the arguments, but just how to instantiate the type of the constructor, of course without losing the generativity of the existential type variables.

@gasche
Copy link
Member

gasche commented May 19, 2020

but I prefer a syntax where the expected instance comes in one piece, rather than having to stitch them together

I can see how this is easier for the implementation, but it is also very constraining for the user who has to repeat the whole constructor type each time just for the usability benefits of naming the existential variables. In particular, this is in fact heavier in term of amount of information than the unpleasant trick of re-abstracting over the parameters that contain the existential variables to name them with locally-abstract types.

We could provide both forms, but I think that this "fully annotated" form is going to be more rarely used.

For functions we have both forms: : type a . a -> a list -> _ = fun x li -> which is often fairly heavy and unpleasant to use, and (type a) (x : a) (li : a list) which feels much nicer when we can use it (not in polymorphic recursion, unfortunately). (Sometimes the fully-annotated form is actually nice because it doubles as providing a type annotation for the function, which is useful for callers; it is a bit less clear that this is so important here.)

@garrigue
Copy link
Contributor

Well, I have a lot to answer.
First, and mostly unrelated, in general I prefer the : type a. ... = fun form because it is clearer, and not much more verbose (if you count all the punctuation).
Next, and foremost, in the case of introducing abstract types, (type a) is the primitive, and : type a. ... = is a derived form (just syntactic sugar). But for existentials in constructors, this seems to be the other way round: for a proper semantics, we need to know the mapping between fresh names and internal existentials at instantiation time, so that we would need to collect the information syntactically from the type annotations on the arguments, before starting to typecheck the arguments. And I'm afraid the extra flexibility would be confusing, as one might want to write the annotation somewhat deeper in the argument, and expect it to be propagated to the constructor. Having the annotation on the constructor itself avoids this ambiguity.
It is still more verbose, if you use partial information

| (C : type a b. _ * (a -> b list) -> _) (x, f) -> ...
vs
| C (type a b) (x, (f : a -> b list)) -> ...

but I'm not really convinced this feature will be much used anyway. It comes handy, but only In very specific cases.

This said, I've battled with menhir to get a valid grammar, and I'm not sure it works yet.
A slight variation that might make sense is:

| C (type a b) (x, f : a * (a -> b list)) -> ...

Note that the above annotation is currently illegal, so this may make the new semantics clearer (at least when there are several arguments).

@gasche
Copy link
Member

gasche commented May 19, 2020

For me the problem with the fully-annotated form is not so much the length as the loss of locality: if I want to understand the type of f, I need to go back to the full annotation and traverse it the right amount to locate the type of f. This is much less useful in many cases than having just (f : a -> b list). Also, giving variable names at the same time as the types often helps understanding what the types mean. (In Mezzo function prototypes are just one pi-type, here the corresponding feature would be C : type a b . (x : a) * (f : a -> b list) -> ...).

for a proper semantics, we need to know the mapping between fresh names and internal existentials at instantiation time, so that we would need to collect the information syntactically from the type annotations on the arguments, before starting to typecheck the arguments

This sounds very brittle and I'm certainly not suggesting we do this. Is my "your second approach" suggestion not viable to avoid it? In the not-fully-annotated case, (after creating the locally abstract types) we would generate fresh inference variables at a level that lets then unify with the internal existentials, and then use pattern checking unification (between the flexible-instantiated constructor type and the user-given pattern with locally-abstract types) to trace which of those inference variables get unified with the locally abstract types.

@garrigue
Copy link
Contributor

This sounds very brittle and I'm certainly not suggesting we do this. Is my "your second approach" suggestion not viable to avoid it? In the not-fully-annotated case, (after creating the locally abstract types) we would generate fresh inference variables at a level that lets then unify with the internal existentials, and then use pattern checking unification (between the flexible-instantiated constructor type and the user-given pattern with locally-abstract types) to trace which of those inference variables get unified with the locally abstract types.

I'm not sure what you are suggesting here. If you mean that the matching of the names and existentials can be obtained by unifying an instance of the constructor type with the annotations given by the user, and doing extra checks, this is indeed what I have in mind.
However, if you suggest using type inference on the patterns to collect this information, this becomes incomplete, since existentials represented as abstract types can be refined by pattern matching on further GADTs, while normal type variables wouldn't. That's why I'm saying that we can only work locally.
There are more hacky ways, such as inferring as we do currently (using generated abstract types), and then trying to rename the generated types when they actually correspond to the ones given in annotations, but it becomes difficult to give a proper semantics then (i.e., we end up with "molten" rigid types).

@lpw25
Copy link
Contributor

lpw25 commented May 19, 2020

So that the syntax you suggested would actually be the safest bet.
| (C : type a b. a * (a -> b list) -> t) (x,f) -> ...
I.e. give a complete type, and check that it properly instantiates the constructor

I haven't followed all of today's discussion, but I was interpreting Jeremy's 2c option to be:

| C (type a b) (x : a) (f : a -> b list) -> ...

where the type annotations are a required part of the syntax -- it is always an error to omit them. This seems basically equivalent to the above suggestion with an explicit type for the constructor, so I'd be surprised if it were somehow harder to type-check.

@garrigue
Copy link
Contributor

After various attempts, I implemented the following in #9584:

| C (type a b) (x, f : a * (a -> b list))

The reason the annotation is on the tuple rather than on individual arguments is that it is simple to enforce it at the parser level.
Note that the parser doesn't now anything about multi-ary constructors, they are only handled during type check.
I'm not sure how to handle inline records either, except if we allow using an explicit type name for the inline record.

@github-actions github-actions bot removed the Stale label Jun 15, 2020
@trefis
Copy link
Contributor

trefis commented Mar 9, 2021

#9584 got merged, so I believe this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-wish typing typing-GADTS GADT typing and exhaustiveness bugs
Projects
None yet
Development

No branches or pull requests

8 participants