Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange type variable names in error messages (GADTs) #5780

Closed
vicuna opened this issue Oct 9, 2012 · 49 comments
Closed

Strange type variable names in error messages (GADTs) #5780

vicuna opened this issue Oct 9, 2012 · 49 comments
Assignees
Labels
bug typing typing-GADTS GADT typing and exhaustiveness bugs

Comments

@vicuna
Copy link

vicuna commented Oct 9, 2012

Original bug ID: 5780
Assigned to: @garrigue
Status: closed (set by @xavierleroy on 2017-02-16T14:18:35Z)
Resolution: fixed
Priority: normal
Severity: text
Version: 4.00.1
Target version: later
Fixed in version: 4.03.0+dev / +beta1
Category: typing
Related to: #7074
Child of: #5998
Monitored by: @lpw25 @yallop @hcarty

Bug description

Ocaml sometimes uses strange type variable names in error messages when type-checking GADTs (maybe to avoid name capture?).

Steps to reproduce

Enter the following code:

type _ t =
| B: bool -> bool t
| I: int -> int t
| A: ('a -> 'b) t * 'a t -> 'b t
| F: ('a -> 'b) -> ('a -> 'b) t

let rec eval: type s. s t -> s = function
| B b -> b
| I i -> i
| A (e, e') -> (eval e) e'
| F f -> f
;;

Ocaml reports the following error:

      Characters 94-96:
| A (e, e') -> (eval e) e'
                        ^^

Error: This expression has type ex#0 t but an expression was expected of type
ex#0

@vicuna
Copy link
Author

vicuna commented Oct 10, 2012

Comment author: @gasche

This is a known problem that I've discussed with Jacques and I think could and should be improved before the next version.

The fundamental issue is that the OCaml syntax has no way to name the existential variables corresponding to a GADT constructor. The name generation it uses now ("ex#" + fresh integer) is particularly obnoxious, but in any case there is a type corresponding to your variable 'a in the type of the constructor A that has no name.

My personal opinion is that not having explicit binding of those variables in the syntax (in the constructors: (A : 'a. ('a -> 'b) * ... -> 'b t), and in the matching (| A a (e, e') -> ...)) is a defect of the GADT syntax -- but syntax is about hard compromises, and here the more concise syntax was chosen.

I believe the current behavior could be improved by:

  • choosing names for existential that are more natural; I would like them to include either the constructor they come from, or the original user-chosen name of the corresponding type variable in the type definition, or both
  • improving the wording of the error message to avoid the annoying "has type 'foo' but was expected at type 'foo'" and possible display something GADT-specific

In the meantime, this is not a bug (the behavior is perfectly correct) but a rather shortcoming on the user interface side. And it is not "sometimes", all existential variables have those unreadable names, and all escape situations generate those unreadable error messages.

Thanks for your report anyway: it's helpful to see that people care about this.

@vicuna
Copy link
Author

vicuna commented Oct 10, 2012

Comment author: @garrigue

This is indeed a well known problem.
Actually part of it was a bug, as there was already some code to reuse variable names from the definition, but it didn't work properly.
This part is now fixed in trunk, at revision 12998.
The new error message is:

Error: This expression has type a#0 t but an expression was expected of type
         a#0

Unfortunately, this is only a small improvement, as origin of the name is not so clear, and many existential types do not come directly from constructors.
As for the "#0" part, this is indeed to avoid name collisions.

As Gabriel mentioned, there has always been a plan to support naming of existential types in patterns.
However this requires new syntax, and this was delayed waiting for more experience.
This should be in the next release.

I'm open to improvements of the wording, but clever error reporting is very difficult.

@vicuna
Copy link
Author

vicuna commented Oct 12, 2012

Comment author: @garrigue

I have added a patch in trunk/experimental/garrigue/pattern-local-types.diffs which allows to define types in patterns.
They are only allowed at the beginning of match cases, and the syntax is:

    | type a b c.  <pattern> -> <body>

They are different from local abstract types, as they are not abstract, but inferred from their uses as type annotations inside <pattern>. If the inferred type is a fresh existential type, then some special manipulation is done to rename it to the given name, so that it will appear in error messages.

Here are some examples:

(* alias *)
function type a. (() : a) -> ([] : a list);;
(function type a. (() : a) -> ([] : a list)) ();;
function type a. () -> ();; (* fail *)

(* existential *)
type t = D : 'a * ('a -> int) -> t;;
let f = function type b. D ((x:b), f) -> (f:b->int) x;;
let f = function type b. D ((x:b), f) -> (f:t->int) x;; (* fail *)

@vicuna
Copy link
Author

vicuna commented Oct 12, 2012

Comment author: @gasche

To be clear to other potential readers of this bug item, the problem is that as there is currently no explicit binding syntax on GADT type definitions, the existential types are introduced in no particular order. If you define a constructor type K with two existential types 'a and 'b, we cannot use, say, the syntax (K (type a) (type b) (x, f) -> ...) because there is no notion of "first" and "second" existential type. So Jacques' choice here is to allow to introduce names for some types, that are later unified with the corresponding existential variable by additional annotations on the pattern arguments (in his example, (x:b) is necessary to link the fresh name b to the actual existential variable 'a of D).

I'm not really satisfied with this compromise, because the notations are still heavier than necessary (though arguably it helps readability to force to annotate use of the types) and we have yet another semantic distinction around type variable binding.

What about the following other compromise:

  • allow the optional use of a type binding syntax at the GADT declaration site
    type t = D : 'a . 'a * ('a -> int) -> t
    
    (I'm not hung up on this particular syntax, for example it would be ok to have type a . a * (a -> int) -> t instead)
  • use the (type a) syntax to locally introduce the type after the constructor name and before the value parameters, using the order in the definition
    function D (type a) (x, f) -> f x
    
  • for GADTs that have not used an explicit order in their definition, reject this syntax (or possibly use the left-to-right order of occurence of the variables, but this would be a hack)

I believe this gives us a reasonable transitional path to a more principle syntax, with ultimately lighter annotations at the matching site than what is possible with your suggestion.

(Of course I'm no expert on the grammatical ambiguities here so one of the syntax proposed may actually not be an option. The important idea is having explicit type binding syntax at both the definition and use site of the constructors.)

I'm actually not so sure that this use of the (type a) syntax is consistent with its current use (which is important: I don't want to introduce new semantic notion). Is it the case? Or should we prefer another syntax for existential types in GADT patterns?

(Where are we on the (type a b) sugar? Is it currently implemented?)

@vicuna
Copy link
Author

vicuna commented Oct 12, 2012

Comment author: @garrigue

Well, I was also a promoter of explicit syntax for existentials in GADT definitions, so I cannot say I don't agree with Gabriel for the first part.

OK, your proposal makes sense, but it has the major drawback of requiring changes in type definitions, while we still allow the old format.
So if say, you are using a library with implicit existentials, you would have no way to use them.
Well, maybe using left-to-right order is okay, but we must be very careful about avoiding bugs (the internal representation must keep the syntactic order, which is not usually true for objects and polymorphic variants for instance.)
Also there is the question of how to handle row variables.
I think your use of (type a) is consistent (you do not really abstract an external type, since the existential is already abstract, and cannot leak anyway, but otherwise this is the same thing), so at least this part is okay.
There is currently not (type a b) sugar, but this would cost nothing.

As for a transition path for a more principled syntax, I would say either this is too late, or we must decide now to change the syntax and maybe break some existing programs, because otherwise it will be too late...

If I would give one argument for my approach, I would say that while it can be in some situations lightly more verbose, it is in general pretty light-weight, thanks to the inference-based approach.

@vicuna
Copy link
Author

vicuna commented Oct 16, 2012

And why not simply improving the error message by printing (for instance):

Error: This expression has type (type a) a t but an expression was expected of type
         (type a) a

This would explicitly tell the user that there is some existential around. After all, my surprise essentially came from the strange format of the variable name and the fact that its abstract nature was not explicit in the message.

@vicuna
Copy link
Author

vicuna commented Jun 16, 2013

Comment author: @gasche

I'm not sure (type a) a more clearly indicates an existential type than ex#23, and besides I'm not sure it's meaningful to close on all existentials (that may also be involved in some other part of the context) when printing the type.

@vicuna
Copy link
Author

vicuna commented Jun 16, 2013

Comment author: @gasche

However, I think the error messages of the form "This expression has type but an expression was expected of type ", with the same type printed at both places, are always harmful for non-experts and could be systematically skipped (or turned into a less confusing "There is a typing conflict at type ", not very informative but also prints the type). I don't know whether that would be easy to implement, though.

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @alainfrisch

A generic (?) work-around for the lack of naming facility is to introduce a local function with a locally abstract type:

type t = D : 'a * ('a -> int) -> t

module type S = sig
  type t
  val x: t
  val f: t -> int
end

let f = function
  | D (arg0, arg1) ->
    (fun (type b) (x : b) f ->
       (module (struct type t = b let x = x let f = f end) : S)
    )
    arg0 arg1

What about considering that the introduced type names behave exactly like that, so that the code above could be written:

let f = function
  | type b. D ((x : b), f) ->
       (module (struct type t = b let x = x let f = f end) : S)

The syntax is the same as in Jacques' proposal, but the static semantics is different. I think it would be less confusing to the user, since locally named types currently always enforce genericity.

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @lpw25

Since, in Alain's proposal the b in

| type b. D ((x : b), f) ->

is a regular locally abstract type (I think), could we have the syntax:

  | (type b) D ((x : b), f) ->

so that we can also allow:

  let f (type b) (D ((x: b), f)) = ...

because I often want to use a single-case GADT for existentials, and let feels more natural for them than match.

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @alainfrisch

let f (type b) (D ((x: b), f)) = ...

This would be a change to the current interpretation of locally abstract type. Here the "body" of the "new type" construction is not generic w.r.t. the new type b.

| type b. D ((x : b), f) ->
is a regular locally abstract type (I think)

It is a locally abstract type "inside" the constructor (but its scope includes the right-hand side of the match clause).

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @lpw25

This would be a change to the current interpretation of locally abstract type.

I'm not sure that it is a fundamental change to the interpretation.

type ext = Ext : 'a -> ext

let f (type a) (Ext(x : a)) = Ext([x] : a list)

feels very similar to:

module type S = sig type t val x : t end;;

let f (type a) (module X : S with type t = a) = 
  (module struct type t = a list let x = [X.x] end : S)

with the addition of some check that (type a) doesn't escape the scope of the existential.

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @alainfrisch

The type of:

   let f (type a) (module X : S with type t = a) = ...

has the form: (module S with type t = 'a) -> ...

This is plain polymorphic function, there is no existential involved here. This is quite different from:

  let f (type a) (Ext(x : a)) = ...

which is monomorphic, and would be more similar to:

 let f (module X : S) =  .... X.t ...

(i.e. when unpacking a module, we already get a way to refer to the existential type).

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @lpw25

The type of:

   let f (type a) (module X : S with type t = a) = ...

has the form: (module S with type t = 'a) -> ...

I understand that, but

  'a. (module S with type t = 'a) -> (module S)

is really equivalent to

  (module S) -> (module S)

the type system just isn't able to deduce this equality. But it could, if it
checked whether 'a appeared elsewhere in the type and then safely removed it.
This check is very similar to checking that the existential doesn't escape in:

  let f (type a) (Ext(x : a)) = ...

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @alainfrisch

I understand that, but

 'a. (module S with type t = 'a) -> (module S)

is really equivalent to

 (module S) -> (module S)

The difference between the two is quite important; the existential is opened in one case by the function and in the other case by the caller. Since we are talking about naming the existential, this distinction is rather important.

With your proposal, what would the following mean?

 fun (type a) (z : a) (Ext(x : a)) -> ...

@vicuna
Copy link
Author

vicuna commented Jul 5, 2013

Comment author: @lpw25

With your proposal, what would the following mean?

fun (type a) (z : a) (Ext(x : a)) -> ...

That would be an error because the existential would escape its scope. It is equivalent to:

  # let f z (Ext x) =
      ignore [z; x];       
      true;;
  Error: This expression has type ex#0 but an expression was expected of type
           ex#0
         The type constructor ex#0 would escape its scope

The point is

  forall A. A -> B

is equivalent to

  (exists A. A) -> B

under the condition that A is not in B.

So it is perfectly reasonable to type

  fun (exists t. (x : t)) -> ...

as

  forall t. (fun (x : t) -> ... )

as long as we check that t does not escape its scope (i.e. is not mentioned in the rest of the type). This is precisely what

  let f (type a) (Ext(x : a)) = ... 

does.

@vicuna
Copy link
Author

vicuna commented Jul 9, 2013

Comment author: @alainfrisch

I still feel uncomfortable with the proposal, because there is an explicit binder for the existential (the GADT constructor), and the equivalence breaks this scope.

@vicuna
Copy link
Author

vicuna commented Jul 9, 2013

Comment author: @gasche

One way to understand Leo's proposal is to say that

  fun (type a) (Ext(x : a)) -> ...

first defines a rigid variable a, and that the patterns Ext(x : a) first introduces an existential variable ex#N, and then upon processing the pattern (x : a) adds the local equation a = ex#N in the branch of the pattern clause. We are not naming a the actual existential variable, but working in a context where both are equivalent, just as

  let cast (type a) (Refl : (a, int) eq) = fun (x : a) -> (x : int)

works in a local context enriched with the equation a = int.

Currently such GADT equations are only added by an annotation on the type of the whole GADT value. This does not allow to name existentials as they precisely don't appear in the outer GADT type.

It seems this is a different mechanism from the one Jacques described for type a . Ext (x : a). Similar subtleties/hacks would probably be needed to get error messages to mention a rather than the existential names.

PS: I'm copying below the message from Oleg on caml-list about error messages for GADTs, which could be an interesting inspiration for useful error messages orthogonally to the naming of existentials discussed here.

Just in case, here is the error message produced by GHC for the
equivalent example.

Code:

{-# LANGUAGE GADTs #-}

data E a where
    B :: Bool -> E Bool
    A :: E (a -> b) -> E a -> E b

eval :: E a -> a
eval (B x)   = x
eval (A f x) = (eval f) x

The error message:

/tmp/ga.hs:9:25:
    Couldn't match type `a1' with `E a1'
      `a1' is a rigid type variable bound by
           a pattern with constructor
             A :: forall b a. E (a -> b) -> E a -> E b,
           in an equation for `eval'
           at /tmp/ga.hs:9:7
    In the second argument of `eval', namely `x'
    In the expression: (eval f) x
    In an equation for `eval': eval (A f x) = (eval f) x

The gist is using the type variable name 'a' from the definition of
GADT, and attaching the numeric suffix to disambiguate. (GHC does that all
the time -- and I believe OCaml does something similar in many cases,
only not for GADTs.) What helps in particular I believe is quoting the
clause from the GADT declaration, so we can see where the name 'a'
comes from. So, perhaps if the error message can be improved with
giving more information about the error, the problem will be solved
without additional syntax?

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @garrigue

I had dropped this discussion hoping that a natural solution would arise,
but unfortunately a feature that was expected to be in 4.01 will not even
be in 4.02.

Looking at the proposals so far, they all seem to be incoherent with the
current constructs (using a type construction in a pattern context, missing
notion of parameter, or incompatible scoping rules).

I've been discussing that with Jeremy Yallop, and we think we have a potential
solution:
(Ext (x : type a. a))
with the scope of "type a" being everything to its right.

The syntax is nice, and the meaning seems clear.
And since we are on the right of x, there is no longer the problem of
rebinding the type.
There are however a few difficulties

  1. originally, in
    let f : type a. typ = expr1 in expr2
    the scope of "type a" is expr1, whereas the pattern-matching equivalent
    of the above would be
    match expr1 with (f : type a. typ) -> expr2
    which gives a different scope.
    Note however that the meaning of "let f : typ = expr" is "let f = (expr : typ)",
    not "let (f : typ) = expr", so there is no formal ambiguity.
  2. The original "type a. typ" is defined as syntactic sugar, and forces the
    pattern to have a really polymorphic type. Here this is not the case since
    we want to match an existential type. This could be solved by rather
    using the syntax
    (Ext (x (type a) : a)
    This is ok too, since (type a) is to the right of x.
    However, this is less readable, and this does not solve point (1),
    i.e. there is the same potential confusion with "let f (type a) = expr1 in expr2".
  3. There is some interaction between these abstract types appearing inside
    patterns and type propagation. Consider this pattern:
    (... (Ext (x : type a. a)) ... : a t)
    Independently of whether it will be typable or not (in many cases it wouldn't,
    since the type would leak), the scoping rules say that the "a" in "a t" is the type
    of x. For this reason, we need an extra traversal, to check which names are,
    and if a type annotation contains names bound in a subpattern, we must delay
    building the type to after typing the subpattern.

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @lpw25

(Ext (x : type a. a))

To me this looks too much like x : 'a. 'a -- i.e. a completely universal type.

My preference would still be for:

(type a) (Ext (x : a))

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @garrigue

As I explained in my previous answer
(type a) (Ext (x : a))
is incorrect from a scoping point of view: it tries to bind an
existential that does not exist yet.

If the similarity with 'a.'a is your problem, then maybe
(Ext (x (type a) : a))
is better.

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @gasche

What about (Ext (x : type a)) ?

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @lpw25

As I explained in my previous answer
(type a) (Ext (x : a))
is incorrect from a scoping point of view: it tries to bind an
existential that does not exist yet.

Only if you think in terms of closed existential types. If we consider these to be open existential types (see "Modeling Abstract Types in Modules with Open Existential Types" by Montagu and Rémy) then:

let (type a) (Ext (x : a)) = ext in ...

is simply

&nu; a. let (Ext y) = ext in let x = open(a) y in ...

So (type a) is reduction and (Ext (x : a)) is opening in the terminology of Montagu et al.

@vicuna
Copy link
Author

vicuna commented Jun 4, 2014

Comment author: @garrigue

What about (Ext (x : type a)) ?

type _ lam = ... | Abs : ('a -> 'b) lam * 'a lam -> 'b lam

? a. let (Ext y) = ext in let x = open(a) y in ...
So (type a) is reduction and (Ext (x : a)) is opening in the terminology of Montagu et al.

Currently (type a) both introduces a new type and binds it (implicitly) at the point it is introduced.
So this would clearly be a change in semantics.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @garrigue

Actually, the binder of the existential is the constructor, not the variable, so
Ext((type a) x : a)
would be ok too.

However, the syntax for type annotations in patterns is devilish. That's what
you would have to write for Abs:
Abs (((type a) f : a -> _), x)
Ugly isn't it?

I'm starting to think that Gabriel's idea might end up being the cleanest.
Namely, the main drawback was when there are several existential variables.
But in most cases, there is only one:
Abs (type a) (f,x)
is non-ambiguous.
When there are several variables, as suggested before, we can use a syntax
to give explicitly the existentials in the declaration.
Also, contrary to what I though first, this approach makes it easier to
handle private rows, without complex hacks involved.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @alainfrisch

What about my proposal:

let f = function
  | type b. D ((x : b), f) ->
       (module (struct type t = b let x = x let f = f end) : S)

seen as an universal quantification on the whole branch (pattern + expression).

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @garrigue

Alain, I do not see the difference between your proposal and my original patch.

The problem is that

let f = function
  | type b. D ((x : b), f) ->
       (module (struct type t = b let x = x let f = f end) : S)

is essentially different from

let f = function
  | D (arg0, arg1) ->
    (fun (type b) (x : b) f ->
       (module (struct type t = b let x = x let f = f end) : S)
    )
    arg0 arg1

In the workaround code, you first pattern match on D, which introduces the existential variable in the environment, before it is matched against (type b).
In the proposal version, type b is introduced before the pattern matching, so the original scoping rules do not allow to redefine an already introduced abstract type.
I do not see how we can reconciliate the two different scoping approaches, short of introducing a new keyword.
Otherwise, (type b) has to appear after D for coherence.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @gasche

Re. explicit ordering of existential variables in GADT constructor declarations, we may use either syntaxes:

(1a) Prod : 'b 'c . 'b expr * 'c expr -> ('b * 'c) expr

and

(1b) Prod : type b c . b expr * c expr -> (b * c) expr

And conversely in pattern position,

(2a) Prod b c (e1, e2) -> ...

or (if we decide that we really want the (type a b ...) syntax)

(2b) Prod (type b c) (e1, e2) -> ...

I personally like (2b) best (it would also be the least confusing for users, in the sense that someone that doesn't know about existential type would understand that the (type b c) part can be ignored in (2b), and be confused about (2a)), but for constructor declaration I have no preference between (1a) and (1b) ((1a) is the more incremental choice though), and I think we might want to allow both.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @alainfrisch

I'm also in favor of explicit ordering in declarations. We should probably allow 1a anyway, and I'm undecided about 1b. In pattern position, 2a would be very confusing; 2b is much better.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @lpw25

Currently (type a) both introduces a new type and binds it (implicitly) at the point it is introduced.
So this would clearly be a change in semantics.

Yes, but it seems a reasonable one, and I think it is more important to avoid adding yet another type variable syntax.

The semantics seem simple enough; a newtype is either used in a (single) existential pattern which binds it, or it may escape its scope in which case it is bound when it is introduced. If someone tries to do both they get an escaping scope error, which is what they would get even if they didn't name the existential.

In a way it is not even a change in semantics. If you consider

let f (type a) x = x in ...

to introduce a type a and not bind it, and

let f (type a) (x : a) = x in ...

to introduce a type a and bind it to the type of the first argument then we already have a semantics where (type a) introduces a type and (... : a) binds it.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @lpw25

Prod : 'b 'c . 'b expr * 'c expr -> ('b * 'c) expr

This syntax should be allowed anyway (same goes for val x : 'a . 'a -> 'a).

Prod (type b c) (e1, e2) -> ...

Personally I don't particularly like adding a form of explicit type argument, which is what this proposal amounts to. I also dislike giving the ordering of type variables a meaning. Does this means that

type t = P : 'a 'b. 'a * 'b -> t

will be incompatible with

type t = P : 'foo 'bar. 'bar * 'foo -> t

?

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @alainfrisch

Does this means that
type t = P : 'a 'b. 'a * 'b -> t
will be incompatible with
type t = P : 'foo 'bar. 'bar * 'foo -> t

I guess so, but why would this be a problem?

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @lpw25

Note that separating introduction from binding could also naturally support or-patterns:

type t =
  A : 'a * ('a -> int) -> t
| B : 'a * ('a -> int) -> t

let f (type a) (A((x : a), g) | B((x : a) , g)) = g x

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @lpw25

I guess so, but why would this be a problem?

It's just unexpected. In a language without explicit type specialisation the order of type arguments should be irrelevant.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @gasche

I agree that the ordering-meaningfulness is slightly unpleasant, but you can always annotate the value arguments of the pattern with the bound names, so that ordering mistakes are immediately caught.

The fact that this annotation is not mandatory means that this syntax is in general lighter.

Finally, you can also support or-patterns (but in a more verbose way):
let f (A (type a) (x, g) | B (type a) (x, g)) = g x
The usual scoping requirement that both sides must define the same variables applies to type "variables" as well.

@vicuna
Copy link
Author

vicuna commented Jun 5, 2014

Comment author: @garrigue

Concerning the syntax for explicit arguments, (1a) (2b) are clearly the least surprising ones, and it should always be our goal.
Note by the way that it should logically be
Prod (type a) (type b) (x,y)
the eventual (type a b) being only a shortcut.

However, I would strongly prefer writing only existential parameters.
The idea is that the non-existential parameters are already bound by
the return type (and using a constraint is sufficient to be explicit).
So actually in prod, no parameters would be explicit, for Abs since there
is only one existential one could use the explicit syntax in the pattern without
adding anything to the definition, or write it if desired. I.e.
Abs : ('a -> 'b) lam * 'a lam -> 'b lam
and
Abs : 'a. ('a -> 'b) lam * 'a lam -> 'b lam
would be equivalent, and
Abs : 'a 'b. ('a -> 'b) lam * 'a lam -> 'b lam
would be refused.
And for Abs2,
Abs : ('a1 -> 'a2 -> 'b) * 'a1 lam * 'a2 lam -> 'b lam
would be ok but not allow explicit application,
Abs : 'a1 'a2. ('a1 -> 'a2 -> 'b) * 'a1 lam * 'a2 lam -> 'b lam
Abs : 'a2 'a1. ('a1 -> 'a2 -> 'b) * 'a1 lam * 'a2 lam -> 'b lam
would be distinct since they allow explicit application, and both
Abs : 'a1. ('a1 -> 'a2 -> 'b) * 'a1 lam * 'a2 lam -> 'b lam
Abs : 'a1 'a2 'b. ('a1 -> 'a2 -> 'b) * 'a1 lam * 'a2 lam -> 'b lam
would be refused.
We would probably also need a new optional warning for
implicit existentials in definition (might also apply to implicit
polymorphic methods).

I know that it is not like Coq, but in ocaml type variables
can be kind of hidden (row variables for instance), and having
a way to say explicitly what is existential and what is not would
be helpful, aside of the fact that having to write all variables when
only few of them are existential would be a pain.

Having a look at CamlinternalFormatBasics is good to give you an
idea of what concrete GADT constructors can be.
There are indeed a number of cases with either one or two existentials
in the definition of fmt. Note that in each case there are also at least
6 non-existential type variables, and in some cases up to 12.
So requiring to write the non-existential variables looks like a real burden.

Of course we must be future proof, so it is worth thinking about potential
situations where we would need to be explicit about non-existential
variables. I see none, but it may just be a lack of insight.

@vicuna
Copy link
Author

vicuna commented Jun 6, 2014

Comment author: @lpw25

I think that

Ext(type a)(x : a)

is better than

Ext (type a)  x

because it is much more clear that a is being both introduced and bound.

It is also consistent with current uses of (type a).

@vicuna
Copy link
Author

vicuna commented Jun 6, 2014

Comment author: @garrigue

I kind of agree with you in principle.
However, concrete examples look more like
Abs (type a) ((f : a -> b),(x : b))
vs
Abs (type a) (f, x)
Note that you need both annotations, because we cannot propagate the expected type.
I wish we could at least omit the inner parentheses like in SML.
Of course, you would still be free to write the extra annotations (this is precisely the goal of introducing names).

Also, at some point it might be a good thing to recognize the specificity of existential type variables.

@vicuna
Copy link
Author

vicuna commented Jun 6, 2014

Comment author: @lpw25

However, concrete examples look more like
Abs (type a) ((f : a -> b),(x : b))
vs
Abs (type a) (f, x)
Note that you need both annotations, because we cannot propagate the expected type.

Out of interest, why can't we propagate expected type?

Even if we do not propagate the expected type, surely we only need to set a to be equal to ex#1? Can't this be inferred from a single annotation, similar to how GADT matching adds equations?

Assuming we do need both annotations, I still think it is worth being explicit. In this case I think that clarity is more important than concision, especially since the syntax will probably not be used very often.

I wish we could at least omit the inner parentheses like in SML.

That would indeed be nice. Is there a grammar conflict, or could it be allowed?

@vicuna
Copy link
Author

vicuna commented Jun 7, 2014

Comment author: @garrigue

Out of interest, why can't we propagate expected type?

Even if we do not propagate the expected type, surely we only need to set a to be equal to ex#1?
Can't this be inferred from a single annotation, similar to how GADT matching adds equations?

The problem is that the semantics of locally abstract types is that they hide their original definition, rather than being equal to it.
If you look at how Pexp_newtype is typed, you will see that a fresh abstract type is introduced in the environment, then the body is typed without any expected type, and ultimately the internal and external types are made equal.
The reason it works so well is because type inference tells us all occurrences of the abstract type, so it is then sufficient to connect the two.
However doing that during pattern matching would be much more complex, since there is not only the body to type, but also the remainder of the patterns.
So the only solution I see for now would be to be fully explicit, otherwise you end up risking to mix external and internal types in the internal context, where they are incompatible.

The advantage of the positional notation is that you are no longer introducing a locally abstract type hiding something, but just naming the existential variable.

Assuming we do need both annotations, I still think it is worth being explicit. In this case I think that clarity is more important than concision, especially since the syntax will probably not be used very often.

To me this just sounds like saying that you should annotate all your functions with their complete types.
Sure this may help catch some type errors earlier.
But it doesn't make your type system any safer (at least in this case).
And debugging types also works by adding the annotations later: once you get a type error, you may start annotating, and the more you annotate the surer you get. The fact that introducing annotations is cheap is an advantage.

@vicuna
Copy link
Author

vicuna commented Jun 7, 2014

Comment author: @lpw25

The problem is that the semantics of locally abstract types is that they hide their original definition, rather than being equal to it.
...
However doing that during pattern matching would be much more complex, since there is not only the body to type, but also the remainder of the patterns.

I can see how that would make connecting the inner and outer types during pattern matching difficult. However, I was rather suggesting connecting the inner type of a with the inner type of ex#0 by adding a GADT equation (as in add_gadt_equation in ctype.ml).

To me this just sounds like saying that you should annotate all your functions with their complete types.
Sure this may help catch some type errors earlier.
But it doesn't make your type system any safer (at least in this case).

I don't think this is the same thing. Annotating your function types is about adding clarity to a particular piece of code by being specific about what type it has. I'm talking about adding consistency to a piece of syntax by having (type a) mean the same thing in all contexts. With the positional approach, in some contexts it introduces a type and in others it binds a type. Sharing a syntax for these different operations is confusing.

If you know about newtypes then it is immediately clear what:

Abs (type a) ((f : a -> _), (x : a))

means, whereas this

Abs (type a) (f, x)

looks like an unused newtype declaration.

@vicuna
Copy link
Author

vicuna commented Jun 9, 2014

Comment author: @yallop

I wish we could at least omit the inner parentheses like in SML.

That would indeed be nice. Is there a grammar conflict, or could it be allowed?

The difficulty is that

(x, y : t)

currently means

((x, y) : t)

@vicuna
Copy link
Author

vicuna commented Jun 9, 2014

Comment author: @garrigue

I can see how that would make connecting the inner and outer types during pattern matching difficult.
However, I was rather suggesting connecting the inner type of a with the inner type of ex#0 by adding a GADT equation (as in add_gadt_equation in ctype.ml).

This amounts to changing the semantics. add_gadt_equation is only used for equations enforced by the GADT itself. This is not the case here.

I'm talking about adding consistency to a piece of syntax by having (type a) mean the same thing in all contexts.

It does. This is why I am so careful about the positioning of (type a).
The positional approach makes clear which existential variable is related to "a".
Then you can perfectly choose to follow the hiding semantics, since all occurrences are "hidden" together.

My point is that, to completely keep the same semantics, you need either to use a positional approach, or to be extremely verbose (try annotating camlinternalFormatBasics for fun...)

Abs (type a) (f, x) looks like an unused newtype declaration.

Until you learn that (type a) immediately after a non-constant constructor in a pattern is a binding occurrence.
The important part is that this behavior refines the usual rule, without introducing a contradiction.

My proposal of making explicit only existentials in the declaration might be more controversial. My justification is that distinguishing existentials will actually help the user, even if this is superficially incoherent with other uses of the universal notation.

@vicuna
Copy link
Author

vicuna commented Jun 9, 2014

Comment author: @gasche

My proposal of making explicit only existentials
in the declaration might be more controversial.

Indeed, I think this should be done very carefully.

One idea that I think would be reasonable is to allow people to reuse the parameter names without quantification:

type 't expr =
| ...
| App : type a . (a -> 't) expr * a expr -> 't expr

Note however that this means Prod's parameters would still have to be quantified:

| Prod : type a b . a expr * b expr -> (a * b) expr

I don't understand why we could do without introducing those explicitly. Jacques says:

The idea is that the non-existential parameters are already bound by
the return type (and using a constraint is sufficient to be explicit).

but consider for example

type 'a expr =
| Val of 'a
| Prod : 'b expr * 'c expr -> ('b * 'c) expr

I can indeed mention the types 'b and 'c in the case where the input type is restricted to the ('b * 'c) expr case:

(* works *)
let fst = function
| Prod (b, c) -> b
| Val (x, y) -> Val x

(* also works *)
let fst : type b c . (b * c) expr -> b expr = function
| Prod (b, c) -> b
| Val (x, y) -> Val x

(* also works, though I'm not fully sure why *)
let fst : ('b * 'c) expr -> 'b expr = function
| Prod (b, c) -> b
| Val (x, y) -> Val x

but in the general case, I cannot get hold of the 'b, 'c variables:

(* works *)
let rec eval : type a . a expr -> a = function
| Val v -> v
| Prod (b, c) -> (eval b, eval c)

(* fails *)
let rec eval : type a . a expr -> a = function
| Val v -> v
| Prod ((b : 'b expr), (c : 'c expr)) -> (eval b, eval c)

@vicuna
Copy link
Author

vicuna commented Jun 9, 2014

Comment author: @garrigue

Ok, I see where the confusion comes form.
Using the equational approach, this is actually

type 'a expr =
| Val of 'a
| Prod of 'b 'c. 'b expr * 'c expr constraint 'a = 'b * 'c

assuming that here the constraint is local to Prod.

So 'b and 'c are existentials too.

Allowing to reuse parameter names introduces strange scoping rules, so I'm not too enthusiastic about it.
Have to think more.

@vicuna
Copy link
Author

vicuna commented Jun 13, 2014

Comment author: @lpw25

I can see how that would make connecting the inner and outer types during pattern matching difficult.
However, I was rather suggesting connecting the inner type of a with the inner type of ex#0 by adding a GADT equation (as in add_gadt_equation in ctype.ml).

This amounts to changing the semantics. add_gadt_equation is only used for equations enforced by the GADT itself. This is not the case here.

I'm only suggesting it as a means of type-checking Abs (type a) (f, (x : a)) without having to annotate all uses of a.

From this perspective, there is no observational difference between adding a GADT equation and making ex#0 the external type of a, because any use of a which escapes the scope of the equation will result in an error anyway.

Until you learn that (type a) immediately after a non-constant constructor in a pattern is a binding occurrence.

But that's my point, it is one more thing users must know in order to read OCaml.

@vicuna
Copy link
Author

vicuna commented Sep 9, 2015

Comment author: @garrigue

Partial fix committed in trunk at revision 16411.
Only made the names are more informative for now, following an idea by gasche.

Names starting with $ are existentials.
$Constr_'a denotes an existential introduced for the type variable 'a of the GADT constructor Constr.
$Constr denotes an existential introduced for an anonymous variable in the GADT constructor Constr.
$'a is an existential generated by reification of type variable 'a during unification.
$n (n a number) is an internally generated existential.

Not that I didn't use #, because it is already used for object types.

This does not replace a way to explicitly give names, but should be a bit clearer.

@vicuna
Copy link
Author

vicuna commented Dec 2, 2015

Comment author: @alainfrisch

The original problem (error message) has been fixed. The more ambitious proposal to give a way to name existential variables should be discussed in a different ticket.

@vicuna
Copy link
Author

vicuna commented Dec 4, 2015

Comment author: @garrigue

The original problem is now resolved.
How to write annotations should move to #7074.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug typing typing-GADTS GADT typing and exhaustiveness bugs
Projects
None yet
Development

No branches or pull requests

3 participants