Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005780OCamlOCaml typingpublic2012-10-09 18:072013-08-19 15:40
ReporterSebastien Furic 
Assigned Togarrigue 
PlatformOSOS Version
Product Version4.00.1 
Target Version4.02.0+devFixed in Version 
Summary0005780: Strange type variable names in error messages (GADTs)
DescriptionOcaml sometimes uses strange type variable names in error messages when type-checking GADTs (maybe to avoid name capture?).
Steps To ReproduceEnter the following code:

type _ t =
  | B: bool -> bool t
  | I: int -> int t
  | A: ('a -> 'b) t * 'a t -> 'b t
  | F: ('a -> 'b) -> ('a -> 'b) t

let rec eval: type s. s t -> s = function
  | B b -> b
  | I i -> i
  | A (e, e') -> (eval e) e'
  | F f -> f

Ocaml reports the following error:
          Characters 94-96:
    | A (e, e') -> (eval e) e'
Error: This expression has type ex#0 t but an expression was expected of type
TagsNo tags attached.
Attached Files

- Relationships
child of 0005998assignedgarrigue GADT typing and exhaustiveness bugs 

-  Notes
gasche (developer)
2012-10-10 10:32

This is a known problem that I've discussed with Jacques and I think could and should be improved before the next version.

The fundamental issue is that the OCaml syntax has no way to *name* the existential variables corresponding to a GADT constructor. The name generation it uses now ("ex#" + fresh integer) is particularly obnoxious, but in any case there is a type corresponding to your variable 'a in the type of the constructor A that has no name.

My personal opinion is that not having explicit binding of those variables in the syntax (in the constructors: (A : 'a. ('a -> 'b) * ... -> 'b t), and in the matching (| A a (e, e') -> ...)) is a defect of the GADT syntax -- but syntax is about hard compromises, and here the more concise syntax was chosen.

I believe the current behavior could be improved by:
- choosing names for existential that are more natural; I would like them to include either the constructor they come from, or the original user-chosen name of the corresponding type variable in the type definition, or both
- improving the wording of the error message to avoid the annoying "has type 'foo' but was expected at type 'foo'" and possible display something GADT-specific

In the meantime, this is not a bug (the behavior is perfectly correct) but a rather shortcoming on the user interface side. And it is not "sometimes", all existential variables have those unreadable names, and all escape situations generate those unreadable error messages.

Thanks for your report anyway: it's helpful to see that people care about this.
garrigue (manager)
2012-10-10 11:46

This is indeed a well known problem.
Actually part of it was a bug, as there was already some code to reuse variable names from the definition, but it didn't work properly.
This part is now fixed in trunk, at revision 12998.
The new error message is:
Error: This expression has type a#0 t but an expression was expected of type
Unfortunately, this is only a small improvement, as origin of the name is not so clear, and many existential types do not come directly from constructors.
As for the "#0" part, this is indeed to avoid name collisions.

As Gabriel mentioned, there has always been a plan to support naming of existential types in patterns.
However this requires new syntax, and this was delayed waiting for more experience.
This should be in the next release.

I'm open to improvements of the wording, but clever error reporting is very difficult.
garrigue (manager)
2012-10-12 03:17

I have added a patch in trunk/experimental/garrigue/pattern-local-types.diffs which allows to define types in patterns.
They are only allowed at the beginning of match cases, and the syntax is:
    | type a b c. <pattern> -> <body>
They are _different_ from local abstract types, as they are not abstract, but inferred from their uses as type annotations inside <pattern>. If the inferred type is a fresh existential type, then some special manipulation is done to rename it to the given name, so that it will appear in error messages.

Here are some examples:
(* alias *)
function type a. (() : a) -> ([] : a list);;
(function type a. (() : a) -> ([] : a list)) ();;
function type a. () -> ();; (* fail *)

(* existential *)
type t = D : 'a * ('a -> int) -> t;;
let f = function type b. D ((x:b), f) -> (f:b->int) x;;
let f = function type b. D ((x:b), f) -> (f:t->int) x;; (* fail *)
gasche (developer)
2012-10-12 10:19
edited on: 2012-10-12 10:21

To be clear to other potential readers of this bug item, the problem is that as there is currently no explicit binding syntax on GADT type *definitions*, the existential types are introduced in no particular order. If you define a constructor type K with two existential types 'a and 'b, we cannot use, say, the syntax (K (type a) (type b) (x, f) -> ...) because there is no notion of "first" and "second" existential type. So Jacques' choice here is to allow to introduce names for some types, that are later unified with the corresponding existential variable by additional annotations on the pattern arguments (in his example, (x:b) is necessary to link the fresh name b to the actual existential variable 'a of D).

I'm not really satisfied with this compromise, because the notations are still heavier than necessary (though arguably it helps readability to force to annotate use of the types) and we have yet another semantic distinction around type variable binding.

What about the following other compromise:
- allow the optional use of a type binding syntax at the GADT declaration site
  type t = D : 'a . 'a * ('a -> int) -> t
  (I'm not hung up on this particular syntax, for example it would be ok to have "type a . a * (a -> int) -> t" instead)
- use the "(type a)" syntax to locally introduce the type after the constructor name and before the value parameters, using the order in the definition
  function D (type a) (x, f) -> f x
- for GADTs that have not used an explicit order in their definition, reject this syntax (or possibly use the left-to-right order of occurence of the variables, but this would be a hack)

I believe this gives us a reasonable transitional path to a more principle syntax, with ultimately lighter annotations at the matching site than what is possible with your suggestion.

(Of course I'm no expert on the grammatical ambiguities here so one of the syntax proposed may actually not be an option. The important idea is having explicit type binding syntax at both the definition and use site of the constructors.)

I'm actually not so sure that this use of the "(type a)" syntax is consistent with its current use (which is important: I don't want to introduce new semantic notion). Is it the case? Or should we prefer another syntax for existential types in GADT patterns?

(Where are we on the "(type a b)" sugar? Is it currently implemented?)

garrigue (manager)
2012-10-12 10:56

Well, I was also a promoter of explicit syntax for existentials in GADT definitions, so I cannot say I don't agree with Gabriel for the first part.

OK, your proposal makes sense, but it has the major drawback of requiring changes in type definitions, while we still allow the old format.
So if say, you are using a library with implicit existentials, you would have no way to use them.
Well, maybe using left-to-right order is okay, but we must be very careful about avoiding bugs (the internal representation must keep the syntactic order, which is not usually true for objects and polymorphic variants for instance.)
Also there is the question of how to handle row variables.
I think your use of (type a) is consistent (you do not really abstract an external type, since the existential is already abstract, and cannot leak anyway, but otherwise this is the same thing), so at least this part is okay.
There is currently not (type a b) sugar, but this would cost nothing.

As for a transition path for a more principled syntax, I would say either this is too late, or we must decide now to change the syntax and maybe break some existing programs, because otherwise it will be too late...

If I would give one argument for my approach, I would say that while it can be in some situations lightly more verbose, it is in general pretty light-weight, thanks to the inference-based approach.
Sebastien Furic (reporter)
2012-10-16 11:50

And why not simply improving the error message by printing (for instance):

Error: This expression has type (type a) a t but an expression was expected of type
         (type a) a

This would explicitly tell the user that there is some existential around. After all, my surprise essentially came from the strange format of the variable name and the fact that its abstract nature was not explicit in the message.
gasche (developer)
2013-06-16 16:27

I'm not sure "(type a) a" more clearly indicates an existential type than "ex#23", and besides I'm not sure it's meaningful to close on all existentials (that may also be involved in some other part of the context) when printing the type.
gasche (developer)
2013-06-16 16:29

However, I think the error messages of the form "This expression has type <foo> but an expression was expected of type <foo>", with the same type printed at both places, are always harmful for non-experts and could be systematically skipped (or turned into a less confusing "There is a typing conflict at type <foo>", not very informative but also prints the type). I don't know whether that would be easy to implement, though.
frisch (developer)
2013-07-05 14:17

A generic (?) work-around for the lack of naming facility is to introduce a local function with a locally abstract type:

type t = D : 'a * ('a -> int) -> t

module type S = sig
  type t
  val x: t
  val f: t -> int

let f = function
  | D (arg0, arg1) ->
    (fun (type b) (x : b) f ->
       (module (struct type t = b let x = x let f = f end) : S)
    arg0 arg1

What about considering that the introduced type names behave exactly like that, so that the code above could be written:

let f = function
  | type b. D ((x : b), f) ->
       (module (struct type t = b let x = x let f = f end) : S)

The syntax is the same as in Jacques' proposal, but the static semantics is different. I think it would be less confusing to the user, since locally named types currently always enforce genericity.
lpw25 (developer)
2013-07-05 15:45

Since, in Alain's proposal the b in

> | type b. D ((x : b), f) ->

is a regular locally abstract type (I think), could we have the syntax:

  | (type b) D ((x : b), f) ->

so that we can also allow:

  let f (type b) (D ((x: b), f)) = ...

because I often want to use a single-case GADT for existentials, and let feels more natural for them than match.
frisch (developer)
2013-07-05 16:51

> let f (type b) (D ((x: b), f)) = ...

This would be a change to the current interpretation of locally abstract type. Here the "body" of the "new type" construction is not generic w.r.t. the new type b.

> | type b. D ((x : b), f) ->
> is a regular locally abstract type (I think)

It is a locally abstract type "inside" the constructor (but its scope includes the right-hand side of the match clause).
lpw25 (developer)
2013-07-05 17:35

> This would be a change to the current interpretation of locally abstract type.

I'm not sure that it is a fundamental change to the interpretation.

    type ext = Ext : 'a -> ext

    let f (type a) (Ext(x : a)) = Ext([x] : a list)

feels very similar to:

    module type S = sig type t val x : t end;;

    let f (type a) (module X : S with type t = a) =
      (module struct type t = a list let x = [X.x] end : S)

with the addition of some check that (type a) doesn't escape the scope of the existential.
frisch (developer)
2013-07-05 17:53

The type of:

   let f (type a) (module X : S with type t = a) = ...

has the form: (module S with type t = 'a) -> ...

This is plain polymorphic function, there is no existential involved here. This is quite different from:

  let f (type a) (Ext(x : a)) = ...

which is monomorphic, and would be more similar to:

 let f (module X : S) = .... X.t ...

(i.e. when unpacking a module, we already get a way to refer to the existential type).
lpw25 (developer)
2013-07-05 18:14

> The type of:
> let f (type a) (module X : S with type t = a) = ...
> has the form: (module S with type t = 'a) -> ...

I understand that, but

  'a. (module S with type t = 'a) -> (module S)

is really equivalent to

  (module S) -> (module S)

the type system just isn't able to deduce this equality. But it could, if it
checked whether 'a appeared elsewhere in the type and then safely removed it.
This check is very similar to checking that the existential doesn't escape in:

  let f (type a) (Ext(x : a)) = ...
frisch (developer)
2013-07-05 18:48

> I understand that, but
> 'a. (module S with type t = 'a) -> (module S)
> is really equivalent to
> (module S) -> (module S)

The difference between the two is quite important; the existential is opened in one case by the function and in the other case by the caller. Since we are talking about naming the existential, this distinction is rather important.

With your proposal, what would the following mean?

 fun (type a) (z : a) (Ext(x : a)) -> ...
lpw25 (developer)
2013-07-05 21:18

> With your proposal, what would the following mean?
> fun (type a) (z : a) (Ext(x : a)) -> ...

That would be an error because the existential would escape its scope. It is equivalent to:

  # let f z (Ext x) =
      ignore [z; x];
  Error: This expression has type ex#0 but an expression was expected of type
         The type constructor ex#0 would escape its scope

The point is

  forall A. A -> B

is equivalent to

  (exists A. A) -> B

under the condition that A is not in B.

So it is perfectly reasonable to type

  fun (exists t. (x : t)) -> ...


  forall t. (fun (x : t) -> ... )

as long as we check that t does not escape its scope (i.e. is not mentioned in the rest of the type). This is precisely what

  let f (type a) (Ext(x : a)) = ...

frisch (developer)
2013-07-09 10:01

I still feel uncomfortable with the proposal, because there is an explicit binder for the existential (the GADT constructor), and the equivalence breaks this scope.
gasche (developer)
2013-07-09 15:07

One way to understand Leo's proposal is to say that

  fun (type a) (Ext(x : a)) -> ...

first defines a rigid variable (a), and that the patterns (Ext(x : a)) first introduces an existential variable ex#N, and then upon processing the pattern (x : a) adds the local equation (a = ex#N) in the branch of the pattern clause. We are not naming (a) the actual existential variable, but working in a context where both are equivalent, just as

  let cast (type a) (Refl : (a, int) eq) = fun (x : a) -> (x : int)

works in a local context enriched with the equation (a = int).

Currently such GADT equations are only added by an annotation on the type of the whole GADT value. This does not allow to name existentials as they precisely don't appear in the outer GADT type.

It seems this is a different mechanism from the one Jacques described for (type a . Ext (x : a)). Similar subtleties/hacks would probably be needed to get error messages to mention (a) rather than the existential names.

PS: I'm copying below the message from Oleg on caml-list about error messages for GADTs, which could be an interesting inspiration for useful error messages orthogonally to the naming of existentials discussed here.

> Just in case, here is the error message produced by GHC for the
> equivalent example.
> Code:
> {-# LANGUAGE GADTs #-}
> data E a where
> B :: Bool -> E Bool
> A :: E (a -> b) -> E a -> E b
> eval :: E a -> a
> eval (B x) = x
> eval (A f x) = (eval f) x
> The error message:
> /tmp/ga.hs:9:25:
> Couldn't match type `a1' with `E a1'
> `a1' is a rigid type variable bound by
> a pattern with constructor
> A :: forall b a. E (a -> b) -> E a -> E b,
> in an equation for `eval'
> at /tmp/ga.hs:9:7
> In the second argument of `eval', namely `x'
> In the expression: (eval f) x
> In an equation for `eval': eval (A f x) = (eval f) x
> The gist is using the type variable name 'a' from the definition of
> GADT, and attaching the numeric suffix to disambiguate. (GHC does that all
> the time -- and I believe OCaml does something similar in many cases,
> only not for GADTs.) What helps in particular I believe is quoting the
> clause from the GADT declaration, so we can see where the name 'a'
> comes from. So, perhaps if the error message can be improved with
> giving more information about the error, the problem will be solved
> without additional syntax?

- Issue History
Date Modified Username Field Change
2012-10-09 18:07 Sebastien Furic New Issue
2012-10-10 10:32 gasche Note Added: 0008232
2012-10-10 10:33 gasche Severity minor => text
2012-10-10 10:33 gasche Status new => acknowledged
2012-10-10 11:46 garrigue Note Added: 0008237
2012-10-10 11:46 garrigue Assigned To => garrigue
2012-10-10 11:46 garrigue Status acknowledged => confirmed
2012-10-12 03:17 garrigue Note Added: 0008247
2012-10-12 10:19 gasche Note Added: 0008249
2012-10-12 10:21 gasche Note Edited: 0008249 View Revisions
2012-10-12 10:56 garrigue Note Added: 0008250
2012-10-16 11:37 gasche Target Version => 4.01.0+dev
2012-10-16 11:50 Sebastien Furic Note Added: 0008262
2013-04-23 02:44 garrigue Relationship added child of 0005998
2013-06-16 16:27 gasche Note Added: 0009511
2013-06-16 16:29 gasche Note Added: 0009512
2013-07-05 14:17 frisch Note Added: 0009698
2013-07-05 15:45 lpw25 Note Added: 0009700
2013-07-05 16:51 frisch Note Added: 0009701
2013-07-05 17:35 lpw25 Note Added: 0009703
2013-07-05 17:53 frisch Note Added: 0009704
2013-07-05 18:14 lpw25 Note Added: 0009706
2013-07-05 18:48 frisch Note Added: 0009707
2013-07-05 21:18 lpw25 Note Added: 0009708
2013-07-09 10:01 frisch Note Added: 0009725
2013-07-09 15:07 gasche Note Added: 0009731
2013-08-19 15:40 doligez Target Version 4.01.0+dev => 4.02.0+dev

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker