Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GADT pattern exhaustiveness checking and abstract types #7028

Closed
vicuna opened this issue Oct 22, 2015 · 21 comments
Closed

GADT pattern exhaustiveness checking and abstract types #7028

vicuna opened this issue Oct 22, 2015 · 21 comments

Comments

@vicuna
Copy link

vicuna commented Oct 22, 2015

Original bug ID: 7028
Reporter: @alainfrisch
Status: acknowledged (set by @xavierleroy on 2015-11-15T17:17:53Z)
Resolution: open
Priority: normal
Severity: feature
Target version: later
Category: typing
Has duplicate: #7360
Monitored by: @gasche @yallop @hcarty

Bug description

It is well known that GADT and abstract types don't interact well w.r.t. exhaustiveness checking. A typical example:

module X : sig
  type abstr
end = struct
  type abstr = ...
end

type _ t =
  | Int: int t
  | Float: float t
  | Abstr: X.abstr t

let f Int = ()

Since the type checker has no way to know that X.t (which could come from an external unit) is different from int in all contexts, it cannot prove that the pattern matching is exhaustive.

This seriously limits the usefulness of GADT. Possible workarounds: (i) break abstraction; (ii) add impossible pattern matching cases. Note that (i) does not work for instance when the type is really implemented as an abstract type (to represent external data structures). I'd like to propose a pragmatic solution based on a softer version of abstraction breaking. Instead of exposing the concrete definition of the type, one could allow to attach an arbitrary string (or identifier) to any type definition without a manifest type (i.e. "fresh" concrete or abstract types, not re-exported types or pure abbreviations). This string could be forgotten in the interface but not changed or added. This provides the guarantee that if two types have such an attached string, and the two strings are different, it is safe to assume that the two types are non-compatible (i.e. different in any context). Built-in types would come with such internal strings.

I don't have a concrete proposal for the syntax, but conceptually, this would give:

module X : sig
  type abstr "blabla"
end = struct
  type abstr "blabla"
end

type _ t =
  | Int: int t
  | Float: float t
  | Abstr: X.abstr t

let f Int = ()  (* type checker can now prove exhaustivity *)

The programmer would be responsible to use unique enough strings if they need to. Using the same string for two types does not break safety, it only prevents the type-checker from proving exhaustivity of some GADT patterns.

A functor could even specify that some abstract types in its argument must have a specific string, allowing to use a GADT as above in the body of the functor (exposing the concrete structure is not an option here).

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @gasche

When the abstract types are used as phantom type-level tags, the feature is unnecessary as

type abstr = Blabla

can be used itself (it is not optimal as it is not self-discoverable, but neither is your proposal). The proposal is interesting for types that are not phantom, as they are inhabited by values (produced by a FFI interface).

Would it make sense, then, to add the "external" keyword as a prefix to the proposed syntax?

external type abstr "blabla"

this is consistent with external value declarations and highlights the intended use-case (FFI types, so to speak).

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @yallop

Perhaps something similar could be achieved for external-abstract (FFI) types by adding an injective type constructor to the initial environment that is known-unequal to other builtin types

type _ abstract

at which point we could write things like this

type abstr = [`blabla] abstract

and the type checker would know that [a] abstract and [b] abstract were distinct.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @alainfrisch

This is not only useful for "external" types, also for concrete types (implemented by a sum or a record) which you don't want to expose to preserve value abstraction. In the example above, replace X.abstr by Date.t for a more realistic case.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @lpw25

I think the idea of adding information to the representation of the type, and being able to reveal that aspect of the representation in the interface, is a reasonable approach to this problem.

I would prefer the information to take the form of an attribute (e.g. [@@Label foo]) rather than a string.

This also fits with the [@Immediate] attribute proposed here:

#188

except in Alain's proposal the extra information is a "phantom" piece of information that is not actual used for anything.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @yallop

I'm in favour of the idea behind this proposal. It would be nice to be able to pick exactly which properties of a type are exposed through an interface. At the moment we have facilities for hiding the constructors but leaving the constructors exposed (but not vice versa), facilities for exposing co- or contra-variance (but not injectivity or bivariance), facilities for exposing equalities (but not inequalities).

I agree with Leo that using a string for the type identity isn't ideal, but I don't think it's a good idea to expose fundamental properties of the language, such as type equality or inequality, using attributes either.

This also fits with the [@Immediate] attribute proposed here:

#188 [^]

I'm not sure. Over there you made the following observation:

I think this proposal is a good use of attributes precisely because it is implementation-specific rather than a property of the language itself.

However, type inequality is very much a property of the language itself.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @lpw25

I'm not sure. Over there you made the following observation:

I think this proposal is a good use of attributes precisely because it is implementation-specific rather than a property of the language itself.

However, type inequality is very much a property of the language itself.

But it is not type inequality it is type incompatibility, which I consider as "extra-lingustic" as the representation is "extra-linguistic". I tend to think that things related to warnings and runtime efficiency as opposed to soundness are aspects of the implementation rather than the language, and so should be handled using attributes.

For example, I would not consider attributes for the other properties that you listed (bi-variance, the dual of private constructors etc.) as they affect the soundness of programs.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @yallop

But it is not type inequality it is type incompatibility.

True.

I tend to think that things related to warnings and runtime efficiency as opposed to soundness are aspects of the implementation rather than the language, and so should be handled using attributes.

I think that's a reasonable view. However, introducing incompatibility information can cause fatal errors, not just warnings, as in the following program, which is rejected if s and t are incompatible:

 type (_,_) eql = Refl : ('a, 'a) eql
                   
 let f : (s, t) eql -> unit = fun Refl -> ()

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @alainfrisch

The fact that a non exhaustive pattern matching is reported only as a warning is an accident of history, I would say. Exhaustivity checking is arguably one aspect of static type safety.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2015

Comment author: @gasche

I feel that the consideration of (in)compatibility rather than (in)equality is a sort of coincidence here: Alain is pragmatic in his proposal and knows that trying to classify (in)equality by non-unique strings would be problematic. But this looks like a pragmatic approximation of a type-theoretic mechanism that, deep down, wants to talk about type identity (thus equality). We are not going to explain it to our users in term of compatibility.

@vicuna
Copy link
Author

vicuna commented Oct 23, 2015

Comment author: @alainfrisch

I'm not sure this would be useful, but one could generalize the idea a little bit. The extra annotation could be an element of a partially ordered set (X, <=) -- the ordering constrains subtyping -- with the following properties:

  • There is a top element T (forall x in X, x <= T). This is for backward compatibility (T represents unannotated types).

  • One can decide for two elements x, y in X if they are incompatible (i.e. they don't have a common lower bound z, z <= x, z <= y).

Possible generalizations:

  • Non-empty finite set of strings instead or T = all strings, <= being set inclusion. This would allow to write a functor that accept abstract types annotated with several possible strings.

  • More generally, non-empty regular languages (concretely, regexps).

  • Finite maps from labels to something as above (with pointwise comparison for <=, missing ids being understood as T).

@vicuna
Copy link
Author

vicuna commented Oct 23, 2015

Comment author: @lpw25

However, introducing incompatibility information can cause fatal errors, not just warnings, as in the following program, which is rejected if s and t are incompatible:

I know. I think I've said it before but that error should really be a warning. At the moment it breaks abstraction.

We are not going to explain it to our users in term of compatibility.

I think if we want our users to understand abstraction we should explain things properly. A lot of the misconceptions about how OCaml works and why certain features are not possible comes from not appreciating this point.

@vicuna
Copy link
Author

vicuna commented Oct 23, 2015

Comment author: @alainfrisch

We are not going to explain it to our users in term of compatibility.

I agree with Leo here. It is worth specifying how compatibility is defined and explaining its purpose.

@vicuna
Copy link
Author

vicuna commented Oct 24, 2015

Comment author: @garrigue

Indeed, we may need more details about the compatibility relation, as it may be surprising at times.

And I see no problem in adding some extra information to types to allow to distinguish them once abstracted. For soundness I suppose that adding this information should only be allowed when defining a type (i.e. in a .ml, without a manifest type), and checked by module subtyping. Since this name is supposed to be global, some kind of path would make sense, but of course it must be clear that it is not interpreted in any way (so a string makes sense too, but those not feel very much part of the language)

However, I wanted to remind all that while not knowing of an incompatibility may cause spurious warnings, and force to add inefficient extra cases as a result, the other problem with abstraction, i.e. the loss of injectivity, is much worse, as it means you are also loosing positive equations, that you may need for typing. If we attempt to solve compatibility, we should solve injectivity too.

@vicuna
Copy link
Author

vicuna commented Nov 12, 2015

Comment author: @alainfrisch

the loss of injectivity, is much worse

This depends of course on the typical use case. I consider warnings on non-exhaustive matches as actual type errors, and having to add useless extra cases to get rid of them somehow reduces the interest of using GADTs in some cases.

Anyway, do you have ideas that would solve both problems at the same time? If not, would you be willing to accept a concrete proposal along the line of what is discussed here?

@vicuna
Copy link
Author

vicuna commented Nov 13, 2015

Comment author: @garrigue

The simplest would be to use attributes, but I see here some arguments against.
However, there seems to be so many properties we may want to mention about types that providing a specific syntax for each of them is going to be very hard to understand.
(uniqueness, injectivity, contractiveness, exact variance for each parameter, ...)

@vicuna
Copy link
Author

vicuna commented Feb 3, 2016

Comment author: @damiendoligez

I don't know about the other properties, but AFAICT the original problem can be solved from within the language with the addition of one small magic ingredient. Just write:

module X : sig
  type rep
  type abstr = Blabla of rep
end = struct
  type rep = ...
  type abstr = Blabla of rep
end

type _ t =
  | Int: int t
  | Float: float t
  | Abstr: X.abstr t

let f Int = ()

But, you will say, this entails more allocations and indirections. That's where the magic ingredient comes in: an optimization to remove the constructor for single-constructor types, which lets users create new incompatible types with the same representation as existing types. The

Blabla
constructor then exists only at compile-time, and matching it is a no-op.

@vicuna
Copy link
Author

vicuna commented Feb 3, 2016

Comment author: @alainfrisch

an optimization to remove the constructor for single-constructor types

This optimization would (in theory at least) break the FFI if it is applied automatically. It could be controlled by an attribute, but in this case we pay the "price" of the proposal (an extra attribute on the declaration), without the getting the full benefits (the "client" code is made more complex because of the constructor). Or we make the optimization automatic and we document it in the FFI (the impact on existing bindings is probably minor ).

@vicuna
Copy link
Author

vicuna commented Sep 28, 2016

Comment author: @damiendoligez

The optimization in question was added to 4.04 because it is also useful for other things.

It gives a workaround for the present problem, but of course not a perfect solution.

@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

@github-actions github-actions bot added the Stale label May 11, 2020
@garrigue
Copy link
Contributor

See ocaml/RFCs#4 and #9042 for a potential solution.

@github-actions github-actions bot removed the Stale label Jun 15, 2020
@github-actions
Copy link

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants