Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline records for constructor arguments #5528

Closed
vicuna opened this issue Mar 8, 2012 · 219 comments
Closed

Inline records for constructor arguments #5528

vicuna opened this issue Mar 8, 2012 · 219 comments
Assignees

Comments

@vicuna
Copy link

vicuna commented Mar 8, 2012

Original bug ID: 5528
Reporter: @alainfrisch
Assigned to: @alainfrisch
Status: closed (set by @xavierleroy on 2017-09-24T15:31:52Z)
Resolution: fixed
Priority: normal
Severity: feature
Fixed in version: 4.03.0+dev / +beta1
Category: ~DO NOT USE (was: OCaml general)
Related to: #5525 #6374
Monitored by: pierpa @bobzhang @gasche @rixed @diml pveber bobot Bardou Camarade_Tux lavi @jmeber @yallop @hcarty yminsky @Chris00 @yakobowski

Bug description

OCaml allows n-ary constructors for sum types. Instead of relying on position, it would be convenient to name the fields. Of course, one can use records, but this requires an extra type declaration and has some runtime overhead.

I've started a new branch (constructors_with_record) in the SVN to allow naming arguments of constructors, with the same syntax as records.

Example:

type t =
| A of {msg: string; foo:int}
| B of string
| C

let f = function
| A {msg; _} | B msg -> msg
| C -> ""

GADTs and exceptions are supported. It is possible to define mutable fields, but there is currently no way to mutate them. Polymorphic fields are not supported.

Note that this proposal also gives a way to disambiguate record field names:

type a = A of {foo: string; bar: int}
type b = B of {foo: int; baz: bool}
let f (A{foo; _}) = foo
let g (B{foo; _}) = foo

To support mutation and field overriding, I was thinking of a syntax like:

type t = A of {mutable l: int} | B of {x:int; y:int}

match ... with
| A r -> r.l <- r.l + 10
| ...

match ... with
| B r -> B {r with x = r.y; y = r.x}
| ...

Binding (directly like above or with an alias pattern) the "record" argument creates a special value (special in the same way that "self" or "ancestor" variables in objects are special) which can only be used in the following context:

  • field projection: r.l
  • field assignment: r.l <- e
  • overidding: B {r with ...}

File attachments

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @alainfrisch

Some motivation. I think that most of the time, using "records" instead of "tuples" for constructor arguments is better (and so it's a good idea to make it as easy as possible to do it in the language):

  • Field names document the meaning of each field.

  • Thanks to punning, code is often not longer than with tuples, and this encourages to use uniform names to bind fields throughout the code base.

  • Ignoring fields is easier than with tuples (just { ... ; _ }).

  • Extending the constructor with more fields does not break existing patterns of the form { ... ; _ } (or when the warning 9 is disabled).

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @alainfrisch

(Note: Camlp4 has not been updated; it does not compile. OCamldoc compiles but fails on the new feature.)

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @gasche

You surely know that Caml Light had mutable sum constructors. Your mutation syntax reminds me of it -- but better behaved, as the syntactic lvalue is still a field projection rather an arbitrary variable.

http://caml.inria.fr/pub/docs/manual-caml-light/node4.6.html

This is also reminding of SML's anonymous record types (see eg. http://adam.chlipala.net/mlcomp/ ); those are not unboxed with the constructor (but MLton can optimize it) but allow field disambiguation -- in sometimes treacherous ways.

This extension prompts a question (say, for a teacher): do we have a duplication of concepts here (two slightly different ways to define records, just as tuples and the * used in sum constructors are slightly different), or is there a reasonable explanation of the first-class record concept in terms of this feature? I mean, maybe it's possible to say once and for all that the general data type is a labelled sum of labelled products, and explain current records as sums with only one case, where the constructor is left implicit. Could something like that reliably explain the semantics?

The restrictions on the "special" variables seem a bit icky. From a language point of view, I'd rather have general unboxed datatypes, with a kind system guaranteeing that they can't instantiate polymorphic parameters. Haskell has those and it's nice -- not sure how that works with the GC, however, and I understand changing the type checker is probably not an option. The pragmatism of your solution makes it simpler, and OCaml already has special cases for unboxed float anyway.

You included no example where two constructors of a single sum types would share some field names. Is that possible/reasonable? That looks like a potential use case -- having a second constructor that has the same fields than the first, plus some, as it is more memory-efficient than a record with an option type at the end.

This construction would allow to implement Queue efficiently without magic.
type 'a cell =
| Null
| Cell of { content : 'a; mutable next : 'a cell }

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @jberdine

For what it's worth, I have several times wanted such a feature. I think that the value of flexible extension and consistent naming, in particular, would be significant for larger code bases.

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @alainfrisch

I've added my extra proposal, allowing field projection and mutation on pseudo record-argument capture variables in patterns. Example:

type t = A of {x:int; mutable y:int}

let get_x (A r) = r.x
let set_x (A r) x = r.x <- x

let bad (A r) = r (* rejected *)

When used in a or-pattern, these pseudo-variable must have the same kind on both sides; the kind includes the constructor, so the following is rejected:

type t = A of {x:int} | B of {x:int}
let get_x = function (A r | B r) -> r.x (* rejected *)

@vicuna
Copy link
Author

vicuna commented Mar 8, 2012

Comment author: @alainfrisch

Gabriel, thanks for your note.

I believe that real records are indeed mostly a special case of this new feature
(using a sum type with one constructor), in the same way that n-tuples could be encoded with a sum type with one n-ary constructor. Of course, the static semantics is quite different: with records, the type is inferred from the labels, but with "record constructors", the type is derived from the constructor, and we know directly the type for the fields.

Currently, two features of records are not supported: polymorphic fields, and overriding ({e with x = ...}), although I don't see any reason why they couldn't be.


You have a question about two constructors in a given sum type sharing some field names. Yes, this is possible:

type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
| A {x} | B {x} -> x
| C {x} -> int_of_string x

It is even feasible to make the following work:

let enrich foo = function
| A r -> B {r with foo}
| z -> z

i.e. use a record-argument capture variable as the starting point for overriding, with a different constructor (having more fields). I'm not sure if this is really desirable.

@vicuna
Copy link
Author

vicuna commented Mar 9, 2012

Comment author: @garrigue

I almost proposed that one (including mutable fields) almost ten years ago.
I'm not going to oppose it, but this indeed raises the question of feature overloading.
Inline records have many advantages over normal records, their only limitation being that one can only get a handle through pattern-matching, but this can be a pain for some uses.
So the question for users is naturally, which one to use, knowing that both have advantages and disadvantages.
This is the only reason I finally didn't propose it.
But if there is a good rationale, this might be fine.

Note that the use of special variables is not a big problem in my mind: this is already the case for instance variables in objects.

Also an extra potential feature would be the ability to use type information to extract without pattern-matching.

type t = A of {x:int} | B of {x:int;y:int}

let get_x (r:t) = r.x

@vicuna
Copy link
Author

vicuna commented Mar 10, 2012

Comment author: @alainfrisch

type t = A of {x:int} | B of {x:int;y:int}
let get_x (r:t) = r.x

I'm not entirely convinced by this extension. How would the code above behave if some constructor in t don't have an "x" field? (Also, we'd need to be careful with type inference and principality here, but you know that better than I do!)

Pattern-matching with punning is not so bad:

let get_x (A{x}|B{x}) = x

(yes it becomes tedious with more constructors)

@vicuna
Copy link
Author

vicuna commented Mar 11, 2012

Comment author: @garrigue

Actually I was not thinking of generating a pattern-matching, but rather to restrict this feature to types for which it can be implemented trivially, as a direct field access without any test. I.e. x would have to be present, with the same type, and at the same position in all cases of the type. This of course includes the trivial case where the type has only one case.
This makes it easier to represent a syntax tree including location information in all nodes for instance.
I think lots of people have wished this behavior for a long time, and combining variants and records makes it easy.
Concerning principality, this is a typical case where -principal can recover it.

But do not take this suggestion as meaning that I support variants of records unconditionally.
I still think that we first need a good rationale. Being useful is not sufficient when there is a large overlap with an already existing feature.

@vicuna
Copy link
Author

vicuna commented Mar 12, 2012

Comment author: @alainfrisch

This makes it easier to represent a syntax tree including location information in all nodes for instance.

I see, but I'm not sure it's worth introducing an extra feature, whose meaning is not completely obvious. It just avoids the need to define (once for each common field) functions like:

let get_loc (Var{loc}|App{loc}|Lambda{loc}) = loc

I still think that we first need a good rationale.

A first reason is that it makes it makes it possible to define sum types with mutable fields, without space overhead. For some low-level data structures (for instance mutable binary trees), it can give some performance boost without requiring Obj.magic.

But the most important reason, in my opinion, is that it encourages to name arguments of constructors instead of relying on position, and it is often a good idea. In particular, it removes the following counter-arguments to naming constructor arguments with explicit record declarations:

  1. Syntactic burden of defining extra records.

  2. "Bad" support for field name overloading.

  3. Runtime overhead associated to extra record nesting.

In the past, for instance, I've proposed to replace some n-ary constructors in OCaml sources with records. This would have simplified maintaining on LexiFi side some of our local patches (avoiding the need to add wildcard patterns at many places). This was rejected, based on the fact that it would have increased memory usage and the size of cmi files (argument 3 above).

@vicuna
Copy link
Author

vicuna commented Mar 13, 2012

Comment author: @damiendoligez

it makes it possible to define sum types with mutable fields

This is a red flag. Are you sure your implementation is safe with pattern-matching and "when" clauses?

@vicuna
Copy link
Author

vicuna commented Mar 13, 2012

Comment author: @alainfrisch

Are you sure your implementation is safe with pattern-matching and "when" clauses?

I'm not sure at all, this would need to be checked.

Note that the semantics is weird anyway (and I'm not absolutely convinced it is safe) even with regular records:

type t = A of int | B of string;;
type u = R of t ref;;
let f = function
| R{contents=A _} -> ""
| R({contents=B _} as r) when (r.contents <- A 10; r.contents = A 20) -> ""
| R({contents=B _} as r) -> (match !r with A _ -> "BAD" | _ -> "OK");;

It's really weird that f can return "BAD" (and it does, with argument (R (ref (B""))) ).

@vicuna
Copy link
Author

vicuna commented Mar 15, 2012

Comment author: lavi

This is perhaps to early to speak about optimization, but for records with float only fields, it would be better to de-inline them internally, or even better to treat the first constructor with float only fields as a float array and de-inline the others.

@vicuna
Copy link
Author

vicuna commented Mar 26, 2012

Comment author: Bardou

Hello,

I have been wishing for this for a long time. Mainly because it's just tedious to declare a record for each constructor. Too often do I start with constructors with a low argument count (say 1 to 3) and find that I have to add new arguments. At some point I have so many arguments that I really have to declare the record, and thus rewrite all the relevant parts of the code.

I would add that it would be convenient for me to be able to name only some of the arguments. For instance :

type t = Sum of pos: Lexing.position * t * t

I would access the anonymous arguments using pattern-matching as usual, and use ".pos" as a shortcut sometimes.

Unifying records and sums is great, unifying tuples at the same time seems even better to me. The OPA language (of Mlstate) does this, if I'm not mistaken.

@vicuna
Copy link
Author

vicuna commented Mar 27, 2012

Comment author: bobot

For a semantic, syntax, typing perspective, can't we consider that:

type t =
| A of string
| B of {msg: string; mutable foo:int}
| C

is exactly the same thing than:

type __t = {msg: string; mutable foo:int}

and t =
| A of string
| B of __t
| C

Of course the "when" clause can lead to strange behaviors, but not different than the one we have with such a type t and __t.

Moreover we can consider the record nearly like an actual record. The nearly mean that we create it with the tag of B instead of the default one (the first, if I remember well) used for record.

So:
type t =
| A of string
| B of ({msg: string; mutable foo:int} as t2)
| C

is exactly for semantic, syntax and typing the same thing than:

type t2 = {msg: string; mutable foo:int}

and t =
| A of string
| B of t2
| C

eg:

match x with
| A s ->
| B ({ msg = ""} as r) -> f (r : t2)

For compilation: B r is just the identity function, the matching just doesn't extract. You can't make the difference between the two definitions, except if you use Obj.magic to see that (Obj.magic (B r)) == (Obj.magic r), but Obj.magic is not in the semantic right?

(It's in fact not the same thing for typing in regard of module subtyping, if t is made abstract, t2 must be done private. But that can be quite useful)

My 2 cents,
I'm very happy that someone try to implement mutable sum type.

@vicuna
Copy link
Author

vicuna commented May 23, 2013

Comment author: bobot

With #5759 (Using well-disciplined type-propagation to disambiguate label and constructor names) the last view (#5528#c7199) should be able to work with such a type (#5528#c7028):

type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
| A {x} | B {x} -> x
| C {x} -> int_of_string x

@vicuna
Copy link
Author

vicuna commented May 23, 2013

Comment author: @gasche

I'll also note that the problem of side-effects on mutable fields in "when" clause has now been fixed by Luc in #5992

@vicuna
Copy link
Author

vicuna commented Oct 10, 2013

@vicuna
Copy link
Author

vicuna commented Oct 11, 2013

Comment author: bobot

The implementation done by Alain already work well:

type t =
| A of {msg: string; mutable foo:int}
| B of string
| C

let x = A {msg = "toto"; foo = 1}

let foo =
match x with
| A r -> r.foo <- 1; r.foo
| _ -> 1

Alain, do you think that allowing the use of {msg = "toto"; foo = 1} as an actual record can be interesting? Will you accept, at least in your branch, a patch that does that?

What else can be done for progressing on this feature?

@vicuna
Copy link
Author

vicuna commented Oct 11, 2013

Comment author: @alainfrisch

Alain, do you think that allowing the use of {msg = "toto"; foo = 1} as an actual record can be interesting?

Indeed, it could be interesting to consider that:

===
type t =
| A of string
| B of {msg: string; mutable foo:int}
| C

also creates a special record type (let's call it t.B). It is special because it must be created with tag 1 instead of 0. One would then accept:

let f = function B r -> r | _ -> assert false
val f: t -> t.B

let g r = B r
val g: t.B -> t

I don't see immediately anything wrong this is approach. It might actually clean things up a little bit, since the variable corresponding to the record argument in "f" above is given a proper core type. Of course, we introduce some complexity on paths for type constructors.

@vicuna
Copy link
Author

vicuna commented Oct 11, 2013

Comment author: bobot

In order to simplify the implementation and reduce the impact on the compiler code, we can avoid to create a new kind of type constructors by requiring that:

  • if you want to refer to this type, you must name it:
    | B of {msg: string; mutable foo:int} as tB

  • when the type is not named we use the type named "t.B" for pretty-printing

The problem is for the ocamlc -i option since in that case we need to print a type for eg. the f, g function that can be parsed by ocaml.

@vicuna
Copy link
Author

vicuna commented Oct 11, 2013

Comment author: @alainfrisch

I'm wondering if we can reuse the current internal definition of Path.t, interpreting a lowercase component before a dot as a type (and not a module).

So t.B would be represented as Pdot(Pident {name="t";..}, "B", 0) and M.t.B as Pdot(Pdot(Pident {name="M";..}, "t"), "B").

@vicuna
Copy link
Author

vicuna commented Oct 11, 2013

Comment author: @lpw25

Could we possibly separate this into two separate branches? A simple one adding constructors with named fields (no mutability, no special record values), and a more complex one with the other features under discussion.

This might help get the simple (and very useful) feature merged much more quickly (i.e. before the next release).

@vicuna
Copy link
Author

vicuna commented Oct 12, 2013

Comment author: @alainfrisch

Before committing anything, we should have a clearer picture of where to go. For instance, now that we have type-based disambiguation of names, it seems most benefits would be achievable with an "inline" annotation within the type declaration:

type t = A of foo inline | B of bar inline

and foo = {x: int; z: string}
and bar = {x: int; y: int}

("inline" can be used only in such context: reference to a record type as argument of a constructor defined in the same group).

The effect is to inform the compiler about the tag to use for the inlined records.

It seems the impact on the type-system and on the language syntax is rather limited, while providing most of the benefits and no restriction compared to regular records (polymorphic fields, etc).

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: bobot

I agree that we should have a clear view of all the proposal, after that we can cut in little pieces for easier integration.

Whichever way we choose, if we require the user to name the type of the record the impact on the type-system is small (only type signature inclusion).

The inline tag changes the syntax in a very small way, which is nice.

  • But do we accept to use the record more than once?

type t = A of foo inline | B of bar inline
and internal = Ai of foo inline | Bi of bar inline | Internal_case of ...

and foo = {x: int; z: string}
and bar = {x: int; y: int}

All the uses must be with the same tag.

  • Could inline be used again for already defined record?

  • What is the rule for the type signature inclusion? foo can't appear in the signature of the module without the type t with the inline tag except if foo is private?

  • we can use include instead of inline, in order to avoid the non-technical difficulty of the addition of a keyword.

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: @alainfrisch

But do we accept to use the record more than once?

I'd say: at most one "inline" reference to a given record. Other (non-inline) references are allowed, of course. I don't like the idea that multiple inline references to a given record type are allowed as long as the tags are identical: this is too fragile and expose the tag assignment scheme as a constraint in the type-checker. Moreover, I'm not convinced this is actually useful.

Could inline be used again for already defined record?

No, the "inline" marker assumes the target record type is defined in the same group.

When the definition group is type-checked, the compiler chooses how to represents datatypes: tags for constructors, and, in this proposal, also tags for records. Any client code of the record type has access to this representation choice. This is necessary so that any code creating a record value can pick to correct tag.

Note that the compiler already chooses the runtime representation of a record type when its definition is type-checked (Regular/Float records). The only difference would be that in the "Regular" case, the compiler also remembers the corresponding tag (and from which constructors it comes). And one must also keep the information in the internal constructor description, to adapt code generation.

What is the rule for the type signature inclusion? foo can't appear in the signature of the module without the type t with the inline tag except if foo is private?

Yes, globally, the sum type and the inlined record(s) types would form a monolithic group, which cannot be split in the signature. This would come from free from the inclusion check, which ensures that the representation of records is equivalent. This is why the following is rejected today:

module X :
sig type t type s = {x: t} end =
struct type t = float type s = {x: t} end

(error message: the first declaration uses unboxed float representation)

One can decide to allow or not the case where the inlined record type is made abstract in the signature.

we can use include instead of inline, in order to avoid the non-technical difficulty of the addition of a keyword.

Indeed. We could also use an attribute (from the "extension points" work).

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: bobot

But do we accept to use the record more than once?

I'd say: at most one "inline" reference to a given record. Other (non-inline) >references are allowed, of course. I don't like the idea that multiple inline >references to a given record type are allowed as long as the tags are identical:
this is too fragile and expose the tag assignment scheme as a constraint in the
type-checker. Moreover, I'm not convinced this is actually useful.

I don't want to push it to much, but to be able to have an internal view with more constructor than the external view and a constant-time conversion from one to the other is very interesting. But I agree that we shouldn't complicate heavily this proposal for this possibility.

So one use of the "inlined" record.

Thank you for the float records example, I forgot about it. But contrary to this example you can read the record even if you don't know the number of the tag.

We could also use an attribute (from the "extension points" work).
That is a possibility.

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: @gasche

If we have the constraint that a given record may only be used for one constructor, I prefer bobot's "as" syntax:

type t =
| A of {x: int; z: string} as foo
| B of {x: int; y: int} as bar

It is lighter (some people have pointed out that, even if they don't care about the memory representation, they don't use a record per constructor now because it's too painful to write), and it naturally enforces this restriction (but then people are going to ask about how to share sets of fields between those...).

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: @alainfrisch

Arguments in favor of the "as" syntax:

  • more compact
  • better when the records fits on one line
  • enforces the invariant more naturally
  • makes it easier to adapt code currently using tuples

Arguments in favor of the "inline" syntax:

  • avoid breaking the flow of constructors when the record is long enough
  • makes it easier to adapt code currently using records
  • minimizing the impact on the language grammar and Parsetre
  • benefit from attributes on type definitions (e.g. for documentation)

@vicuna
Copy link
Author

vicuna commented Oct 14, 2013

Comment author: @hcarty

Would the constraint of one record to one constructor allow for:

type t =
| A of { x : int; y : int } as foo
| B of { x : int; y : int } as bar

?

@vicuna
Copy link
Author

vicuna commented Oct 1, 2014

Comment author: @alainfrisch

Some news from the caml-devel front:

  • The story can be simplified if we reject the declaration of several extension constructors with the same name in the same module (at least if they have an record argument). This allow to drop the heavy last form (4) from the previous note.

  • It was noted that the implementation strategy based on generating internal type names was not optimal. I've thus created a new branch constructors_with_record4 which follows a more direct approach by keeping type paths !M.X / !M.t.X internally (corresponding record type declarations are generated on the fly, and not inserted in the environment). This indeed simplifies quite a lot the implementation.

  • I've also added to that branch support for referencing those !-types inside the type declaration that defines them, as in:

    type t = A of {x:int; mutable y: !A option} | B of !A

(we could write !t.A instead of !A)

@vicuna
Copy link
Author

vicuna commented Oct 6, 2014

Comment author: lavi

Sorry to rise an old subject: I mean type parameters declaration.

Current situation is probably enought for most cases, but still fragile.
As the type is now more related to the type constructor, it would be nice to be able to specify (optionaly) the type parameters with this constructor :

Taking an old exemple, this may look likes:

type (_, _) funchain =
| Id : ('a, 'a) funchain
| ('a, 'b, 'c) Comp : { head: ('a -> 'b); tail: ('b, 'c) funchain } -> ('a, 'c) funchain

@vicuna
Copy link
Author

vicuna commented Oct 6, 2014

Comment author: @alainfrisch

There was indeed some resistance against the use of the syntactic ordering (first occurrence) of free variables in the inlined record to define the type parameters for that inlined record. The last branch (constructors_with_record4) has been adapted as follows:

  • For non-GADT constructors, we use the list of type parameters from the sum type (even if some of them are not used in the record).

  • For GADT constructors, we only support the case where the inlined record has 0 (resp. 1 variable) and this yields an inlined record type with 0 (resp. 1) type parameter. Note that one can uses a GADT constructor even when there is no existential variable, which gives a way to avoid introducing phantom type variables when there is at most 1 free variables in the inlined record.

The case of a GADT constructor with more than one free variable requires an explicit ordering between the variables. There are plans to allow specifying the ordering of such variables anyway (in order to provide a way to pattern match on them), and we will wait for such a feature to be available.

@vicuna
Copy link
Author

vicuna commented Oct 6, 2014

Comment author: @lpw25

  • For GADT constructors, we only support the case where the inlined record has 0 (resp. 1 variable) and this yields an inlined record type with 0 (resp. 1) type parameter.

It might be better to just leave out GADT record constructors entirely. This is a much simpler restriction for people to understand than "their can be at most one free type variable if you use GADT syntax".

Note that one can uses a GADT constructor even when there is no existential variable, which gives a way to avoid introducing phantom type variables when there is at most 1 free variables in the inlined record.

It is not really clear to me why we do not use the version without phantom type variables for constructors with regular syntax. The rule would be quite simple: constructors have the same type variables in the same order as their type, with unused type variables removed.

@vicuna
Copy link
Author

vicuna commented Oct 6, 2014

Comment author: @alainfrisch

It might be better to just leave out GADT record constructors entirely.

The case of GADT constructors with one free variable is quite common, and there is no ambiguity for that case, so why should we arbitrarily reject it?

The rule would be quite simple: constructors have the same type variables in the same order as their type, with unused type variables removed.

This is also an option, but is it significantly more difficult to define the list of free variables ordered by their first syntactic appearance than the set of free variables (ordered by their rank in the list of parameters of the sum type) ?

@vicuna
Copy link
Author

vicuna commented Oct 6, 2014

Comment author: @lpw25

The case of GADT constructors with one free variable is quite common, and there is no ambiguity for that case, so why should we arbitrarily reject it?

Because it is more important that the reason for rejecting things is easy to understand than to accept as many things as possible.

Whilst GADTs with a single argument might be common, I don't think that many of them would benefit much from using record syntax.

This is also an option, but is it significantly more difficult to define the list of free variables ordered by their first syntactic appearance than the set of free variables (ordered by their rank in the list of parameters of the sum type) ?

It is for users, especially with any kind of object or polymorphic variant involved.

I'm also not convinced that "first syntactic appearance" rule is stable: do all equal (according to Includecore.type_decl) type declarations result in the same ordering of free variables in their constructors?

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @garrigue

I'm also not convinced that "first syntactic appearance" rule is stable: do all equal (according to Includecore.type_decl) type declarations result in the same ordering of free variables in their constructors?

The answer is clearly no: you would have to normalize the type in some way first.

type ('a, 'b) bpair = 'b * 'a
type _ t = Pair : ('a, 'b) bpair -> ('a * 'b) pair

Not only that, but you cannot even now for sure what are the free variables of a type without normalizing it.

type 'a s = string
type _ t = S : 'a s t -> string t

Does the constructor t have one parameter ? zero parameter ?
This would depend on whether the definition of s is accessible or not.

For these reasons I would be careful about any implicit definition of the parameter list, including removing absent variables.

In the 0-parameter case, GADT constructors with no parameters could be seen as a special case of parameterized constructors, but does it justify having a different behavior for apparently similar cases:

type 'a t = Int of {v: int}
type _ t = Int : {v: int} -> 'a t

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @lpw25

For these reasons I would be careful about any implicit definition of the parameter list, including removing absent variables.

Yes it does seem like using the same parameter list as the type is the only way to guarantee that the list remains stable under changes of type environment.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @alainfrisch

Thanks Jacques, this is very helpful and shows clearly that even the notion of free variables is not so clear (and context dependent). So let's drop support for inlined record on GADT constructors for now (committed to the *4 branch, as a type-checking level error; it would be easy to reject the case during parsing) and settle for the simple and stable definition for type parameters on non-GADT constructors (the same as the sum type).

Any opinion on the support for extension records? This support represents a non-negligible fraction of the patch and of the conceptual complexity of the current proposal.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @lpw25

Any opinion on the support for extension records? This support represents a non-negligible fraction of the patch and of the conceptual complexity of the current proposal.

I thought it no longer affected the "conceptual complexity" of the proposal, as the same things work on both regular and extension constructors. Is there something I've missed?

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @alainfrisch

Consequences of supporting extension constructors:

  • Some extra constraints need to be defined so that a constructor-path refers to a well-defined extension constructor. At least we should reject multiple definitions of extension constructors with inlined records in the same module, which might seem a little bit ad hoc. If we reject all duplicated extension constructors, this breaks backward compatibility (esp. for exceptions).

  • Similarly, extension constructors with inlined records must now be assumed to create fresh types (cf Mtype.contains_type; to exclude unpacking packaged modules in functor bodies).

  • Internally, we need two different code paths to deal with local extension constructors (whose path is a Pident with the same ident as the constructor added to the environment) and with non-local ones (whose path has the form MOD.X -- no type component before the constructor as for regular constructors). This is mostly an implementation issue, but it can yield to situations as:

# module X = struct type t = ..   type t += A of {x:int}  let x : !A = {x=0}   type s = A end;;
module X :
  sig type t = .. type t += A of { x : int; } val x : !A type s = A end
# X.x;;
- : !X.A = {x = 0}
# (X.x : !X.A);;
Error: Constructor X.A does not have an inline record argument

Here, !X.A is a type path with no type component; internally, it refers to the extension constructor A in module X (there can be at most one). But for the surface language, !X.A can also be used to refer to !X.s.A. So the displayed type is incompatible with the interpretation of syntactic !-types. There are plenty of other such cases already in the type system, but this is slightly unsatisfactory. We will somehow need to explain that a type printed as !X.A necessarily refers to the argument of an extension constructor. Or we could decide to use the syntax !X.A only for such constructors, and force users to write !X.s.A to refer to the argument of a regular constructor (in which case the syntax !X.A becomes specific to extension constructors). (The same example can be done with local constructors.)

  • Again internally, there is extra plumbing required to pass the "manifest" extension constructor in case of rebinding in Env. (Currently, when an extension constructor is added to the environment, one doesn't need to know whether it is a rebinding or not.)

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @alainfrisch

I forgot to mention that since extension constructor rebinding cannot be specified in signature (contrary to sum types with a manifest), it is not possible to expose equality of the inlined record type attached to two identical extension constructors (one being bound to the other).

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @lpw25

If we reject all duplicated extension constructors, this breaks backward compatibility (esp. for exceptions).

Personally, I think this is backwards compatibility that it would be good to break if possible. Giving two exceptions in the same module the same name is more likely to be a bug than deliberate, and it is quite likely to be a run-time bug on the error path -- so quite a nasty bug to find. It also doesn't allow any extra functionality, since you can just rename one of the exceptions (they must be internal to the module after all).

Obviously, as with all breaks in backwards compatibility we should be cautious. Perhaps a check through OPAM would indicate how often this occurs in practise.

  • Similarly, extension constructors with inlined records must now be assumed to create fresh types (cf Mtype.contains_type; to exclude unpacking packaged modules in functor bodies).

I had forgotten about this, but as you pointed out earlier, it is unlikely to cause any backwards compatibility issues since this feature was only introduced in 4.02 and is quite obscure.

module X = struct type t = .. type t += A of {x:int} let x : !A = {x=0} type s = A end;;

module X :
sig type t = .. type t += A of { x : int; } val x : !A type s = A end

X.x;;

  • : !X.A = {x = 0}

(X.x : !X.A);;

Error: Constructor X.A does not have an inline record argument

I had also forgotten about this issue. It is slightly annoying, but not too different from other type scoping issues. If the error message included the type of the constructor (X.s) in it's output then I don't think it will cause much confusion.

  • Again internally, there is extra plumbing required to pass the "manifest" extension constructor in case of rebinding in Env. (Currently, when an extension constructor is added to the environment, one doesn't need to know whether it is a rebinding or not.)

Do you mean that

exception Foo of { x : int }

exception Bar = Foo

let f (x : !Foo) : !Bar = x

works?

I'm not sure whether this is the correct behaviour or not. Rebinding an exception is currently an implementation issue (i.e. it cannot be exposed in the signature), so it is not clear that it should effect the typing environment.

It has been previously suggested (or at least it has been suggested to me) that rebinding could be exposed in signatures, which would make your behaviour definitely correct. Interestingly, the current restriction is purely syntactic: the type-checker is perfectly happy with rebindings in signatures.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @alainfrisch

Giving two exceptions in the same module the same name is more likely to be a bug than deliberate

We don't need to go very far to find such a situation: typing/ctype.ml has two instances of duplicated exception names. The fix is indeed trivial (rename one of them), but we've seen users complaining about the new {|...|} syntax breaking their comments :-)

Let's see on caml-devel if anyone is against this move to a stricter semantics.

it is unlikely to cause any backwards compatibility issues since this feature was only introduced in 4.02 and is quite obscure.

I did not say there is anything wrong with it, just that, considering how simple the entire stuff is, such tiny details contribute to a significant fraction of the overall "complexity". And it shows that it's not completely unlikely that we forgot something subtle.

Do you mean that
...
works?

Yes. The (small) implementation overhead (need to pass the information on the side, since the extension_constructor doesn't hold the information) is directly related to the fact that the rebinding information is lost in the signature. It's good that you commented on that. I'll thus simplify the patch to stop propagating the manifest type in such rebinding of extension constructors. If we ever track rebinding in signatures, getting the proper manifest where we need in (in Datarepr) will become straightforward. Otherwise, there is indeed no strong reason to make the example above work. (Interestingly, tracking rebinding in signatures will also require to add the constraint on non-duplication of extension constructor names, and to support substitution of constructor paths in Subst as we do in the branch.)

the type-checker is perfectly happy with rebindings in signatures.

I wouldn't go as far: rebinding is currently not represented in signatures (and thus not supported in Subst nor checked in signature inclusion). More code will be needed than just allowing to specify rebinding in syntactic signatures.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @lpw25

the type-checker is perfectly happy with rebindings in signatures.

I wouldn't go as far: rebinding is currently not represented in signatures (and thus not supported in Subst nor checked in signature inclusion). More code will be needed than just allowing to specify rebinding in syntactic signatures.

Good point, the type-checker would currently basically desugar them to regular exception definitions, which is probably not what you want for things like functor parameters. Still they would not be hard to support if we wanted to.

@vicuna
Copy link
Author

vicuna commented Oct 7, 2014

Comment author: @alainfrisch

If the error message included the type of the constructor (X.s) in it's output then I don't think it will cause much confusion.

Good idea! We now get:

# module X = struct type t = .. type t += A of {x:int} let x : !A = {x=0} type s = A end;;
module X :
  sig type t = .. type t += A of { x : int; } val x : !A type s = A end
# (X.x : !X.A);;
Error: Constructor X.s.A does not have an inline record argument

@vicuna
Copy link
Author

vicuna commented Oct 10, 2014

Comment author: @alainfrisch

Yet another branch... constructors_with_record5. This one drops support for referring to inline record types, and adds some syntactic restrictions to ensure that these types never escape. The only constructions that are allowed on pattern variables r bound to inline records (with a pattern of the form "A r" or "A (... as r)") are:

r.x
r.x <- ...
A r
A {r with ...}

Inline record types are now only used as an internal type-checking device allowing us to reuse all the existing machinery for type-checking and compiling records (including mutation, polymorphic fields, etc), but they are not exposed to users any more. Some work is done so that error messages don't mention them (not sure if this can still happen).

There was some concerns raised amongst developers about the specific naming mechanism, and this latest proposal, inspired by a patch from Jacques Garrigue, aims at minimality to facilitate inclusion, while keeping the key benefits of inlined records. Future versions might re-allow some kind of naming ability for inline records, and drop the syntactic constraints. Or not.

@vicuna
Copy link
Author

vicuna commented Oct 10, 2014

Comment author: ybarnoy

This seems like the way to go for now. How soon can we integrate it?

@vicuna
Copy link
Author

vicuna commented Oct 14, 2014

Comment author: @alainfrisch

The latest branch (*5) has been integrated in the trunk!

@vicuna
Copy link
Author

vicuna commented Oct 18, 2014

Comment author: @bobzhang

thanks for the nice work.

type t = A of {x : int} | B of {x :int};;

type t = A of { x : int; } | B of { x : int; }

let f = function (A x | B x) -> x.x;;

Error: The variable x on the left-hand side of this or-pattern has type
t.A but on the right-hand side it has type t.B

The error message still exposes 't.B' which might not be friendly for new users

@vicuna
Copy link
Author

vicuna commented Oct 19, 2014

Comment author: @garrigue

Actually I would argue that using named types in error messages makes them more understandable, in particular when the error is a conflict.
What about just explaining it in the manual?

@vicuna
Copy link
Author

vicuna commented Oct 20, 2014

Comment author: ybarnoy

I think it's very clear in error messages and should be kept as is. The feature as a whole works very well, from what I've seen so far.

@vicuna
Copy link
Author

vicuna commented Nov 25, 2014

Comment author: @alainfrisch

Let's mark this is resolved! The current solution addresses quite exactly what was in the original description. Further discussion about extending the system to make the inline record types visible in the type system can be done in another ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants