Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005528OCamlOCaml generalpublic2012-03-08 17:362014-06-11 16:44
Reporterfrisch 
Assigned Tofrisch 
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusassignedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0005528: Inline records for constructor arguments
DescriptionOCaml allows n-ary constructors for sum types. Instead of relying on position, it would be convenient to name the fields. Of course, one can use records, but this requires an extra type declaration and has some runtime overhead.

I've started a new branch (constructors_with_record) in the SVN to allow naming arguments of constructors, with the same syntax as records.

Example:

 type t =
    | A of {msg: string; foo:int}
    | B of string
    | C

 let f = function
    | A {msg; _} | B msg -> msg
    | C -> ""

GADTs and exceptions are supported. It is possible to define mutable fields, but there is currently no way to mutate them. Polymorphic fields are not supported.

Note that this proposal also gives a way to disambiguate record field names:

 type a = A of {foo: string; bar: int}
 type b = B of {foo: int; baz: bool}
 let f (A{foo; _}) = foo
 let g (B{foo; _}) = foo

To support mutation and field overriding, I was thinking of a syntax like:

 type t = A of {mutable l: int} | B of {x:int; y:int}

 match ... with
 | A r -> r.l <- r.l + 10
 | ...

 match ... with
 | B r -> B {r with x = r.y; y = r.x}
 | ...

Binding (directly like above or with an alias pattern) the "record" argument creates a special value (special in the same way that "self" or "ancestor" variables in objects are special) which can only be used in the following context:
   - field projection: r.l
   - field assignment: r.l <- e
   - overidding: B {r with ...}

TagsNo tags attached.
Attached Filesdiff file icon patch_encoding.diff [^] (27,530 bytes) 2014-04-01 10:33 [Show Content]

- Relationships
related to 0006374assignedfrisch A single wildcard for n-ary type constructors 
related to 0005525resolvedgarrigue Resolving record fields using all specified fields 

-  Notes
(0007022)
frisch (developer)
2012-03-08 17:46

Some motivation. I think that most of the time, using "records" instead of "tuples" for constructor arguments is better (and so it's a good idea to make it as easy as possible to do it in the language):

  - Field names document the meaning of each field.

  - Thanks to punning, code is often not longer than with tuples, and this encourages to use uniform names to bind fields throughout the code base.

  - Ignoring fields is easier than with tuples (just { ... ; _ }).

  - Extending the constructor with more fields does not break existing patterns of the form { ... ; _ } (or when the warning 9 is disabled).
(0007023)
frisch (developer)
2012-03-08 17:48
edited on: 2012-03-08 17:54

(Note: Camlp4 has not been updated; it does not compile. OCamldoc compiles but fails on the new feature.)

(0007025)
gasche (developer)
2012-03-08 19:22

You surely know that Caml Light had mutable sum constructors. Your mutation syntax reminds me of it -- but better behaved, as the syntactic lvalue is still a field projection rather an arbitrary variable.

  http://caml.inria.fr/pub/docs/manual-caml-light/node4.6.html [^]

This is also reminding of SML's anonymous record types (see eg. http://adam.chlipala.net/mlcomp/ [^] ); those are not unboxed with the constructor (but MLton can optimize it) but allow field disambiguation -- in sometimes treacherous ways.


This extension prompts a question (say, for a teacher): do we have a duplication of concepts here (two slightly different ways to define records, just as tuples and the * used in sum constructors are slightly different), or is there a reasonable explanation of the first-class record concept in terms of this feature? I mean, maybe it's possible to say once and for all that the general data type is a labelled sum of labelled products, and explain current records as sums with only one case, where the constructor is left implicit. Could something like that reliably explain the semantics?


The restrictions on the "special" variables seem a bit icky. From a language point of view, I'd rather have general unboxed datatypes, with a kind system guaranteeing that they can't instantiate polymorphic parameters. Haskell has those and it's nice -- not sure how that works with the GC, however, and I understand changing the type checker is probably not an option. The pragmatism of your solution makes it simpler, and OCaml already has special cases for unboxed float anyway.


You included no example where two constructors of a single sum types would share some field names. Is that possible/reasonable? That looks like a potential use case -- having a second constructor that has the same fields than the first, plus some, as it is more memory-efficient than a record with an option type at the end.


This construction would allow to implement Queue efficiently without magic.
  type 'a cell =
    | Null
    | Cell of { content : 'a; mutable next : 'a cell }
(0007026)
jjb (reporter)
2012-03-08 20:20

For what it's worth, I have several times wanted such a feature. I think that the value of flexible extension and consistent naming, in particular, would be significant for larger code bases.
(0007027)
frisch (developer)
2012-03-08 20:43

I've added my extra proposal, allowing field projection and mutation on pseudo record-argument capture variables in patterns. Example:

  type t = A of {x:int; mutable y:int}

  let get_x (A r) = r.x
  let set_x (A r) x = r.x <- x

  let bad (A r) = r (* rejected *)

When used in a or-pattern, these pseudo-variable must have the same kind on both sides; the kind includes the constructor, so the following is rejected:

  type t = A of {x:int} | B of {x:int}
  let get_x = function (A r | B r) -> r.x (* rejected *)
(0007028)
frisch (developer)
2012-03-08 20:52

Gabriel, thanks for your note.

I believe that real records are indeed mostly a special case of this new feature
(using a sum type with one constructor), in the same way that n-tuples could be encoded with a sum type with one n-ary constructor. Of course, the static semantics is quite different: with records, the type is inferred from the labels, but with "record constructors", the type is derived from the constructor, and we know directly the type for the fields.

Currently, two features of records are not supported: polymorphic fields, and overriding ({e with x = ...}), although I don't see any reason why they couldn't be.

---

You have a question about two constructors in a given sum type sharing some field names. Yes, this is possible:

 type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
  | A {x} | B {x} -> x
  | C {x} -> int_of_string x


It is even feasible to make the following work:

 let enrich foo = function
 | A r -> B {r with foo}
 | z -> z

i.e. use a record-argument capture variable as the starting point for overriding, with a different constructor (having more fields). I'm not sure if this is really desirable.
(0007029)
garrigue (manager)
2012-03-09 02:37

I almost proposed that one (including mutable fields) almost ten years ago.
I'm not going to oppose it, but this indeed raises the question of feature overloading.
Inline records have many advantages over normal records, their only limitation being that one can only get a handle through pattern-matching, but this can be a pain for some uses.
So the question for users is naturally, which one to use, knowing that both have advantages and disadvantages.
This is the only reason I finally didn't propose it.
But if there is a good rationale, this might be fine.

Note that the use of special variables is not a big problem in my mind: this is already the case for instance variables in objects.

Also an extra potential feature would be the ability to use type information to extract without pattern-matching.

type t = A of {x:int} | B of {x:int;y:int}

let get_x (r:t) = r.x
(0007039)
frisch (developer)
2012-03-10 12:58

> type t = A of {x:int} | B of {x:int;y:int}
> let get_x (r:t) = r.x

I'm not entirely convinced by this extension. How would the code above behave if some constructor in t don't have an "x" field? (Also, we'd need to be careful with type inference and principality here, but you know that better than I do!)

Pattern-matching with punning is not so bad:

let get_x (A{x}|B{x}) = x

(yes it becomes tedious with more constructors)
(0007041)
garrigue (manager)
2012-03-12 00:44

Actually I was not thinking of generating a pattern-matching, but rather to restrict this feature to types for which it can be implemented trivially, as a direct field access without any test. I.e. x would have to be present, with the same type, and at the same position in all cases of the type. This of course includes the trivial case where the type has only one case.
This makes it easier to represent a syntax tree including location information in all nodes for instance.
I think lots of people have wished this behavior for a long time, and combining variants and records makes it easy.
Concerning principality, this is a typical case where -principal can recover it.

But do not take this suggestion as meaning that I support variants of records unconditionally.
I still think that we first need a good rationale. Being useful is not sufficient when there is a large overlap with an already existing feature.
(0007052)
frisch (developer)
2012-03-12 20:45

> This makes it easier to represent a syntax tree including location information in all nodes for instance.

I see, but I'm not sure it's worth introducing an extra feature, whose meaning is not completely obvious. It just avoids the need to define (once for each common field) functions like:

 let get_loc (Var{loc}|App{loc}|Lambda{loc}) = loc

> I still think that we first need a good rationale.

A first reason is that it makes it makes it possible to define sum types with mutable fields, without space overhead. For some low-level data structures (for instance mutable binary trees), it can give some performance boost without requiring Obj.magic.

But the most important reason, in my opinion, is that it encourages to name arguments of constructors instead of relying on position, and it is often a good idea. In particular, it removes the following counter-arguments to naming constructor arguments with explicit record declarations:

 1. Syntactic burden of defining extra records.

 2. "Bad" support for field name overloading.
 
 3. Runtime overhead associated to extra record nesting.


In the past, for instance, I've proposed to replace some n-ary constructors in OCaml sources with records. This would have simplified maintaining on LexiFi side some of our local patches (avoiding the need to add wildcard patterns at many places). This was rejected, based on the fact that it would have increased memory usage and the size of cmi files (argument 3 above).
(0007059)
doligez (administrator)
2012-03-13 17:54

> it makes it possible to define sum types with mutable fields

This is a red flag. Are you sure your implementation is safe with pattern-matching and "when" clauses?
(0007060)
frisch (developer)
2012-03-13 18:29

> Are you sure your implementation is safe with pattern-matching and "when" clauses?

I'm not sure at all, this would need to be checked.

Note that the semantics is weird anyway (and I'm not absolutely convinced it is safe) even with regular records:

type t = A of int | B of string;;
type u = R of t ref;;
let f = function
 | R{contents=A _} -> ""
 | R({contents=B _} as r) when (r.contents <- A 10; r.contents = A 20) -> ""
 | R({contents=B _} as r) -> (match !r with A _ -> "BAD" | _ -> "OK");;

It's really weird that f can return "BAD" (and it does, with argument (R (ref (B""))) ).
(0007087)
lavi (reporter)
2012-03-15 11:02

This is perhaps to early to speak about optimization, but for records with float only fields, it would be better to de-inline them internally, or even better to treat the first constructor with float only fields as a float array and de-inline the others.
(0007157)
Bardou (reporter)
2012-03-26 10:41

Hello,

I have been wishing for this for a long time. Mainly because it's just tedious to declare a record for each constructor. Too often do I start with constructors with a low argument count (say 1 to 3) and find that I have to add new arguments. At some point I have so many arguments that I really have to declare the record, and thus rewrite all the relevant parts of the code.

I would add that it would be convenient for me to be able to name only *some* of the arguments. For instance :

type t = Sum of pos: Lexing.position * t * t

I would access the anonymous arguments using pattern-matching as usual, and use ".pos" as a shortcut sometimes.

Unifying records and sums is great, unifying tuples at the same time seems even better to me. The OPA language (of Mlstate) does this, if I'm not mistaken.
(0007199)
bobot (reporter)
2012-03-27 17:08

For a semantic, syntax, typing perspective, can't we consider that:

type t =
   | A of string
   | B of {msg: string; mutable foo:int}
   | C

is exactly the same thing than:

type __t = {msg: string; mutable foo:int}

and t =
   | A of string
   | B of __t
   | C

Of course the "when" clause can lead to strange behaviors, but not different than the one we have with such a type t and __t.

Moreover we can consider the record nearly like an actual record. The nearly mean that we create it with the tag of B instead of the default one (the first, if I remember well) used for record.

So:
type t =
   | A of string
   | B of ({msg: string; mutable foo:int} as t2)
   | C

is exactly for semantic, syntax and typing the same thing than:

type t2 = {msg: string; mutable foo:int}

and t =
   | A of string
   | B of t2
   | C

eg:

match x with
  | A s ->
  | B ({ msg = ""} as r) -> f (r : t2)

For compilation: B r is just the identity function, the matching just doesn't extract. You can't make the difference between the two definitions, except if you use Obj.magic to see that (Obj.magic (B r)) == (Obj.magic r), but Obj.magic is not in the semantic right?

(It's in fact not the same thing for typing in regard of module subtyping, if t is made abstract, t2 must be done private. But that can be quite useful)

My 2 cents,
I'm very happy that someone try to implement mutable sum type.
(0009325)
bobot (reporter)
2013-05-23 16:07

With 0005759 (Using well-disciplined type-propagation to disambiguate label and constructor names) the last view (http://caml.inria.fr/mantis/view.php?id=5528#c7199 [^]) should be able to work with such a type (http://caml.inria.fr/mantis/view.php?id=5528#c7028 [^]):

type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
  | A {x} | B {x} -> x
  | C {x} -> int_of_string x
(0009327)
gasche (developer)
2013-05-23 16:52

I'll also note that the problem of side-effects on mutable fields in "when" clause has now been fixed by Luc in http://caml.inria.fr/mantis/view.php?id=5992 [^]
(0010450)
hongboz (developer)
2013-10-10 17:01

F# 3.1 has added named tuples
http://blogs.msdn.com/b/fsharpteam/archive/2013/06/27/announcing-a-pre-release-of-f-3-1-and-the-visual-f-tools-in-visual-studio-2013.aspx [^]
(0010453)
bobot (reporter)
2013-10-11 10:37

The implementation done by Alain already work well:
===
type t =
  | A of {msg: string; mutable foo:int}
  | B of string
  | C

let x = A {msg = "toto"; foo = 1}

let foo =
  match x with
  | A r -> r.foo <- 1; r.foo
  | _ -> 1
===

Alain, do you think that allowing the use of {msg = "toto"; foo = 1} as an actual record can be interesting? Will you accept, at least in your branch, a patch that does that?

What else can be done for progressing on this feature?
(0010454)
frisch (developer)
2013-10-11 11:05

> Alain, do you think that allowing the use of {msg = "toto"; foo = 1} as an actual record can be interesting?

Indeed, it could be interesting to consider that:

===
type t =
  | A of string
  | B of {msg: string; mutable foo:int}
  | C
===

also creates a special record type (let's call it t.B). It is special because it must be created with tag 1 instead of 0. One would then accept:

 let f = function B r -> r | _ -> assert false
 val f: t -> t.B

 let g r = B r
 val g: t.B -> t


I don't see immediately anything wrong this is approach. It might actually clean things up a little bit, since the variable corresponding to the record argument in "f" above is given a proper core type. Of course, we introduce some complexity on paths for type constructors.
(0010455)
bobot (reporter)
2013-10-11 11:24

In order to simplify the implementation and reduce the impact on the compiler code, we can avoid to create a new kind of type constructors by requiring that:

- if you want to refer to this type, you must name it:
  | B of {msg: string; mutable foo:int} as tB

- when the type is not named we use the type named "t.B" for pretty-printing

The problem is for the ocamlc -i option since in that case we need to print a type for eg. the f, g function that can be parsed by ocaml.
(0010456)
frisch (developer)
2013-10-11 15:38

I'm wondering if we can reuse the current internal definition of Path.t, interpreting a lowercase component before a dot as a type (and not a module).

So t.B would be represented as Pdot(Pident {name="t";..}, "B", 0) and M.t.B as Pdot(Pdot(Pident {name="M";..}, "t"), "B").
(0010458)
lpw25 (developer)
2013-10-11 17:43

Could we possibly separate this into two separate branches? A simple one adding constructors with named fields (no mutability, no special record values), and a more complex one with the other features under discussion.

This might help get the simple (and very useful) feature merged much more quickly (i.e. before the next release).
(0010460)
frisch (developer)
2013-10-12 08:03

Before committing anything, we should have a clearer picture of where to go. For instance, now that we have type-based disambiguation of names, it seems most benefits would be achievable with an "inline" annotation within the type declaration:

  type t = A of foo inline | B of bar inline

  and foo = {x: int; z: string}
  and bar = {x: int; y: int}

("inline" can be used only in such context: reference to a record type as argument of a constructor defined in the same group).

The effect is to inform the compiler about the tag to use for the inlined records.

It seems the impact on the type-system and on the language syntax is rather limited, while providing most of the benefits and no restriction compared to regular records (polymorphic fields, etc).
(0010462)
bobot (reporter)
2013-10-14 11:27

I agree that we should have a clear view of all the proposal, after that we can cut in little pieces for easier integration.

Whichever way we choose, if we require the user to name the type of the record the impact on the type-system is small (only type signature inclusion).

The inline tag changes the syntax in a very small way, which is nice.
- But do we accept to use the record more than once?

type t = A of foo inline | B of bar inline
and internal = Ai of foo inline | Bi of bar inline | Internal_case of ...

and foo = {x: int; z: string}
and bar = {x: int; y: int}

All the uses must be with the same tag.

- Could inline be used again for already defined record?

- What is the rule for the type signature inclusion? foo can't appear in the signature of the module without the type t with the inline tag except if foo is private?

- we can use include instead of inline, in order to avoid the non-technical difficulty of the addition of a keyword.
(0010463)
frisch (developer)
2013-10-14 12:01

> But do we accept to use the record more than once?

I'd say: at most one "inline" reference to a given record. Other (non-inline) references are allowed, of course. I don't like the idea that multiple inline references to a given record type are allowed as long as the tags are identical: this is too fragile and expose the tag assignment scheme as a constraint in the type-checker. Moreover, I'm not convinced this is actually useful.

> Could inline be used again for already defined record?

No, the "inline" marker assumes the target record type is defined in the same group.

When the definition group is type-checked, the compiler chooses how to represents datatypes: tags for constructors, and, in this proposal, also tags for records. Any client code of the record type has access to this representation choice. This is necessary so that any code creating a record value can pick to correct tag.

Note that the compiler already chooses the runtime representation of a record type when its definition is type-checked (Regular/Float records). The only difference would be that in the "Regular" case, the compiler also remembers the corresponding tag (and from which constructors it comes). And one must also keep the information in the internal constructor description, to adapt code generation.

> What is the rule for the type signature inclusion? foo can't appear in the signature of the module without the type t with the inline tag except if foo is private?

Yes, globally, the sum type and the inlined record(s) types would form a monolithic group, which cannot be split in the signature. This would come from free from the inclusion check, which ensures that the representation of records is equivalent. This is why the following is rejected today:

  module X :
  sig type t type s = {x: t} end =
  struct type t = float type s = {x: t} end

(error message: the first declaration uses unboxed float representation)

One can decide to allow or not the case where the inlined record type is made abstract in the signature.

> we can use include instead of inline, in order to avoid the non-technical difficulty of the addition of a keyword.

Indeed. We could also use an attribute (from the "extension points" work).
(0010464)
bobot (reporter)
2013-10-14 12:25

>> But do we accept to use the record more than once?

>I'd say: at most one "inline" reference to a given record. Other (non-inline) >references are allowed, of course. I don't like the idea that multiple inline >references to a given record type are allowed as long as the tags are identical:
>this is too fragile and expose the tag assignment scheme as a constraint in the
>type-checker. Moreover, I'm not convinced this is actually useful.

I don't want to push it to much, but to be able to have an internal view with more constructor than the external view and a constant-time conversion from one to the other is very interesting. But I agree that we shouldn't complicate heavily this proposal for this possibility.

So one use of the "inlined" record.

Thank you for the float records example, I forgot about it. But contrary to this example you can read the record even if you don't know the number of the tag.

>We could also use an attribute (from the "extension points" work).
That is a possibility.
(0010465)
gasche (developer)
2013-10-14 15:26

If we have the constraint that a given record may only be used for one constructor, I prefer bobot's "as" syntax:


  type t =
    | A of {x: int; z: string} as foo
    | B of {x: int; y: int} as bar

It is lighter (some people have pointed out that, even if they don't care about the memory representation, they don't use a record per constructor now because it's too painful to write), and it naturally enforces this restriction (but then people are going to ask about how to share sets of fields between those...).
(0010466)
frisch (developer)
2013-10-14 15:31
edited on: 2013-10-14 15:37

Arguments in favor of the "as" syntax:

 - more compact
 - better when the records fits on one line
 - enforces the invariant more naturally
 - makes it easier to adapt code currently using tuples

Arguments in favor of the "inline" syntax:

 - avoid breaking the flow of constructors when the record is long enough
 - makes it easier to adapt code currently using records
 - minimizing the impact on the language grammar and Parsetre
 - benefit from attributes on type definitions (e.g. for documentation)

(0010467)
hcarty (reporter)
2013-10-14 15:58

Would the constraint of one record to one constructor allow for:

  type t =
    | A of { x : int; y : int } as foo
    | B of { x : int; y : int } as bar

?
(0010468)
gasche (developer)
2013-10-14 16:07

hcarty: yes, the important point is that the type names are distinct. Common field names are handled well by type-directed disambiguation.

frisch: also the "as" syntax introduces no new keyword. I don't think you would get away with an extension for this as it introduces a rather deep change to the semantics (eg. in the FFI layer).

If you rework the AST structure, could you make it so that both the whole constructor and also the record itself can be attached attributes?

| K of { ... }[% ...] as foo
  [% ...]
(0010469)
frisch (developer)
2013-10-14 16:18

gasche: as bobot suggested, we could use the existing "include" keyword (probably in prefix position).

> could you make it so that both the whole constructor and also the record itself can be attached attributes?

Of course, one could allow [@@...] attributes on the inlined record definition (before the "as" keyword). It's more an aesthetic question on the complexity of the Parsetree structure and on the resulting user code.
(0010470)
gasche (developer)
2013-10-14 16:26

Remark: the "as" syntax also fully encodes the constraint that one type cannot be exported without the other: removing the "as foo" makes the type declaration syntactically incorrect, and removing the top sum type remove the "as" as well.
(0010471)
frisch (developer)
2013-10-14 16:27
edited on: 2013-10-14 16:27

gasche: this is exactly what I meant by "enforces the invariant more naturally".

(0010473)
lpw25 (developer)
2013-10-14 16:50

> Remark: the "as" syntax also fully encodes the constraint that one type cannot be exported without the other: removing the "as foo" makes the type declaration syntactically incorrect

If we did go this way, I think that not having an `as foo` should be allowed: you probably don't want one most of the time.

Also a definition

    type t = Foo of { x : int } as foo

should be exportable as:

    type t = Foo of { x : int }
    type foo

so that users can hide the relationship if they want.
(0010475)
frisch (developer)
2013-10-14 16:57

> I think that not having an `as foo` should be allowed: you probably don't want one most of the time.

If we do that, "item" attributes become ambiguous:

 type t = Foo of {x : int} [@@doc]

Also, how should the type-checker behaves on:

  function Foo r -> r

?
(0010476)
lpw25 (developer)
2013-10-14 17:29

Is there any particular reason to prefer the `as foo` syntax to Alain's earlier suggestion of creating a `t.Foo` type?

It seems preferable to avoid adding syntax (especially binding constructs) where possible.

Note that `t.Foo` was previously suggested as a simple means of distinguishing the `Foo` constructor of `t` from other `Foo` constructors. If that syntax was added then we could use something like:

  #Foo

to represent the type of the argument to `Foo`, and only use:

  #t.Foo

when we need to distinguish one `Foo` from another.
(0010477)
lpw25 (developer)
2013-10-14 17:40

>If we do that, "item" attributes become ambiguous:
>
> type t = Foo of {x : int} [@@doc]

Why is that ambiguous? It should be an attribute on the type t, because that is the only structure item present.

> Also, how should the type-checker behaves on:
>
> function Foo r -> r
>
>?

Similar to the way it does at the moment:

    # type t = Foo of int * float;;
    type t = Foo of int * float
    # function Foo f -> f;;
    Characters 9-14:
      function Foo f -> f;;
               ^^^^^
    Error: The constructor Foo expects 2 argument(s),
           but is applied here to 1 argument(s)
(0010479)
frisch (developer)
2013-10-14 17:53

> Why is that ambiguous?

Because one probably wants to be able to attach "item attributes" on the inlined record definition as well (e.g. some documentation).

> Similar to the way it does at the moment

So having as "type t = Foo of {x: int} as foo" allows to write "function (Foo r) -> r", but defining "type t = Foo of {x: int}" would have this function rejected?
(0010480)
lpw25 (developer)
2013-10-14 18:11

> Because one probably wants to be able to attach "item attributes" on the inlined record definition as well (e.g. some documentation).

I would have thought that the documentation would be better attached to the variant constructor as a whole:

    type t = Foo of {x : int} [@doc]

> So having as "type t = Foo of {x: int} as foo" allows to write "function (Foo r) -> r", but defining "type t = Foo of {x: int}" would have this function rejected?

Yes, although I actually prefer some version of your `t.Foo` idea.
(0010483)
gasche (developer)
2013-10-15 10:24

"as foo" is better than t.Foo because it is much easier to explain to beginners. To a solid approximation, "as foo" behaves like syntactic sugar for declaring the record on the side -- the details that are glossed over are representational and non-leaking.

On the other hand, t.Foo enters a whole new world of types that have a weird name and don't look like types. I don't want to be in the room when a beginner will get this in an error message.

PS: Besides, t.Foo was the syntax suggested for disambiguating sum constructors if type-directed-disambiguation was rejected. I'm slightly wary of syntaxes that have been suggested as solutions for so many different problem -- with incompatible semantics. What will you do for the next problem if t.Foo is already taken?
(0010487)
dim (developer)
2013-10-15 11:23

Another problem I see with "t.Foo": how do we specify the type variables of "t.Foo"?

With "as foo" we can imagine writing "as ('a, 'b) foo". It seems even simpler with the inline solution.
(0010488)
bobot (reporter)
2013-10-15 14:08

Thanks dim, it is a nice point against "t.Foo" or more generally for generating automatically a name for the record. So if the name can appear somewhere the user must give explicitly a name for the record. Since the user can make a typing-error in any program, the user must always give a name for the record.

Does everyone agree?
(0010489)
frisch (developer)
2013-10-15 14:24

Leo seems to disagree:
> If we did go this way, I think that not having an `as foo` should be allowed:
> you probably don't want one most of the time.


Personally, I think I'd indeed prefer to require a name the record type. This would simplify the explanation of the feature.

Concerning the choice of syntax between:

 type t = A of include foo
 and foo = {....}

and:

 type t = A of {....} as foo

It is true that the second form naturally enforces the invariants required to allow inlining the record, but it looks more like a different notion compared to normal records, and it creates a new syntax to bind types (which might make the life a tiny bit harder for external tools). The first form makes it clearer that this is only about a choice of runtime representation, and that type checking is not really affected (as long as the required invariant is satisfied).

*If* the argument in favor of "... as foo" is that it is syntactically lighter, this would apply equally to other cases; one could indeed allow it as syntactic sugar for any nested record/sum type declaration. E.g.

 type t = {x : (A | B) as foo; y : int}

would be equivalent to:

 type t = {x : foo; y : int}
 and foo = A | B

and one would still use a "include" keyword for the case of record inlining within a constructor:

 type t = A of include {....} as foo


But I don't think this is a very good idea. I prefer to remain as close as possible to the current syntax (i.e. without the "as").
(0010490)
gasche (developer)
2013-10-15 14:40

I think the feature would be very useful for expert users that are wary of memory representation issues. The hardest question, I believe, is whether it is worth it in terms of gain versus added complexity/irregularity, and whether language maintainers are convinced. I think both "include" and "as" are reasonable choices, and would be fine with any of them.
(0010491)
hcarty (reporter)
2013-10-15 14:40

What is the benefit of:

  type t = A of include foo
  and foo = { ... }

over

  type t = A of foo
  and foo = { ... }

? The 'include' form has one less level of indirection? If that is the only benefit then from an end-user perspective this seems like a complicated change for little benefit.

The original proposal:

  type t = A of { ... }

was appealing for all of the reasons listed in the first comment (http://caml.inria.fr/mantis/view.php?id=5528#c7022 [^]). I don't see any improvements to those conditions in the 'include' case over what is currently possible in 4.01.0 and earlier.

The 'as' proposal keeps more of the benefits listed in c7022 but requiring any explicit naming on embedded records feels less usable than dropping the requirement for an explicit name.
(0010492)
frisch (developer)
2013-10-15 14:55

> The 'include' form has one less level of indirection? If that is the only benefit then from an end-user perspective this seems like a complicated change for little benefit.

The change for the end-user is just adding one keyword, this is not really complicated.

> I don't see any improvements to those conditions in the 'include' case over what is currently possible in 4.01.0 and earlier.

The point is that using records for arguments of constructors has already become a reasonable approach thanks to the introduction of type-based disambiguation of labels. There remain two problems with using records instead of n-ary constructors:

  1. A runtime overhead.

  2. A syntactic overhead on the declaration.


In my opinion, addressing 2 only would not justify an important modification to the language grammar (if it would, the same argument would also call for allowing arbitrary embedded type definitions, as I suggest in my previous note).

Eliminating the runtime overhead might not be so important in most cases, but for the definition of critical data structures, it is not negligible. It's not only that it allows to use records over n-ary constructors, this is also about exposing some possibility of the runtime system currently unexploited (namely, tagged union of blocks with mutable fields).
(0010493)
hcarty (reporter)
2013-10-15 15:59

Thank you for the explanation. As the proposal currently stands, am I correct that aa and bb will have the same run-time representation?

  type a = A of int * string

  type b = A of include foo
  and foo = { x : int; y : string }

  let aa = A (1, "s")
  let bb = A { x = 1; y = "s" }

Can foo come from anywhere? For example, is this valid:

  module Bar = struct type t = { x : int; y : string } end

  type c = A of include Bar.t

If c is a valid type:

If Bar.t is abstract does that make type c invalid? Or would the 'include' be silently ignored/warned against. Does the same answer hold if Bar.t is something other than a record (sum type, tuple, etc)?
(0010494)
frisch (developer)
2013-10-15 16:17

> Can foo come from anywhere?

No, the target of the "include" must be a record type (or maybe an abstract one) defined in the same group. And there must at most one "include" reference to such a record type. This is really the same constraint as for the "as" syntax. Just a different syntax.
(0010495)
bobot (reporter)
2013-10-15 16:23

@hcarty
>As the proposal currently stands, am I correct that aa and bb will have the
>same run-time representation?
Yes, that's true.

> Can foo come from anywhere?
We currently agree for this restriction as defined by @frisch:
>> the "inline" marker assumes the target record type is defined in the same group and used in an "inline" only once

The reasons are:
 1) same group: during the definition of the type of the record we must know the tag to use
 2) unicity: it is dangerous to make the typer dependent on the runtime property that the first non constant constructors of datatypes have the same tag.
(0010496)
hcarty (reporter)
2013-10-15 16:53

Could you use different names for the same type to get around the 'only once' restriction? For a somewhat contrived example:

  type layout_t =
    | C of include c_t
    | Fortran of include fortran_t
  and type c_t = { offset : int; index : int }
  and type fortran_t = c_t

If the record definition is large then the copy-and-paste repetition could be tedious and error-prone.
(0010497)
bobot (reporter)
2013-10-15 17:05

In my opinion we agree on:
 - all record must be named
 - record are used once and in the same group in which they are defined

and the last choices are:
 - as or inline or include (same than inline but already a keyword):
   * as: a lot nicer to read, clear invariant
   * inline: parser/syntax highlighter doesn't need a lot of modification
   * include: same than inline but already a keyword

  I don't know how we can make a choice among them.

 - signature inclusion:
  type t =
    | A of {x : int; y : int; mutable z : int} as foo
    | ...

  Is this signature accepted?
   type foo = private {x : int; y : int; mutable z : int}

 - as syntaxic sugar, can we keep the usual application?
  With the previous type t, [A(1,2,3)] unsugared using the field order in
  the definition into [A{x=1;y=2;z=3}]
(0010498)
bobot (reporter)
2013-10-15 17:13

@hcarty. Do you want the value of type c_t and fortran_t be of the same type? If it is the case, the runtime doesn't allow that. Ask yourself which tag {offset=1;index=2} must have.

At the end, "as" seems a lot more nicer for explanations.
(0010499)
Bardou (reporter)
2013-10-15 17:14

FWIW, the main reason the original proposal was appealing to me is that I find it annoying to have to move around to define the records. The "inline" approach is thus much less appealing than the "as" approach.

Also, the restriction about having to inline a record at most once seems much less intuitive to me than the "as" syntax. I was originally going to comment that I did not understand this restriction, until I re-read this whole thread and thought about it a little longer. I can't believe it would be easier to explain to a beginner that a record can only be inlined once, than to explain what "as" does.
(0010500)
dario (reporter)
2013-10-15 20:30

Count me in among the group that prefers the "as" syntax. I could really see myself using it in practice, whereas the more cumbersome alternatives ("include" or "inline") not so much.

About breaking editors and editor support in general (this issue was mentioned higher up in the thread): I really don't think that should be a consideration in this decision. Syntax highlighting rules can always be tweaked to accommodate language evolution. In contrast, once we commit to a certain syntax it must be supported forever. Moreover, minor or temporary breakage in editor support is an annoyance but not a show-stopper.
(0010501)
lpw25 (developer)
2013-10-15 20:35
edited on: 2013-10-15 21:03

I think this proposal has somewhat lost sight of its initial aims, which were for a simple mechanism for giving names to variant arguments.

For this the following syntax is obviously sufficient:

    type t = Foo of { x: int; y: float }

There is no need for `as ..` or `include ..` to achieve this.

Then Alain suggested that mutable variant arguments might be useful, with a syntax like:

    match foo with
      Foo r -> r.x <- 4

Now *if* we want to support this, then the issue becomes what type does `r` have (unless `r` is some special binding like `self` as Alain initially proposed).

The simplest and cleanest solution to this, as I suggested in an earlier message, is to add a syntax like `#Foo` that means the type of the argument to Foo.

Now, since constructors don't have unique names, that would also require adding a disambiguating syntax like M.t.Foo to give a unique name to every constructor. This is actually useful anyway, so I don't think it is a problem.

This `#Foo` construct is simple to explain, and knowledge of it is not required to write the vast majority of code using variants with named arguments. It would also work with traditional variants, allowing their arguments to handled as a single block.

To answer the earlier question about parameters. I think the simplest system is to have `#Foo` have the same type parameters as `t`.

I also think that both:

    type t = Foo of include s
    and s = { x: int }

and

    type t = Foo of { x : int } as foo

are non-intuitive syntaxes. It will not be at all clear to most programmers the idea behind them, or why `as ...` is for some reason required. I'm not sure the gain of mutable variant arguments is worth it.

(0010502)
Bardou (reporter)
2013-10-16 09:44

I agree with lpw25. I would prefer to enrich the type-system with a way to represent "the record type of a constructor". It seems to me that both "inline" and "as" have been proposed to avoid adding a new syntax to refer to the type of constructors, but both "inline" and "as" are themselves new syntax anyway.

Maybe they are simpler than #Foo or t.Foo or whatever to implement, as t.Foo is not just new syntax, it implies new typing rules, and more than "as" or "inline". But in the long run, what will you wish you had implemented? As Dario says, the syntax will stay forever. Better choose the one which improves the language the most and which can be the basis for more improvements, not the one which is a hack to circumvent the need to implement typing rules.

And in the long run I think that t.Foo is better than "inline" or even "as". It makes it simpler to define constructors (no need to name the record type manually). We will have to write t.Foo sometimes though, mostly in interfaces. So it does not mean we shouldn't pick a good syntax for t.Foo, and think about the implications of adding it to the type system. But I think it mostly implies good things, such as the future possibility to write t.Foo in expressions for disambiguations.
(0010503)
frisch (developer)
2013-10-16 10:18

> I think this proposal has somewhat lost sight of its initial aims, which were for a simple mechanism for giving names to variant arguments.

For me at least, the aim has indeed changed, since type-based disambiguation now makes it already more reasonable to use records to name constructor arguments. Making it easier to define anonymous records for constructor arguments would encourage people to do so. But the same argument would justify allowing any kind of "inline anonymous type definitions". Is there really a reason to allow:

  type t = A of {x: int; y: int} | B

but not:

  type t = {x: (A of int | B); y: int}

or:

  type t = A of (B | C of int) | D

?

After all, if we provide a syntax to refer to sub-parts of a type definition, this would not be too difficult.

I'm not convinced that the syntactic convenience of allowing such inline data type definitions deserves a change to the language and an extension to the type algebra (to name those sub-parts). And if it did, I don't see why we should only support "records inside sums".

The real inconvenience (at least for me) with using records to name constructor arguments is now (after type-based disambiguation) the runtime overhead.

The syntax I proposed ("type t = A of include foo and foo = {....}", or a version based on a attribute such as "type t = A [@inline] of foo" and foo = {....}") naturally supports the case where "foo" is turned into either an abstract or a private type in the module interface. More importantly, it would more naturally be extended to other cases where we can support flattening the runtime representation, such as:

  type t = A [@inline] of string | B

  type s = A [@inline] of u | B [@inline] of v
  and u = X | Y of string
  and v = Z | T

(The exact algorithm used to assign tags would need to be defined.)
(0010504)
gasche (developer)
2013-10-16 10:32
edited on: 2013-10-16 10:34

For t.Foo to behave well (in a regular / non-surprising) way, it should also work for tuples (unnamed parameters). Besides, as pointed out by Jérémie, there is the question of what would the type parameters be -- probably all those of the type declaration itself, and not only those appearing in the constructor's type.

Do those constructions (t.Foo, as and inline) make sense in presence of GADTs?

  type (_, _) funchain =
    | Id : ('a, 'a) funchain
    | Comp : { head: ('a -> 'b); tail: ('b, 'c) funchain } -> ('a, 'c) funchain
  (* what's the type funchain.Comp ? *)

  type (_, _) funchain =
    | Id : ('a, 'a) funchain
    | Comp : { head: ('a -> 'b); tail: ('b, 'c) funchain } as ('a, 'b, 'c) funcomp -> ('a, 'c) funchain

  type (_, _) funchain =
    | Id : ('a, 'a) funchain
    | Comp : inline ('a, 'b, 'c) funcomp -> ('a, 'c) funchain
  and ('a, 'b, 'c) funcomp = { head: ('a -> 'b); tail: ('b, 'c) funchain }

(Note that ine the "as" and "inline" cases I took the easy way out of assuming the existential definition and the type equalities live *outside* the record itself, which is just there for storing constructor arguments. It is not clear to me that t.Foo should behave in the same way; intuitively, I would expect it to tell me all there is to know, type-wise, about the Foo constructor.)

(0010505)
garrigue (manager)
2013-10-16 11:06

I have not followed this discussion for a long time.
I do agree with Leo: it moved away from the original goal, which was to provide a compact syntax to define more readable types, and also a more efficient representation.
They can already be achieved by the original proposal, where either we do not allow binding the record argument (it would not allow mutable fields, but are they that important in this context?), or make this binding second class. All the rest seems to make it more complex for little benefit.

Note that the new disambiguation of record labels does not change much to this problem: it relies on type inference, and as many have already pointed its failures may be hard to explain to beginners. Forcing people to rely on it when defining sums of records seems to be a bad policy. In particular, polluting the name space with new record definitions does not look like a good idea to me.

As I pointed before, since this feature is not strictly necessary to start with, we need a good rationale.
However I would say that people in this discussion have already made enough arguments that adding it seems worthwhile. We still need to keep it as small and clear as possible.
The only additions I think would be fine are about making it smoother to use, such as allowing to use a tuple instead of a record when constructing/destructing, the way F# does.
(0010506)
Bardou (reporter)
2013-10-16 11:12

In other languages such as OPA, tuples are records with default names for fields. Basically, (int * int) is the same as {int; int} which is the same as {1: int; 2: int}. With this in mind we could imagine naming only a few of the parameters: {f: int; int} would be {f: int; 2: int}.

This is much more than what was initially requested here but it is an answer to what t.Foo is in the case of tuples - just a possibility to keep in mind.

Alain makes a good point about type abstraction. It would be nice if the extension allowed to abstract the arguments of a constructor. It is already possible to abstract each argument, but not the number of arguments.
(0010507)
frisch (developer)
2013-10-16 11:31

> it would not allow mutable fields, but are they that important in this context?

Yes, I'd say this is very important. In particular, it allows to define data structures which could not be defined otherwise without some extra runtime overhead. This would be very useful for cases such as mutable lists/trees, or to use the record to "cache" some data.

> make this binding second class

There are cases where it is really useful to consider the argument as a stand-alone record value, for instance to expose a "builder" function which some magic in it (default arguments, enforcing invariants, etc), or to split a big "map" function over the sum type into smaller functions (on each argument record).

Supporting those cases really means that we consider the argument as a full-fledged record type, which strongly argue, in my opinion, in favor of using the normal syntax for defining record types (i.e. not the "as" syntax). Having two syntaxes to define records seems a little bit weird.

If we decide these cases are not so important in practice, I would indeed prefer using "second-class binding" over extending the type algebra with a new construction only to refer to the argument of a specific constructor.

> the new disambiguation of record labels does not change much to this problem: it relies on type inference

True, but for cases where the constructor is used explicitly around its argument (as an expression or a pattern), this would always work.

> allowing to use a tuple instead of a record when constructing/destructing

I don't like it: if we provide a syntax which looks like records on the type definition, it would add confusion to support expressions/patterns which look like tuples.



My order of preference (preferred one first):

 1. Use the normal syntax for record types and some "inlining" annotation on the constructor.

 2. Use the compact syntax, do not support "as ...", do not extend the type algebra (and thus do not consider labels of arguments as regular labels). (i.e. what is currently implemented)

 3. Use the compact syntax with a mandatory "as ..." annotation, do not extend the type algebra, and consider the labels of arguments as regular labels.

 4. Use the compact syntax, with an optional "as ..." annotation.


Note that 1 could serve as the internal implementation technique for 2 (generating internal names), thus supporting cases where the "argument constructor" is used locally as a standalone record (we would need to consider labels of argument as regular labels, though). There would be no syntax to name this type, in the same way that monomorphic variables or variables introduced for GADT existentials are displayed but cannot be written by the user. It also means we could easily support both 1 and 2.
(0010508)
dim (developer)
2013-10-16 11:51

BTW, if there is nothing special to have the argument as a stand-alone record value, we can still do it with something like this:

  type 'a ty =
    | A : {...} -> [> `A ] ty
    | B : {...} -> [> `B ] ty
    | C : {...} -> [> `C ] ty

  type t = [ `A | `B | `C ] ty
  type a = [ `A ] ty
  type b = [ `B ] ty

  let f (A x : a) = ...

  (* Compiled as the identity *)
  let t_of_a : a -> t = fun (A _ as x) -> (x :> t)
(0010509)
garrigue (manager)
2013-10-16 12:12

> > it would not allow mutable fields, but are they that important in this context?
> Yes, I'd say this is very important. In particular, it allows to define data structures which could not be defined otherwise without some extra runtime overhead. This would be very useful for cases such as mutable lists/trees, or to use the record to "cache" some data.

A reason I don't like it much is that you end up with mutable data structures with arbitrary tags.
Actually I'm convinced that there should have been a specific tag for mutable blocs.
Of course this is too late to change it, so one could argue that this doesn't matter.

> > make this binding second class
> There are cases where it is really useful to consider the argument as a stand-alone record value, for instance to expose a "builder" function which some magic in it (default arguments, enforcing invariants, etc), or to split a big "map" function over the sum type into smaller functions (on each argument record).

For the builder function, I'm not sure I see the point if you only allow inlining in one type.
For the splitting, one can use a GADT to remove the pattern-matching overhead, if this is really a problem.


> > the new disambiguation of record labels does not change much to this problem: it relies on type inference
> True, but for cases where the constructor is used explicitly around its argument (as an expression or a pattern), this would always work.

I was more concerned about polluting the name space of normal records.

> > allowing to use a tuple instead of a record when constructing/destructing
> I don't like it: if we provide a syntax which looks like records on the type definition, it would add confusion to support expressions/patterns which look like tuples.

Well, we already allow omitting labels in function applications :)
Honestly, a lot of languages allow that, so lots of people would like it.
But I agree this should be discussed.

> My order of preference (preferred one first):

I clearly prefer your second best at this point.
If inlining is introduced I believe it should be a much more general mechanism.
In particular allowing only to inline in one context looks like a show-stopper to me.
(0010510)
gasche (developer)
2013-10-16 14:37
edited on: 2013-10-16 14:39

I just had a chat with Arthur Charguéraud that had (surprise surprise) yet another proposal! (And he somehow tricked me into mentioned in here instead of waiting for him to comment)

His proposal is general inlining of types in other types, basically Alain's "include" proposal, generalized as Jacques seems interested in in his last message.

  type t = A | B of include r * int
  type r = { mutable a : int; p : include (int * int); }

His idea of memory representation would be that an "include" includes the whole memory representation of the included value, *including* its header word. This allows to refer to the included component without having to repack the value.

(Allegedly there is a runtime tag to make the GC happy in case of full-fledged values included in one another, which is used for the closure representation.)

When I pointed out that having the header tag included would break some optimization opportunities when several all-float records are included in a common record, he suggested that we also have an include@noheader variant that removes the header part, but requires (preferable explicit) repacking to mention the value as a whole.

(When I pointed out that the inclusion discussed here, with the record-with-a-nonzero-constructor trick, is more memory efficient than his include without repacking, he suggested that this one case could be optimized specifically. After all, the FFI will be a pain anyway.)

Jacques, is it something like this that you had in mind?

(When Alain said that the "{ ... } as foo" should really be "include { ... } as foo", maybe he didn't plan for it to become a serious suggestion, but it's starting to look good now.)

(0010511)
frisch (developer)
2013-10-16 14:42

> For the splitting, one can use a GADT to remove the pattern-matching overhead, if this is really a problem.

What do you mean?

What I had in mind:

  type t =
    | A of t_a
    | ...

  and t_a = { (* many fields *) }


let rec map f = function
   | A r -> A (map_a r)
   | ...

and map_a {x; y; ...} =
   (* a long piece of code *)


> I clearly prefer your second best at this point.

Would you consider the variant I suggested: generate internal names for the inner records, so as to allow code as above, even if the user cannot write the corresponding signature?
(0010512)
garrigue (manager)
2013-10-16 17:56

> > For the splitting, one can use a GADT to remove the pattern-matching overhead, if this is really a problem.

> What do you mean?

This is what Jeremie described, and it already works.
You just need a type annotation to make the pattern-matching complete.
Note that it doesn't rely on GADT unification, only on the improved exhaustiveness check, so that using polymorphic variants here works fine.

type _ t =
  | A : int -> [> `A] t
  | B : bool -> [> `B] t

let fA (A x : [`A] t) = A (x + 1)
let fB (B x : [`B] t) = B (not x)

let map_t = function
  | A _ as x -> fA x
  | B _ as x -> fB x
(0010513)
frisch (developer)
2013-10-16 18:16

> This is what Jeremie described, and it already works.

You need to change the type definition and use a combinator of advanced techniques (GADT + polymorphic variants). This is nice, but I cannot imagine encouraging this style for beginners (or even large code bases).
(0010514)
garrigue (manager)
2013-10-16 18:57

> You need to change the type definition and use a combinator of advanced techniques (GADT + polymorphic variants). This is nice, but I cannot imagine encouraging this style for beginners (or even large code bases).

But you don't have to use it in general: you can already either write your function in a monolithic way, or explicitly pass the arguments, just like everybody was doing until now. The point is only that if the extra flexibility you are talking about really matters, you can do it, and in a uniform way. Anyway, I would not expect beginners to make a distinction between record cases and tuple cases to start with, so it would be strange to them that you can do some things with record cases that you cannot do with tuple cases.

By the way, this pattern of using GADTs for refinement types could have lots of applications: it not only allows to talk about one specific branch, but also about a subset of the cases. So it may well be that you end up using it naturally when your type contains lots of cases. But I'm diverting.
(0010662)
gasche (developer)
2013-11-25 11:37

We discussed it again just now. Xavier's opinion is that there is not enough evidence that the performance improvements would be worth a language change. In particular, allocation of the constructor and records would be combined so the construction part shouldn't be that expensive. The extra access indirection may be expensive, but concrete examples would be important before going further with complex proposals.
(0010663)
frisch (developer)
2013-11-25 22:36

Some micro-benchmarks for the allocation only (Win32 MSVC port):

type t =
  | A of int * int * int * int * int      (* unboxed *)
(* or
  | A of (int * int * int * int * int)    (* boxed *)
*)

let () =
  for i = 1 to 100000 do
    for j = 1 to 10000 do
      ignore (A (i, j, 0, 0, 0))
    done
  done


Boxed version:     6.165s
Unboxed version:   5.135s


Unboxing gives a 15% speedup. With only two fields, I observe a 25% speedup.
(0011101)
yminsky (reporter)
2014-03-26 11:29
edited on: 2014-03-26 11:31

I did a slightly bigger benchmark by going into Core_kernel's Map and Set data structures and adding boxing, in the same way that Alain did above. Note that we add more boxing in the Map case, since we add boxing to the leaf case as well as intermediate nodes.

Here's the benchmark itself:

open Core_bench.Std
open Core.Std

let convert_list n =
  let l = List.init n ~f:Fn.id in
  Bench.Test.create ~name:"convert_list"
    (fun () -> ignore (Int.Set.of_list l))

let convert_map n =
  let l = List.init n ~f:(fun x -> (x,x)) in
  Bench.Test.create ~name:"convert_map"
    (fun () -> ignore (Int.Map.of_alist_exn l))

let sizes = [ 100
            ; 1000
            ; 10000
            ; 100000
            ]

let tests =
  List.map ~f:convert_list sizes @ List.map ~f:convert_map sizes

let () = Command.run (Bench.make_command tests)

And here are the results.

[06:18:51 (set_benchmark) benchmarks]$ ./bench_set.exe  -ascii -clear-columns time cycles alloc
Estimated testing time 1.33333m (8 benchmarks x 10s). Change using -quota SECS.
                                                                                       
  Name              Time/Run      Cycls/Run       mWd/Run      mjWd/Run      Prom/Run  
 -------------- ------------- -------------- ------------- ------------- ------------- 
  convert_list       14.53us        39.23kc        5.05kw         4.57w         4.57w  
  convert_list      212.77us       574.48kc       70.74kw       579.08w       579.08w  
  convert_list    3_266.06us     8_818.32kc      911.54kw    35_073.47w    35_073.47w  
  convert_list   41_167.54us   111_145.96kc   11_113.46kw   401_424.59w   401_424.59w  
  convert_map        16.61us        44.85kc        4.73kw         4.42w         4.42w  
  convert_map       250.51us       676.32kc       67.27kw       609.05w       609.05w  
  convert_map     4_018.38us    10_849.16kc      876.57kw    38_958.09w    38_958.09w  
  convert_map    50_576.73us   136_558.34kc   10_763.47kw   446_488.76w   446_488.76w  
                                                                                       
[06:21:06 (set_benchmark) benchmarks]$ ./bench_set.exe  -ascii -clear-columns time cycles alloc
Estimated testing time 1.33333m (8 benchmarks x 10s). Change using -quota SECS.
                                                                                       
  Name              Time/Run      Cycls/Run       mWd/Run      mjWd/Run      Prom/Run  
 -------------- ------------- -------------- ------------- ------------- ------------- 
  convert_list       14.24us        38.44kc        6.47kw         7.41w         7.41w  
  convert_list      215.57us       582.00kc       91.66kw       938.61w       938.61w  
  convert_list    3_443.43us     9_297.12kc    1_188.72kw    45_585.56w    45_585.56w  
  convert_list   42_880.74us   115_777.48kc   14_551.28kw   505_826.54w   505_826.54w  
  convert_map        16.99us        45.86kc        6.44kw         8.55w         8.55w  
  convert_map       266.23us       718.73kc       91.18kw     1_196.31w     1_196.31w  
  convert_map     4_390.57us    11_854.39kc    1_183.75kw    58_614.39w    58_614.39w  
  convert_map    55_615.78us   150_158.85kc   14_501.37kw   647_744.59w   647_744.59w  


As you can see, the cost gets higher as you try this on bigger maps and sets. It goes up to about 10% in the map case. That said, the amount of promotion goes up by 47%. Promotion has serious non-linear costs, and I think that in a program that churns through a lot of memory, this gets even more important.

If you want a more real world example, perhaps try doing this on every variant in the OCaml AST and see how it affects overall compilation time.

Are these performance results compelling yet? I think we can generate more of them.

I'm actually surprised that the performance implications are controversial. Isn't that why OCaml bothered to mint multi-argument variant constructors in the first place? Otherwise, why not just use tuples? Indeed, most (all?) performance sensitive OCaml code I've seen that uses variants, within INRIA or outside of it, is careful to not create extra indirections when unnecessary.

(0011103)
lpw25 (developer)
2014-03-26 16:08
edited on: 2014-03-26 16:11

There seem to be three aspects to the proposals on this issue:

1. Allowing named variant constructor fields

   The original point was that it is frustrating that giving names to the
   parameters of a constructor has to involve an extra layer of
   indirection. The record syntax is much nicer than the tuple syntax for
   various reasons, and it would seem a good idea to allow it to be used for
   variant constructors.

   The simplest solution to this is to allow definitions like:

     Foo of {x : int; y : int}

   to be used as you would expect.

   This is clearly the simplest solution to this problem and seems pretty
   unobjectionable.

2. Allowing mutation of variant constructor fields

   The second aspect is that it is also frustrating to introduce a layer of
   indirection in order to have mutable fields in a variant constructor.

   To me, it seems that the simplest solution to this is just to bring back
   the support for mutable constructor fields from caml-light
   (http://caml.inria.fr/pub/docs/manual-caml-light/node4.6.html [^]).

   This seems much simpler than the other proposals and directly addresses the
   problem. In particular, it maintains the distinction between a constructor
   with multiple arguments and a constructor with a single tuple/record
   argument. I think beginners have enough trouble with this distinction
   without us adding features which blur the line (like `Foo {x;y} as r`).

3. Supporting functions operating on individual variant constructors

   Alain wants to be able to create functions which operate over individual
   constructors of a variant. I can see that this would be useful but I find
   the proposals to be a bit unintuitive and potentially confusing. GADTs
   already provide a natural generalisation of this, supporting subsets of
   variant constructors.

Thoughts? It would be nice to resolve this issue before 4.02.

(0011111)
ybarnoy (reporter)
2014-03-27 16:37
edited on: 2014-03-27 16:38

Thanks for the (second) summary Leo. I'd like to suggest that part 1 is worth implementing right away. OCaml prides itself on speed and simplicity, and I think this contributes on both fronts, despite the duplication of functionality. I also think most OCaml programmers overwhelmingly want this feature for the reasons that have been mentioned, the main one being that it clarifies code. Keeping track of the meaning of tuples is hard and error-prone, but currently the language steers people in that direction.

After this is integrated, it might be worth looking at cases where

type t = {x:int}
type u = Foo of t

can be optimized to type u = Foo of {x:int} behind the scenes (for example, if we can be sure that type t is never used individually), so that the two representations are really identical.

Part 2 can be left for later, although it seems to me that mutable constructor fields do make a lot of sense and are a clean solution to the problem. I can understand the desire to confine mutability in the language, but at the same time, confining mutability to records is entirely arbitrary. Not to mention the fact that currently, refs are overused for this purpose, causing a lot of extra indirections and littering programs with confusing syntax for state assignment. Having both <- and := is one of the things that really trips up beginners, and relegating := to a smaller domain is a good thing IMO.

Part 3 doesn't interest me personally, and I think won't be used nearly as widely as part 1 and 2 will.

These are features that the majority of OCaml programmers will be very enthusiastic about (again, IMO). Let's not let them rot away.

(0011113)
frisch (developer)
2014-03-27 22:25

Concerning 2, I'm not a big fan of mutable constructor arguments as in caml-light, for several reasons:

 - the syntax x<-e is already used for something else (object field assignment);

 - the binding site for such a variable does not make it explicit that the variable is mutable (one needs to look at the type definition);

 - would the following be accepted:

     type t = A of mutable int * mutable int
 
     function A (x, 0) | A (0, x) -> x<- 100

   ?

 - a variable bound to a mutable argument needs to keep a reference to the entire constructor block (if the variable is actually assigned to), which might be non-intuitive

 - if a syntax identical to records is supported for constructor arguments, it seems very natural to allow mutable fields in them as well, using the "as" binding syntax to refer to the arguments as a record:

   type t = A of {mutable x: int; y: int}

   function A ({x; y} as r) -> r.x <- r.y

And if this syntax is accepted, I don't think we really need mutable constructor argument (when the "tuple" syntax is used).

Note that we can very well support the syntax above without giving a proper type to 'r' ('r' could only appear in expressions r.x, r.x <- e and maybe A r and A {r with ...}).
(0011115)
lpw25 (developer)
2014-03-28 00:10
edited on: 2014-03-28 00:31

> - the syntax x<-e is already used for something else (object field assignment);

Personally, I don't think there is a problem with conflating object field assignment with record/variant field assignment

> - the binding site for such a variable does not make it explicit that the variable is mutable (one needs to look at the type definition);

Surely this is true of record fields in general.

> - would the following be accepted:
>
> type t = A of mutable int * mutable int
>
> function A (x, 0) | A (0, x) -> x<- 100

It could be, although I think I would prefer it not to be. Does anyone know what the behaviour was in caml-light?

> - if a syntax identical to records is supported for constructor arguments, it seems very natural to allow mutable fields in them as well, using the "as" binding syntax to refer to the arguments as a record:
>
> type t = A of {mutable x: int; y: int}
>
> function A ({x; y} as r) -> r.x <- r.y

I don't particularly like this syntax as it blurs the distinction between constructors with multiple arguments and constructors with a single record argument. To me it seems unsatisfactory to pretend that there is some record `r` when no such record exists.

I'm not sure I see the benefit of this fake record notation, rather than just directly assigning to `x`. If we want to express assignment to variant constructor fields why not represent this literally with a syntax for assigning to variant constructor fields.

(0011116)
lpw25 (developer)
2014-03-28 00:19

I can see one undesirable aspect to caml-light's approach:

    type t = Foo of mutable int

    let v = Foo 4 in
    let f () =
      match v with
        Foo x -> x <- 7
    in
      match v with
        Foo x ->
          print_int x;
          x <- 6;
          print_int x;
          f ();
          print_int x

It is not necessarily clear what the output of this code would be. Does anyone know what this would output with caml-light? Or would it have given some kind of error?
(0011117)
gasche (developer)
2014-03-28 00:27

Under Caml Light 0.76 the output of this code is...
  444
(0011118)
ybarnoy (reporter)
2014-03-28 01:56

OK yeah that's very confusing. I'd say this is a serious negative for mutable tuple-based variants. We need the record projection syntax to separate binding from write-access. The same issue would crop up with record variants:

type t = Foo of {mutable x:int}

let v = Foo {x=4} in
match v with
Foo {x} ->
  x <- 6;
  print_int x
  ...

This suggests that Alain's 'as r' syntax would be good to have here.
(0011119)
gasche (developer)
2014-03-28 06:46

I feel that we a re-hashing things that have already been said. My first post in this thread began with:

> You surely know that Caml Light had mutable sum constructors.
> Your mutation syntax reminds me of it -- but better behaved,
> as the syntactic lvalue is still a field projection
> rather an arbitrary variable.

There is a reason why it is better to avoid "mutable variable" and instead prefer all mutables to be marked by a record field. It makes design simpler and more consistent.

Consider that

  let v = Foo {x=4} in
  match v with
  Foo {x} ->

is already possible with a record parameter, and that

  Foo {x} -> (x <- 6)

is already rejected today: problem solved!
(0011121)
frisch (developer)
2014-03-28 09:38
edited on: 2014-03-28 10:39

>> - the binding site for such a variable does not make it explicit that the variable is mutable (one needs to look at the type definition);
>
> Surely this is true of record fields in general.

A variable bound to a record field is never an l-value, even if the field is mutable. In "r.x <- e", the binding site for x is the type definition. In "x <- e" (for object field update), the binding site for x is the "val mutable" definition. With mutable constructor argument, the fact that a bound variable is an l-value or not depends on information which is not available on the binding site.

> I'm not sure I see the benefit of this fake record notation, rather than just directly assigning to `x`.

The main benefit, as Gabriel remarks, is that it fits more naturally in the existing language.

The notation also supports:

  A r -> A {r with ...}


An argument of authority (which has some value in discussion on language design considering how subjective it is) is that mutable field and arguments were parts of caml-light and removed in OCaml; one can safely assume that our beloved historical core developers where rather unhappy with the feature.


> function A (x, 0) | A (0, x) -> x<- 100
> It could be, although I think I would prefer it not to be. Does anyone know what the behaviour was in caml-light?

There was no or-pattern in caml-light.


----

Anyway, I think I now stand behind my original proposal (and what is implemented in the branch): staying as close as possible to existing syntax for record type, allowing to bind a special variable to the "record" argument, but without giving it a type.

(0011122)
bobot (reporter)
2014-03-28 10:31

I agree with Alain Frish. His proposal is the saner first step. Moreover his proposal is extendable *later* to mutable or "typed" proposal.
(0011123)
frisch (developer)
2014-03-28 10:38

Just to be clear: I include in my original proposal the mutable part (which is in the branch).
(0011126)
lpw25 (developer)
2014-03-28 11:26

> Anyway, I think I now stand behind my original proposal (and what is implemented in the branch): staying as close as possible to existing syntax for record type, allowing to bind a special variable to the "record" argument, but without giving it a type.

I still don't like these fake record bindings, they are not exactly a clean design. I suppose it would be too awkward to support:

    let f r =
      match r with
        Foo _ -> r.x <- r.x + 1

in a robust way. So I can't think of any real alternative to using the fake record syntax.

Out of interest, you have referred to using `as` binding for mutable variant fields, does that mean you do not allow:

    type t = Foo of { mutable x: int }

    let f = function Foo r -> r.x <- r.x + 1
(0011127)
lpw25 (developer)
2014-03-28 11:47
edited on: 2014-03-28 11:50

Actually, I can think of one alternative (or at least variation) to the fake record syntax:

   type t = Foo of {mutable x: int}

   let f (x : t) : t =
     match x with
       Foo _ as r -> r.x <- r.x + 1; r

Here `r` is `as` bound to the whole variant (and can be used just as the variant), but is also allowed to be used with the labels that `Foo` has.

This is a bit like:

  let f (x : [`Foo of 'a | `Bar of 'a]) : [`Foo of 'a] =
    match x with
      `Foo _ as y -> y
    | `Bar x -> `Foo x

where the `as` bound `y` variable is given a smaller type ([`Foo of 'a]) than the original scrutinee `x` ([`Foo of 'a | `Bar of 'a]).

I think this is an improvement on the fake record syntax as it does not conflate multiple argument constructors with single argument constructors.

(0011128)
frisch (developer)
2014-03-28 13:10

> does that mean you do not allow:
> type t = Foo of { mutable x: int }
> let f = function Foo r -> r.x <- r.x + 1

No, of course, this would be allowed as well.

> Foo _ as r -> r.x <- r.x + 1; r

It is true that internally, the pseudo variable bound to the "record argument" actually points to the whole variant (of course), but I believe it would be less confusing for the user if we distinguish the variant from its argument (and more coherent with the idea of making things look as if is was really a record argument, with some restrictions).

In my proposal, the above would read:

  Foo r -> r.x <- r.x + 1; Foo r

(where "Foo r" is the identity at runtime -- no allocation).
(0011129)
lpw25 (developer)
2014-03-28 14:02
edited on: 2014-03-28 14:09

> and more coherent with the idea of making things look as if is was really a record argument, with some restrictions

I think that is the bit that I am trying to avoid, I would prefer it not to look like it was a record argument. I do think that it is potentially confusing for beginners, who are already confused by the distinction between:

    Foo of int * int

and

    Foo of (int * int)

To be honest, either:

    Foo r -> r.x <- r.x + 1; Foo r

or

    Foo _ as r -> r.x <- r.x + 1; r

is probably fine, although I have a strong preference for the second one.

(0011130)
ybarnoy (reporter)
2014-03-28 14:04

OK, I think Alain's reasoning makes sense. Alain -- does your branch also contain part 3 of the proposal as outlined by Leo, or is that part just speculative?

Also, I'm going to ask for the 'unspeakable' here. As far as I can tell, the current best practice (and by best I mean easiest) for getting people to test out a branch is to create a PR on github, at which point it's available directly from opam. Would it be possible to create a PR on github with the contents of this branch? That would allow people to play around with the new features more easily, which would allow for maximum feedback.
(0011131)
gasche (developer)
2014-03-28 14:11
edited on: 2014-03-28 14:13

It would be a bad idea to make a PR just for the purpose of having an opam branch. Anil's "switches out of PRs" is meant to streamline the process of patch review, not to re-purpose github PRs into "upload me to get a switch".

Given that the branch is on the official SVN repository, it is already mirrored by the github mirror:
  https://github.com/ocaml/ocaml/tree/constructors_with_record [^]

I don't know the details of how to easily build a switch from a branch; I suppose it involves creating a private repository overlay with a new switch pointing to the git adress. The process is possibly less streamlined that private OPAM packages (rather than switches) out of git, as less people have had a used for it. Do try to do that and contribute feedback to the OPAM developers on how to make that easier if necessary. Don't use pull requests for that.

PS: the reason I don't know about this is that I use a different approach to get experimental switches, which is my own script https://github.com/gasche/opam-compiler-conf [^] to build switches out of *local* git branches. You're welcome to use that as well and to improve it if you see an opportunity.

(0011132)
frisch (developer)
2014-03-28 14:33

@ybarnoy: no, part 3 is not included in the branch (the pseudo variable bound to a record-argument can only be used in a limited number of expression kinds).

Note that the branch is more than 2-year old. The AST has changed a lot since then, and it will take some effort to update it. I'll wait for some kind of consensus to emerge on the feature set before working on the code again.
(0011133)
ybarnoy (reporter)
2014-03-28 14:49

Part 1 (variants with records) looks good to me, and part 2 (variants with records + mutability, mutated using either 'as' or pseudo-record binding) does too after having realized what some of the issues were with allowing all variants to be mutable.

Leo seems to be pushing for the 'as' binding to bind the whole variant type and project out of that (for mutation), while the original proposal uses pseudo-record binding and projection from that. Personally, it doesn't matter to me either way, but avoiding the 'as' seems a little cleaner. Either way, this is not a strong objection to the implementation or the idea.

Would anybody else like to voice their objection? If not, I think it's worthwhile updating this branch and getting it into people's hands to play around with.
(0011139)
frisch (developer)
2014-03-31 14:58
edited on: 2014-03-31 14:59

I've created a new branch (constructors_with_record2) in the SVN. In first commit there (14505), I adapted the parser and "AST" (Parsetree, Typedtree, Types) and basically stopped at the type-checker. In the second commit (14508), I tried a different approach: encoding record arguments as extra pseudo record type declaration, with some special internal markers to specify the special representation. If we follow this approach, most of the first commit (except in the parser and Parsetree) can be undone. Concretely, this rewrites:

  type t = A of {x : int}

into:

  type t = A of t#A
   and t#A = {x : int}

with some markers so that values of type t#A are created with the proper tag (2), and the A constructor behaves as the identity.

This greatly simplifies the type-checker, as we can readily re-use all the machinery for records (including polymorphic fields, mutable fields, etc). This is made possible by the recent type-based disambiguation for record labels. Compared to the recent discussion, the "downside" of this approach is that it allows to write a functions such as:

  function A r -> r

which receives a type (t -> t#A) that cannot be written by the programmer.

Considering the gain in complexity (and some useful cases allowed by this more liberal approach), I'm wondering what other people think about it. Currently, the pretty-print of type declarations (in the toplevel, with "ocamlc -i", in error messages) are a little bit rough (they should not mention those t#A names), but this is purely cosmetic and should be easily fixable.

(0011140)
gasche (developer)
2014-03-31 15:09

The immediate temptation would be to make this accessible in the surface language (let users write t#A (or any better syntax)).

In this case, it is important to give this type a nominal identity, by which I mean that with (type t = A of int * int), t#A must *not* be compatible to the structural tuple (int * int), as that would break the runtime invariant that those elements have tag 0.

Note that this is strongly similar to bobot's "as" proposal, except that the name is generated instead of being chosen by the user.
(0011141)
frisch (developer)
2014-03-31 15:36

One point to be careful is that, because of GADTs, t#A can have more type parameters than t. For instance:

 type t = A: {x : 'a; f : 'a -> unit} -> t

is mapped to:

 type t = A: 'a t#A -> t
  and 'a t#A = {x : 'a; f : 'a -> unit}

Currently, the parameters for t#A are all free variables in the record definition (i.e. existentials are added, and unused parameters for t are removed), sorted alphabetically. *If* we expose t#A to the user, one should probably keep unused parameters (with their original ordering) and maybe find a syntax to force the materialize the ordering between existential variables.
(0011142)
ybarnoy (reporter)
2014-03-31 17:03

What does exposing the t#A type to the user give us aside from added complexity? (And overloading of the # symbol?) I see this as a potential source of many future headaches and delays in implementation, as opposed to the simplicity desired by the original proposal.

I would personally rather have the r in function A r -> r be a clear pseudo-record, meaning that it cannot be returned, but only projected from or used to reassemble A.

Actually, this makes me lean much more strongly in the direction of requiring
function A _ as r -> r, where the 'as' binds to the whole variant, and where trying to substitute anything other than a proper record construction for the _ will not be a legal binding.
(0011143)
frisch (developer)
2014-03-31 17:22

I'm *not* proposing to let the users refer to t#A (it can only be generated by type-inference).

It just turns out that the implementation strategy I'm playing with is simple and non-invasive in the compiler, while allowing some more useful cases and making the use of the variable bound to a record argument less context dependent (for instance, a syntactic restriction on where such a variable would occur would probably reject "function A r -> A (print_endline "X"; r)" which could very well be produced by a generic code instrumentation tool).

(Of course, I understand that the simplicity of implementation is not a very strong argument for language design.)
(0011144)
ybarnoy (reporter)
2014-03-31 17:35

I'm completely ok with this being an implementation strategy. My concern is just to keep it simple on the user's end. We're already adding a small layer of complexity, and I think it's being done for good reasons -- let's not make things more complex than they need to be from the user's perspective. Any further features, such as exposing t#A externally for some other purpose, should probably be discussed under another feature request, while not derailing this one. Syntactically, the scope of this particular feature almost certainly requires 'function A r -> r' to be illegal (IMO).
(0011145)
lpw25 (developer)
2014-03-31 21:15
edited on: 2014-03-31 21:16

> I'm *not* proposing to let the users refer to t#A (it can only be generated by type-inference).

You should not be able to infer a type that cannot be expressed as a type expression. I do not really like this proposal (despite the fact I proposed something similar early in this discussion).

I am also not convinced that it will simplify implementation, `t#A` should never escape the scope in which the variant is matched, and ensuring it doesn't seems like a bit of work. Currently, you are just allowing `t#A` to escape, which means that types are inferred which cannot be written as type expressions.

It seems just as simple to annotate variables in the environment with a list of additional labels and then allow those labels on uses of that variable. Which is all that is required to implement:

  fun Foo _ as r -> r.x <- r.x + 2; r

(0011146)
bobot (reporter)
2014-03-31 22:50

lpw25, your proposal seems not future proof. In this discussion, all the *hypothetical* way of naming the type of the inlined record have in common that in `r.x`, r is the record.

Moreover the type system of Alain's second branch is very conservative, nothing new except an extension of the rule for algebraic datatypes definition. In yours additionally, the typing environment is extended, a rule is added for `r.x` and for Foo {f1=...; ...}.
(0011148)
lpw25 (developer)
2014-04-01 09:58

> lpw25, your proposal seems not future proof. In this discussion, all the *hypothetical* way of naming the type of the inlined record have in common that in `r.x`, r is the record.

Only if you think it is desirable to provide specific support for giving individual constructor arguments there own types. But given that GADTs already provide a more general mechanism than this, I don't think that is desirable.

We appear to be going around in circles here, I thought there was general agreement that aspect 3 (as numbered in an earlier message) was not worth pursuing further.

> Moreover the type system of Alain's second branch is very conservative, nothing new except an extension of the rule for algebraic datatypes definition. In yours additionally, the typing environment is extended, a rule is added for `r.x` and for Foo {f1=...; ...}.

Extending the typing environment is no more complicated than extending the translation of algebraic datatype definitions.

Alain's current proposal allows the type checker to infer a type that cannot be written as a type expression, which I find very distasteful. Fixing it to not do that would either require an extension to the type algebra (which seems an unnecessary complexity for a questionable feature) or some specific handling of `Foo {...}` patterns to ensure that `t#Foo` types do not escape. So it is probably no simpler or more conservative.
(0011150)
lpw25 (developer)
2014-04-01 10:24

One issue with Alain's proposal is that it prevents code like the following:

    module type S = sig exception E of { x : int } end

    let m = (module struct exception E of { x : int } end : S)

    module F (X : T) = struct
      module M = (val m : T)
      ...
    end

because the definition of `E` would include a type definition `exn#E`, so `S` could not be unpacked in a functor definition. This seems undesirable.

Code like this would become more likely with the addition of "open types".
(0011152)
frisch (developer)
2014-04-01 11:13

I've attached the current diff. Apart from the parsing and parsetree changes, the encoding logic is contained in typedecl.ml, plus very light internal support for "records with a non-zero tag" and "constructors which behave as the identity". In particular, there is no change in typecore or includecore.

If we want the full power of records (including polymorphic fields, mutable fields, record override), I doubt we can achieve something even remotely as simple as that with a more direct treatment that implements exactly the latest proposal. Actually, all the proposal supporting mutation (but not treating record argument as first-class values) require to introduce a new kind of special variable in the environment, with some syntactic restriction on where they can appear.

Leo: can you clarify whether your main objection is that this approach (i) allows expressions to be inferred with a type that cannot be written by the user or (ii) allows to manipulate the record arguments as first-class values that can escape the scope of their binding pattern?

For (i), a possible fix is to let the users refer to those t#A types. Since I don't find it particularly useful and I'm not overly shocked by the lack of syntax to refer to them, I don't push for it, but at least as long as there is no existential type variables, this would be very easy to add.

For (ii), there are cases when letting the record arguments escape the scope of their binding pattern can be useful for code sharing. I would not add extra complexity to support those cases, but I'm not very enthusiastic at the idea of adding complexity just to get rid of them.


---

Concerning exceptions: in the current branch, record arguments are not accepted for exceptions, as they would require a slightly different compilation scheme. It's indeed weird that an exception declaration could create a type declaration, but I'm not sure how bad it would be in practice (your example relies on a corner case, allowing unpacking within a functor body as long as the unpacked module does not create any type -- the main use of first-class modules is precisely to encapsulate types).
(0011154)
lpw25 (developer)
2014-04-01 13:38
edited on: 2014-04-01 13:41

> Leo: can you clarify whether your main objection is that this approach (i) allows expressions to be inferred with a type that cannot be written by the user or (ii) allows to manipulate the record arguments as first-class values that can escape the scope of their binding pattern?

It is i) that I object to. You really should not infer a type that cannot be written as a type expression. It is simply poor design, you should not be able to write functions that you cannot then provide a signature for. ii) is simply a special case of i), since if an individual constructor can escape it's scope then it can appear in a function's type.

> For (i), a possible fix is to let the users refer to those t#A types. Since I don't find it particularly useful and I'm not overly shocked by the lack of syntax to refer to them, I don't push for it, but at least as long as there is no existential type variables, this would be very easy to add.

This means adding a new kind of type expression, that is only used to allow an individual constructor to be referred to. I really do not see the benefit of adding more complexity to the language in order to support a use case that is already supported by GADTs. It is also still unclear how to handle existentials properly.

> Concerning exceptions: in the current branch, record arguments are not accepted for exceptions, as they would require a slightly different compilation scheme. It's indeed weird that an exception declaration could create a type declaration, but I'm not sure how bad it would be in practice (your example relies on a corner case, allowing unpacking within a functor body as long as the unpacked module does not create any type -- the main use of first-class modules is precisely to encapsulate types).

I haven't explored the issue properly, but I also suspect there will be problems with `(module type of ..)` and `S with ...` expressions. It would certainly require some additional complexity. In general, it seems a bad idea to add a hidden type definition to something that is not a type definition.

Even without exceptions and open types `S with ...` has to be careful about how it handles these hidden types.

(0011155)
frisch (developer)
2014-04-01 14:26

Recent changes on the branch:

  - support for record arguments on exceptions (commit 14515)

  - switch to the syntax t.A for synthesized record declarations (instead of t#A), and allow the user to refer to such types (commit 14516)

  - in order to make type parameters for synthesized record declarations predictable: always keep the original type parameters (with ordering) and append required existential variables, sorted alphabetically (commit 14518)

Note that t.A is still seen as a type constructor (it is parsed as three tokens, but kept as a single string in the Parsetree), there is no new construction in the type algebra.
(0011156)
frisch (developer)
2014-04-01 14:29

> In general, it seems a bad idea to add a hidden type definition to something that is not a type definition.

It is not something new: it's already the case for class types declarations, and even class declarations (which also define a value component). (That said, I agree that it is more difficult to justify it for exception declarations.)
(0011157)
lpw25 (developer)
2014-04-01 14:39

> It is not something new: it's already the case for class types declarations, and even class declarations

In those cases the type isn't hidden, it is an ordinary type with an ordinary type name.
(0011158)
frisch (developer)
2014-04-01 17:59
edited on: 2014-04-01 18:00

> In those cases the type isn't hidden, it is an ordinary type with an ordinary type name.

A class declaration for 'c' also generates a type whose internal name is '#c' which is not an ordinary type name.

Anyway, in the branch, it is now possible to refer to synthesized record declarations (t.A). Does it address you concern about inferred types not being expressible in the syntax?

(0011159)
frisch (developer)
2014-04-01 18:19

Ok, at this point, let me just recognize that I'm lending towards bobot's original proposal. I've just added (commit 14519) the remaining piece: the ability to give an explicit name (and list of parameters) to the synthesized record.

So one can write:

  type t =
     | A of { x : 'a; y : 'b } as ('a, 'b) ta
     | B of { z : int } as tb
     | C : { c : 'a } as 'a tc -> t

The 'as' clause is of course optional.
(0011160)
lpw25 (developer)
2014-04-01 18:29
edited on: 2014-04-01 18:31

> Does it address you concern about inferred types not being expressible in the syntax?

It probably does. Now the question remains, is it worth the additional complexity (and potential confusion) of defining a type for every constructor when GADTs already support the use case which you are targeting?

Also, what does your branch do for:

    module type S = sig type t = T of {x : int} end

    module M : S = struct type t = T of {x : int} end

    module N : S with type t = M.t = M

do you instead need to write:

    module N : S with type t.T = M.t.T and type t = M.t = M

If you include the complexity to get things like this right, is this suggested implementation really simpler than treating `Foo r` or `Foo _ as r` as special bindings?

(0011161)
ybarnoy (reporter)
2014-04-01 18:37

I really don't see the need for this last step, Alain. If we're going to have t.A as a type, which I'm not too crazy about to begin with, I don't see the need to have an 'as' syntax inside the type definition. At this point, you can probably define

type t2 = t.A

right? So why do we need this extra 'as' syntax making it even harder to keep track of type names? While it seems like an innocuous feature, it increases the cognitive burden of reading the language.

The more features are tacked onto this branch, the less of a chance it has of being accepted by everyone IMO, and the greater the chance someone will think of a use-case that's broken by the new features, which will require even more engineering and time to resolve.

I've submitted a PR to opam-repositories so hopefully this branch will be available from opam and we'll get more eyeballs on it.
(0011162)
lpw25 (developer)
2014-04-01 18:37

> The 'as' clause is of course optional.

Out of interest, if the 'as' clause isn't present is a hidden type still created.
(0011163)
frisch (developer)
2014-04-01 19:02
edited on: 2014-04-01 19:08

> do you instead need to write:
> module N : S with type t.T = M.t.T and type t = M.t = M

Yes, this is currently the case (well, it's rejected by the parser, but it'd be nothing to have it accepted).

It doesn't seem to add a lot of complexity to automatically add constraints on "inner record types". I've to admit that I never use with-constraint to add a type equalities on a concrete type. Since you seem more familiar with this feature, do you know if the following is normal?

 module type S = sig type t = A end;;
 module M : S = struct type t = A end;;
 module N1 : S with type t = M.t = M;; (* ok *)
 type s = M.t;;
 module N2 : S with type t = s = M;; (* rejected *)

(This is because Typedecl.check_coherence does not try to expand type aliases.)

Another limitation of this feature seems to be that you cannot use it with mutually recursive type declarations:

 module type S = sig type t = A of s and s = B of t end;;
 module M : S = struct type t = A of s and s = B of t end;;
 module N : S with type t = M.t and type s = M.s = M;; (* rejected *)
 module N : S with type s = M.s and type t = M.t = M;; (* rejected *)

(0011164)
frisch (developer)
2014-04-01 19:11

> I really don't see the need for this last step, Alain. If we're going to have t.A as a type, which I'm not too crazy about to begin with, I don't see the need to have an 'as' syntax inside the type definition.

This is mostly to allow specifying an explicit list of type parameters, instead of relying on a necessary ad hoc choice in case of existential variables.
(0011165)
frisch (developer)
2014-04-01 19:13

> I've submitted a PR to opam-repositories so hopefully this branch will be available from opam and we'll get more eyeballs on it.

Thanks! Is OPAM able to track SVN branches now?
(0011166)
frisch (developer)
2014-04-01 19:15

> Out of interest, if the 'as' clause isn't present is a hidden type still created.

Yes, except that it is not really hidden: it can be accessed with the t.A syntax and its type parameters are those of the original sum type, to which existential variables are added, in alphabetic ordering.
(0011167)
ybarnoy (reporter)
2014-04-01 20:40

> Thanks! Is OPAM able to track SVN branches now?

Nope. Opam can track pull requests on the github mirror automatically, or github branches if they're added individually. The reason they have to be added individually is probably because there are just too many of them to add them automatically.
(0011168)
lpw25 (developer)
2014-04-01 20:53
edited on: 2014-04-01 20:54

> do you know if the following is normal?
>
> module type S = sig type t = A end;;
> module M : S = struct type t = A end;;
> module N1 : S with type t = M.t = M;; (* ok *)
> type s = M.t;;
> module N2 : S with type t = s = M;; (* rejected *)
>
>(This is because Typedecl.check_coherence does not try to expand type aliases.)

This is normal, it follows the same policy (from Typedecl.check_coherence) as:

    type s = t = T

which in turn follows the same policy as:

    module M = sig
      type s = T
    end = struct
       type t = T
       type s = t
    end

I guess they could probably all be changed to expand aliases, but there might be some corner cases which require care.

(0011170)
garrigue (manager)
2014-04-02 04:45

> I guess they could probably all be changed to expand aliases, but there might be some corner cases which require care.

While this would be possible, there is a semantical problem: the expression "type s = t = T" is a re-export, and it assumes that t is a concrete datatype definition. You cannot re-export something that is not there in the first place.
Of course your example should work with
  type s = M.t = A
(0011176)
frisch (developer)
2014-04-02 17:53
edited on: 2014-04-02 17:54

Thanks! What about the other point ("Another limitation of this feature seems to be that you cannot use it with mutually recursive type declarations")?

I'm concerned this will indeed make it difficult to fully support that feature with the current encoding approach ("type t = A of {x:t} | B" creates 2 mutually recursive types). I don't see it as a big problem, though.

(0011177)
ybarnoy (reporter)
2014-04-02 18:05

While I was very enthusiastic about this feature when it involved simply adding records to variants, I have to admit that I'm ambivalent about what I see as 'polluting' the type namespace with t.A, t.B etc.

Other than the fact that it's easier to implement things this way and that it's novel, can anyone sell me on the applicability of these type changes from a practical perspective?
(0011178)
frisch (developer)
2014-04-02 18:25

Well, while certainly not the most important part of the general proposal, being able to refer to the "record argument" as a first class values can be useful at times. It could let you factorize code a little bit, as in:

 type t = A of {...} | B of {...}

 let map_a (x : t.A) = ...
 let map_b (x : t.B) = ...

 let map = function
    | A r -> A (map_a r)
    | B r -> A (map_b r)


I would not push for allowing such things if it complicated the proposal, but I've really the feeling that it simplifies it, both in terms of implementation and of regularity of the system (otherwise we need to specify a need kind of variables, which can only be used in restricted contexts).
(0011179)
gasche (developer)
2014-04-02 18:37
edited on: 2014-04-02 18:38

These types, or bobot's "as" proposal, try two avoid two defects in language design:

1) (if there are no t.A at all) having expressions that do not have a type ("r" on the right-hand-side of the arrow in "fun (A r) -> r.x")
2) (if t.A can only be produced by the type-checker) having well-typed expressions that do have a type, but whose type cannot be written down by the user

Note that (2) somehow already exists in OCaml (the bewildering polymorphic variant types [? ... ], or (t & u) but I believe those only appear in incorrect programs).

Both issues would be relatively serious design problems for a language with a strong type system that is supposed to be elegant. Being as elegant as possible may or may not be important from a "practical" perspective (I think it is), but it does make life easier when you teach the language or make it evolve.

The reason I rather prefer bobot's "as" is precisely that, by forcing the user to put a name, it doesn't introduce a new way to form type expressions (and removes any potential problems with type parameters).

One way to think about this would be the following. Consider a user that doesn't know about "variant records", encounters code that uses them and play with it (in various random way we can't quite predict). As she forms a mental model of how the feature work, will she be surprised? That's bad. Will she post a question on the mailing-list, Stackoverflow or whatever, because it is so surprising, or even a report on the bugtracker, because there is obviously a mistake in the behavior she observes? That's even worse, as it means more work for us.

(0011180)
ybarnoy (reporter)
2014-04-02 18:52

Thank you gasche -- that was a well-worded explanation, and exactly what I was looking for.

1. How would applying bobot's proposal as a hard constraint work though? Would it mean that if I don't have an 'as' clause in my type, I can't bind it using 'fun (A r) -> ...' and will get a type error for a missing type?

2. Raising some other concerns: Our current tag limit is 256, but I can easily see that expanding to 10 bits or more (in fact, I submitted a proposal on the list to do something like this). A generated program that makes use of this space could then be producing 1000 or more types from just one type. Could this be a problem performance-wise for the compiler?

3. Does type t.A also currently apply for the 'type t = A of int' case?

4. One of the next foreseeable steps is extending this mechanism to polymorphic variants. Would there be any obvious issue there? (I don't believe the branch supports them now. Does it?)
(0011183)
frisch (developer)
2014-04-02 22:58

> 3. Does type t.A also currently apply for the 'type t = A of int' case?

No: t.A is only introduced for record arguments.

> 4. One of the next foreseeable steps is extending this mechanism to polymorphic variants. Would there be any obvious issue there?

To keep the structural nature of polymorphic variants, one would need some kind of structural records, which don't currently exist in OCaml (and which would probably not have the same features as nominal records, in particular polymorphic fields). The record syntax would not work, because one would need to distinguish between a polymorphic variant whose argument is an actual record value from a polymorphic variant with "named" arguments (for the same reason, currently, polymorphic variants can only have one single argument). So, contrary to the current proposal, extending polymorphic variants with multiple and named arguments would require a change to the syntax of expressions, patterns, type expressions.
(0011185)
ybarnoy (reporter)
2014-04-02 23:07

> No: t.A is only introduced for record arguments.

Well, isn't that confusing and 'inelegant'? Think of a beginner implementing a sum type:

type t = A of {x:int; y:int} | B of int * int

now said beginner has learned that ocaml supports shipping variant types around individually. Great! She'll write a function

let f (r:t.A) = A r

and a corresponding function

let g (tup:t.B) = B tup

and get a type error because for some reason this latter type doesn't exist.
(0011187)
ybarnoy (reporter)
2014-04-02 23:30

Going back to that same example, I still feel that the best solution (even if it's harder on the typechecker code) is to use

let f = function
   A _ as r -> r.x
 | B (a,b) -> b

Can I bind the tuple of B with one variable? No, because it's not really a tuple. Likewise, I should not be able to bind the record of A with one variable. Otherwise why can't I have

let f = function
   A r -> A r
 | B t -> B t

?
(0011188)
frisch (developer)
2014-04-02 23:32

In your example, if t.B meant something, it should be int * int (the normal product type); a pattern "B tup" would need to create a fresh tuple; and an expression "B tup" would copy the tuple content to a fresh block. Why not, but this is another story, let's not open this can of worms.

Even in the case of a single argument, I don't see the point of allowing to refer to its type as t.B.

But maybe you're right, and one should not allow the user to refer to t.B. As gasche reports, there are already cases of inferred types that cannot be written. If we display t.B in inferred types (and error messages), the user could always add an explicit "as" to the inner record declaration if he/she needs to refer to it (assuming he/she's in control of the type declaration). I'm undecided.
(0011190)
lpw25 (developer)
2014-04-03 01:19
edited on: 2014-04-03 01:19

> As gasche reports, there are already cases of inferred types that cannot be written.

Actually, the [? ...] types only appear in error messages, and the [< `Foo of int & float] types can be written, so there are not types which are inferred but cannot be written.

(0011191)
lpw25 (developer)
2014-04-03 01:25
edited on: 2014-04-03 01:49

I'm afraid I still do not understand why it is worth extending the syntax twice (t.A and Foo of { x : int } as foo) in order to support something which GADTs already support.

These proposals seem inelegant and potentially confusing, and, like all features, they add more things which people must understand in order to use OCaml. This might be fine if they were genuinely increasing the expressivity of the language, but in this case they are simply adding another way to express something which can already be expressed.

(0011193)
garrigue (manager)
2014-04-03 04:04

Hello, I always need a lot of time to get back into this discussion...
(To make it easier, I was on holidays when it re-strarted :)

For once, I think I mostly agree with Leo.
My rationale is that, at least as the proposal was originally presented, the goal was to allow using record syntax for datatype arguments, not to turn them into real records.
From this point of view anything that makes them look like real records is just going to be confusing.
So I would even say that being able to return the argument as a record is a problem too.

Also, in general building a feature reusing existing ones is attractive, but can be very tricky.
For instance, with GADTs the use of locally abstract types simplified the implementation, but created some kind of "implementation leakage", meaning that the details of the implementation are not properly hidden.
(It was turned a choice: some things are defined as syntactic sugar, to clarify the situation.)
My assessment is that it is acceptable because GADTs are an advanced feature, but I'm not proud of it.

Of course, we could also turn record-constructors into a language feature.
I.e. make them a real extension of records, allow polymorphic fields, etc...
However, is it the real goal?

So I would prefer going back to the situation before the introduction of as-declarations, making sure that the pseudo-record-type is _never_ seen. Or switch to an implementation where there are no pseudo-record-types, just using constructor descriptions instead; the implementation should not matter if the difference is not visible.
(0011198)
frisch (developer)
2014-04-03 09:56

> support something which GADTs already support.

I guess you're referring to:

type _ t =
  | A : int -> [> `A] t
  | B : bool -> [> `B] t

let fA (A x : [`A] t) = A (x + 1)
let fB (B x : [`B] t) = B (not x)

let map_t = function
  | A _ as x -> fA x
  | B _ as x -> fB x 



But this is a very advanced used of the type system, relying on both GADT and polymorphic variants, requiring more explicit type annotations, and turning a monomorphic type into a parametric one. And there are some more drawbacks I can think of:

 - If you add a case, you're no longer notified of a partial pattern matching for the definition of map_t (unless you add a type annotation).

 - Probably difficult error messages.

 - Adding a recursive case is cumbersome (and now each new constructors needs to be written three times), and you need more type annotations to please the type-checker:

type _ t =
  | A : int -> [> `A] t
  | B : bool -> [> `B] t
  | C : s t -> [> `C] t

and s = [`A | `B | `C]

..

let rec map_t : s -> s = function
  | A _ as x -> fA x
  | B _ as x -> fB x
  | C c -> C (map_t c)
(0011199)
frisch (developer)
2014-04-03 10:14

> My rationale is that, at least as the proposal was originally presented, the
> goal was to allow using record syntax for datatype arguments, not to turn them
> into real records. From this point of view anything that makes them look like
> real records is just going to be confusing.

I agree this is different from the original proposal. I now believe it is a better design if something that looks like a record and supports most of its common features (r.x, r.x <- e, {r with ...}) is really a record.

Current n-ary constructors are confusing to beginners for this reason (they look like tuple, both on the type declarations, in patterns and expressions), and we have here the opportunity to do better here at little cost.


It would be a shame having to tell users that they can start with:

 type t = A of { x : int } | ...

but need to turn back to:

 type t = A of t_a | ...
  and t_a = { x : int; y : 'a. .... }

when they need to add a polymorphic field.



> Of course, we could also turn record-constructors into a language feature.
> I.e. make them a real extension of records, allow polymorphic fields, etc...
> However, is it the real goal?

I'm not arguing this is goal number 1, but I believe it makes the system simpler to explain if all features of records are supported by the record syntax on constructor arguments. Moreover, it supports more code refactoring and tighter encodings of some data structures.

Concretely, I have some specific use case for the feature in our code base, to replace proper record types (used as constructor arguments in some AST) with those new inline records. (This is a different situation from people starting with n-ary constructor and wanting to name them.) The reason do to so is both to clean up the type definition, by bringing argument definitions closer to where they are used (and thus allowing the reader to make sense of each constructor more easily), and also to benefit from the more compact representation (yes, those AST can be huge, and reducing the number of allocations when mapping over them will be a big performance boost). However, there are a few internal helpers functions to build and map those records, and it would really be a big plus if I could still use them. (And no, I won't switch the whole AST definition to GADT + polymorphic variants!)
(0011200)
garrigue (manager)
2014-04-03 10:51

OK, I misunderstood some part of the discussion, where it was explicit that polymorphic were not supported.

What you are proposing now is to support full records.
(With all the bells and whistles...)

While this does make sense in its own way, one still has to be careful of a number of things.
One is to avoid the "syntactic sugar" trap.
Currently one gets the following behavior:
  # type id = Id of {f: 'a. 'a -> 'a};;
  type id = Id of id.Id
  and id.Id = { f : 'a. 'a -> 'a; }

I think this is not acceptable: the definition should be printed just as you have input it.
That is id.Id is part of the definition of id, not something independent.
If for some technical reason you want to add them as record definitions to the environment, this is perfectly fine with me, but they should not appear in the signature itself. They can be handled like constructor and label definitions in datarepr.ml.
Considering this, I would also suggest to add id.Id to the syntax of paths, rather than allow the awkward "as" syntax.
And of course you need to support mutually recursive definitions (which should be easier once the structure is clearer).
(0011201)
frisch (developer)
2014-04-03 10:59

> I think this is not acceptable: the definition should be printed just as you have input it.

Yes, of course, some more cosmetic work in need in the printer.

> but they should not appear in the signature itself

The current strategy is to have them part of the signature (as "#row" pseudo declarations) to avoid having to change the representation of sum types in Types. This is mostly an internal implementation choice (assuming the printer is adapted).

> Considering this, I would also suggest to add id.Id to the syntax of paths, rather than allow the awkward "as" syntax.

The only technical trouble with id.Id is the choice of its type parameters, in particular in presence of GADT existentials. There is a well-defined chocie (type parameters of the sum type, to which one appends all existentials in the constructor, sorted alphabetically), but it is a little bit ad hoc. Maybe this ad hoc-ness is not a good enough justification for "as".

> And of course you need to support mutually recursive definitions

I think there are already supported properly in the branch. Or do you see any problem?
(0011202)
bobot (reporter)
2014-04-03 11:09

Alain, what do you want to do for the `type t2 = t1 =` case?

```
type t1 =
  | A of {a:'a. ('a -> int) -> 'a list -> int; b:bool}
  | B of {a:int; b:bool};;



module A = struct
  type t2 = t1 =
    | A of {a:'a. ('a -> int) -> 'a list -> int; b:bool}
    | B of {a:int; b:bool}

end;;
```

Doesn't type check in the branch:
```
Error: This variant or record definition does not match that of type t1
       The types for field A are not equal.
```
(0011203)
garrigue (manager)
2014-04-03 11:31

The problem with "type t2 = t1 = ..." is probably related to the addition of records in the signature.
Really I think you would be much better off not putting the records in the signature.

The case with #row is different.
There the type itself is structural, but we need the row to be nominal.
So there is a need to have it in the signature.
And this turns about to be a pain: have to handle it everywhere, and lots of bugs to fix.

Concerning the of id.Id, I suppose this can be done in two steps:
add them in the syntax, but turn "id.Id" into an identifier when converting from syntactic paths to internal paths. A bit like how the #-types for objects are handled: the "#c" identifier is built internally.

> > And of course you need to support mutually recursive definitions
> I think there are already supported properly in the branch. Or do you see any problem?
Again I was basing myself on past messages, which may be inaccurate now.
Just replace this by "and support equations / private types / constraints / etc..."
(0011204)
lpw25 (developer)
2014-04-03 11:51

>> support something which GADTs already support.
>
> I guess you're referring to:
>
> ...

I actually prefer to avoid polymorphic variants for such a simple case, so I was thinking of something more like:

   type a = private A
   type b = private B
   type c = private C

   type _ t =
   | A : int -> a t
   | B : bool -> b t
   | C : 'a t -> c t

   let fA (A x : a t) = A (x + 1)
   let fB (B x : b t) = B (not x)

   let rec map_t : type k. k t -> k t = function
     | A _ as x -> fA x
     | B _ as x -> fB x
     | C _ as x -> C (map_t c)

While this does make a monomorphic type polymorphic, I don't think it suffers from the other drawbacks you mention.

Of course the type could be made monomorphic again using an existential, but currently all existentials introduce a layer of boxing, which is what we are trying to avoid.
(0011205)
frisch (developer)
2014-04-03 11:56

> Alain, what do you want to do for the `type t2 = t1 =` case?

Indeed, this is currently not supported.

Your comment with "#row" is precious, and I trust your judgment about not representing t.Id in the internal signature, but rather to see it as a "sub-component" of the type declaration, similar to labels and constructors. This will indeed be a cleaner strategy (but requiring much more effort, and in deeper parts of the type-checker). As far as I can tell, the user experience would be quite similar (in the sense that t.Id would be a proper full-fledged record type), except for corner cases such as re-exporting.

Do you have an opinion on the ad hoc ordering of type parameters for t.Id, when the case has existential variables?


> Concerning the of id.Id, I suppose this can be done in two steps

Currently, this is done directly in the parser, representing id.Id as a single Lident. Apart from support in external tools (which might need to re-parse), do you see any problem with this approach?
(0011207)
lpw25 (developer)
2014-04-03 12:10
edited on: 2014-04-03 12:21

Alain, if what you are trying to do is create something completely identical to:

   type fooA = { x: int; y: float }
   type fooB = { z : int }

   type foo =
     A of fooA
   | B of fooB

but with a different representation, then why not add a couple of attributes for controlling data representation. For example:

    type fooA = { x : int; y : float }
    type fooB = { z : int } [@@tag 1]

    type foo =
      A [@unboxed] of fooA
    | B [@unboxed] of fooB

where `[@@tag n]` lets you control the tag of a record, and `[@unboxed]` lets you have a single-argument constructor that is identity. `[@unboxed]` should only work if it can see the definition of the argument's type (which cannot be float), and can see that there will be no collisions between different unboxed constructors.

I think that if the only difference between what you want and what you get currently with records is the layout then that should be fixed using attributes, which I believe are well suited to controlling "implementation details" like data layout.

The `[@unboxed]` attribute would also be very useful for solving other problems with layout (for example needing a layer of boxing for existentials).

Then, the `Foo of { x: int; y: float }` syntax could be used only for simple variant constructors with named parameters: no mutable fields, no referring to them individually, etc.

(0011208)
garrigue (manager)
2014-04-03 12:37

> Do you have an opinion on the ad hoc ordering of type parameters for t.Id, when the case has existential variables?

I suppose this is not going to matter much in practice, but this should be well defined.
First thing, you of course want to put them at the end of the variable list.
Concerning their own ordering, there are two possibilities:
* occurrence order (from left to right): simple, but fragile
* alphabetical order (since variables names are now kept): cleaner, but some variables may be anonymous

As a conclusion, the most stable approach seems to choose alphabetical order, and occurrence order for anonymous variables, probably with a warning in that case.

> Currently, this is done directly in the parser, representing id.Id as a single Lident. Apart from support in external tools (which might need to re-parse), do you see any problem with this approach?

I have no specific problem in mind, so this might work just as well.
The only drawback might for preprocessors, who need some specific knowledge of the string meaning.
(0011209)
lpw25 (developer)
2014-04-03 13:51
edited on: 2014-04-03 13:51

> Concerning their own ordering, there are two possibilities:
> * occurrence order (from left to right): simple, but fragile
> * alphabetical order (since variables names are now kept): cleaner, but some variables may be anonymous

Does using alphabetical order mean that the following would be an error?

    module M : sig
      type foo = Foo : { foo: 'foo; bar: 'bar } -> foo
    end = struct
      type foo = Foo : { foo: 'a; bar: 'b } -> foo
    end

That seems pretty undesirable.

(0011210)
frisch (developer)
2014-04-03 14:56

> then why not add a couple of attributes for controlling data representation

I actually proposed that during the discussion, but it seems we are so close at solving the original problem (naming constructor arguments + mutable arguments) with the same bullet that it would be a shame not to do it, and instead introduce a syntax-record-but-not-really-record stuff which would only add complexity and confusion for not good reason.

But yes, internally, this is exactly how the branch is implemented and it would be rather simple to support explicit attributes as well.

> Does using alphabetical order mean that the following would be an error?

Yes, probably. It would of course be nicer to have an explicit introduction of variables in GADTs constructors, not only for that, but also to simplify naming them in patterns.
(0011211)
lpw25 (developer)
2014-04-03 15:37

If you are going to go down the route of being able to refer to the type of individual constructors, I think that constructors with tuple/single arguments should be supported as well. Otherwise the extension feels very ad-hoc.

For single argument constructors this is obviously very simple.

For tuple argument constructors it is more difficult. The main change would be allowing these types to be used with the tuple syntax.

This would mean having the Ppat_tuple and Pexp_tuple cases in typecore.ml check if their expected type is a tuple-argument type before assuming that the expression/pattern has a regular tuple type.

The other thing in typecore which seems to assume Texp_tuple produces a regular tuple type is type_approx, but I don't really know what that's for so maybe it's easy to fix.
(0011212)
ybarnoy (reporter)
2014-04-03 16:19

Alain, I understand that you're excited about solving several problems at once, but the t.A solution 'smells' like a bit of a hack to me. I think it may make more sense to take things one step at a time. We have a much stronger incentive to solve things gradually than to do it all at once, since we'll be stuck with whatever is decided for good. 5 years down the road, do we think that going with t.A will be seen as having been the right decision?

I like Leo's suggestion that as a first step, immutable constructor records should be supported and nothing more. This will clarify that at this point, these are little more than the same variants we have now, but with labels. ie. they're *not* records. If you want record-style features, for now you'll have to use records.

We can then consider the next steps, especially after gauging the community's reactions.

My thinking is that, if we have the ability to split up a feature into gradual steps (as we do in this case), it's in our interests to do so.

Also, my PR was committed, so everyone can now access this branch by typing
'opam switch install 4.02.0dev+record_constructors'
(0011214)
frisch (developer)
2014-04-03 16:38

Jacques: can I have your opinion about the following implementation strategy?

The idea would be to extend Types.type_declarations with the new field, say:

  type_subdecls: (Ident.t * type_declaration) list

and arrange so that Env.store_type adds those sub-declarations to the environment together with the main one. This would replace the notion of "fake" type declarations in signatures, and require adjustments to substitution, inclusion check, etc. It seems one could use this both for the inner record types, but also for #row types, and potentially more to come.
(0011224)
frisch (developer)
2014-04-04 11:09

- Replying to myself: I quickly tried that approach of keeping sub type declarations out of signatures, but it really complicates many parts of the code (where the best strategy seems to locally "flattenize" the declarations anyway).

- Commit 14529 removes the 'as' clause and supports re-exporting a sum type with inline records, as in:

module M = struct
  type 'a t =
    | A of {x : 'a}
    | B: {u : 'b} -> unit t

end;;

module N = struct
  type 'b t = 'b M.t =
    | A of {x : 'b}
    | B: {u : 'z} -> unit t
end;;

- Exception re-export is still not supported (it should re-export the inlined record as well).

- I still don't see how to fully support with-constraints on such types. The only problem is that mutually recursive declarations are not supported currently (which I've reported in another ticket: 0006360). If with-constraints are extended to support mutually recursive types, it will be very easy to support inline record.
(0011226)
ybarnoy (reporter)
2014-04-04 19:00

Alain, can you merge trunk into your branch? The OSX build bug (-fno-defer-pop) is still present in your branch, making the opam switch useless on OSX.
(0011228)
garrigue (manager)
2014-04-05 05:41

> Replying to myself: I quickly tried that approach of keeping sub type declarations out of signatures, but it really complicates many parts of the code (where the best strategy seems to locally "flattenize" the declarations anyway).

What I was suggesting was rather rebuilding the declarations every time you add the sum type to the environment, exactly like for constructor descriptions.
This may mean more work upfront, but you're going to avoid lots of dark corners.
In particular this solves the problem with with declarations, since the records are no longer involved.
(0011234)
frisch (developer)
2014-04-07 13:32

> What I was suggesting was rather rebuilding the declarations every time you add the sum type to the environment, exactly like for constructor descriptions.

I've started an attempt with this approach, replacing the Types.cd_args field by:

...
    cd_args: constructor_arguments;
...

and constructor_arguments =
  | Cstr_tuple of type_expr list
  | Cstr_record of Ident.t * label_declaration list


(The Ident.t seems necessary to ensure the proper nominal aspect of t.A types.) Then, in most places, the simplest approach seems to rewrite signatures to extract inline records into new proper type declarations (and switching from Cstr_record to Cstr_tuple). Otherwise, functions such as Env.prefix_idents_and_substs become much more complex. Is that what you had in mind?
(0011235)
garrigue (manager)
2014-04-07 14:50

This is indeed what I had in mind.

> (The Ident.t seems necessary to ensure the proper nominal aspect of t.A types.)

The need for the Ident.t seems strange.
The only case you would need it is when rebinding a type from the current module...
Should be solved first.

> Then, in most places, the simplest approach seems to rewrite signatures to extract inline records into new proper type declarations (and switching from Cstr_record to Cstr_tuple).

Why? I'm sure you can share code without rewriting.

> Otherwise, functions such as Env.prefix_idents_and_substs become much more complex.

Is it due to the extra Ident.t ?
I really think it shouldn't be there, but there may be some inadequacy somewhere.
(0011236)
frisch (developer)
2014-04-07 14:57

Everytime a variant type containing inlined records is inserted in the environment, would you create one fresh Ident.t to represent each inlined record? I thought this would loose the nominal identity of those record types.
(0011238)
garrigue (manager)
2014-04-07 16:07

When the type is inside a module, the last component of the path is a string, so there is no problem of identity.
And the first time you insert a type into the environment, there is no problem of identity either.
So the only problematic case is when you rebind a type whose name is a Pident.
(But I may be overlooking something)

Anyway, I agree that to handle rebinding you need this Ident.t, so maybe it's impossible to get rid of it.

And, looking again, the extra complexity in prefix_idents seems unavoidable as long as these record types are visible in value types. This shouldn't that much more complex still.

> Then, in most places, the simplest approach seems to rewrite signatures to extract inline records into new proper type declarations (and switching from Cstr_record to Cstr_tuple).

You certainly need to do this when you insert declarations in the environment. But in which other cases where would you need that.
(0011239)
frisch (developer)
2014-04-07 17:02
edited on: 2014-04-07 17:15

Ok, I've committed my current state (where I removed the Ident.t on Cstr_record) in branch constructors_with_record3. Many features are not supported yet: existentials, polymorphic fields, record on exceptions.

The following now works fine:

module M = struct type t = A of {x:int} end;;
module type S = sig type t = A of {x:int} end;;
module N : S with type t = M.t = M;;

Globally, the addition of Cstr_tuple to Types and Typedtree is indeed quite mechanical (now that the Ident.t is gone; hopefully we won't have to add it back), and it simplifies the printer, as expected. I agree that the extra impact on the code is probably justified (at least if the missing features can be added easily).


The translation is done in Datarepr.constructor_descrs, which now also returns extra type declarations. A manifest is propagated to those synthesized types, when the re-exported type is non-local.

The following works:

 module M = struct type t = A of {x:int} end;;
 module N = struct open M type s = t = A of {x:int} end;;
 fun (A r : M.t) -> (A r : N.s);;

while, as expected, the following doesn't:

 module M = struct type t = A of {x:int} type s = t = A of {x:int} end;;
 fun (A r : M.t) -> (A r : M.s);;


To fix that, I'm wondering whether one could always retrieve the id of the synthesized type in the environment, at least when the rebound type is a local one (as above).

> And, looking again, the extra complexity in prefix_idents seems unavoidable as long as these record types are visible in value types.

Can you elaborate?

(0011240)
frisch (developer)
2014-04-07 17:18

> I'm wondering whether one could always retrieve the id of the synthesized type in the environment

Attempt to do that: commit 14557, which fixes the reported case. Doing a textual
lookup seems a little bit fragile, but I don't see anything trivially wrong with it. Jacques?
(0011243)
garrigue (manager)
2014-04-08 08:46

> Can you elaborate?

Using constructors_with_record3:

# module M = struct type t = A of {x:int} let f = function A r -> r end;;
module M : sig type t = A of { x : int; } val f : t -> t.A end
# M.f;;
- : M.t -> t.A = <fun>

> Doing a textual lookup seems a little bit fragile, but I don't see anything trivially wrong with it.

I thought so at first, but then I was concerned with the following situation.
However this seems to work, so maybe its ok. Need to check the details.

# type t = A of {x:int};;
type t = A of { x : int; }
# let A r = A {x=3};;
val r : t.A = {x = 3}
# module M = struct type u = t = A of {x:int} end;;
module M : sig type u = t = A of { x : int; } end
# type t = A of {y:bool};;
type t = A of { y : bool; }
# let M.A r' = M.A {x=3};;
val r' : M.u.A = {M.x = 3}
# r = r';;
- : bool = true
(0011245)
frisch (developer)
2014-04-08 15:33

Commit 14560 adds the Ident.t to Cstr_record, and uses it both for adding the manifest on synthesized local types (instead of a textual lookup in the environment) and to fix the bug on lack of prefixing you reported in the previous note.

It was also necessary to extend Includemod.signatures to add equalities on synthesized sub-types together with equality on the main sum type. This is required for:

module A : sig
  type t = A of {x:int}
  val f: t -> t.A
end = struct
  type t = A of {x:int}
  let f (A r) = r
end;;

I'm pretty sure that at least Subst.signature should also be extended to rename sub-ids, but I cannot find a counter example for now.


Jacques: does this approach seem correct to you? Do you think we should do something about other instances of Subst.add_type:

   Typedecl.check_coherence
   Typemod.merge_constraint
(0011248)
frisch (developer)
2014-04-08 18:16
edited on: 2014-04-08 18:20

Update: for GADT constructors with record arguments, I don't know what I had in mind, but it does not really make sense to use the original type parameters for the synthesized types. Consider:

  type _ t = A: {x : 'a} -> ('a * 'a) t

It would morally be encoded to:

  type _ t = A: 'a t.A -> ('a * 'a) t
   and 'a t.A = {x : 'a}

In the current branch, I simply take all free variables of the record arguments to serve as type parameters for t.A (currently with no well-defined ordering).

(0011249)
Bardou (reporter)
2014-04-08 18:22

In this case, for consistency, shouldn't regular sum types follow the same rule?
(0011255)
frisch (developer)
2014-04-09 19:47

> In this case, for consistency, shouldn't regular sum types follow the same rule?

Maybe, I'm a little bit undecided. Do others have an opinion?


I've made some progress on the constructors_with_record3 branch. It should now support all features I have in mind, including GADT constructors, polymorphic fields, and more recently exception constructors with record arguments, and proper rebinding (which needs to add a manifest to the synthesized record type).

I guess that it should be disallowed to have two exception constructor declarations in the same module with the same name if they have a record argument (because it would mean two identically named inner types t.A), but I need to check that. (And if this is indeed the case, it will not be a big limitation.)

Some testing and more eyes on the code would now be useful!

ybarnoy: do you think you could arrange to have this branch available on OPAM as well?
(0011256)
ybarnoy (reporter)
2014-04-09 19:51

>ybarnoy: do you think you could arrange to have this branch available on OPAM as well?

Sure I could.

Would it be possible to merge the master branch into this branch so we can get the building on OSX fixed?

Also, do you think it's worthwhile keeping OPAM access to the constructors_with_record2 branch, or should we just replace it with constructors_with_record3?
(0011257)
frisch (developer)
2014-04-09 19:53

> Would it be possible to merge the master branch into this branch so we can get the building on OSX fixed?

Yes, I will try to find time to do it tomorrow (if not, early next week).

> Also, do you think it's worthwhile keeping OPAM access to the constructors_with_record2 branch, or should we just replace it with constructors_with_record3?

I'm now convinced that the extra implementation burden on the *3 branch is worth the effort, so yes, one could drop the *2 branch.
(0011258)
frisch (developer)
2014-04-10 14:01

I've written a blog post on this topic to describe my current proposal:

https://www.lexifi.com/blog/inlined-records-constructors [^]
(0011263)
ybarnoy (reporter)
2014-04-11 17:31

Great post! Small correction: I believe line 3 in the 4th code box should have t.A rather than A.t.

The way to use this branch with opam is to run 'opam switch 4.02.0dev+record_constructors'. However, I noticed your instruction to build without ocamldoc and submitted another pull request to add this to configure. This pull request hasn't been committed yet.

Also, if you don't want to bother merging trunk, only a single change is needed in configure: line 325 (under darwin) should read bytecccompopts="$gcc_warnings", ie. -fno-defer-pop needs to be removed. Maybe you can cherry pick this one change from trunk.
(0011266)
ybarnoy (reporter)
2014-04-13 21:09

OK the opam switch should be working now (without ocamldoc).
(0011267)
frisch (developer)
2014-04-14 13:52

Thanks Yotam! I've synchronized the branch with trunk.
(0011304)
frisch (developer)
2014-04-18 16:11

Note: the branch has been updated so that type parameters of inner records are now derived from the free type variables of the record definition (in order of appearing, using the first occurrence for each variable). 0006374 would then be quite useful, since it frees users from thinking about the number of these variables when they only need the refer to the type constructor as in:

  let f (r : _ t.A) = {r with x = r.y}
(0011307)
ybarnoy (reporter)
2014-04-20 03:56

I've tested the branch a little bit using the opam switch and it looks pretty good. Perhaps you want to add the opam switch to your blog post, and you may also want to write a quick note to the list to get people's attention. I think the barrier for testing with opam is low enough that we may get some feedback.
(0011311)
frisch (developer)
2014-04-22 18:13

The blog now mentions the opam switch. I'll think about announcing the branch to the caml-list.
(0011320)
ybarnoy (reporter)
2014-04-24 17:15

Please do. There are heavyweight contributors on the list who should really be made aware of this branch.
(0011416)
whitequark (reporter)
2014-05-10 16:55

I really hope this will be merged. It adds a lot more convenience to the language, especially for complex code, with comparatively little cost. And there is a lot of support from the users of the language.
(0011702)
lpw25 (developer)
2014-06-07 15:26

Coming back to this issue, I think that I have come to terms with most of the potential downsides to Alain's proposal (although I still disapprove of the bundling of orthogonal features into a single proposal).

The two issues which still concern me are:

1. The handling of existentials in GADT constructors.

2. The disparity between record-argument constructors and tuple-argument constructors.

I don't think there are any really good solutions to issue 1. I'm quite averse to adding a semantically meaningful ordering to existential quantifiers, and without it the ordering must be determined by the type. However, I think this issue is likely to come up very rarely in practice, and Alain's related proposal to allow `_ foo` for a type `foo` with any number of arguments should avoid even more cases where this is an issue.

For issue 2 I think the best solution might be to support "constructor types" for tuple-argument constructors as well as record-argument constructors. This would work by supporting type-based disambiguation for the tuple syntax (much like is done for other constructors). Uses of tuple syntax would give a tuple type by default, but in the presence of a type annotation for a tuple "constructor type" would instead give an element of that type. This is a bit ugly, but it shouldn't be too difficult for people to understand given the existence of type-based record disambiguation: tuple syntax is just a special form of constructor after all.

What are people's thoughts on this approach to issue 2?
(0011704)
frisch (developer)
2014-06-08 14:01

I'm not sure your proposal about 2 goes into the right direction. Users are used to the fact that a "record" in a type definition creates a nominal type, and they are thus likely to expect the "record argument" to behave as an actual nominal record type. But they are unlikely to expect a similar behavior for a "tuple argument", which is usually interpreted in a structural way. To address the user's confusion with tuple arguments, if any, one should rather strive to make them behave as proper tuples rather than to overload the tuple "constructor" with a nominal behavior. If one defines:

  type t = A of int * int

one should be able to write:

 let f (x : int * int) = A x
 let g = function A x -> (x : int * int)

I'm pretty sure that this behavior would be more useful than introducing a nominal t.A type, sharing the same syntax as the structural 2-tuple but incompatible with it.

Yes, this means that deconstructing the constructor can allocate, but this is not different from accessing the fields of a "float record". The special support for "tuple argument" would just become an optimized representation, with some effect on the type system (as for float records). But this seems very independent from the discussion about record arguments.

(Another remark is that one could consider the tuple notation for n-ary arguments as a purely syntactic coincidence, which is not shared by the revised syntax from camlp4/camlp5. With this perspective, it would seem even weirder to overload the tuple constructor.)
(0011705)
lpw25 (developer)
2014-06-08 18:51

> I'm not sure your proposal about 2 goes into the right direction. Users are used to the fact that a "record" in a type definition creates a nominal type, and they are thus likely to expect the "record argument" to behave as an actual nominal record type. But they are unlikely to expect a similar behavior for a "tuple argument", which is usually interpreted in a structural way.

I don't think tuples really have a structural aspect. `'a * 'b` is basically just a type `('a, 'b) tuple2` with a single constructor `_ of 'a * 'b` that does not need to be written. However, I agree that disambiguation may be a bit unexpected. On the other hand, I don't think people will be writing much code where they actually notice the disambiguation: functions directly using constructor types will probably be quite rare.

I suppose that it could even be useful to allow users to define types with tuple-style constructors:

    type ipair = _ of int * int

    let x : ipair = (3, 4)

but I can only think of a few use cases, and it seems likely to cause more harm than good.

> To address the user's confusion with tuple arguments, if any, one should rather strive to make them behave as proper tuples rather than to overload the tuple "constructor" with a nominal behavior. If one defines:
>
> type t = A of int * int
>
> one should be able to write:
>
> let f (x : int * int) = A x
> let g = function A x -> (x : int * int)

This also seems a reasonable approach. It would need some unboxing "optimisation" to ensure that `Foo(x, y)` did not first allocate `(x, y)`. However, this is trivial to implement, and better unboxing of tuples would be useful anyway.

I suppose the disadvantage is that it hides some cost because something that looks like destruction also includes allocation. However, as you say, it is no worse than float records. Similarly, currying also hides allocations.
(0011706)
lpw25 (developer)
2014-06-08 19:04

> one should be able to write:
>
> let f (x : int * int) = A x
> let g = function A x -> (x : int * int)

Thinking about it some more, I think you are probably right that this is a better approach. I suspect that in most real world cases the extra allocation could easily be optimised away, especially after Pierre's inlining work is merged.

Rather than thinking of:

   type foo = Foo of int * int

   let f (Foo x) = x

as having an additional allocation cost, you can instead think of

   type foo = Foo of {a : int; b: int}

   let f (Foo x) = x

as being a case where this cost is easy to optimise away.
(0011721)
frisch (developer)
2014-06-10 15:55

> I don't think tuples really have a structural aspect.

When users type:

type t1 = {x: int; y: int}
type t2 = int * int

they expect {...} to create one nominal type, but ... * ... to reuse the built-in product type. In that sense, the syntax for tuple types has a structural aspect, contrary to the syntax for record types. It would create confusion to have ...*... create a nominal type, but only when used as the argument of a constructor.
(0011723)
gasche (developer)
2014-06-10 16:12
edited on: 2014-06-10 16:13

Remark: I privately suggested a solution to 2. to Alain, which is to redefine (a * b * c) as syntactic sugar for a built-in record type with fields {0 : 'a; 1: 'b; 2 : 'c}, and (a, b, c) as sugar for {0=a; 1=b; 2=c}.

With this proposal, (type t = A of b * c) then t.A would give a distinct nominal type with fields 0 and 1, but distinct from (b * c) (the meaning of field '0' would then be disambiguated by type information propagation).

(I'm not quite sure what the difference is between a structural type, and a built-in nominal type. I suspect they are the same thing.)

I'm fine with the proposal to drop (A of b * c) altogether and always assume a tuple there, if we accept the performance regressions on existing programs. I would also be fine with leaving the "of b * c" issue unadressed: we could simply answer Leo's criticism by saying that not-really-tuples were an existing defect that could be fixed independently -- but of course improving several birds is always nice.

(0011725)
frisch (developer)
2014-06-10 16:21

Gabriel: what you suggest(ed) seems to be a way to implement Leo's proposal, i.e. to create a new nominal types to represent the values under n-ary constructors.

> I'm fine with the proposal to drop (A of b * c) altogether and always assume a tuple there[...]. I would also be fine with leaving the "of b * c" issue unadressed.

What about my proposal (i.e. continue with the current compact representation, and create/deconstruct the inner tuple on demand)?
(0011727)
lpw25 (developer)
2014-06-10 16:27

> if we accept the performance regressions on existing programs

There should be no issue for existing programs. We are talking about adding support for patterns and expressions `Foo x` when `Foo` has multiple arguments.

The only cases where existing code uses this syntax is code of the form `Foo(a, b)` where the tuple is expressed directly. Since it is trivial to detect this case and avoid allocating the intermediate tuple, there should be no performance regression.
(0011728)
gasche (developer)
2014-06-10 18:21

I misunderstood the proposal as simply changing the memory representation. In retrospect, it's obviously not that, if only for the FFI backward-compatibility implications (though existing types could be expressed using records).

I would feel bad about introducing yet another different mechanism to the system. Alain's proposal currently has no boxing on pattern variable extraction. We could understand (Foo x) as syntactic sugar for something like a kind of view pattern, however (Foo (_a,_b) -> (_a,_b) as x).
(0011730)
frisch (developer)
2014-06-11 10:48

> I would feel bad about introducing yet another different mechanism to the system.

I've opened a new ticket 0006455 to track this other topic. For me, it is only remotely related to "inlined records".
(0011734)
ybarnoy (reporter)
2014-06-11 15:27
edited on: 2014-06-11 15:28

Since this discussion has been long, and probably won't end that soon, I'll bring up a point that's been bugging me as I've been using haskell. The ability to have automatic constructor functions that can be curried is very cool, and while adding a comment to Alain's new ticket, I realized that this could help records as well. If we're considering doing things the 'right way', maybe it's worth discussing the merits of this idea. Copying from the other ticket, how about this?

type t = A of int | B of int * int

(A is of type int -> t, B is of type int -> int -> t)

which we could then curry, by writing for example let x = B 4

The cool thing is, that unlike haskell where they couldn't generalize this over to records properly, we can using labels:

type t = A of {x:int} | B of {x:int; y:int}

(A is of type x:int -> t, B is of type x:int -> y:int -> t

So maybe the proper way to do inline records in constructors is B ~x:3 ~y:2? This unifies the language between labels and records very nicely.

Of course we'd have to be backwards compatible, so the form B(3, 4) would feed the whole 'tuple' into A for the tuple example, but it would really just be applying all the arguments.

(0011738)
Bardou (reporter)
2014-06-11 16:31

Unifying records and labels is something I have been wishing for a few years already and I even wrote prototype languages which had them. We can even have default values for fields.
(0011740)
frisch (developer)
2014-06-11 16:40

It could make sense to bring records and labels closer to each other, but this seems again quite unrelated to the current discussion (i.e. this should apply to normal records as well). A different point could be that instead of relying on records to name the argument of constructors, one should rather rely on labels. But then one would need to invent much more new syntax, including new syntax in patterns, and syntax to support field mutation, "with" overriding, and so on. And this would not address the ability to refer to the arguments as a single value.

I don't want to block the discussion, but I thought there was some consensus on the use of record syntax. Unless someone strongly feels the need to discuss a completely different direction, I'd suggest not to make the current discussion diverge even more, since it seems we were slowly reaching some form of agreement on the remaining details.
(0011741)
ybarnoy (reporter)
2014-06-11 16:44

OK sorry Alain. I do think there's a consensus on the current implementation for record syntax. It's just what happens -- once one person brings up a tangent, it seems to inspire other people (myself included) to go in tangential directions. Also, I think that since we've missed the 4.02 boat, the urgency of this issue seems reduced (even if it just appears that way).

- Issue History
Date Modified Username Field Change
2012-03-08 17:36 frisch New Issue
2012-03-08 17:46 frisch Note Added: 0007022
2012-03-08 17:48 frisch Note Added: 0007023
2012-03-08 17:54 frisch Note Edited: 0007023 View Revisions
2012-03-08 18:47 gasche Relationship added related to 0005525
2012-03-08 19:22 gasche Note Added: 0007025
2012-03-08 20:20 jjb Note Added: 0007026
2012-03-08 20:43 frisch Note Added: 0007027
2012-03-08 20:52 frisch Note Added: 0007028
2012-03-09 02:37 garrigue Note Added: 0007029
2012-03-10 12:58 frisch Note Added: 0007039
2012-03-12 00:44 garrigue Note Added: 0007041
2012-03-12 20:45 frisch Note Added: 0007052
2012-03-13 17:54 doligez Note Added: 0007059
2012-03-13 18:29 frisch Note Added: 0007060
2012-03-15 11:02 lavi Note Added: 0007087
2012-03-26 10:41 Bardou Note Added: 0007157
2012-03-26 14:17 lefessan Assigned To => lefessan
2012-03-26 14:17 lefessan Status new => acknowledged
2012-03-26 14:18 lefessan Assigned To lefessan =>
2012-03-27 17:08 bobot Note Added: 0007199
2013-05-23 16:07 bobot Note Added: 0009325
2013-05-23 16:52 gasche Note Added: 0009327
2013-10-10 17:01 hongboz Note Added: 0010450
2013-10-11 10:37 bobot Note Added: 0010453
2013-10-11 11:05 frisch Note Added: 0010454
2013-10-11 11:24 bobot Note Added: 0010455
2013-10-11 15:38 frisch Note Added: 0010456
2013-10-11 17:43 lpw25 Note Added: 0010458
2013-10-12 08:03 frisch Note Added: 0010460
2013-10-14 11:27 bobot Note Added: 0010462
2013-10-14 12:01 frisch Note Added: 0010463
2013-10-14 12:25 bobot Note Added: 0010464
2013-10-14 15:26 gasche Note Added: 0010465
2013-10-14 15:31 frisch Note Added: 0010466
2013-10-14 15:36 frisch Note Edited: 0010466 View Revisions
2013-10-14 15:37 frisch Note Edited: 0010466 View Revisions
2013-10-14 15:58 hcarty Note Added: 0010467
2013-10-14 16:07 gasche Note Added: 0010468
2013-10-14 16:18 frisch Note Added: 0010469
2013-10-14 16:26 gasche Note Added: 0010470
2013-10-14 16:27 frisch Note Added: 0010471
2013-10-14 16:27 frisch Note Edited: 0010471 View Revisions
2013-10-14 16:50 lpw25 Note Added: 0010473
2013-10-14 16:57 frisch Note Added: 0010475
2013-10-14 17:29 lpw25 Note Added: 0010476
2013-10-14 17:40 lpw25 Note Added: 0010477
2013-10-14 17:53 frisch Note Added: 0010479
2013-10-14 18:11 lpw25 Note Added: 0010480
2013-10-15 10:24 gasche Note Added: 0010483
2013-10-15 11:23 dim Note Added: 0010487
2013-10-15 14:08 bobot Note Added: 0010488
2013-10-15 14:24 frisch Note Added: 0010489
2013-10-15 14:40 gasche Note Added: 0010490
2013-10-15 14:40 hcarty Note Added: 0010491
2013-10-15 14:55 frisch Note Added: 0010492
2013-10-15 15:59 hcarty Note Added: 0010493
2013-10-15 16:17 frisch Note Added: 0010494
2013-10-15 16:23 bobot Note Added: 0010495
2013-10-15 16:53 hcarty Note Added: 0010496
2013-10-15 17:05 bobot Note Added: 0010497
2013-10-15 17:13 bobot Note Added: 0010498
2013-10-15 17:14 Bardou Note Added: 0010499
2013-10-15 20:30 dario Note Added: 0010500
2013-10-15 20:35 lpw25 Note Added: 0010501
2013-10-15 21:03 lpw25 Note Edited: 0010501 View Revisions
2013-10-16 09:44 Bardou Note Added: 0010502
2013-10-16 10:18 frisch Note Added: 0010503
2013-10-16 10:32 gasche Note Added: 0010504
2013-10-16 10:34 gasche Note Edited: 0010504 View Revisions
2013-10-16 11:06 garrigue Note Added: 0010505
2013-10-16 11:12 Bardou Note Added: 0010506
2013-10-16 11:31 frisch Note Added: 0010507
2013-10-16 11:51 dim Note Added: 0010508
2013-10-16 12:12 garrigue Note Added: 0010509
2013-10-16 14:37 gasche Note Added: 0010510
2013-10-16 14:39 gasche Note Edited: 0010510 View Revisions
2013-10-16 14:42 frisch Note Added: 0010511
2013-10-16 17:56 garrigue Note Added: 0010512
2013-10-16 18:16 frisch Note Added: 0010513
2013-10-16 18:57 garrigue Note Added: 0010514
2013-11-25 11:37 gasche Note Added: 0010662
2013-11-25 22:36 frisch Note Added: 0010663
2014-03-26 11:29 yminsky Note Added: 0011101
2014-03-26 11:31 yminsky Note Edited: 0011101 View Revisions
2014-03-26 16:08 lpw25 Note Added: 0011103
2014-03-26 16:11 lpw25 Note Edited: 0011103 View Revisions
2014-03-27 16:37 ybarnoy Note Added: 0011111
2014-03-27 16:38 ybarnoy Note Edited: 0011111 View Revisions
2014-03-27 22:25 frisch Note Added: 0011113
2014-03-28 00:10 lpw25 Note Added: 0011115
2014-03-28 00:19 lpw25 Note Added: 0011116
2014-03-28 00:27 gasche Note Added: 0011117
2014-03-28 00:31 lpw25 Note Edited: 0011115 View Revisions
2014-03-28 01:56 ybarnoy Note Added: 0011118
2014-03-28 06:46 gasche Note Added: 0011119
2014-03-28 09:38 frisch Note Added: 0011121
2014-03-28 10:31 bobot Note Added: 0011122
2014-03-28 10:38 frisch Note Added: 0011123
2014-03-28 10:39 frisch Note Edited: 0011121 View Revisions
2014-03-28 11:26 lpw25 Note Added: 0011126
2014-03-28 11:47 lpw25 Note Added: 0011127
2014-03-28 11:50 lpw25 Note Edited: 0011127 View Revisions
2014-03-28 13:10 frisch Note Added: 0011128
2014-03-28 14:02 lpw25 Note Added: 0011129
2014-03-28 14:02 lpw25 Note Edited: 0011129 View Revisions
2014-03-28 14:03 lpw25 Note Edited: 0011129 View Revisions
2014-03-28 14:04 ybarnoy Note Added: 0011130
2014-03-28 14:09 lpw25 Note Edited: 0011129 View Revisions
2014-03-28 14:11 gasche Note Added: 0011131
2014-03-28 14:13 gasche Note Edited: 0011131 View Revisions
2014-03-28 14:33 frisch Note Added: 0011132
2014-03-28 14:49 ybarnoy Note Added: 0011133
2014-03-31 14:58 frisch Note Added: 0011139
2014-03-31 14:59 frisch Note Edited: 0011139 View Revisions
2014-03-31 14:59 frisch Note Edited: 0011139 View Revisions
2014-03-31 15:09 gasche Note Added: 0011140
2014-03-31 15:36 frisch Note Added: 0011141
2014-03-31 17:03 ybarnoy Note Added: 0011142
2014-03-31 17:22 frisch Note Added: 0011143
2014-03-31 17:35 ybarnoy Note Added: 0011144
2014-03-31 21:15 lpw25 Note Added: 0011145
2014-03-31 21:16 lpw25 Note Edited: 0011145 View Revisions
2014-03-31 22:50 bobot Note Added: 0011146
2014-04-01 09:58 lpw25 Note Added: 0011148
2014-04-01 10:24 lpw25 Note Added: 0011150
2014-04-01 10:33 frisch File Added: patch_encoding.diff
2014-04-01 11:13 frisch Note Added: 0011152
2014-04-01 13:38 lpw25 Note Added: 0011154
2014-04-01 13:41 lpw25 Note Edited: 0011154 View Revisions
2014-04-01 13:41 lpw25 Note Edited: 0011154 View Revisions
2014-04-01 14:26 frisch Note Added: 0011155
2014-04-01 14:29 frisch Note Added: 0011156
2014-04-01 14:39 lpw25 Note Added: 0011157
2014-04-01 17:59 frisch Note Added: 0011158
2014-04-01 18:00 frisch Note Edited: 0011158 View Revisions
2014-04-01 18:19 frisch Note Added: 0011159
2014-04-01 18:29 lpw25 Note Added: 0011160
2014-04-01 18:31 lpw25 Note Edited: 0011160 View Revisions
2014-04-01 18:37 ybarnoy Note Added: 0011161
2014-04-01 18:37 lpw25 Note Added: 0011162
2014-04-01 19:02 frisch Note Added: 0011163
2014-04-01 19:08 frisch Note Edited: 0011163 View Revisions
2014-04-01 19:11 frisch Note Added: 0011164
2014-04-01 19:13 frisch Note Added: 0011165
2014-04-01 19:15 frisch Note Added: 0011166
2014-04-01 20:40 ybarnoy Note Added: 0011167
2014-04-01 20:53 lpw25 Note Added: 0011168
2014-04-01 20:54 lpw25 Note Edited: 0011168 View Revisions
2014-04-02 04:45 garrigue Note Added: 0011170
2014-04-02 17:53 frisch Note Added: 0011176
2014-04-02 17:54 frisch Note Edited: 0011176 View Revisions
2014-04-02 18:05 ybarnoy Note Added: 0011177
2014-04-02 18:25 frisch Note Added: 0011178
2014-04-02 18:37 gasche Note Added: 0011179
2014-04-02 18:38 gasche Note Edited: 0011179 View Revisions
2014-04-02 18:52 ybarnoy Note Added: 0011180
2014-04-02 22:58 frisch Note Added: 0011183
2014-04-02 23:07 ybarnoy Note Added: 0011185
2014-04-02 23:30 ybarnoy Note Added: 0011187
2014-04-02 23:32 frisch Note Added: 0011188
2014-04-03 01:19 lpw25 Note Added: 0011190
2014-04-03 01:19 lpw25 Note Edited: 0011190 View Revisions
2014-04-03 01:25 lpw25 Note Added: 0011191
2014-04-03 01:31 lpw25 Note Edited: 0011191 View Revisions
2014-04-03 01:49 lpw25 Note Edited: 0011191 View Revisions
2014-04-03 04:04 garrigue Note Added: 0011193
2014-04-03 09:56 frisch Note Added: 0011198
2014-04-03 10:14 frisch Note Added: 0011199
2014-04-03 10:51 garrigue Note Added: 0011200
2014-04-03 10:59 frisch Note Added: 0011201
2014-04-03 11:09 bobot Note Added: 0011202
2014-04-03 11:31 garrigue Note Added: 0011203
2014-04-03 11:51 lpw25 Note Added: 0011204
2014-04-03 11:56 frisch Note Added: 0011205
2014-04-03 12:10 lpw25 Note Added: 0011207
2014-04-03 12:12 lpw25 Note Edited: 0011207 View Revisions
2014-04-03 12:21 lpw25 Note Edited: 0011207 View Revisions
2014-04-03 12:37 garrigue Note Added: 0011208
2014-04-03 13:51 lpw25 Note Added: 0011209
2014-04-03 13:51 lpw25 Note Edited: 0011209 View Revisions
2014-04-03 14:56 frisch Note Added: 0011210
2014-04-03 15:37 lpw25 Note Added: 0011211
2014-04-03 16:19 ybarnoy Note Added: 0011212
2014-04-03 16:38 frisch Note Added: 0011214
2014-04-04 11:09 frisch Note Added: 0011224
2014-04-04 19:00 ybarnoy Note Added: 0011226
2014-04-05 05:41 garrigue Note Added: 0011228
2014-04-07 13:32 frisch Note Added: 0011234
2014-04-07 14:50 garrigue Note Added: 0011235
2014-04-07 14:57 frisch Note Added: 0011236
2014-04-07 16:07 garrigue Note Added: 0011238
2014-04-07 17:02 frisch Note Added: 0011239
2014-04-07 17:06 frisch Note Edited: 0011239 View Revisions
2014-04-07 17:15 frisch Note Edited: 0011239 View Revisions
2014-04-07 17:18 frisch Note Added: 0011240
2014-04-08 08:46 garrigue Note Added: 0011243
2014-04-08 15:33 frisch Note Added: 0011245
2014-04-08 18:16 frisch Note Added: 0011248
2014-04-08 18:20 frisch Note Edited: 0011248 View Revisions
2014-04-08 18:22 Bardou Note Added: 0011249
2014-04-09 19:47 frisch Note Added: 0011255
2014-04-09 19:51 ybarnoy Note Added: 0011256
2014-04-09 19:53 frisch Note Added: 0011257
2014-04-10 14:01 frisch Note Added: 0011258
2014-04-10 14:04 frisch Assigned To => frisch
2014-04-10 14:04 frisch Status acknowledged => assigned
2014-04-11 17:31 ybarnoy Note Added: 0011263
2014-04-13 21:09 ybarnoy Note Added: 0011266
2014-04-14 13:52 frisch Note Added: 0011267
2014-04-18 16:11 frisch Note Added: 0011304
2014-04-18 16:11 frisch Relationship added related to 0006374
2014-04-20 03:56 ybarnoy Note Added: 0011307
2014-04-22 18:13 frisch Note Added: 0011311
2014-04-24 17:15 ybarnoy Note Added: 0011320
2014-05-10 16:55 whitequark Note Added: 0011416
2014-06-07 15:26 lpw25 Note Added: 0011702
2014-06-08 14:01 frisch Note Added: 0011704
2014-06-08 18:51 lpw25 Note Added: 0011705
2014-06-08 19:04 lpw25 Note Added: 0011706
2014-06-10 15:55 frisch Note Added: 0011721
2014-06-10 16:12 gasche Note Added: 0011723
2014-06-10 16:13 gasche Note Edited: 0011723 View Revisions
2014-06-10 16:21 frisch Note Added: 0011725
2014-06-10 16:27 lpw25 Note Added: 0011727
2014-06-10 18:21 gasche Note Added: 0011728
2014-06-11 10:48 frisch Note Added: 0011730
2014-06-11 15:27 ybarnoy Note Added: 0011734
2014-06-11 15:28 ybarnoy Note Edited: 0011734 View Revisions
2014-06-11 16:31 Bardou Note Added: 0011738
2014-06-11 16:40 frisch Note Added: 0011740
2014-06-11 16:44 ybarnoy Note Added: 0011741


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker