Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005528OCamlOCaml generalpublic2012-03-08 17:362013-05-23 16:52
Reporterfrisch 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version 
Summary0005528: Inline records for constructor arguments
DescriptionOCaml allows n-ary constructors for sum types. Instead of relying on position, it would be convenient to name the fields. Of course, one can use records, but this requires an extra type declaration and has some runtime overhead.

I've started a new branch (constructors_with_record) in the SVN to allow naming arguments of constructors, with the same syntax as records.

Example:

 type t =
    | A of {msg: string; foo:int}
    | B of string
    | C

 let f = function
    | A {msg; _} | B msg -> msg
    | C -> ""

GADTs and exceptions are supported. It is possible to define mutable fields, but there is currently no way to mutate them. Polymorphic fields are not supported.

Note that this proposal also gives a way to disambiguate record field names:

 type a = A of {foo: string; bar: int}
 type b = B of {foo: int; baz: bool}
 let f (A{foo; _}) = foo
 let g (B{foo; _}) = foo

To support mutation and field overriding, I was thinking of a syntax like:

 type t = A of {mutable l: int} | B of {x:int; y:int}

 match ... with
 | A r -> r.l <- r.l + 10
 | ...

 match ... with
 | B r -> B {r with x = r.y; y = r.x}
 | ...

Binding (directly like above or with an alias pattern) the "record" argument creates a special value (special in the same way that "self" or "ancestor" variables in objects are special) which can only be used in the following context:
   - field projection: r.l
   - field assignment: r.l <- e
   - overidding: B {r with ...}

TagsNo tags attached.
Attached Files

- Relationships
related to 0005525resolvedgarrigue Resolving record fields using all specified fields 

-  Notes
(0007022)
frisch (developer)
2012-03-08 17:46

Some motivation. I think that most of the time, using "records" instead of "tuples" for constructor arguments is better (and so it's a good idea to make it as easy as possible to do it in the language):

  - Field names document the meaning of each field.

  - Thanks to punning, code is often not longer than with tuples, and this encourages to use uniform names to bind fields throughout the code base.

  - Ignoring fields is easier than with tuples (just { ... ; _ }).

  - Extending the constructor with more fields does not break existing patterns of the form { ... ; _ } (or when the warning 9 is disabled).
(0007023)
frisch (developer)
2012-03-08 17:48
edited on: 2012-03-08 17:54

(Note: Camlp4 has not been updated; it does not compile. OCamldoc compiles but fails on the new feature.)

(0007025)
gasche (developer)
2012-03-08 19:22

You surely know that Caml Light had mutable sum constructors. Your mutation syntax reminds me of it -- but better behaved, as the syntactic lvalue is still a field projection rather an arbitrary variable.

  http://caml.inria.fr/pub/docs/manual-caml-light/node4.6.html [^]

This is also reminding of SML's anonymous record types (see eg. http://adam.chlipala.net/mlcomp/ [^] ); those are not unboxed with the constructor (but MLton can optimize it) but allow field disambiguation -- in sometimes treacherous ways.


This extension prompts a question (say, for a teacher): do we have a duplication of concepts here (two slightly different ways to define records, just as tuples and the * used in sum constructors are slightly different), or is there a reasonable explanation of the first-class record concept in terms of this feature? I mean, maybe it's possible to say once and for all that the general data type is a labelled sum of labelled products, and explain current records as sums with only one case, where the constructor is left implicit. Could something like that reliably explain the semantics?


The restrictions on the "special" variables seem a bit icky. From a language point of view, I'd rather have general unboxed datatypes, with a kind system guaranteeing that they can't instantiate polymorphic parameters. Haskell has those and it's nice -- not sure how that works with the GC, however, and I understand changing the type checker is probably not an option. The pragmatism of your solution makes it simpler, and OCaml already has special cases for unboxed float anyway.


You included no example where two constructors of a single sum types would share some field names. Is that possible/reasonable? That looks like a potential use case -- having a second constructor that has the same fields than the first, plus some, as it is more memory-efficient than a record with an option type at the end.


This construction would allow to implement Queue efficiently without magic.
  type 'a cell =
    | Null
    | Cell of { content : 'a; mutable next : 'a cell }
(0007026)
jjb (reporter)
2012-03-08 20:20

For what it's worth, I have several times wanted such a feature. I think that the value of flexible extension and consistent naming, in particular, would be significant for larger code bases.
(0007027)
frisch (developer)
2012-03-08 20:43

I've added my extra proposal, allowing field projection and mutation on pseudo record-argument capture variables in patterns. Example:

  type t = A of {x:int; mutable y:int}

  let get_x (A r) = r.x
  let set_x (A r) x = r.x <- x

  let bad (A r) = r (* rejected *)

When used in a or-pattern, these pseudo-variable must have the same kind on both sides; the kind includes the constructor, so the following is rejected:

  type t = A of {x:int} | B of {x:int}
  let get_x = function (A r | B r) -> r.x (* rejected *)
(0007028)
frisch (developer)
2012-03-08 20:52

Gabriel, thanks for your note.

I believe that real records are indeed mostly a special case of this new feature
(using a sum type with one constructor), in the same way that n-tuples could be encoded with a sum type with one n-ary constructor. Of course, the static semantics is quite different: with records, the type is inferred from the labels, but with "record constructors", the type is derived from the constructor, and we know directly the type for the fields.

Currently, two features of records are not supported: polymorphic fields, and overriding ({e with x = ...}), although I don't see any reason why they couldn't be.

---

You have a question about two constructors in a given sum type sharing some field names. Yes, this is possible:

 type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
  | A {x} | B {x} -> x
  | C {x} -> int_of_string x


It is even feasible to make the following work:

 let enrich foo = function
 | A r -> B {r with foo}
 | z -> z

i.e. use a record-argument capture variable as the starting point for overriding, with a different constructor (having more fields). I'm not sure if this is really desirable.
(0007029)
garrigue (manager)
2012-03-09 02:37

I almost proposed that one (including mutable fields) almost ten years ago.
I'm not going to oppose it, but this indeed raises the question of feature overloading.
Inline records have many advantages over normal records, their only limitation being that one can only get a handle through pattern-matching, but this can be a pain for some uses.
So the question for users is naturally, which one to use, knowing that both have advantages and disadvantages.
This is the only reason I finally didn't propose it.
But if there is a good rationale, this might be fine.

Note that the use of special variables is not a big problem in my mind: this is already the case for instance variables in objects.

Also an extra potential feature would be the ability to use type information to extract without pattern-matching.

type t = A of {x:int} | B of {x:int;y:int}

let get_x (r:t) = r.x
(0007039)
frisch (developer)
2012-03-10 12:58

> type t = A of {x:int} | B of {x:int;y:int}
> let get_x (r:t) = r.x

I'm not entirely convinced by this extension. How would the code above behave if some constructor in t don't have an "x" field? (Also, we'd need to be careful with type inference and principality here, but you know that better than I do!)

Pattern-matching with punning is not so bad:

let get_x (A{x}|B{x}) = x

(yes it becomes tedious with more constructors)
(0007041)
garrigue (manager)
2012-03-12 00:44

Actually I was not thinking of generating a pattern-matching, but rather to restrict this feature to types for which it can be implemented trivially, as a direct field access without any test. I.e. x would have to be present, with the same type, and at the same position in all cases of the type. This of course includes the trivial case where the type has only one case.
This makes it easier to represent a syntax tree including location information in all nodes for instance.
I think lots of people have wished this behavior for a long time, and combining variants and records makes it easy.
Concerning principality, this is a typical case where -principal can recover it.

But do not take this suggestion as meaning that I support variants of records unconditionally.
I still think that we first need a good rationale. Being useful is not sufficient when there is a large overlap with an already existing feature.
(0007052)
frisch (developer)
2012-03-12 20:45

> This makes it easier to represent a syntax tree including location information in all nodes for instance.

I see, but I'm not sure it's worth introducing an extra feature, whose meaning is not completely obvious. It just avoids the need to define (once for each common field) functions like:

 let get_loc (Var{loc}|App{loc}|Lambda{loc}) = loc

> I still think that we first need a good rationale.

A first reason is that it makes it makes it possible to define sum types with mutable fields, without space overhead. For some low-level data structures (for instance mutable binary trees), it can give some performance boost without requiring Obj.magic.

But the most important reason, in my opinion, is that it encourages to name arguments of constructors instead of relying on position, and it is often a good idea. In particular, it removes the following counter-arguments to naming constructor arguments with explicit record declarations:

 1. Syntactic burden of defining extra records.

 2. "Bad" support for field name overloading.
 
 3. Runtime overhead associated to extra record nesting.


In the past, for instance, I've proposed to replace some n-ary constructors in OCaml sources with records. This would have simplified maintaining on LexiFi side some of our local patches (avoiding the need to add wildcard patterns at many places). This was rejected, based on the fact that it would have increased memory usage and the size of cmi files (argument 3 above).
(0007059)
doligez (manager)
2012-03-13 17:54

> it makes it possible to define sum types with mutable fields

This is a red flag. Are you sure your implementation is safe with pattern-matching and "when" clauses?
(0007060)
frisch (developer)
2012-03-13 18:29

> Are you sure your implementation is safe with pattern-matching and "when" clauses?

I'm not sure at all, this would need to be checked.

Note that the semantics is weird anyway (and I'm not absolutely convinced it is safe) even with regular records:

type t = A of int | B of string;;
type u = R of t ref;;
let f = function
 | R{contents=A _} -> ""
 | R({contents=B _} as r) when (r.contents <- A 10; r.contents = A 20) -> ""
 | R({contents=B _} as r) -> (match !r with A _ -> "BAD" | _ -> "OK");;

It's really weird that f can return "BAD" (and it does, with argument (R (ref (B""))) ).
(0007087)
lavi (reporter)
2012-03-15 11:02

This is perhaps to early to speak about optimization, but for records with float only fields, it would be better to de-inline them internally, or even better to treat the first constructor with float only fields as a float array and de-inline the others.
(0007157)
Bardou (reporter)
2012-03-26 10:41

Hello,

I have been wishing for this for a long time. Mainly because it's just tedious to declare a record for each constructor. Too often do I start with constructors with a low argument count (say 1 to 3) and find that I have to add new arguments. At some point I have so many arguments that I really have to declare the record, and thus rewrite all the relevant parts of the code.

I would add that it would be convenient for me to be able to name only *some* of the arguments. For instance :

type t = Sum of pos: Lexing.position * t * t

I would access the anonymous arguments using pattern-matching as usual, and use ".pos" as a shortcut sometimes.

Unifying records and sums is great, unifying tuples at the same time seems even better to me. The OPA language (of Mlstate) does this, if I'm not mistaken.
(0007199)
bobot (reporter)
2012-03-27 17:08

For a semantic, syntax, typing perspective, can't we consider that:

type t =
   | A of string
   | B of {msg: string; mutable foo:int}
   | C

is exactly the same thing than:

type __t = {msg: string; mutable foo:int}

and t =
   | A of string
   | B of __t
   | C

Of course the "when" clause can lead to strange behaviors, but not different than the one we have with such a type t and __t.

Moreover we can consider the record nearly like an actual record. The nearly mean that we create it with the tag of B instead of the default one (the first, if I remember well) used for record.

So:
type t =
   | A of string
   | B of ({msg: string; mutable foo:int} as t2)
   | C

is exactly for semantic, syntax and typing the same thing than:

type t2 = {msg: string; mutable foo:int}

and t =
   | A of string
   | B of t2
   | C

eg:

match x with
  | A s ->
  | B ({ msg = ""} as r) -> f (r : t2)

For compilation: B r is just the identity function, the matching just doesn't extract. You can't make the difference between the two definitions, except if you use Obj.magic to see that (Obj.magic (B r)) == (Obj.magic r), but Obj.magic is not in the semantic right?

(It's in fact not the same thing for typing in regard of module subtyping, if t is made abstract, t2 must be done private. But that can be quite useful)

My 2 cents,
I'm very happy that someone try to implement mutable sum type.
(0009325)
bobot (reporter)
2013-05-23 16:07

With 0005759 (Using well-disciplined type-propagation to disambiguate label and constructor names) the last view (http://caml.inria.fr/mantis/view.php?id=5528#c7199 [^]) should be able to work with such a type (http://caml.inria.fr/mantis/view.php?id=5528#c7028 [^]):

type t = A of {x:int} | B of {foo: int; x:int} | C of {x:string}

let get = function
  | A {x} | B {x} -> x
  | C {x} -> int_of_string x
(0009327)
gasche (developer)
2013-05-23 16:52

I'll also note that the problem of side-effects on mutable fields in "when" clause has now been fixed by Luc in http://caml.inria.fr/mantis/view.php?id=5992 [^]

- Issue History
Date Modified Username Field Change
2012-03-08 17:36 frisch New Issue
2012-03-08 17:46 frisch Note Added: 0007022
2012-03-08 17:48 frisch Note Added: 0007023
2012-03-08 17:54 frisch Note Edited: 0007023 View Revisions
2012-03-08 18:47 gasche Relationship added related to 0005525
2012-03-08 19:22 gasche Note Added: 0007025
2012-03-08 20:20 jjb Note Added: 0007026
2012-03-08 20:43 frisch Note Added: 0007027
2012-03-08 20:52 frisch Note Added: 0007028
2012-03-09 02:37 garrigue Note Added: 0007029
2012-03-10 12:58 frisch Note Added: 0007039
2012-03-12 00:44 garrigue Note Added: 0007041
2012-03-12 20:45 frisch Note Added: 0007052
2012-03-13 17:54 doligez Note Added: 0007059
2012-03-13 18:29 frisch Note Added: 0007060
2012-03-15 11:02 lavi Note Added: 0007087
2012-03-26 10:41 Bardou Note Added: 0007157
2012-03-26 14:17 lefessan Assigned To => lefessan
2012-03-26 14:17 lefessan Status new => acknowledged
2012-03-26 14:18 lefessan Assigned To lefessan =>
2012-03-27 17:08 bobot Note Added: 0007199
2013-05-23 16:07 bobot Note Added: 0009325
2013-05-23 16:52 gasche Note Added: 0009327


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker