About the order in which record fields are typed #7816

vicuna · 2018-07-02T14:18:32Z

Original bug ID: 7816
Reporter: @trefis
Status: new
Resolution: open
Priority: normal
Severity: minor
Category: typing
Monitored by: @nojb @yallop

Bug description

Original starting point of the discussion: #1675

N.B. I will not discuss principality concerns here, they remain the same regardless of which order is actually chosen.

Current state of things (up-to OCaml 4.07)

For patterns:

when typing a record pattern, fields are typed in the order following that of the record declaration
parmatch currently relies on this sorting having been done, cf.

ocaml/typing/parmatch.ml

Line 261 in 41a432f

let records_args l1 l2 =
matching doesn't appear to rely on this having been done, but does use that same (declaration) order, cf.
- ocaml/bytecomp/matching.ml
  
  Line 556 in 41a432f
  
  - records are expanded so that they possess all fields
- ocaml/bytecomp/matching.ml
  
  Line 64 in 41a432f
  
  let all_record_args lbls = match lbls with

For expressions fields are typed in the order following that of the record declaration
Alain pointed out this is coherent with the way labeled arguments are typechecked.

Suggested change: follow the source order instead

If I'm not mistaken, the proposition has been made by both Gabriel and Leo.

I'm not sure whether they propose this change to be limited to patterns or to apply to expression as well.
Although it seems like if we change the order in patterns we should change the order in expressions as well.

Potential issues

If we change the order in which typing is done, then we also need to update the evaluation order (to match the new typing order).

As Jeremy said:

The equation propagation behaviour is closely connected with
evaluation order, too. In a match like this:
match e with
| {x = lazy A, y = lazy B}
equality information needs to flow in the direction first-evaluated
to last-evaluated because the equalities are only available when
pattern matching succeeds.

So changing the direction in which equalities flow (which will cause
a noisy type-checking failure in some currently-valid programs) also
involves changing the order in which patterns are evaluated (which
will cause a silent evaluation-order change in some currently-valid
programs).

Although I'm not sure lazyness really matters here?

Following that, Gabriel added:

I thought that the evaluation-order change would be simple, but we
have a problem if a matched column contains two record patterns with
different field orders.

To give a concrete example of a problematic pattern:

type _ t = Pair : ('a * 'b) t | Unit : unit t
type 'a r = { x : 'a t * 'a; y : 'a t * 'a; }
let foo (type a) (r : a r) =
  match r with
  | { x = (Pair, _); y = (_, (a, b)) } -> ignore (a, b)
  | { y = (Pair, _); x = (_, (c, d)) } -> ignore (c, d)
  | _ -> ()

The text was updated successfully, but these errors were encountered:

vicuna · 2018-07-02T14:45:45Z

Comment author: @stedolan

To expand on Jeremy's comment about laziness, here's an example using laziness that would segfault if the typing order were changed but the matching order were not.

type 'a is_str = Is_string : string is_str
type 'a box = { x : 'a; tag : 'a is_str Lazy.t }

let foo (type a) : a box -> bool = function
  | { tag = lazy Is_string; x = "foo" } -> true
  | _ -> false

let bang = foo { x = 42; tag = lazy (assert false) }

The field 'x' is only known to be of type 'string' after the match on field 'tag' has succeeded. Since it is a lazy pattern-match, this match could fail with an exception, in which case 'x' might not be a string.

So, if the equations gleaned from 'tag = lazy Is_string' are available when type-checking 'x = "foo"', then 'tag' must be evaluated before 'x' is matched. 4.06.1 types and matches 'x = "foo"' first, so this function gives a type error:

This pattern matches values of type string
but a pattern was expected which matches values of type a

It would also be fine to type and match 'tag = lazy Is_string' first, in which case this code would be type correct and fail with an assertion error, but soundness demands that the typing order and evaluation order do not contradict each other.

vicuna · 2018-07-02T14:58:02Z

Comment author: @trefis

Thanks for the example.

What I meant was: we can get the same segfault without needing lazyness at all, if you have:

type 'a typ = Is_string : string typ | Is_int : int typ
type 'a box = { x : 'a; tag : 'a typ }

let foo (type a) : a box -> bool = function
  | { tag = Is_string; x = "foo" } -> true
  | _ -> false

you'll get into trouble just the same if you look at x first and you're passed { tag = Is_int, x = 42 }.

vicuna · 2018-07-02T16:48:51Z

Comment author: @gasche

A comment on my different-field-orders-in-the-same-column point: (as Thomas already knows) we know how to compile this correctly, by splitting the matrix into several submatrices, one for different row orders. (In general "split more" is the magical solution to compilation correctness issue.) But this solution is a bit frustrating as it may degrade the performance of existing programs.

We could have a conservative criterion on when it is not needed to split, that is when it is valid to reorder field. It's clearly sound if the fields are all variables, and I think we can extend to the case where the fields are all exhaustive value patterns (value patterns as opposed to effectful patterns like "lazy"). In particular, it is non-trivial but I think checking a one-constructor GADT can be delayed to after accessing type-dependent patterns -- this is similar the logic used to shortcut the case switch when all type-valid GADT constructors return a field of interest in the same position.

With this improvement (a conservative criterion on when reordering is sound and splitting is thus not necessary), the impact on performance would probably be negligible, so it would be all dandy. But this is yet more complicated logic to implement and maintain, and nobody is positively excited by the idea.

vicuna · 2018-07-03T08:25:19Z

Comment author: @trefis

I agree with this on principle, though I would probably use a slightly different criterion.

But indeed, I'm not excited by this either: keeping the current ordering seems "fine" for now; we can reconsider after an hypothetical cleanup/simplification of matching.ml

github-actions · 2020-05-07T04:16:31Z

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

vicuna added the typing label Mar 14, 2019

github-actions bot added the Stale label May 7, 2020

trefis closed this as completed May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the order in which record fields are typed #7816

About the order in which record fields are typed #7816

vicuna commented Jul 2, 2018

vicuna commented Jul 2, 2018

vicuna commented Jul 2, 2018

vicuna commented Jul 2, 2018

vicuna commented Jul 3, 2018

github-actions bot commented May 7, 2020