MantisBT - OCaml
View Issue Details
0007816OCamltypingpublic2018-07-02 16:182018-07-03 10:25
normalminorhave not tried
0007816: About the order in which record fields are typed
Original starting point of the discussion: [^]

N.B. I will not discuss principality concerns here, they remain the same regardless of which order is actually chosen.

# Current state of things (up-to OCaml 4.07)

For patterns:
- when typing a record pattern, fields are typed in the order following that of the record declaration
- parmatch currently relies on this sorting having been done, cf. [^]
- matching doesn't appear to rely on this having been done, but does use that same (declaration) order, cf.
  + [^]
  + [^]

For expressions fields are typed in the order following that of the record declaration
Alain pointed out this is coherent with the way labeled arguments are typechecked.

# Suggested change: follow the source order instead

If I'm not mistaken, the proposition has been made by both Gabriel and Leo.

I'm not sure whether they propose this change to be limited to patterns or to apply to expression as well.
Although it seems like if we change the order in patterns we should change the order in expressions as well.

## Potential issues

If we change the order in which typing is done, then we also need to update the evaluation order (to match the new typing order).

As Jeremy said:
> The equation propagation behaviour is closely connected with
> evaluation order, too. In a match like this:
> match e with
> | {x = lazy A, y = lazy B}
> equality information needs to flow in the direction first-evaluated
> to last-evaluated because the equalities are only available when
> pattern matching succeeds.
> So changing the direction in which equalities flow (which will cause
> a noisy type-checking failure in some currently-valid programs) also
> involves changing the order in which patterns are evaluated (which
> will cause a silent evaluation-order change in some currently-valid
> programs).

Although I'm not sure lazyness really matters here?

Following that, Gabriel added:
> I thought that the evaluation-order change would be simple, but we
> have a problem if a matched column contains two record patterns with
> different field orders.

To give a concrete example of a problematic pattern:

    type _ t = Pair : ('a * 'b) t | Unit : unit t
    type 'a r = { x : 'a t * 'a; y : 'a t * 'a; }
    let foo (type a) (r : a r) =
      match r with
      | { x = (Pair, _); y = (_, (a, b)) } -> ignore (a, b)
      | { y = (Pair, _); x = (_, (c, d)) } -> ignore (c, d)
      | _ -> ()
No tags attached.
Issue History
2018-07-02 16:18trefisNew Issue
2018-07-02 16:30trefisDescription Updatedbug_revision_view_page.php?rev_id=3305#r3305
2018-07-02 16:45stedolanNote Added: 0019219
2018-07-02 16:58trefisNote Added: 0019220
2018-07-02 18:48gascheNote Added: 0019221
2018-07-03 10:25trefisNote Added: 0019222

2018-07-02 16:45   
To expand on Jeremy's comment about laziness, here's an example using laziness that would segfault if the typing order were changed but the matching order were not.

    type 'a is_str = Is_string : string is_str
    type 'a box = { x : 'a; tag : 'a is_str Lazy.t }

    let foo (type a) : a box -> bool = function
      | { tag = lazy Is_string; x = "foo" } -> true
      | _ -> false

    let bang = foo { x = 42; tag = lazy (assert false) }

The field 'x' is only known to be of type 'string' after the match on field 'tag' has succeeded. Since it is a lazy pattern-match, this match could fail with an exception, in which case 'x' might not be a string.

So, if the equations gleaned from 'tag = lazy Is_string' are available when type-checking 'x = "foo"', then 'tag' must be evaluated before 'x' is matched. 4.06.1 types and matches 'x = "foo"' first, so this function gives a type error:

    This pattern matches values of type string
    but a pattern was expected which matches values of type a

It would also be fine to type and match 'tag = lazy Is_string' first, in which case this code would be type correct and fail with an assertion error, but soundness demands that the typing order and evaluation order do not contradict each other.
2018-07-02 16:58   
Thanks for the example.

What I meant was: we can get the same segfault without needing lazyness at all, if you have:

    type 'a typ = Is_string : string typ | Is_int : int typ
    type 'a box = { x : 'a; tag : 'a typ }

    let foo (type a) : a box -> bool = function
      | { tag = Is_string; x = "foo" } -> true
      | _ -> false

you'll get into trouble just the same if you look at x first and you're passed { tag = Is_int, x = 42 }.
2018-07-02 18:48   
A comment on my different-field-orders-in-the-same-column point: (as Thomas already knows) we know how to compile this correctly, by splitting the matrix into several submatrices, one for different row orders. (In general "split more" is the magical solution to compilation correctness issue.) But this solution is a bit frustrating as it may degrade the performance of existing programs.

We could have a conservative criterion on when it is not needed to split, that is when it is valid to reorder field. It's clearly sound if the fields are all variables, and I think we can extend to the case where the fields are all exhaustive value patterns (value patterns as opposed to effectful patterns like "lazy"). In particular, it is non-trivial but I think checking a one-constructor GADT can be delayed to after accessing type-dependent patterns -- this is similar the logic used to shortcut the case switch when all type-valid GADT constructors return a field of interest in the same position.

With this improvement (a conservative criterion on when reordering is sound and splitting is thus not necessary), the impact on performance would probably be negligible, so it would be all dandy. But this is yet more complicated logic to implement and maintain, and nobody is positively excited by the idea.
2018-07-03 10:25   
I agree with this on principle, though I would probably use a slightly different criterion.

But indeed, I'm not excited by this either: keeping the current ordering seems "fine" for now; we can reconsider after an hypothetical cleanup/simplification of