Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005667OCamlOCaml typingpublic2012-06-29 14:012013-04-08 13:33
Assigned To 
PrioritynormalSeverityfeatureReproducibilityhave not tried
PlatformOSOS Version
Product Version 
Target VersionlaterFixed in Version 
Summary0005667: Autofocusing pattern on record field to simplify matching trees with meta-data
DescriptionA common idiom in OCaml when defining an AST is:

  type expr = {
       loc: ...;
       desc: expr_desc;
  } and desc =
    | Foo of ...
    | Bar of ...

i.e. wrap each node in a record type to attach some meta-data (like the location). With such a type definition, it becomes quickly cumbersome to write non-trivial patterns (e.g. to implement optimizations, or AST rewriting). This has even been used as an argument against extending some of the intermediate languages in OCaml itself with location information (to make locations available to gdb, IIRC).

One might want to look at "views" or "active patterns" as generic solutions to such problems. Here, I'd like to propose a little patch which implements the following ad hoc strategy: when a constructor pattern [p] is type-checked in a context where a record type is expected (this information is now propagated, as it is required by GADTs), and the record type has a single field [l] whose type is a variant, then the pattern is automatically wrapped into a pattern {l = p}. There are several ways to improve the cases covered by the strategy (e.g. consider only fields which are variant types that contain the specified constructor), and one could prefer to mark explicitly the record type declaration so as to enable automatically this "auto-focusing" hack (by specifying the field explicitly).

As an example, here is how a pattern matching can be simplified by this hack:

(* before *)

      | {pmod_desc = Pmod_apply(
         {pmod_desc = Pmod_apply(
          {pmod_desc = Pmod_apply(
           {pmod_desc = Pmod_ident {txt = Lident "IFDEF"}},
           {pmod_desc = Pmod_ident {txt = Lident sym}}

(* after *)

      | Pmod_apply
              (Pmod_apply(Pmod_ident (Lident "IFDEF"),Pmod_ident (Lident sym)),
               body_def), body_not_def)

The patch is quite ad hoc, but I believe it covers a lot of concrete cases. I don't expect the patch to be integrated in its current form, but hopefully this proposal will trigger some interesting discussions.
TagsNo tags attached.
Attached Filesdiff file icon autofocus.diff [^] (2,497 bytes) 2012-06-29 14:01 [Show Content]

- Relationships

-  Notes
garrigue (manager)
2012-06-30 03:28

This makes sense (I suppose that many people already had this idea).
However, this would be a policy switch for ocaml.
Namely, in ocaml dynamic semantics never depend on the type system.
This is true for GADTs too: they only change the typing, not the dynamic behavior.
What you propose means that you automatically insert a field access, with the choice
of the field depending on its type.
This is a major change.

In some way, a mechanism of pattern macros would be less intrusive.
gasche (developer)
2012-06-30 20:21

I was just going to mention the mechanism of pattern synonyms. When Conor McBride came to Paris a few months ago, he mentioned it as an improvement he suggested for Haskell, but that would also make sense for OCaml. [^]

The syntax he demonstrated was of the form:

  pattern p x y z = Foo (x, [y], z)

(In his mind, `p` could then be used in both patterns and expressions, but using it only for patterns make it simpler.)

In this case, one could write:

  pattern apply (e1, e2) = {pmod_desc = Pmod_apply(e1, e2); ...}

I think there is also value in Alain's proposition of type-directed code inference (here pattern inference), but maybe it should be a more syntactically explicit.
protz (manager)
2012-06-30 20:37

FWIW, I often wished I was able to abstract on patterns. The code I'm writing right often performs the same set of patterns:

match t with
| A (B t')

I'd be happy to abstract that as:

pattern p (t) = A (B t)

and write

match t with
| p t' ->
frisch (developer)
2012-07-02 18:54

Jacques: the semantics of the language already depend on the type system. For instance in

  module A = struct include (M : sig end) let () = print_endline "AAA" end

the semantics depends on the signature constraint. One can argue that there is some kind of elaboration semantics, but then it's hard to conclude that the semantics doesn't depend on the type system.

If we mark explicitly the record declaration with the field on which "auto-focusing" is enabled, and the elaboration keeps this mark "at runtime", the dynamic semantics is defined without further reference to type inference (a constructor pattern applied to a record value "reduces" to the same pattern applied to the marked field).
garrigue (manager)
2012-07-09 11:11
edited on: 2012-07-09 11:14

Oops, I wrote an answer that was never sent.
At the module level it is clear that signature have a semantical meaning.
But in the core language type constraints are not supposed to change the dynamical behaviour.
Of course, you might come up with a proper dynamic semantics for focusing, not depending on typing :-)
And one could argue that if we add type witnesses this will change this a lot.

At the very least, I would suggest an explicit syntax:

     | Pmod_apply
              ({Pmod_apply({Pmod_ident {Lident "IFDEF"}},{Pmod_ident {Lident sym}}),
               body_def)}, body_not_def)

I.e., this should be an extension of the record syntax, not of pattern matching in general.
And I agree that in this case, it does not depend on type inference (at least if all the record
fields have a concrete type, not just a type variable).

doligez (administrator)
2012-07-17 14:33

Alain, the common idiom that you describe is probably not the best way to add location annotations.
It is probably better to add a location field to every constructor, just to avoid doubling the depth of every pattern.

Then you have to define a "location_of" function for every type, but we have good syntactic tools to do this kind of thing.
frisch (developer)
2012-07-17 17:26
edited on: 2012-07-17 17:26

Damien: the solution with intermediate record nodes to store extra information is wide-spread, and is used everywhere in the OCaml code base itself, as you know well. Adding extra arguments to all constructors requires to add extra wildcard patterns to many places, and is not very nice conceptually (if a concept applies to all kinds of nodes, it should be placed at the level of the node, not under all constructors).

Jacques: (I think you miss a closing } before body_def. And {...} around the whole pattern.) Yes, an explicit syntax might be worth the little syntactic overhead, I'm not sure (the extra {...} make the pattern somehow less readable). More important to me would be to make it explicit on the record definition site which field is subject to this trick.

gasche (developer)
2012-07-17 18:41

Note that the "patterns synonyms" feature would be able to handle both style (factor our the record layer, or factor out the additional location field); if applied consistently they would even allow to change from one representation to the other without having to change the pattern-matching sites.

(I've mentioned this feature idea to Luc, and he seemed quite interested in it. I hope we can come up with a reasonable proposal some time after the release.)
frisch (developer)
2012-07-17 18:54

Gabriel: I'm eager to hear more about a proposal on pattern synonyms! Could you give an idea on how a declaration like:

 pattern apply (e1, e2) = {pmod_desc = Pmod_apply(e1, e2); _}

would be used?

Something like

 match e with
 | apply (e1, e2) -> ...
 | ...


This looks very nice. It'd be a little bit tedious to define an alias for each constructor, but it works. One also needs to be careful about constant alias (with no capture variable), in order to distinguish them from capture variables. (Maybe aliases should be uppercase identifiers?)

One could also allow calling arbitrary function and matching the result. This is a much more complex approach, of course. (The idea would be that a pattern like:

   f p -> ...

would call function f and match its result with p.)
gasche (developer)
2012-07-17 19:45

Yes, something along those lines -- I was hoping to smuggle a currified syntax for patterns, and the use site may get some syntactic noise to avoid ambiguities, but that's the idea.

Your latter proposal of applying a function (looks like view patterns) makes me suspicious. The problem with such proposals (also internalized "when" for example) is that the interaction between the pattern matching compilation and the fact of executing arbitrary user code are a very messy problem. For once, it's hard to specify if the function will be called at all, in which order and how many times. Even more problematic, the compiler may check the presence of some head constructor, then execute your function that would mutate the data by changing the head constructor, and then the optimized parameter access code would provoke a segfault.
I am planning for a minimal proposal rather than an ambitious and possibly problematic proposal.

You are right that using the pattern synonyms in the present use case (locations) involve some boilerplate. You have to choose between defining patterns for each use case, as in your "apply" example, or defining a more generic pattern

  pattern desc p = {pmod_desc = p; pmod_loc = _loc}

  match e with
  | desc (Apply(e1, e2)) -> ...

which provides a much lesser syntactic gain (we could have essentially the same thing by choosing a shorter field name) in exchange for less boilerplate. Note, as an independent side remark, the implicit introduction of a "_loc" identifier that captures the innermost location, in the spirit of Camlp4 quotations.

I think having some boilerplate is a reasonable price to pay for the use of a relatively low-tech solution; it's certainly less sophisticated than type-based pattern inference as you suggest. Hopefully that mean we have less ways to get it wrong, an maybe it could be useful in more different situations -- but I readily admit that this nested record example is so far my only practical example to motivate the feature.

(The difficulties will be, I suppose, with whether we choose to expose it through module boundaries; conceptually we'd like to be able both to keep them abstract and publish only their "type" (matched type plus environment effect) or to publish their code to allow for more performant compilation. In practice I'm not sure the current pattern compilation techniques would handle abstract patterns, so they may be always-concrete for now.)
lpw25 (developer)
2012-07-17 21:31

As mentioned in the comments on feature request 0005528, "inline records for constructor arguments" could also be a solution to the specific problem of location fields. If the inline record constructors were allowed to share fields, perhaps with an explicit syntax like:

type expr =
    | Foo of { external loc: ...; ... }
    | Bar of { external loc: ...; ... }

These could be pattern matched as Foo {...} and Bar {...}, but also have their loc fields accessed directly as e.loc.
frisch (developer)
2012-07-18 09:29

Gabriel: your arguments about side effects while running code during pattern matching already apply to lazy patterns. But it's certainly to be taken into account.

Patterns aliases are like syntactic macros; they could almost be implemented as a syntax extension (e.g. using the new -ppx flag :-)). For instance, an AST filter could detect (and remove) a declaration like:

  let alias {pmod_desc = Pmod_apply(e1, e2); _} = Apply (e1, e2)

and use it to rewrite any pattern:

  Apply (p1, p2)

into {pmod_desc = Pmod_apply(p1, p2); _}

A deeper support in the compiler is required to get better scoping, exporting to interfaces, etc. If we do that, this is no longer a really low-tech approach; at least, it's going to be more intrusive in the language definition than my proposal.
frisch (developer)
2012-07-18 14:57

Another variant of my proposal, committed to branch autofocus. It introduces an explicit syntax on the record declaration:

  type t = {
      match x : s;

(only one field can be marked as 'match', not checked for now)

When the type-checker encounters a pattern, with an expected type referencing such a record declaration with a 'match' field, and the pattern under consideration cannot be a record pattern (not a record, variable, alias, constraint, or-pattern), it focuses on the marked field.

This is more explicit than the original proposal, and also slightly more general since this is not restricted to sum types.
lefessan (developer)
2012-07-20 00:23

@Alain: Does the order of unification matters for your patch ?

If yes, shouldn't you add the restriction that 's' is not a polymorphic variable ?

Otherwise, does your patch correctly computes the correct type for both 'f1' and 'f2' in this example ?

type 'a ref = { match mutable content : 'a }

let f1 x =
   if true then
     match x with None | Some _ -> None
   else x.content
let f2 x =
   if true then x.content
     match x with None | Some _ -> None
frisch (developer)
2012-07-20 09:46

@Fabrice: yes, it does, unless you compile with -principal. The patch relies on the same type propagation technology as used for GADTs.

When type-checking without -principal, f1 is rejected and f2 is accepted. In -principal mode, both as rejected, as expected. Moreover, if you add the constraint (x : 'a option ref) on the argument, all four cases are accepted.
xleroy (administrator)
2012-08-02 09:26

Changing target version, as this is a new language feature with deep implications.
frisch (developer)
2012-09-15 07:18

Jacques: one more point about your proposal of an explicit syntax. I guess that the proposed syntax will cause conflicts in the yacc grammar, or will require significant effort. In particular, something starting with "{A1.A2.A3." could be either (1) a normal record pattern with a qualified first label, (2) a "focused" record pattern with a sub-pattern being qualified constructor name, or (3) a "record override" construction ("A1.A2.A3." being also a prefix of several kinds of valid expressions).
garrigue (manager)
2012-09-15 09:22

Alain: Yes, I know that there is already a lot of things in record syntax.
However, this can be solved in a later phase.
The real question should be semantics.
Could there be a conflict with punning?

By the way, my understanding was that focusing was only for patterns,
so I would think that record override is irrelevant here.
frisch (developer)
2012-09-15 12:22

> The real question should be semantics.

I don't see any real issue with semantics. For instance, in the variant of the proposal with the explicit 'match' annotation on the type declaration: when a record value is matched against a pattern of one of the authorized forms, matching continues with the designated field and the same pattern.

> record override is irrelevant here.

frisch (developer)
2013-04-04 15:33

Making it easier to pattern match on Parsetree fragments will become more and more important as we move towards -ppx based preprocessors instead of Camlp4. Of course, the proposal is not specific to the OCaml Parsetree itself and would also benefit to other AST encodings using a similar representation.

Solutions which have been discussed in the thread:

 1. Original proposal: automatically focus on a record field (many possible variants: with/without an explicit annotation on the target field in the record type declaration; only for constructor patterns; etc).

 2. Jacques: use a new syntax on the record pattern to make the field implicit while keeping it explicit that we rely on this new feature.

 3. Gabriel: use pattern synonyms.

 4. Leo: use inline records (with one location field in each constructor).

I still believe that 1 is the less intrusive change to the compiler and syntax, and the most direct way to provide a good experience to people writing patterns on Parsetree fragments. I don't think it is a very deep philosophical switch on the semantical side, especially if we pick the variant with an explicit annotation on target field on the record type declaration and if we restrict focusing to the case where the pattern is a constructor. I also ok with Jacques' variant, but I'd then prefer a very light syntax. 3 and 4 require much more pervasive changes to the language definition and compiler and affect significantly the way the ASTs have to be defined (for 3, we need to define synonyms for each constructor, which mean picking new names, etc). 4 in particular does not scale well if many "satellite" (locations + attributes + ...) fields have to be added to each constructor. Moreover, doing 1 or 2 does not prevent from adding 3 or 4 to the language later.

For 2, I'd suggest to use a single-character prefix operator (for instance, "&") in front of the constructor pattern, as in:

      | &Pmod_apply
              (&Pmod_apply(&Pmod_ident (&Lident "IFDEF"),&Pmod_ident (&Lident sym)),
               body_def), body_not_def)

Since we're making good progress on the proposal "-ppx + extension_points", it's a good time to resurrect this discussion... Please comment!
lpw25 (developer)
2013-04-04 17:16
edited on: 2013-04-04 17:22

> 4. Leo: use inline records (with one location field in each constructor).

> 4 in particular does not scale well if many "satellite" (locations + attributes + ...) fields have to be added to each constructor

I don't see how it would not scale:

type expr =
    | Foo of { foo_arg: t;
               other_foo_arg: s;
               external loc: Loc.t;
               external attribute: Atrr.t }
    | Bar of { bar_arg: r;
               external loc: Loc.t;
               external attribute: Attr.t; }

I think that these "tagged records" are a much more direct solution to the
problem, with a much better data layout than using "auto-focusing patterns"
with ordinary records.

The problem with data types like Parsetree.module_expr, which have different
"variants" that share some fields, is not just that pattern matching is a bit
verbose, but that they are not in the most appropriate data layout. Ordinary
variants would provide the best data layout, but then the shared fields can
only be accessed through unnecessarily expensive pattern matching.

"Tagged records" solve this problem by combining the best data layout (like
variants) with quick access to shared fields (like records).

gasche (developer)
2013-04-04 17:22

Just a remark (I'm not weighting in for inline records, altough I do like this approach): I'm not sure those "external" qualifiers would be necessary for runtime performance. With GADTs I've already seen cases where access to fields went without runtime tag test. I wouldn't be surprised to hear that this case (access to the first argument present in all constructors) already generates optimal code. Otherwise it could probably be recognized and optimized, provided there is a convincing argument that this is actually important for performances.
frisch (developer)
2013-04-04 19:03

Leo: are you concerned with the runtime performance and memory usage? Because otherwise I don't see why "tagged record" is a more appropriate data layout. I'd argue quite the contrary, as it didn't make it explicit that all constructors share some common fields. Splitting the common part in a dedicated record and restricting the variable part to a inner sum type is a better separation of concerns than flattening everything so that it fits in a single memory block.

What's the "external" qualifier in your example? If it is supposed to tell the compiler that all constructors share the common part, it's kind of weird having to duplicate the field definition in all constructors and have the compiler check that they all match. If we start adding annotations to type definitions in order to help the compiler pick a good runtime representation, I'd rather do the opposite and mark the "description" field within the enclosing record as being "inlined".

But I'm not concerned with performance at all here, and the current Parsetree representation definition just seems more logical to me and it allows us to write cleaner code (in my opinion). For instance, we can expose a "maker" function which only takes the "description" and optional arguments for the other fields. And a generic "map" function does not have to copy all satellite fields for each constructors. The same technique also allows better code factorization of type definitions (as in "'a loc"). How would you deal with 'a loc with inline records? You'd have to extract its location field in the enclosing data structure, turning something like:

 Record of (Longident.t loc * expression) list


 Record of (Longident.t * Location.t * expression) list

or rather introduce an extra record:

 Record of record_field list
and record_field =
    label: Longident.t;
    label_loc: Location.t;
    content: expression;

But then pattern matching is not really made simpler, and you cannot pass the "Longident.t loc" as a single object, thus polluting a lot of code. (Applying the same inlining trick to "Longident.t loc" does not work because inner nodes don't have location; and this would not work for "string loc".)

Don't get me wrong: I'd love to get inline records (that's why I've proposed them!), including in the definition of OCaml's parsetree, but only to name constructor arguments, not to merge location and attribute fields in all constructors. I also agree that in some cases tweaking the type definition for performance reasons can be useful and that inline records can be a good tool, but this is a different topic.
lpw25 (developer)
2013-04-05 19:10
edited on: 2013-04-06 13:08

> Leo: are you concerned with the runtime performance and memory usage?

It annoys me that I have to choose between easy access to shared fields and the data layout that seems most suitable. This probably comes under the category of "premature optimisation", but it is frustrating nonetheless.

> What's the "external" qualifier in your example? If it is supposed to tell the compiler that all constructors share the common part, it's kind of weird having to duplicate the field definition in all constructors and have the compiler check that they all match.

I admit that the duplication of field names is not ideal. I chose that syntax as it was the smallest change from ordinary variant constructors. Maybe something like:

  type t =
    Foo of { foo: int }
  | Bar of { bar: int }
  with { shared: int }

would be better (since we are going to move all type-conv extensions over to ppx :)).

> If we start adding annotations to type definitions in order to help the compiler pick a good runtime representation, I'd rather do the opposite and mark the "description" field within the enclosing record as being "inlined".

This is probably a matter of taste.

> For instance, we can expose a "maker" function which only takes the "description" and optional arguments for the other fields

Indeed, there is always a choice between combining data into a single data structure and being able to represent a "partial" data structure. This is probably orthogonal to the issue of tagged records: if some shared fields have no default do you put them in a separate sub-record or leave them inlined in a single data structure?

> How would you deal with 'a loc with inline records?

I'm afraid I don't really understand your point here. Why would I not just use

  Record of (Longident.t loc * expression) list

lpw25 (developer)
2013-04-06 13:07
edited on: 2013-04-06 13:08

> For instance, we can expose a "maker" function which only takes the "description" and optional arguments for the other fields

It occurs to me that rather than providing both of:

    mkExpr ?loc ?attr descr

    mkLet other args

we probably want to provide:

    mkLet ?loc ?attr other args

which can easily be implemented with tagged records.

frisch (developer)
2013-04-08 10:46

> This probably comes under the category of "premature optimisation"

I believe this is indeed the case!

> Why would I not just use
> Record of (Longident.t loc * expression) list

The whole discussion is about simplifying patterns like:

 Record [{txt=Lident "x"},ex; {txt=Lident "y"},ey] -> ...


 Record [Lident "x",ex; Lident "y",ex] -> ...

(or maybe even: Record ["x",ex; "y",ex] -> ... if we also support auto-focusing on a constructor)


 Record l when List.exists (function {txt=Lident "x"} -> true | _ -> false) -> ...


 Record l when List.exists (function Lident "x" -> true | _ -> false) -> ...

Worst: I tend to enable warning 9 (labels not bound in a record pattern), which makes the version with records even more tedious:

 Record [{txt=Lident "x";_},ex; {txt=Lident "y";_},ey] -> ...

> we probably want to provide:
> mkLet ?loc ?attr other args

It really depends. While it is nice to have such builder function (and this is precisely what the new Ast_helper module in the extension_points branch does), it is also useful to have access to mkExpr. Consider for instance OCaml's parser.mly: it defines a standard "mkexp" function which sets the location (using symbol_rloc()), which is then reused directly in all rules creating expressions. In extension_points, there is also a mkexp_attrs which processes "inline" attributes and extension node notions. How would that look if the expression type was defined directly as a sum type with shared arguments?

Another example would be the implementation of a term algebra with hash-consing. Each node would have a unique id which has to be assigned automatically if the "term description" cannot be found in a memoization table. We need a way to manipulate the term description (without the unique id), if only to pass it to the builder function is charge of the hash-consing.

- Issue History
Date Modified Username Field Change
2012-06-29 14:01 frisch New Issue
2012-06-29 14:01 frisch File Added: autofocus.diff
2012-06-29 14:02 frisch Description Updated View Revisions
2012-06-30 03:28 garrigue Note Added: 0007626
2012-06-30 20:21 gasche Note Added: 0007628
2012-06-30 20:37 protz Note Added: 0007629
2012-07-02 18:54 frisch Note Added: 0007630
2012-07-09 11:11 garrigue Note Added: 0007663
2012-07-09 11:14 garrigue Note Edited: 0007663 View Revisions
2012-07-17 14:33 doligez Note Added: 0007763
2012-07-17 14:35 doligez Status new => acknowledged
2012-07-17 17:26 frisch Note Added: 0007766
2012-07-17 17:26 frisch Note Edited: 0007766 View Revisions
2012-07-17 17:26 frisch Note Edited: 0007766 View Revisions
2012-07-17 18:41 gasche Note Added: 0007767
2012-07-17 18:54 frisch Note Added: 0007768
2012-07-17 19:45 gasche Note Added: 0007769
2012-07-17 21:31 lpw25 Note Added: 0007771
2012-07-18 09:29 frisch Note Added: 0007774
2012-07-18 13:16 frisch Target Version => 4.01.0+dev
2012-07-18 14:57 frisch Note Added: 0007776
2012-07-20 00:23 lefessan Note Added: 0007781
2012-07-20 09:46 frisch Note Added: 0007783
2012-07-31 13:36 doligez Target Version 4.01.0+dev => 4.00.1+dev
2012-08-02 09:26 xleroy Note Added: 0007866
2012-08-02 09:26 xleroy Target Version 4.00.1+dev => later
2012-09-15 07:18 frisch Note Added: 0008075
2012-09-15 09:22 garrigue Note Added: 0008076
2012-09-15 12:22 frisch Note Added: 0008080
2013-01-24 18:03 didc Note Added: 0008794
2013-01-25 10:43 didc Note View State: 0008794: private
2013-01-25 23:46 didc Note Deleted: 0008794
2013-04-04 15:33 frisch Note Added: 0009033
2013-04-04 17:16 lpw25 Note Added: 0009034
2013-04-04 17:22 gasche Note Added: 0009035
2013-04-04 17:22 lpw25 Note Edited: 0009034 View Revisions
2013-04-04 19:03 frisch Note Added: 0009036
2013-04-05 19:10 lpw25 Note Added: 0009040
2013-04-06 13:07 lpw25 Note Added: 0009041
2013-04-06 13:08 lpw25 Note Edited: 0009041 View Revisions
2013-04-06 13:08 lpw25 Note Edited: 0009040 View Revisions
2013-04-08 10:46 frisch Note Added: 0009044

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker