New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autofocusing pattern on record field to simplify matching trees with meta-data #5667
Comments
Comment author: @garrigue This makes sense (I suppose that many people already had this idea). In some way, a mechanism of pattern macros would be less intrusive. |
Comment author: @gasche I was just going to mention the mechanism of pattern synonyms. When Conor McBride came to Paris a few months ago, he mentioned it as an improvement he suggested for Haskell, but that would also make sense for OCaml. http://hackage.haskell.org/trac/haskell-prime/wiki/PatternSynonyms The syntax he demonstrated was of the form: pattern p x y z = Foo (x, [y], z) (In his mind, In this case, one could write: pattern apply (e1, e2) = {pmod_desc = Pmod_apply(e1, e2); ...} I think there is also value in Alain's proposition of type-directed code inference (here pattern inference), but maybe it should be a more syntactically explicit. |
Comment author: @protz FWIW, I often wished I was able to abstract on patterns. The code I'm writing right often performs the same set of patterns: match t with I'd be happy to abstract that as: pattern p (t) = A (B t) and write match t with |
Comment author: @alainfrisch Jacques: the semantics of the language already depend on the type system. For instance in module A = struct include (M : sig end) let () = print_endline "AAA" end the semantics depends on the signature constraint. One can argue that there is some kind of elaboration semantics, but then it's hard to conclude that the semantics doesn't depend on the type system. If we mark explicitly the record declaration with the field on which "auto-focusing" is enabled, and the elaboration keeps this mark "at runtime", the dynamic semantics is defined without further reference to type inference (a constructor pattern applied to a record value "reduces" to the same pattern applied to the marked field). |
Comment author: @garrigue Oops, I wrote an answer that was never sent. At the very least, I would suggest an explicit syntax:
I.e., this should be an extension of the record syntax, not of pattern matching in general. |
Comment author: @damiendoligez Alain, the common idiom that you describe is probably not the best way to add location annotations. Then you have to define a "location_of" function for every type, but we have good syntactic tools to do this kind of thing. |
Comment author: @alainfrisch Damien: the solution with intermediate record nodes to store extra information is wide-spread, and is used everywhere in the OCaml code base itself, as you know well. Adding extra arguments to all constructors requires to add extra wildcard patterns to many places, and is not very nice conceptually (if a concept applies to all kinds of nodes, it should be placed at the level of the node, not under all constructors). Jacques: (I think you miss a closing } before body_def. And {...} around the whole pattern.) Yes, an explicit syntax might be worth the little syntactic overhead, I'm not sure (the extra {...} make the pattern somehow less readable). More important to me would be to make it explicit on the record definition site which field is subject to this trick. |
Comment author: @gasche Note that the "patterns synonyms" feature would be able to handle both style (factor our the record layer, or factor out the additional location field); if applied consistently they would even allow to change from one representation to the other without having to change the pattern-matching sites. (I've mentioned this feature idea to Luc, and he seemed quite interested in it. I hope we can come up with a reasonable proposal some time after the release.) |
Comment author: @alainfrisch Gabriel: I'm eager to hear more about a proposal on pattern synonyms! Could you give an idea on how a declaration like: pattern apply (e1, e2) = {pmod_desc = Pmod_apply(e1, e2); _} would be used? Something like match e with ? This looks very nice. It'd be a little bit tedious to define an alias for each constructor, but it works. One also needs to be careful about constant alias (with no capture variable), in order to distinguish them from capture variables. (Maybe aliases should be uppercase identifiers?) One could also allow calling arbitrary function and matching the result. This is a much more complex approach, of course. (The idea would be that a pattern like: f p -> ... would call function f and match its result with p.) |
Comment author: @gasche Yes, something along those lines -- I was hoping to smuggle a currified syntax for patterns, and the use site may get some syntactic noise to avoid ambiguities, but that's the idea. Your latter proposal of applying a function (looks like view patterns) makes me suspicious. The problem with such proposals (also internalized "when" for example) is that the interaction between the pattern matching compilation and the fact of executing arbitrary user code are a very messy problem. For once, it's hard to specify if the function will be called at all, in which order and how many times. Even more problematic, the compiler may check the presence of some head constructor, then execute your function that would mutate the data by changing the head constructor, and then the optimized parameter access code would provoke a segfault. You are right that using the pattern synonyms in the present use case (locations) involve some boilerplate. You have to choose between defining patterns for each use case, as in your "apply" example, or defining a more generic pattern pattern desc p = {pmod_desc = p; pmod_loc = _loc} match e with which provides a much lesser syntactic gain (we could have essentially the same thing by choosing a shorter field name) in exchange for less boilerplate. Note, as an independent side remark, the implicit introduction of a "_loc" identifier that captures the innermost location, in the spirit of Camlp4 quotations. I think having some boilerplate is a reasonable price to pay for the use of a relatively low-tech solution; it's certainly less sophisticated than type-based pattern inference as you suggest. Hopefully that mean we have less ways to get it wrong, an maybe it could be useful in more different situations -- but I readily admit that this nested record example is so far my only practical example to motivate the feature. (The difficulties will be, I suppose, with whether we choose to expose it through module boundaries; conceptually we'd like to be able both to keep them abstract and publish only their "type" (matched type plus environment effect) or to publish their code to allow for more performant compilation. In practice I'm not sure the current pattern compilation techniques would handle abstract patterns, so they may be always-concrete for now.) |
Comment author: @lpw25 As mentioned in the comments on feature request #5528, "inline records for constructor arguments" could also be a solution to the specific problem of location fields. If the inline record constructors were allowed to share fields, perhaps with an explicit syntax like: type expr = These could be pattern matched as Foo {...} and Bar {...}, but also have their loc fields accessed directly as e.loc. |
Comment author: @alainfrisch Gabriel: your arguments about side effects while running code during pattern matching already apply to lazy patterns. But it's certainly to be taken into account. Patterns aliases are like syntactic macros; they could almost be implemented as a syntax extension (e.g. using the new -ppx flag :-)). For instance, an AST filter could detect (and remove) a declaration like: let alias {pmod_desc = Pmod_apply(e1, e2); _} = Apply (e1, e2) and use it to rewrite any pattern: Apply (p1, p2) into {pmod_desc = Pmod_apply(p1, p2); _} A deeper support in the compiler is required to get better scoping, exporting to interfaces, etc. If we do that, this is no longer a really low-tech approach; at least, it's going to be more intrusive in the language definition than my proposal. |
Comment author: @alainfrisch Another variant of my proposal, committed to branch autofocus. It introduces an explicit syntax on the record declaration: type t = { (only one field can be marked as 'match', not checked for now) When the type-checker encounters a pattern, with an expected type referencing such a record declaration with a 'match' field, and the pattern under consideration cannot be a record pattern (not a record, variable, alias, constraint, or-pattern), it focuses on the marked field. This is more explicit than the original proposal, and also slightly more general since this is not restricted to sum types. |
Comment author: @lefessan @alain: Does the order of unification matters for your patch ? If yes, shouldn't you add the restriction that 's' is not a polymorphic variable ? Otherwise, does your patch correctly computes the correct type for both 'f1' and 'f2' in this example ? type 'a ref = { match mutable content : 'a } let f1 x = |
Comment author: @alainfrisch @fabrice: yes, it does, unless you compile with -principal. The patch relies on the same type propagation technology as used for GADTs. When type-checking without -principal, f1 is rejected and f2 is accepted. In -principal mode, both as rejected, as expected. Moreover, if you add the constraint (x : 'a option ref) on the argument, all four cases are accepted. |
Comment author: @xavierleroy Changing target version, as this is a new language feature with deep implications. |
Comment author: @alainfrisch Jacques: one more point about your proposal of an explicit syntax. I guess that the proposed syntax will cause conflicts in the yacc grammar, or will require significant effort. In particular, something starting with "{A1.A2.A3." could be either (1) a normal record pattern with a qualified first label, (2) a "focused" record pattern with a sub-pattern being qualified constructor name, or (3) a "record override" construction ("A1.A2.A3." being also a prefix of several kinds of valid expressions). |
Comment author: @garrigue Alain: Yes, I know that there is already a lot of things in record syntax. By the way, my understanding was that focusing was only for patterns, |
Comment author: @alainfrisch
I don't see any real issue with semantics. For instance, in the variant of the proposal with the explicit 'match' annotation on the type declaration: when a record value is matched against a pattern of one of the authorized forms, matching continues with the designated field and the same pattern.
Indeed! |
Comment author: @alainfrisch Making it easier to pattern match on Parsetree fragments will become more and more important as we move towards -ppx based preprocessors instead of Camlp4. Of course, the proposal is not specific to the OCaml Parsetree itself and would also benefit to other AST encodings using a similar representation. Solutions which have been discussed in the thread:
I still believe that 1 is the less intrusive change to the compiler and syntax, and the most direct way to provide a good experience to people writing patterns on Parsetree fragments. I don't think it is a very deep philosophical switch on the semantical side, especially if we pick the variant with an explicit annotation on target field on the record type declaration and if we restrict focusing to the case where the pattern is a constructor. I also ok with Jacques' variant, but I'd then prefer a very light syntax. 3 and 4 require much more pervasive changes to the language definition and compiler and affect significantly the way the ASTs have to be defined (for 3, we need to define synonyms for each constructor, which mean picking new names, etc). 4 in particular does not scale well if many "satellite" (locations + attributes + ...) fields have to be added to each constructor. Moreover, doing 1 or 2 does not prevent from adding 3 or 4 to the language later. For 2, I'd suggest to use a single-character prefix operator (for instance, "&") in front of the constructor pattern, as in:
Since we're making good progress on the proposal "-ppx + extension_points", it's a good time to resurrect this discussion... Please comment! |
Comment author: @lpw25
I don't see how it would not scale: type expr = I think that these "tagged records" are a much more direct solution to the The problem with data types like Parsetree.module_expr, which have different "Tagged records" solve this problem by combining the best data layout (like |
Comment author: @gasche Just a remark (I'm not weighting in for inline records, altough I do like this approach): I'm not sure those "external" qualifiers would be necessary for runtime performance. With GADTs I've already seen cases where access to fields went without runtime tag test. I wouldn't be surprised to hear that this case (access to the first argument present in all constructors) already generates optimal code. Otherwise it could probably be recognized and optimized, provided there is a convincing argument that this is actually important for performances. |
Comment author: @alainfrisch Leo: are you concerned with the runtime performance and memory usage? Because otherwise I don't see why "tagged record" is a more appropriate data layout. I'd argue quite the contrary, as it didn't make it explicit that all constructors share some common fields. Splitting the common part in a dedicated record and restricting the variable part to a inner sum type is a better separation of concerns than flattening everything so that it fits in a single memory block. What's the "external" qualifier in your example? If it is supposed to tell the compiler that all constructors share the common part, it's kind of weird having to duplicate the field definition in all constructors and have the compiler check that they all match. If we start adding annotations to type definitions in order to help the compiler pick a good runtime representation, I'd rather do the opposite and mark the "description" field within the enclosing record as being "inlined". But I'm not concerned with performance at all here, and the current Parsetree representation definition just seems more logical to me and it allows us to write cleaner code (in my opinion). For instance, we can expose a "maker" function which only takes the "description" and optional arguments for the other fields. And a generic "map" function does not have to copy all satellite fields for each constructors. The same technique also allows better code factorization of type definitions (as in "'a loc"). How would you deal with 'a loc with inline records? You'd have to extract its location field in the enclosing data structure, turning something like: Record of (Longident.t loc * expression) list into Record of (Longident.t * Location.t * expression) list or rather introduce an extra record: Record of record_field list But then pattern matching is not really made simpler, and you cannot pass the "Longident.t loc" as a single object, thus polluting a lot of code. (Applying the same inlining trick to "Longident.t loc" does not work because inner nodes don't have location; and this would not work for "string loc".) Don't get me wrong: I'd love to get inline records (that's why I've proposed them!), including in the definition of OCaml's parsetree, but only to name constructor arguments, not to merge location and attribute fields in all constructors. I also agree that in some cases tweaking the type definition for performance reasons can be useful and that inline records can be a good tool, but this is a different topic. |
Comment author: @lpw25
It annoys me that I have to choose between easy access to shared fields and the data layout that seems most suitable. This probably comes under the category of "premature optimisation", but it is frustrating nonetheless.
I admit that the duplication of field names is not ideal. I chose that syntax as it was the smallest change from ordinary variant constructors. Maybe something like: type t = would be better (since we are going to move all type-conv extensions over to ppx :)).
This is probably a matter of taste.
Indeed, there is always a choice between combining data into a single data structure and being able to represent a "partial" data structure. This is probably orthogonal to the issue of tagged records: if some shared fields have no default do you put them in a separate sub-record or leave them inlined in a single data structure?
I'm afraid I don't really understand your point here. Why would I not just use Record of (Longident.t loc * expression) list |
Comment author: @lpw25
It occurs to me that rather than providing both of:
we probably want to provide:
which can easily be implemented with tagged records. |
Comment author: @alainfrisch
I believe this is indeed the case!
The whole discussion is about simplifying patterns like: Record [{txt=Lident "x"},ex; {txt=Lident "y"},ey] -> ... into: Record [Lident "x",ex; Lident "y",ex] -> ... (or maybe even: Record ["x",ex; "y",ex] -> ... if we also support auto-focusing on a constructor) or: Record l when List.exists (function {txt=Lident "x"} -> true | _ -> false) -> ... into: Record l when List.exists (function Lident "x" -> true | _ -> false) -> ... Worst: I tend to enable warning 9 (labels not bound in a record pattern), which makes the version with records even more tedious: Record [{txt=Lident "x";},ex; {txt=Lident "y";},ey] -> ...
It really depends. While it is nice to have such builder function (and this is precisely what the new Ast_helper module in the extension_points branch does), it is also useful to have access to mkExpr. Consider for instance OCaml's parser.mly: it defines a standard "mkexp" function which sets the location (using symbol_rloc()), which is then reused directly in all rules creating expressions. In extension_points, there is also a mkexp_attrs which processes "inline" attributes and extension node notions. How would that look if the expression type was defined directly as a sum type with shared arguments? Another example would be the implementation of a term algebra with hash-consing. Each node would have a unique id which has to be assigned automatically if the "term description" cannot be found in a memoization table. We need a way to manipulate the term description (without the unique id), if only to pass it to the builder function is charge of the hash-consing. |
Comment author: @alainfrisch I've attached a new patch (autofocus2.diff) against the current trunk, making use of attributes to mark fields that support autofocusing. type t = {desc [@Focus] : [
|
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc. |
We had this discussion almost ten years ago, but I still believe that pattern synonyms would be nicer than the proposed feature of using type information to guess an implicit (runtime-relevant) coercion. Since then I have worked more on the implementation of pattern-matching in OCaml, so maybe it would be easier to try to implement it. (But then we all still have many things to do!) |
Original bug ID: 5667
Reporter: @alainfrisch
Status: acknowledged (set by @damiendoligez on 2012-07-17T12:35:57Z)
Resolution: open
Priority: normal
Severity: feature
Target version: later
Category: typing
Monitored by: @bobzhang tgazagna @lefessan @diml @ygrek lavi @jmeber "Julien Signoles" @hcarty @Chris00 @yakobowski
Bug description
A common idiom in OCaml when defining an AST is:
i.e. wrap each node in a record type to attach some meta-data (like the location). With such a type definition, it becomes quickly cumbersome to write non-trivial patterns (e.g. to implement optimizations, or AST rewriting). This has even been used as an argument against extending some of the intermediate languages in OCaml itself with location information (to make locations available to gdb, IIRC).
One might want to look at "views" or "active patterns" as generic solutions to such problems. Here, I'd like to propose a little patch which implements the following ad hoc strategy: when a constructor pattern [p] is type-checked in a context where a record type is expected (this information is now propagated, as it is required by GADTs), and the record type has a single field [l] whose type is a variant, then the pattern is automatically wrapped into a pattern {l = p}. There are several ways to improve the cases covered by the strategy (e.g. consider only fields which are variant types that contain the specified constructor), and one could prefer to mark explicitly the record type declaration so as to enable automatically this "auto-focusing" hack (by specifying the field explicitly).
As an example, here is how a pattern matching can be simplified by this hack:
The patch is quite ad hoc, but I believe it covers a lot of concrete cases. I don't expect the patch to be integrated in its current form, but hopefully this proposal will trigger some interesting discussions.
File attachments
The text was updated successfully, but these errors were encountered: