New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using well-disciplined type-propagation to disambiguate label and constructor names #5759
Comments
Comment author: @garrigue As I told before, I have some 5 lines of code somewhere to make this work for records. |
Comment author: @alainfrisch Our code base has many instances of declarations with identical names (we currently disambiguate syntactically by prefixing with the type name, using a local hack), so it might be a good test case for a better solution. I volunteer for testing your patch! |
Comment author: @garrigue OK, I created a branch record-disambiguation, which allows field access based on types. |
Comment author: @alainfrisch
We could apply Datarepr.label_descrs. Not very clean (nor efficient), of course! It would make a lot of sense to keep the label descriptions directly in the Type_record constructor. Instead of: | Type_record of one would have: | Type_record of label_description list (and we would keep the Ident.t in lbl_name). The code structure needs to be adapted a little bit, to create the type declaration and the labels (/constructors) at the same time, but I don't see any big problem to do so.
It is a common case in our code base... I was actually thinking about issuing a warning when an "ambiguous" label or constructor is used without enough type information flowing in to disambiguate it. (Depending on the order of type declarations is a bad idea.) |
Comment author: @gasche I'm not fond of Jacques' type-based implicit module qualification. I don't really see myself explaining to a surprised beginner reading some code "yeah, in this case you don't need to give the module path, because it was somehow inferred". The nice thing with Alain's proposed semantics is that we're not adding yet another different way to do the same thing, but only using type information to enable a new feature that requires it. I can see myself explaining to a beginner "yeah, you should really avoid having duplicate label names available in the same environment, but if you do you can put annotations to disambiguate" (aside: Agda has type-based disambiguation of constructor names and its users are quite fond of it). I wouldn't personally use that feature much, but if it is considered I strongly support the use of a warning in all ambiguous cases. Finally, I would like insist on the fact that I don't expect beginners to really understand how type information is propagated, and to adopt an optimistic model of "expect it to work and, if not, add type annotations". This means that what you think of as "if type information is available and is M.t, I can prefix the constructor with the path M" is likely to become, in the mind of the beginner, "I can access constructors under any path M implicitly". |
Comment author: @garrigue Gabriel, did you read my note? Concerning the semantics, I have given it some thought over many years, and the catch is of course that we must keep backward compatibility. |
Comment author: @gasche I did indeed read your note, but understood it as "I implemented a first desirable feature, whose more ambitious refinement will have to wait for a larger code change". Sorry for this misunderstanding.
Indeed, this is yet another issue. Writing M.(r.f) feels equivalent to r.M.f to the user, but the first is ambiguous and not the second. Or you could choose a different visibility behavior, where opening a module still shadows the previous definition(s) in scope, and the only ambiguity is between two labels defined in the same module. Users would then to have module-qualification to distinguish between modules, and type-annotation between in-module label reuses. Not sure if this is really intuitive, and doesn't cover Alain's use case of allowing name reuse between parsedtree and typedtree. |
Comment author: @alainfrisch Jacques: I had to add a call to Btype.repr in Env.ty_path, to have 'make world' succeed in the branch (Env.ty_path gets called with a "Tlink(Tconstr("Odoc_types.info",[],[]))"). I thought that lbl_res would always be a Tconstr node. Do you understand why the Btype.repr is now needed? |
Comment author: @garrigue I have now committed a version that does "the right thing", by keeping the required information in the environment. Hopefully there is still no need to bootstrap. The fix to the Tlink problem is the last commit. --- typing/typecore.ml (revision 12943)
|
Comment author: @garrigue I have just committed in record-disambiguation@12945 a complete solution, which supports field access, field mutation, record construction and pattern-matching. Note that combining two strategies under the same syntax can result in a slightly complex specification.
See examples in tests/typing-misc/records.ml. Variant constructors are not yet supported. |
Comment author: @alainfrisch Great! I'll have a look at it soon (not next week, though).
I think it would be better to look for the "last" defined type which contains all the labels (and give a warning if there are several such types). This is what we do in our version, and it only accepts more programs. It's already a good (syntactic) disambiguation scheme (for records; this doesn't apply so easily to sum types). |
Comment author: @garrigue Actually the other option I was thinking of was just to revert to the original behavior when there was no type or qualifier. I do not like much looking up types according to multiple labels. This looks robust, but one needs a completely new notion of multi-key lookup. Moreover, if we start having it for record creation and pattern-matching, why not have it for field access (with delayed resolution) too ? This is doable, but this is a real language change, as we need polymorphic records during typing. |
Comment author: @alainfrisch
The can be implemented with the current environment, using a find_all on the first label to get a list of all matching record types, and then filtering (and issuing a warning if several record types remain). The extension to field access seems much more complex, I'm not proposing such a language change (especially because disambiguation based on type information gives a simple solution). FWIW, we rarely use field accesses. Since the introduction of record punning and the warning on "non exhaustive" record patterns, we have good incentives to rely almost exclusively on record patterns. |
Comment author: @alainfrisch There is a bad interaction with private types. The following piece of code: module rec X : sig let () = ignore {x = 2} fails with: Error: Cannot create values of the private type X.t |
Comment author: @gasche tests/typing-private has no testcase with recursive module (typing-recmod/t16ok.ml has a private type but it's not used in a recursive module either). Maybe you could add this one to the test suite? |
Comment author: @alainfrisch One way to fix this issue is to restrict Mtype.enrich_typedecl: ===================================================================
=================================================================== I'm not totally convinced by this patch, because:
|
Comment author: @alainfrisch Dual problem: while in trunk the following is rejected: type t = {x: int} this is now accepted in the branch. Actually, one could argue that the definition of s should be rejected (it is accepted in the trunk and in the branch), because the manifest type is more public than the declaration itself. |
Comment author: @garrigue The original code looks strange... but maybe this is for recursive modules. |
Comment author: @alainfrisch Another problem: unbound labels raise an uncaught Not_found error (e.g. compiling the program "{x=0}"). What about calling Typetexp.find_label instead of Env.lookup_label in the case 2) of type_label_a_list? |
Comment author: @alainfrisch I've uploaded patch_use_all_fields.diff, which use all fields in record expressions and patterns to look for a type (for expressions, it looks for a record type with exactly the listed labels; for patterns, it looks for a record type with at least the listed labels). Note that we don't need to change the environment to implement this. The multi-label lookup loops over all types declaring the first label, and then finds the first one with the expected remaining labels. It might be slightly inefficient if users creates hundreds of types sharing the same labels. At LexiFi, we have been using this multi-label lookup for several years and we are quite happy about it. It is not clear to me how important it remains in presence of type-based disambiguation, but it might well be a useful addition (especially effective when using record patterns instead of the dot notation, as we do most of the time). An interesting variation could be to use the subset semantics for record pattern, only when the pattern explicitly allows for more fields ("; _"). Unfortunately, this breaks backward compatibility. |
Comment author: @alainfrisch One could even argue that the multi-label lookup is better behaved. Using the type of the first label and then type-checking the whole record with this expected type makes type-checking dependent of the order of fields in the source code (even in -principal mode), which is not very nice. |
Comment author: @garrigue Fixed the uncaught Not_found. Concerning the problem with recursive modules, the situation is more complex. Now, if we start using the expanded type, we run into problems, because its labels may have a different status. Also, your modification to enrich_typedecl is wrong, because it would result in removing equations between internal and external types, which are needed to solve (partially) the double vision problem. |
Comment author: @alainfrisch Shouldn't we always apply Ctype.repr when matching lbl_res (or the application of instance to lbl_res)? I've got an assertion failure in type_label_a_list after merging your branch on our local version (so I cannot directly reproduce it), because ty_res.desc was not a Tconstr (calling Ctype.repr solved it). |
Comment author: @alainfrisch I believe the call to Env.find_type in Typecore.expand_path should be protected. It can raise a Not_found exception which would be left uncaught, if a .cmi file is missing. Example: bug.ml: type t = {x:int} now compile bug.ml, bug2.ml, remove bug.cmi, and compile bug3.ml --> Not_found. |
Comment author: @alainfrisch After adding a call to Ctype.repr in type_label_a_list and protecting the call to Env.find_type in expand_path, I'm able to compile all our code base with a local merge of this branch. Hurrah! |
Comment author: @garrigue OK, I protected the call to Env.find_type. Concerning lbl_res, I think this is an instance of my comment 8133: there is an invariant that lbl_res should be a Tconstr, not a Tlink, but it can be easily broken if you do anything on lbl_res without taking an instance, in particular calling expand_head. Finally, the discussion about how to resolve labels in the absence of type annotation. My idea was just a small improvement over the current approach: use the first label Your idea of using the labels is more powerful, but it is a change of vision compared with |
Comment author: @alainfrisch
FWIW, we use this ability to compile with missing cmi's quite a lot in our code base, to protect internal data structures of libraries (and make them transparent only in selected parts of the code). We did not find any problem (like uncaught Not_found exceptions) with this approach. Most of calls to Env.find_type are directly protected with a Not_found handler. Maybe it would be better to have it return an optional type.
This is indeed backward compatible, but it breaks the property that (at least in -principal mode) type-checking does not depend on the order of fields in the record expressions/patterns. Multi-label lookup is also backward compatible and it preserves this property. Also, it fits nicely with the new overall strategy to type check records (see my diff for type_label_a_list: just a single line change).
Yes, indeed. (Although I believe that qualified labels are much less useful in presence of type-based resolution.)
The main change of vision is to resolve labels from types rather than from the environment. Now the question is how the type is deduced from the labels if no explicit type information is available. Switching from the rule "Look for the last type declaration in the current scope which defines the first label listed in the record" to "Look for the last type declaration in the current scope which defines all the labels listed in the record" does not seem such a big change of vision.
What would this simple analysis be? I don't see how to do something simple, predictible, and robust to minor code changes.
Maybe. I won't argue too much for inclusion in OCaml, but I guess we'll keep the multi-label lookup internally, even in presence of type-based selection (not only for backward compatibility of our code, but also to preserve the property that the order of fields does not matter). |
Comment author: @alainfrisch Added a patch to add type-based disambiguation for constructors. |
Comment author: @lpw25 Personally, I find the proposed semantics a bit awkward, and potentially confusing. The problem is that it attempts to find the label from the type information (ignoring the environment) and then falls back on using the environment. This means that removing a type annotation can lead to an "Unknown label/constructor" error, which I think most programmers would find very confusing. An alternative semantics would be to allow the environment to contain multiple labels/constructors with the same name, and try to use the type information to disambiguate them. If they can't be disambiguated by type information then the most recently declared is chosen. The main practical difference between the two semantics is that the alternative semantics would not allow an unqualified reference to a label or constructor that is not present in the environment. This means that changing type information can only lead to a type error, not an "Unknown label" error. I have attached a patch which implements the alternative semantics (including variant constructors, and disambiguation using the other fields in a record). Alain, do these semantics cover the use cases in your code base? And can your code base be compiled using my patch? |
Comment author: @garrigue The original patch from years ago was only about disambiguating the dot notation for record access. Now, when we start combining type information with other kinds of disambiguation mechanisms I agree this becomes hairy, and I'm not really convinced. |
Comment author: @bobzhang alain, if this is pushed to trunk, will anyone to rename parsetree, typetree to make the name more consistent? I am glad to see the changes ;-) |
Comment author: @garrigue Alain, I'm just waiting a bit to be sure that everybody is satisfied. |
Comment author: @gasche I would personally appreciate if the behavior respected the symmetry between record construction and pattern-matching on variant constructors; currently, both the {Foo.Bar.l1 = ...; l2 = ...} and the multi-name lookup are reserved to labels and could be extended to variant matching. I'm planning to consider writing patches for those and propose them to submission. I don't think you should delay trunk-merging of the current patchset because of this. Besides, because of the implementation difficulties, it may be best in any case if those features land in a second step, rather than bundled with the current patchset. |
Comment author: @garrigue Gabriel, I'm very reticent about doing multi-label lookup for sums. |
Comment author: @alainfrisch Does anyone have an opinion about allowing type-information to flow from the bound expression to the pattern (instead of the other way around) in a construction like "let {x; y} = foo in ..."? |
Comment author: @gasche
Indeed, but I would feel better doing some work on my side first before drawing you into an argument. Alain > given that the type-information propagation is a part of the observable (and therefore which-should-be-specified) static semantics of the language, I'd rather have a specification as clear as possible. Can you come up with a good spec that is good for you, that Jacques is ready to implement? I would personally be ok with a backtracking semantics if it allows to restore the symmetry, but... |
Comment author: @alainfrisch
My hope was that Jacques would come with such a proposal :-) I would actually be rather comfortable with type-checking: let p = e in ... as match e with p -> ... plus the expected generalization. The only case where I really want type-information to flow from the lhs to the rhs is the explicit type annotation: let p : t = e in ... but it's rather straightforward to translate that to: let p = (e : t) in ... and IIRC, this is already what the parser or type-checker does. I did not think about recursive bindings yet, maybe there are cases where it's really important to type patterns first. |
Comment author: @garrigue Actually, there is no theory yet about propagating type information into patterns... |
Comment author: @garrigue Thinking a bit more about the handling of let.
|
Comment author: @alainfrisch
There is some complexity in type_let, but only to support recursive definitions. Otherwise, this is quite simple. I think that different warnings are raised for unused bindings arising from let and from match, but the behavior is the same otherwise, and the code is prepared to be parametrized by the kind of warning to raise. Anyway, if you do the rest of the work, I promise to adapt the warnings :-) |
Comment author: @lpw25
This seems reasonable for unqualified labels, however I disagree for qualified labels. I think that the examples below both show errors that are easier to understand when the error message includes the type of the label. I also think that the Wrong_name error implies that the label was looked for in the type itself, which is not true for qualified labels. I would rather the error better reflected what the typechecker is actually doing for qualified labels. It seems to me that we should probably add a new error, for cases where a qualified label/constructor does not have the expected type. Here is an example of the new error in practice: module N = struct module M = struct type foo = Foo end end;;module N : sig module M : sig type foo = Foo end end module M = struct type foo = Foo end;;module M : sig type foo = Foo end type foo2 = M.foo;;type foo2 = M.foo open N;;let r : foo2 = M.Foo;;Characters 15-20: Here is another example, this time with an ambiguous constructor: module M = struct type foo = Foo end;;module M : sig type foo = Foo end type foo2 = M.foo;;type foo2 = M.foo module M = struct type foo = Foo type bar = Foo end;;module M : sig type foo = Foo type bar = Foo end let r : foo2 = M.Foo;;Characters 15-20: I think that it is much clearer from these error messages what is going on. It is clear that the type checker looked up the constructor, found one or two types that it might have belonged to, but they didn't match the type that was expected. In the better-errors.diff patch I also made some changes to the wording of errors/warnings. In particular, I improved the new warning messages and I replaced all uses of "label" with "field" since that is what they are called in the OCaml manual. I think these changes are probably worth including. Also, since Alain fixed the typo in env.ml, I don't think we still need the call to mark_type_used in typecore.ml that your comment describes as strange. |
Comment author: @gasche While you're fixing typos in passing, there is a minuscule one at line 658 of typecore.ml: 655: let labels' = List.filter check_ids labels in The "&" should be "&&". |
Comment author: @garrigue OK, I've merged new-error.diff, and switched from label to field (also removing other uses of label for records). (I was not aware of the wording in the manual, and the source code only uses "label"...) I also removed the useless line (I remember checking that it was no longer needed, but clearly I forgot to commit that). And changed & into && (which is strictly equivalent...) I didn't change the warning. Do you really think it makes a difference? |
Comment author: @lpw25
Not particularly. The warnings can always be reworded later if people don't understand them. I think all the warnings and errors are in good shape now. I look forward to seeing this extension in the trunk. |
Comment author: @garrigue The branch record-disambiguation was merged into trunk on 2012-12-06, at revision 13112. |
Comment author: @bobzhang Shall we add an explicit warning here when such fancy label resolving technique is applied? My concern is that it's hard to write code which is compatible with 4.00 or elder compilers. Thanks |
Comment author: @alainfrisch We generally aim at preserving backward compatibility (not breaking code accepted by a previous version of the compiler, except for good reasons). If you want to write code which can be compiled with both an old and a recent version of OCaml, you should thus use the old version during development. I don't think we want to add new warning for each new language feature or improvement to the type-inference introduced in newer releases. |
Comment author: @bobzhang Using the old compiler seems to reasonable, but not practical. I TA a compiler class using ocaml, but some students used a very old compiler 3.11. The problem with this new feature is that it is easy to be used unintentionally, unlike other features (let open XX in...). Thanks |
Comment author: @garrigue Added warning 42 for disambiguated labels and constructors, as requested by hongboz. |
Original bug ID: 5759
Reporter: @alainfrisch
Assigned to: @garrigue
Status: resolved (set by @garrigue on 2012-12-07T14:31:22Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.01.0+dev
Fixed in version: 4.01.0+dev
Category: typing
Related to: #6000 #7386
Child of: #6951
Parent of: #5525 #5848 #6784
Monitored by: @bobzhang @gasche @diml @ygrek @jmeber @hcarty @alainfrisch
Bug description
Now that the type-checker has a well-disciplined way to propagate type information (with a way to ensure principality), one should allow explicit type information to disambiguate constructor and label names. This seems to be a low-hanging fruit.
Not being to use in a given context two types sharing some name is an annoying issue with the language, with several workarounds but no satisfactory solution. Being able to share labels has become more important with the introduction of punning. Moreover, having nice short names for AST constructors (in the compiler itself) has become more important with -bin-annot and -ppx, since people are actually going to pattern match and build AST fragments. Later, if we add some "runtime type" features, concrete names used in the type declaration will affect the semantics of programs (e.g. by being directly visible in textual representation of values), making the "prefixing" work-around even less appealing.
File attachments
The text was updated successfully, but these errors were encountered: