Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using well-disciplined type-propagation to disambiguate label and constructor names #5759

Closed
vicuna opened this issue Sep 15, 2012 · 114 comments
Closed
Assignees
Milestone

Comments

@vicuna
Copy link

vicuna commented Sep 15, 2012

Original bug ID: 5759
Reporter: @alainfrisch
Assigned to: @garrigue
Status: resolved (set by @garrigue on 2012-12-07T14:31:22Z)
Resolution: fixed
Priority: normal
Severity: minor
Target version: 4.01.0+dev
Fixed in version: 4.01.0+dev
Category: typing
Related to: #6000 #7386
Child of: #6951
Parent of: #5525 #5848 #6784
Monitored by: @bobzhang @gasche @diml @ygrek @jmeber @hcarty @alainfrisch

Bug description

Now that the type-checker has a well-disciplined way to propagate type information (with a way to ensure principality), one should allow explicit type information to disambiguate constructor and label names. This seems to be a low-hanging fruit.

Not being to use in a given context two types sharing some name is an annoying issue with the language, with several workarounds but no satisfactory solution. Being able to share labels has become more important with the introduction of punning. Moreover, having nice short names for AST constructors (in the compiler itself) has become more important with -bin-annot and -ppx, since people are actually going to pattern match and build AST fragments. Later, if we add some "runtime type" features, concrete names used in the type declaration will affect the semantics of programs (e.g. by being directly visible in textual representation of values), making the "prefixing" work-around even less appealing.

File attachments

@vicuna
Copy link
Author

vicuna commented Sep 15, 2012

Comment author: @garrigue

As I told before, I have some 5 lines of code somewhere to make this work for records.
So this is only a political decision :-)
For pattern-matching, this is a little bit more complicated, due to the interaction
with polymorphic variant inference.
I need to find a solution for that, as this causes trouble with GADTs too.

@vicuna
Copy link
Author

vicuna commented Sep 17, 2012

Comment author: @alainfrisch

Our code base has many instances of declarations with identical names (we currently disambiguate syntactically by prefixing with the type name, using a local hack), so it might be a good test case for a better solution. I volunteer for testing your patch!

@vicuna
Copy link
Author

vicuna commented Sep 19, 2012

Comment author: @garrigue

OK, I created a branch record-disambiguation, which allows field access based on types.
The patch is attached to this PR.
There is an important limitation: this patch only uses the type to determine the module in which the type was defined, and prefix the (unqualified) label with this module path. If the label is defined in several types in the same module, this may well fail.
Probably the cleanest behavior would be to obtain the label information from the type itself. However, it is not currently available in the environment (we would need a labels_by_path, similar to constrs_by_path). Additionally, one may question the wisdom of allowing to define labels such that only explicit type annotations can select them.

@vicuna
Copy link
Author

vicuna commented Sep 19, 2012

Comment author: @alainfrisch

obtain the label information from the type itself

We could apply Datarepr.label_descrs. Not very clean (nor efficient), of course! It would make a lot of sense to keep the label descriptions directly in the Type_record constructor. Instead of:

| Type_record of
(Ident.t * mutable_flag * type_expr) list * record_representation

one would have:

| Type_record of label_description list

(and we would keep the Ident.t in lbl_name). The code structure needs to be adapted a little bit, to create the type declaration and the labels (/constructors) at the same time, but I don't see any big problem to do so.

allowing to define labels such that only explicit type annotations can select them

It is a common case in our code base...

I was actually thinking about issuing a warning when an "ambiguous" label or constructor is used without enough type information flowing in to disambiguate it. (Depending on the order of type declarations is a bad idea.)

@vicuna
Copy link
Author

vicuna commented Sep 19, 2012

Comment author: @gasche

I'm not fond of Jacques' type-based implicit module qualification. I don't really see myself explaining to a surprised beginner reading some code "yeah, in this case you don't need to give the module path, because it was somehow inferred".
Besides, we already have good features to alleviate module access, namely local open, with a compact syntax, so I think you could do something equally convenient, less surprising and more explicit. It may be worth it to look at situations where current local open is not satisfying: does it work in patterns, for example? Or maybe the current state of "declaring a module alias has a heavy semantics with unsavory consequences" is a problem for conveniently short module names, and could be improved.

The nice thing with Alain's proposed semantics is that we're not adding yet another different way to do the same thing, but only using type information to enable a new feature that requires it. I can see myself explaining to a beginner "yeah, you should really avoid having duplicate label names available in the same environment, but if you do you can put annotations to disambiguate" (aside: Agda has type-based disambiguation of constructor names and its users are quite fond of it). I wouldn't personally use that feature much, but if it is considered I strongly support the use of a warning in all ambiguous cases.

Finally, I would like insist on the fact that I don't expect beginners to really understand how type information is propagated, and to adopt an optimistic model of "expect it to work and, if not, add type annotations". This means that what you think of as "if type information is available and is M.t, I can prefix the constructor with the path M" is likely to become, in the mind of the beginner, "I can access constructors under any path M implicitly".
This suggests a very different model of ambiguity in the case of Jacque's feature: if you write the constructor/field K without a path, the implementor point of view is "that can denote any constructor K currently in scope or, if type information is available and shows module M, M.K", but the user point of view is likely to be "it can denote any constructor K available in the environment under any module path". Under this view, I think it is unreasonable to combine both Alain's and Jacques' suggested behavior, because there should be an ambiguity (which ought to result in a warning) as soon as two arbitrarily deep submodules share the same constructor name.

@vicuna
Copy link
Author

vicuna commented Sep 20, 2012

Comment author: @garrigue

Gabriel, did you read my note?
I was talking about a quick implementation explicitly stating that the behavior was not as thorough as expected.
The reason is that using directly type information in a clean way requires changing the environment data structure and/or type representation, which in turn requires bootstrapping, which is heavy for a very first prototype.
But it is clearly the way to go, as label shadowing would make understanding the behavior complicated (note that the current module qualification mechanism is already broken in this respect, it would be better to fix it too).

Concerning the semantics, I have given it some thought over many years, and the catch is of course that we must keep backward compatibility.
When type information is available, we can use it and completely ignore where the label is defined.
However, when it is not, or when there are qualified labels, we need to look them up in the environment, using the current rule.
As you point out, combining the two mechanism may not be very intuitive, and this makes error reporting hard (we don't know the programmer's intent).
Fortunately, the worse think that may happen is a type error.
Concerning ambiguity detection, this is another problem.
This requires a way to keep track of all unqualified labels defined in the environment, but Ident.keys seems to provide that already, so this is actually not too difficult.
This would probably have to be yet another optional warning, since people using the old approach intentionally open modules to use a specific label definition, hiding previous ones.

@vicuna
Copy link
Author

vicuna commented Sep 20, 2012

Comment author: @gasche

I did indeed read your note, but understood it as "I implemented a first desirable feature, whose more ambitious refinement will have to wait for a larger code change". Sorry for this misunderstanding.

Concerning ambiguity detection, this is another problem.
This would probably have to be yet another optional warning,
since people using the old approach intentionally open modules
to use a specific label definition, hiding previous ones.

Indeed, this is yet another issue. Writing M.(r.f) feels equivalent to r.M.f to the user, but the first is ambiguous and not the second. Or you could choose a different visibility behavior, where opening a module still shadows the previous definition(s) in scope, and the only ambiguity is between two labels defined in the same module. Users would then to have module-qualification to distinguish between modules, and type-annotation between in-module label reuses. Not sure if this is really intuitive, and doesn't cover Alain's use case of allowing name reuse between parsedtree and typedtree.

@vicuna
Copy link
Author

vicuna commented Sep 20, 2012

Comment author: @alainfrisch

Jacques: I had to add a call to Btype.repr in Env.ty_path, to have 'make world' succeed in the branch (Env.ty_path gets called with a "Tlink(Tconstr("Odoc_types.info",[],[]))"). I thought that lbl_res would always be a Tconstr node. Do you understand why the Btype.repr is now needed?

@vicuna
Copy link
Author

vicuna commented Sep 21, 2012

Comment author: @garrigue

I have now committed a version that does "the right thing", by keeping the required information in the environment. Hopefully there is still no need to bootstrap.

The fix to the Tlink problem is the last commit.
One should never call expand_head directly on type definitions, I suppose.
It should also apply to the previous patch (but at a different location).

--- typing/typecore.ml (revision 12943)
+++ typing/typecore.ml (working copy)
@@ -2360,7 +2360,8 @@
let record = {record with exp_type = instance env record.exp_type} in
begin try
let (label_path,label) = Env.lookup_label lid.txt env in

  •  match ty_exp.desc, (expand_head env label.lbl_res).desc with
    
  •  let ty_res = instance Env.empty label.lbl_res in
    
  •  match ty_exp.desc, (expand_head env ty_res).desc with
       Tconstr(p1,_,_), Tconstr(p2,_,_) when not (Path.same p1 p2) ->
         raise Exit
     | _ -> (record, label_path, label)
    

@vicuna
Copy link
Author

vicuna commented Sep 21, 2012

Comment author: @garrigue

I have just committed in record-disambiguation@12945 a complete solution, which supports field access, field mutation, record construction and pattern-matching.

Note that combining two strategies under the same syntax can result in a slightly complex specification.
The worst case is record construction:

  • we first check if there is a qualified field name, and use all the labels in its type to handle unqualified fields
  • otherwise we use the labels from the type propagated from the context, either by a type annotation or a source expression (with construct)
  • if there is no type from the context, we use finally the type from the first unqualified label
    The last case could be discussed, as it means that label order has an impact on typability.
    Add to that principality checks, where the warning should be disabled if there is another way to reach the same result...

See examples in tests/typing-misc/records.ml.
Hopefully error messages are clear enough that the semantics is not too confusing.

Variant constructors are not yet supported.
There is also an interaction with polymorphic variants for pattern-matching, which results in types from the context being ignored if a polymorphic variant appears in one of the patterns.

@vicuna
Copy link
Author

vicuna commented Sep 22, 2012

Comment author: @alainfrisch

Great! I'll have a look at it soon (not next week, though).

  • if there is no type from the context, we use finally the type from the first unqualified label

I think it would be better to look for the "last" defined type which contains all the labels (and give a warning if there are several such types). This is what we do in our version, and it only accepts more programs. It's already a good (syntactic) disambiguation scheme (for records; this doesn't apply so easily to sum types).

@vicuna
Copy link
Author

vicuna commented Sep 24, 2012

Comment author: @garrigue

Actually the other option I was thinking of was just to revert to the original behavior when there was no type or qualifier.
I only implemented lookup of the first label because it was simpler...

I do not like much looking up types according to multiple labels. This looks robust, but one needs a completely new notion of multi-key lookup. Moreover, if we start having it for record creation and pattern-matching, why not have it for field access (with delayed resolution) too ? This is doable, but this is a real language change, as we need polymorphic records during typing.

@vicuna
Copy link
Author

vicuna commented Sep 24, 2012

Comment author: @alainfrisch

but one needs a completely new notion of multi-key lookup

The can be implemented with the current environment, using a find_all on the first label to get a list of all matching record types, and then filtering (and issuing a warning if several record types remain).

The extension to field access seems much more complex, I'm not proposing such a language change (especially because disambiguation based on type information gives a simple solution). FWIW, we rarely use field accesses. Since the introduction of record punning and the warning on "non exhaustive" record patterns, we have good incentives to rely almost exclusively on record patterns.

@vicuna
Copy link
Author

vicuna commented Oct 16, 2012

Comment author: @alainfrisch

There is a bad interaction with private types. The following piece of code:

module rec X : sig
type t = private {x: int}
end = struct
type t = {x: int}

let () = ignore {x = 2}
end

fails with:

Error: Cannot create values of the private type X.t

@vicuna
Copy link
Author

vicuna commented Oct 16, 2012

Comment author: @gasche

tests/typing-private has no testcase with recursive module (typing-recmod/t16ok.ml has a private type but it's not used in a recursive module either). Maybe you could add this one to the test suite?

@vicuna
Copy link
Author

vicuna commented Oct 16, 2012

Comment author: @alainfrisch

One way to fix this issue is to restrict Mtype.enrich_typedecl:

===================================================================
--- typing/mtype.ml (revision 13019)
+++ typing/mtype.ml (working copy)
@@ -152,6 +152,7 @@
try
let orig_decl = Env.find_type p env in
if orig_decl.type_arity <> decl.type_arity

  •  || orig_decl.type_private <> decl.type_private
       then decl
       else {decl with type_manifest =
               Some(Btype.newgenty(Tconstr(p, decl.type_params, ref Mnil)))}
    

===================================================================

I'm not totally convinced by this patch, because:

  1. I don't understand why the problem did not appear before, e.g. for mutable field assignments.

  2. I don't know if the fix would break existing code.

@vicuna
Copy link
Author

vicuna commented Oct 16, 2012

Comment author: @alainfrisch

Dual problem: while in trunk the following is rejected:

type t = {x: int}
type s = t = private {x: int}
let () = ignore {x = 2}

this is now accepted in the branch. Actually, one could argue that the definition of s should be rejected (it is accepted in the trunk and in the branch), because the manifest type is more public than the declaration itself.

@vicuna
Copy link
Author

vicuna commented Oct 17, 2012

Comment author: @garrigue

The original code looks strange... but maybe this is for recursive modules.
Give me one more day before I look into it.

@vicuna
Copy link
Author

vicuna commented Oct 17, 2012

Comment author: @alainfrisch

Another problem: unbound labels raise an uncaught Not_found error (e.g. compiling the program "{x=0}"). What about calling Typetexp.find_label instead of Env.lookup_label in the case 2) of type_label_a_list?

@vicuna
Copy link
Author

vicuna commented Oct 17, 2012

Comment author: @alainfrisch

I've uploaded patch_use_all_fields.diff, which use all fields in record expressions and patterns to look for a type (for expressions, it looks for a record type with exactly the listed labels; for patterns, it looks for a record type with at least the listed labels).

Note that we don't need to change the environment to implement this. The multi-label lookup loops over all types declaring the first label, and then finds the first one with the expected remaining labels. It might be slightly inefficient if users creates hundreds of types sharing the same labels.

At LexiFi, we have been using this multi-label lookup for several years and we are quite happy about it. It is not clear to me how important it remains in presence of type-based disambiguation, but it might well be a useful addition (especially effective when using record patterns instead of the dot notation, as we do most of the time). An interesting variation could be to use the subset semantics for record pattern, only when the pattern explicitly allows for more fields ("; _"). Unfortunately, this breaks backward compatibility.

@vicuna
Copy link
Author

vicuna commented Oct 17, 2012

Comment author: @alainfrisch

One could even argue that the multi-label lookup is better behaved. Using the type of the first label and then type-checking the whole record with this expected type makes type-checking dependent of the order of fields in the source code (even in -principal mode), which is not very nice.

@vicuna
Copy link
Author

vicuna commented Oct 18, 2012

Comment author: @garrigue

Fixed the uncaught Not_found.

Concerning the problem with recursive modules, the situation is more complex.
The reason the problem didn't appear before is that we didn't try to expand the type of a label before using it for lookup. So the status private/public depended on the label itself, not the type it belongs to.

Now, if we start using the expanded type, we run into problems, because its labels may have a different status.
The right thing to do is to expand just until we can see the labels, and no further.
I implemented that in the branch.

Also, your modification to enrich_typedecl is wrong, because it would result in removing equations between internal and external types, which are needed to solve (partially) the double vision problem.
(The arity case is probably for bootstrapping recursive modules, starting with parameterless types.)

@vicuna
Copy link
Author

vicuna commented Oct 18, 2012

Comment author: @alainfrisch

Shouldn't we always apply Ctype.repr when matching lbl_res (or the application of instance to lbl_res)? I've got an assertion failure in type_label_a_list after merging your branch on our local version (so I cannot directly reproduce it), because ty_res.desc was not a Tconstr (calling Ctype.repr solved it).

@vicuna
Copy link
Author

vicuna commented Oct 18, 2012

Comment author: @alainfrisch

I believe the call to Env.find_type in Typecore.expand_path should be protected. It can raise a Not_found exception which would be left uncaught, if a .cmi file is missing. Example:

bug.ml: type t = {x:int}
bug2.ml: type t = Bug.t = {x:int}
bug3.ml: open Bug2;; let f ({x} : t) = x

now compile bug.ml, bug2.ml, remove bug.cmi, and compile bug3.ml --> Not_found.

@vicuna
Copy link
Author

vicuna commented Oct 18, 2012

Comment author: @alainfrisch

After adding a call to Ctype.repr in type_label_a_list and protecting the call to Env.find_type in expand_path, I'm able to compile all our code base with a local merge of this branch. Hurrah!

@vicuna
Copy link
Author

vicuna commented Oct 19, 2012

Comment author: @garrigue

OK, I protected the call to Env.find_type.
There are probably other places where missing cmi's break the compiler...

Concerning lbl_res, I think this is an instance of my comment 8133: there is an invariant that lbl_res should be a Tconstr, not a Tlink, but it can be easily broken if you do anything on lbl_res without taking an instance, in particular calling expand_head.
I think this is important to keep this invariant, otherwise lbl_res could end up pointing to another (equivalent) path, which may have an impact with recursive modules.

Finally, the discussion about how to resolve labels in the absence of type annotation.
This is problematic, because we want something simple, and backward compatible, knowing
that there are already two ways to lookup labels: direct, or using a module path qualifier.

My idea was just a small improvement over the current approach: use the first label
(or qualified label if there is one) to determine the record type, and then lookup the labels
in this record. The idea was just to have a more homogeneous behavior with the
annotated case: always lookup individual labels from the type rather than from
the environment.

Your idea of using the labels is more powerful, but it is a change of vision compared with
the current approach. It practice it may be nicer, but then why stop there, and not do like
in Caml heavy, choosing the record type according to all label uses? (Basing it on type inference
would be problematic, but a simple analysis might be sufficient.)
Also, it is important to support the qualified case in your approach too, which means
also doing lookup for qualified labels.
In the end don't we end up being too clever, while there is an easy way to disambiguate,
using a type annotation?

@vicuna
Copy link
Author

vicuna commented Oct 19, 2012

Comment author: @alainfrisch

There are probably other places where missing cmi's break the compiler...

FWIW, we use this ability to compile with missing cmi's quite a lot in our code base, to protect internal data structures of libraries (and make them transparent only in selected parts of the code). We did not find any problem (like uncaught Not_found exceptions) with this approach.

Most of calls to Env.find_type are directly protected with a Not_found handler. Maybe it would be better to have it return an optional type.

My idea was just a small improvement over the current approach: use the first label (or qualified label if there is one) to determine the record type, and then lookup the labels in this record.

This is indeed backward compatible, but it breaks the property that (at least in -principal mode) type-checking does not depend on the order of fields in the record expressions/patterns. Multi-label lookup is also backward compatible and it preserves this property. Also, it fits nicely with the new overall strategy to type check records (see my diff for type_label_a_list: just a single line change).

Also, it is important to support the qualified case in your approach too, which means also doing lookup for qualified labels.

Yes, indeed. (Although I believe that qualified labels are much less useful in presence of type-based resolution.)

Your idea of using the labels is more powerful, but it is a change of vision compared with the current approach.

The main change of vision is to resolve labels from types rather than from the environment. Now the question is how the type is deduced from the labels if no explicit type information is available. Switching from the rule "Look for the last type declaration in the current scope which defines the first label listed in the record" to "Look for the last type declaration in the current scope which defines all the labels listed in the record" does not seem such a big change of vision.

It practice it may be nicer, but then why stop there, and not do like
in Caml heavy, choosing the record type according to all label uses? (Basing it on type inference would be problematic, but a simple analysis might be sufficient.)

What would this simple analysis be? I don't see how to do something simple, predictible, and robust to minor code changes.

In the end don't we end up being too clever, while there is an easy way to disambiguate, using a type annotation?

Maybe. I won't argue too much for inclusion in OCaml, but I guess we'll keep the multi-label lookup internally, even in presence of type-based selection (not only for backward compatibility of our code, but also to preserve the property that the order of fields does not matter).

@vicuna
Copy link
Author

vicuna commented Oct 19, 2012

Comment author: @alainfrisch

Added a patch to add type-based disambiguation for constructors.

@vicuna
Copy link
Author

vicuna commented Oct 22, 2012

Comment author: @lpw25

Personally, I find the proposed semantics a bit awkward, and potentially confusing.

The problem is that it attempts to find the label from the type information (ignoring the environment) and then falls back on using the environment. This means that removing a type annotation can lead to an "Unknown label/constructor" error, which I think most programmers would find very confusing.

An alternative semantics would be to allow the environment to contain multiple labels/constructors with the same name, and try to use the type information to disambiguate them. If they can't be disambiguated by type information then the most recently declared is chosen.

The main practical difference between the two semantics is that the alternative semantics would not allow an unqualified reference to a label or constructor that is not present in the environment. This means that changing type information can only lead to a type error, not an "Unknown label" error.

I have attached a patch which implements the alternative semantics (including variant constructors, and disambiguation using the other fields in a record).

Alain, do these semantics cover the use cases in your code base? And can your code base be compiled using my patch?

@vicuna
Copy link
Author

vicuna commented Oct 23, 2012

Comment author: @garrigue

The original patch from years ago was only about disambiguating the dot notation for record access.
In that case there is no real alternative that I can think of, and I think it makes sense.
The question about error messages is only about error messages: the message could say "there was no type information and no label definition was found in the environment".
The reason we need to look up the environment is of course backward compatibility.

Now, when we start combining type information with other kinds of disambiguation mechanisms I agree this becomes hairy, and I'm not really convinced.
In some way, Pierre's proposal of writing "t^.C" for constructor C of type t is cleaner, since it avoids trying to propagate type information in more complex situations. In particular backward propagation (needed for variant constructor) and propagation inside patterns are much less well understood than the direct propagation used for the dot notation.

@vicuna
Copy link
Author

vicuna commented Nov 8, 2012

Comment author: @bobzhang

alain, if this is pushed to trunk, will anyone to rename parsetree, typetree to make the name more consistent? I am glad to see the changes ;-)

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @garrigue

Alain, I'm just waiting a bit to be sure that everybody is satisfied.
Then I'll send a mail to caml-devel (probably next week).

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @gasche

I would personally appreciate if the behavior respected the symmetry between record construction and pattern-matching on variant constructors; currently, both the {Foo.Bar.l1 = ...; l2 = ...} and the multi-name lookup are reserved to labels and could be extended to variant matching.

I'm planning to consider writing patches for those and propose them to submission. I don't think you should delay trunk-merging of the current patchset because of this. Besides, because of the implementation difficulties, it may be best in any case if those features land in a second step, rather than bundled with the current patchset.

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @garrigue

Gabriel, I'm very reticent about doing multi-label lookup for sums.
I was already hesitating for records, but eventually accepted because record construction/destruction(in patterns) use multiple labels in a single syntactic construct.
However, there is no such syntactic construct for variants: constructors are always taken individually.
To do multi-label lookup, you would need to change the interpretation of pattern-matching as a whole.
(And a patch does not replace a discussion.)

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @alainfrisch

Does anyone have an opinion about allowing type-information to flow from the bound expression to the pattern (instead of the other way around) in a construction like "let {x; y} = foo in ..."?

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @gasche

(And a patch does not replace a discussion.)

Indeed, but I would feel better doing some work on my side first before drawing you into an argument.

Alain > given that the type-information propagation is a part of the observable (and therefore which-should-be-specified) static semantics of the language, I'd rather have a specification as clear as possible. Can you come up with a good spec that is good for you, that Jacques is ready to implement? I would personally be ok with a backtracking semantics if it allows to restore the symmetry, but...

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @alainfrisch

given that the type-information propagation is a part of the observable (and therefore which-should-be-specified) static semantics of the language, I'd rather have a specification as clear as possible. Can you come up with a good spec that is good for you, that Jacques is ready to implement? I would personally be ok with a backtracking semantics if it allows to restore the symmetry, but...

My hope was that Jacques would come with such a proposal :-)

I would actually be rather comfortable with type-checking:

let p = e in ...

as

match e with p -> ...

plus the expected generalization. The only case where I really want type-information to flow from the lhs to the rhs is the explicit type annotation:

let p : t = e in ...

but it's rather straightforward to translate that to:

let p = (e : t) in ...

and IIRC, this is already what the parser or type-checker does.

I did not think about recursive bindings yet, maybe there are cases where it's really important to type patterns first.

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @garrigue

Actually, there is no theory yet about propagating type information into patterns...
So we will have to come up with something reasonable.
But I agree that propagation from expression is easier to formalize that the other way round, so this may be possible. The only problem is that the typing of let expressions is very complicated (and unused-detection warnings do not help there :-)

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @garrigue

Thinking a bit more about the handling of let.
Of course, if this is a "let rec", we have to type the pattern first, so we cannot propagate from the expression to the pattern.
But fortunately, in ocaml, the two are explicitly distinguished.
And since type_cases now generalizes type variables, we could completely delegate non-recursive lets to type_cases (patterns with GADTs are already delegated).
I see just two problems:

  • explicit polymorphic types. Currently they are handled by let, but maybe we could handle them as normal type annotations.
  • unused identifier warnings. I'm afraid they work differently with type_let and type_cases. Or am I wrong.

@vicuna
Copy link
Author

vicuna commented Nov 9, 2012

Comment author: @alainfrisch

  • unused identifier warnings. I'm afraid they work differently with type_let and type_cases. Or am I wrong.

There is some complexity in type_let, but only to support recursive definitions. Otherwise, this is quite simple. I think that different warnings are raised for unused bindings arising from let and from match, but the behavior is the same otherwise, and the code is prepared to be parametrized by the kind of warning to raise. Anyway, if you do the rest of the work, I promise to adapt the warnings :-)

@vicuna
Copy link
Author

vicuna commented Nov 10, 2012

Comment author: @lpw25

Namely, if you get such an error, your problem is with the wrong type, and you are going to search where it is coming from. Knowing the "right" type doesn't help solving the problem.
Worse, if the label was ambiguous, this may not even be the right type.
So my conclusion is that it is not really useful to have a special case for non-ambiguous qualified labels or constructors.

This seems reasonable for unqualified labels, however I disagree for qualified labels. I think that the examples below both show errors that are easier to understand when the error message includes the type of the label.

I also think that the Wrong_name error implies that the label was looked for in the type itself, which is not true for qualified labels. I would rather the error better reflected what the typechecker is actually doing for qualified labels.
However, your point about ambiguity is a good one, and it is clear that Label_mismatch is not an appropriate error either.

It seems to me that we should probably add a new error, for cases where a qualified label/constructor does not have the expected type.
I have implemented such an error in the new-error.diff patch.

Here is an example of the new error in practice:

module N = struct module M = struct type foo = Foo end end;;

module N : sig module M : sig type foo = Foo end end

module M = struct type foo = Foo end;;

module M : sig type foo = Foo end

type foo2 = M.foo;;

type foo2 = M.foo

open N;;

let r : foo2 = M.Foo;;

Characters 15-20:
let r : foo2 = M.Foo;;
^^^^^
Error: The constructor M.Foo belongs to the variant type N.M.foo
but a constructor was expected belonging to the variant type
foo2 = M.foo

Here is another example, this time with an ambiguous constructor:

module M = struct type foo = Foo end;;

module M : sig type foo = Foo end

type foo2 = M.foo;;

type foo2 = M.foo

module M = struct type foo = Foo type bar = Foo end;;

module M : sig type foo = Foo type bar = Foo end

let r : foo2 = M.Foo;;

Characters 15-20:
let r : foo2 = M.Foo;;
^^^^^
Error: The constructor M.Foo belongs to one of the following variant types:
M/1024.bar M/1024.foo
but a constructor was expected belonging to the variant type
foo2 = M/1018.foo

I think that it is much clearer from these error messages what is going on. It is clear that the type checker looked up the constructor, found one or two types that it might have belonged to, but they didn't match the type that was expected.

In the better-errors.diff patch I also made some changes to the wording of errors/warnings. In particular, I improved the new warning messages and I replaced all uses of "label" with "field" since that is what they are called in the OCaml manual. I think these changes are probably worth including.

Also, since Alain fixed the typo in env.ml, I don't think we still need the call to mark_type_used in typecore.ml that your comment describes as strange.

@vicuna
Copy link
Author

vicuna commented Nov 10, 2012

Comment author: @gasche

While you're fixing typos in passing, there is a minuscule one at line 658 of typecore.ml:

655: let labels' = List.filter check_ids labels in
656: if keep && labels' = [] then (false, labels) else
657: let labels'' = List.filter check_closed labels' in
658: if keep & labels'' = [] then (false, labels') else (true, labels'')

The "&" should be "&&".

@vicuna
Copy link
Author

vicuna commented Nov 11, 2012

Comment author: @garrigue

OK, I've merged new-error.diff, and switched from label to field (also removing other uses of label for records). (I was not aware of the wording in the manual, and the source code only uses "label"...)
I agree that error messages should reflect what the compiler does, and this is the case now.

I also removed the useless line (I remember checking that it was no longer needed, but clearly I forgot to commit that).

And changed & into && (which is strictly equivalent...)

I didn't change the warning. Do you really think it makes a difference?

@vicuna
Copy link
Author

vicuna commented Nov 11, 2012

Comment author: @lpw25

I didn't change the warning. Do you really think it makes a difference?

Not particularly. The warnings can always be reworded later if people don't understand them.

I think all the warnings and errors are in good shape now.

I look forward to seeing this extension in the trunk.

@vicuna
Copy link
Author

vicuna commented Dec 7, 2012

Comment author: @garrigue

The branch record-disambiguation was merged into trunk on 2012-12-06, at revision 13112.

@vicuna vicuna closed this as completed Dec 7, 2012
@vicuna
Copy link
Author

vicuna commented Feb 5, 2013

Comment author: @bobzhang

Shall we add an explicit warning here when such fancy label resolving technique is applied? My concern is that it's hard to write code which is compatible with 4.00 or elder compilers. Thanks

@vicuna
Copy link
Author

vicuna commented Feb 5, 2013

Comment author: @alainfrisch

We generally aim at preserving backward compatibility (not breaking code accepted by a previous version of the compiler, except for good reasons). If you want to write code which can be compiled with both an old and a recent version of OCaml, you should thus use the old version during development. I don't think we want to add new warning for each new language feature or improvement to the type-inference introduced in newer releases.

@vicuna
Copy link
Author

vicuna commented Feb 5, 2013

Comment author: @bobzhang

Using the old compiler seems to reasonable, but not practical. I TA a compiler class using ocaml, but some students used a very old compiler 3.11. The problem with this new feature is that it is easy to be used unintentionally, unlike other features (let open XX in...). Thanks

@vicuna
Copy link
Author

vicuna commented Feb 19, 2013

Comment author: @garrigue

Added warning 42 for disambiguated labels and constructors, as requested by hongboz.
Committed in trunk, revision 13297.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants