New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GADT exhaustiveness check incompleteness #6437
Comments
Comment author: mqtthiqs A more concise version of the same bug (I guess): type 'a t = T1 : ('a * 'b) t | T2 : 'a t -> ('a * 'b) t let bug : type a. a t * a u -> unit = function reports a warning for non-exhaustive pattern-matching; but whatever the value of t is, the second argument cannot be U1 (this branch 4 is impossible). Note that eta-expanding the first argument (ie. uncommenting branches 2 and 3) the warning disappears. Hope it helps. |
Comment author: @garrigue This is a well known limitation of the way the exhaustiveness check works for GADTs requires a constructor to "seed" counterexamples. Doing it on the basis of the type alone could cause infinite loops. By the way, this is not a bug, since we know that the exhaustiveness check cannot be complete, and do not provide any specification for it other than it should be sound (i.e. detect all non-exhaustive cases). |
Comment author: mqtthiqs Hello Jacques. Is the first piece of code (in the bug report, not in the comment) of the same kind? I don't see how it could be helped by "seeding" counterexamples. |
Comment author: @garrigue Indeed the example in the bug report is a simpler case. let rec f : type g1 g2. g1 var * (g1,g2) ctx -> g2 var = function But the cause of the warning is the same: using the original order, Progress report: I've tried to solve these problem (including 6220) |
Comment author: @garrigue The patch attached to 6403 does not solve this case: not clear where to hook the expansion here. |
Comment author: @garrigue After looking more carefully, this case is really interesting. The solution seems to be to do the splitting when typing the pattern (to prove that it cannot be typed). The tricky part is that we need to do backtracking in the typing function Typecore.type_pat. |
Comment author: @garrigue The patch type_pat_cps.diff implements a new approach, which can fully exploit typing information. |
Comment author: @garrigue Brittle indeed: this doesn't bootstrap yet, due to an interaction with objects. |
Comment author: @garrigue Cleaner patch. Could simplify parmatch more, as there seems to be no need to do 2 passes. |
Comment author: @garrigue type_pat_cps3.diff additionally simplifies parmatch, merging the two passes. |
Comment author: @garrigue Created a branch in the subversion server. |
Comment author: @garrigue Still thinking about what to do without too much cost.
Exh. (current) | 3.1s | 11.4s | 100s The cost is not big, but still about 3 percents. An alternative solution would be to be more explicit about which pattern-matching to analyze.
to mean that pattern pat is not reached.
to mean that we want explicitly to check that the last case is not needed. let harder : would force exploding the subterms of PlusS. My only concern is that it is more verbose, and feels like an overkill. (*) http://www.math.nagoya-u.ac.jp/~garrigue/papers/index.html |
Comment author: @garrigue Here are a few more times, for make library on another machine. Note that most of the cost is GADT related. And there seems to be no clear bottleneck (except that depths higher than 1 do not seem to be a good idea...) |
Comment author: @gasche As discussed during ICFP, I would be very happy to have an explicit syntax to claim that some patterns are unreachable. I think that would solve many problems with this proposal (by letting users express their intent in a way that can direct the choice of depth), and I suspect that code author could also find it a useful help to reason about the correctness of their GADT assumptions. I'm not very happy with the syntax you suggest, as "-> _" to me reads exactly like "the right-hand-side is to be inferred or elaborated later". I think a nice syntax for an unreachable pattern should not have a "->" symbol in the first place; if we have to keep it for conflicts-avoidance reason, I would rather piggy-back on something like "(assert false)[@unreachable]", or plain "[%unreachable]". |
Comment author: @garrigue Ok, my syntax proposal was just trying not to confuse the tools trying to parse ocaml code, by making it look like a real case. And while _ is used in Coq for the refine tactic, it is not used in ocaml for anything like that, and even if it were used for implicit arguments this would not be just after an arrow. This said, I don't care as long as this is not going to take more characters: I don't think we want the user to have to write a lot just to say that a pattern-matching is finished. match e with I find it less explicit, but it indeed doesn't have an arrow... But please, this is enough work to implement that, do not compose it with a flame war on syntax. |
Comment author: @yallop It's useful to be able to write unmatchable patterns at any point in a match rather than just at the end. From that perspective the rhs-underscore proposal | pat -> _ is better than the arrow-free syntax | pat which interacts badly with or-patterns anywhere except at the end. (I hope this doesn't count as a syntax flame.) |
Comment author: @trefis
I had missed the part between parentheses at the time. |
Comment author: @garrigue IIRC recent versions of the branch fix the change for warnings (only explode or-patterns when they contain gadts). |
Comment author: @lpw25 One thing that is not entirely clear to me about these proposals is whether they mean this case is absurd (i.e. it cannot happen due to the types of indices) or this case is redundant/unreachable (i.e. it cannot happen due to some combination of the types of indices and the cases above it in the match statement). They both seem potentially useful, and I'm not sure which one is being proposed. |
Comment author: @garrigue This is the second one (previous cases are taken into account). This is the whole point of combining this with the exhaustiveness/redundancy check. Thanks to that, in most cases one would only have to add a wild pattern clause, the remaining cases to check being automatically inferred. |
Comment author: @gasche what about | done or | false This would give for example: let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function |
Comment author: @garrigue I've added refuted cases to gadt-warnings (revision 16500). A small explanation of how it works:
Here are some concrete examples from the testsuite: type zero = Zero |
Comment author: @alainfrisch If the syntax "pat -> " is retained, I'd prefer to consider "" as a new kind of expression that can appear anywhere, syntactically (instead of making the rhs field of "case" optional). This minimizes the impact on the Parsetree (and thus the breakage of existing code), and would allow to use this "_" in other contexts (e.g. for attribute/extension payloads). It could even make sense to have such "unreachable" expression in other contexts, in the future (with the intent of letting the compiler check unreachability). For instance: let _ = f ... in that would work only if the compiler can prove that f cannot return a value (based on its type). (That said, and without entering the syntax flame, I find this choice of "" unfortunate, as it prevents to use it to future extensions where "" would be a more natural syntactic candidate.) |
Comment author: @gasche
This is precisely my point. We already have an expression marker for "unreachable code", it is (assert false). If you want to have a "->" (which I think in this particular case is wrong, but then I agree finding a satisfying syntax is difficult), I would rather have | p [@empty] -> assert false Note using an attribute to express intent here is backward-compatible and upward-compatible:
I think that a good longer syntax is better than a bad shorter syntax in this instance. We are talking about complex code (the user is doing non-trivial reasoning here), so the code author is not in a hurry. Your (Jacques') idea of implicitly assuming "_ [@empty] -> assert false" after each one-pattern construction is excellent and will solve the one case that really needs concision. |
Comment author: @alainfrisch I agree with most of Gabriel's points, except that I'm not a big fan of the idea of using attributes to affect the way type-checker works, except w.r.t. warnings (technically, we could argue that the current discussion is about the exhaustiveness check, which only triggers a warning, but this one should really be a strong error). |
Comment author: @garrigue Well, the real point is that this is not about empty patterns, but unreachable right-hand side. |
Comment author: @garrigue Also, the reason it should be compact: The idea is that the programmer should just add the line
to a pattern matching involving gadts, and expect it to work. Another reason you want to put it is that it enables exploding in the redundancy check, which is more symmetric. So, the worse thing would be that to be seen as an advanced feature you only use to optimize your code. It allows you to write better code, not just faster code. |
Comment author: @lpw25
I'm slightly unclear on this part. If you don't enable exploding in the redundancy check will you still get a warning, or is the default no warning. It is also not clear why adding this case should improve the redundancy check. Asserting that a case is unreachable says nothing about the redundancy of cases above it, and it seems pretty arbitrary to treat it as if it did. |
Comment author: @lpw25 I haven't checked the grammar properly but how about: | _ -> * I think it '' slightly better than '' because '' feels like it is saying "there is something here but I'm not specifying it" whereas '' feels like it is saying "anything could go here" (taking its traditional wildcard meaning from globs). It is also unlikely to be given special meaning elsewhere since it is a binary operator in both expressions and types -- so any future use would already be quite restricted. |
Comment author: @trefis I just noticed that I also get:
both before and after f4f858. I assume this is because of the "&& q.pat_desc = Tpat_any" here https://github.com/ocaml/ocaml/blob/trunk/typing/parmatch.ml#L1929 , which I think shouldn't be there. |
Comment author: @trefis Indeed if I remove that part of the condition the example works as expected and the testsuite still passes. |
Comment author: @garrigue No, this is the intended behavior. |
Comment author: @gasche Indeed, I would expect to get some sort of warning if I write an unnecessary absurd clause (do we agree to call this clauses "absurd clauses", or what is the right vocabulary?). It may be useful, to avoid Thomas' confusion, to use a different warning message when the unnecessary clause is an absurd one. I would suggest (with the de rigueur apology for my verbosity): You may drop the part after the comma and still get a decent warning message. |
Comment author: @trefis
Yes that's what I expected, but this is probably a result of me not reading this thread carefully enough.
Fair enough. The warning specialization for "absurd clauses" might be nice then, but I don't have a strong opinion. |
Comment author: @gasche My understanding is that using a final "| _ -> ." is to be understood as a statement of intention of exhaustivity: I declare the pattern-matching above to be complete, with all cases covered. And in particular that means: compiler, please work harder unfolding cases to check that I'm not mistaken. |
Comment author: @trefis Thank you, sorry for the noise. |
Comment author: @gasche Hopefully performing code reviews that clarify features and fix type-checking bugs are not yet considered "noise" around here! |
Comment author: @lpw25
But in this case the warning is only being raised for cases that are trivially covered (i.e. without recourse to type information). I don't see, if the None deserves a warning, why the _ doesn't. |
Comment author: @garrigue
I rather call them "unreachable clauses", because what one would call "absurd" is the intersection of the pattern with the complement of the previous patterns Changing the warning message is easy. The reason "| _ -> ." is handled differently is that intuitively it means: "try harder to prove exhaustiveness". So you may end up adding it in places where it is not really needed, as it cannot harm anyway. |
Comment author: @lpw25 I think we need to clarify what ".... -> ." means. I can see five possible meanings: a) This case is unreachable. In other words, all the possible matches of this case are covered by earlier cases in the match. b) This case is absurd. In other words, all possible matches of this case are disallowed by the types. c) The non-absurd matches of this case are unreachable. In other words all the non-absurd matches of this case (i.e. the matches not ruled out by typing) are covered by earlier cases in the match. d) The reachable matches of this case are absurd. In other words all the reachable matches of this case (i.e. the matches not covered by earlier cases) are disallowed by the types. e) This matches of this case are either absurd or unreachable. a) is clearly not what we want as it does not allow you to specify absurd cases to force the exhaustivity checker deeper. b) allows you to specify absurd cases and force the checking deeper, but it forces you to always write the absurd cases by hand even if there is only one reachable case remaining. c), d) and e) are very similar and give an error on the same cases. However, they differ on when a warning should be given (a warning should indicate that an implication is trivially true because no matches meet its precondition). For c) a warning should be given if there are no non-absurd matches for the case, whereas for d) a warning should be given if there are no reachable matches for the case. Finally, e) would never give a warning. In practice this means that c) would warn for: | None -> foo for type | () -> foo What we currently have is part way between d) and e). It behaves as d) except for To me it seems that d) would be the most useful behaviour. We can already indicate that all remaining cases are unreachable by simply writing no more cases, so This means that we should remove the special exception for |
Comment author: @garrigue After this discussion I would argue for (e):
If you can think of an example where warning for an unnecessary unreachable clause would detect a bug, please tell me. |
Comment author: @trefis As you know my initial guess for the meaning of "... -> ." was (e), as such I have a slight preference for it. |
Comment author: @gasche After thinking about this more, I think that semantically it is best if we avoid distinctions between "unreachable" and "absurd". This distinction is, in a sense, algorithmically motivated: we know what the exhaustiveness checker can reason about easily (the type-less constructor trees), and what needs refined type-directed reasoning. But the underlying explanation for pattern-matching is a capture of all the (well-typed) values that may flow into a clause, with no distinguo between those two settings. This would suggest that (e) is the best choice -- in terms of mental model and explanation for beginners. I'm not satisfied by the name "unreachable clause", because to me any clause whose right-hand-side is dead code (not necessarily ".") is an unreachable clause. (The warning "this match case is unused" could just as well say "this clause in unreachable".) We could pun on the "dead code" idea and call them "killed clauses", but I'm not sure this is of good taste. I still like the name "absurd clause" better; in the Agda world they are called "absurd patterns", but they have no "->" in their syntax, you would write | Right () where "()" is the marker for absurdity, not unlike our ".". We write | Right _ -> . so we are writing a clause rather than a pattern. Or what about "void clause" or "empty clause", maybe? |
Comment author: @lpw25
Unless we were to go with (b), "absurd" is a bad name for them because absurd means that the case does not make sense at all (i.e. the types disallow it). "empty" or "void" seem best if we go with (e) because they do not differentiate between absurd and unreachable. |
Comment author: @gasche A nice thing with "empty" or "void" clause is that they can also be understood as saying that we leave the right-hand-side empty (or void), and this understanding is consistent with the syntax used. |
Comment author: @garrigue Pushed change to (e) at commit 61c2dd2 in trunk. For the name, I still think that unreachable is clearer.
I'm not opposed to other names, but then they should show that we're trying to prove/check something. Maybe something like "refutation clause", but it sounds pedantic. |
Comment author: @gasche I like "refutation clause". Maybe "refuted clause" would sound simpler and be equally precise? (I thought of turning it into "refused clause" to use simpler language but that suggests something else, namely indicating an intent to be partial and not handle some cases.) I'm not sure being pedantic is an issue, but we need a clear name to communicate to our users. (I understand that the documentation for this feature is not yet written.) "higher-order polymorphism" or "relaxed value restriction" or "principal inference" are all pedantic names, yet they are precise and useful way to name concepts that we can and do explain to our users. We don't need a name that can be guessed when seeing the feature, but rather a name that can be remembered once it has been encountered in the Changelog, a blog post, or the documentation -- and for this precision is key. Thanks a lot for your work on the issue! |
Original bug ID: 6437
Reporter: mqtthiqs
Assigned to: @garrigue
Status: closed (set by @xavierleroy on 2017-02-16T14:16:34Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 4.01.0
Target version: 4.03.0+dev / +beta1
Fixed in version: 4.03.0+dev / +beta1
Category: typing
Related to: #6403 #6801
Child of: #5998
Parent of: #6220
Monitored by: @gasche @yallop @alainfrisch
Bug description
This code:
type ('a, 'b) ctx =
| Nil : (unit, unit) ctx
| Cons : ('a, 'b) ctx -> ('a * unit, 'b * unit) ctx
type 'a var =
| O : ('a * unit) var
| S : 'a var -> ('a * unit) var
let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function
| Cons g, O -> O
| Cons g, S n -> S (f (g, n))
(* | Nil, _ -> (assert false) *)
reports a warning for non-exhaustive pattern-matching, whereas (I think) it should not, since the last commented case is impossible.
Or am I missing something?
File attachments
The text was updated successfully, but these errors were encountered: