Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006437OCamlOCaml typingpublic2014-05-21 20:472015-11-26 10:11
Reportermqtthiqs 
Assigned Togarrigue 
PrioritynormalSeverityminorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformOSOS Version
Product Version4.01.0 
Target Version4.03.0+devFixed in Version4.03.0+dev 
Summary0006437: GADT exhaustiveness check incompleteness
DescriptionThis code:

type ('a, 'b) ctx =
  | Nil : (unit, unit) ctx
  | Cons : ('a, 'b) ctx -> ('a * unit, 'b * unit) ctx

type 'a var =
  | O : ('a * unit) var
  | S : 'a var -> ('a * unit) var

let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function
  | Cons g, O -> O
  | Cons g, S n -> S (f (g, n))
  (* | Nil, _ -> (assert false) *)

reports a warning for non-exhaustive pattern-matching, whereas (I think) it should not, since the last commented case is impossible.
Or am I missing something?
TagsNo tags attached.
Attached Filesdiff file icon type_pat_cps.diff [^] (35,978 bytes) 2015-05-07 09:47 [Show Content]
diff file icon type_pat_cps2.diff [^] (39,997 bytes) 2015-05-08 07:16 [Show Content]
diff file icon type_pat_cps3.diff [^] (45,336 bytes) 2015-05-18 06:04 [Show Content]

- Relationships
related to 0006403resolvedgarrigue another broken GADT exhaustiveness check 
parent of 0006220closedgarrigue GADT type information is not used to detect unused match cases 
related to 0006801resolvedgarrigue Exhaustiveness messages for GADTs suggest patterns that will not type check 
child of 0005998assignedgarrigue GADT typing and exhaustiveness bugs 

-  Notes
(0011560)
mqtthiqs (reporter)
2014-05-23 21:53

A more concise version of the same bug (I guess):

  type 'a t = T1 : ('a * 'b) t | T2 : 'a t -> ('a * 'b) t
  type 'a u = U1 : unit u | U2 : ('a * 'b) u

  let bug : type a. a t * a u -> unit = function
    | t, U2 -> ()
    (* | T1, U2 -> () *)
    (* | T2 _, U2 -> () *)
    (* | _, U1 -> () *) (* impossible *)

reports a warning for non-exhaustive pattern-matching; but whatever the value of t is, the second argument cannot be U1 (this branch 4 is impossible). Note that eta-expanding the first argument (ie. uncommenting branches 2 and 3) the warning disappears.

Hope it helps.
(0011561)
garrigue (manager)
2014-05-24 09:06

This is a well known limitation of the way the exhaustiveness check works for GADTs requires a constructor to "seed" counterexamples. Doing it on the basis of the type alone could cause infinite loops.
We should eventually do something about that, but this is not high-priority.

By the way, this is not a bug, since we know that the exhaustiveness check cannot be complete, and do not provide any specification for it other than it should be sound (i.e. detect all non-exhaustive cases).
(0011609)
mqtthiqs (reporter)
2014-05-30 17:05

Hello Jacques. Is the first piece of code (in the bug report, not in the comment) of the same kind? I don't see how it could be helped by "seeding" counterexamples.
(0011610)
garrigue (manager)
2014-05-31 15:46

Indeed the example in the bug report is a simpler case.
The warning can be avoided by reversing the order in the tuple:

let rec f : type g1 g2. g1 var * (g1,g2) ctx -> g2 var = function
  | O, Cons g -> O
  | S n, Cons g -> S(f (n,g));;

But the cause of the warning is the same: using the original order,
one tries to build a pattern for the Nil context, but then the var part
is not seeded, and the exhaustiveness checked does not try to split
the var into O and S. The O and S of the Cons case do no propagate
to the Nil (and they should not, since GADT matching being left to
right, they might actually not be valid there).

Progress report: I've tried to solve these problem (including 6220)
by using typing information to split a wildcard pattern when checking
exhaustiveness. However, there is a big problem: the current
of exhaustiveness checking does not keep types... Keeping them is
going to be a lot of work, and we should be extremely careful about
not propagating wrong typing information in presence of GADTs.
Compared to these risks, a few false positives does not seem to be
a big problem, particularly since you can really add the reported
pattern to remove the warning.
(0013739)
garrigue (manager)
2015-04-27 12:22

The patch attached to 6403 does not solve this case: not clear where to hook the expansion here.
(0013767)
garrigue (manager)
2015-04-30 06:44

After looking more carefully, this case is really interesting.
To prove that (Nil, _) doesn't accept any value, we need to do a case analysis on the right part, using its type. However, we cannot use the type coming from the other patterns, as it depends on the left part.

The solution seems to be to do the splitting when typing the pattern (to prove that it cannot be typed). The tricky part is that we need to do backtracking in the typing function Typecore.type_pat.
(0013768)
garrigue (manager)
2015-04-30 14:01

The patch type_pat_cps.diff implements a new approach, which can fully exploit typing information.
The idea is to rewrite Typecore.type_pat in continuation passing style, so that it becomes possible to backtrack, looking for a counter-example.
This is rather heavy-weight, and potentially brittle, but I think this is the only way to have accurate warnings. Looking at the testsuite, it indeed removes a number of false positives.
(0013770)
garrigue (manager)
2015-04-30 14:16

Brittle indeed: this doesn't bootstrap yet, due to an interaction with objects.
(0013859)
garrigue (manager)
2015-05-07 09:51

Updated patch. Now it bootstraps.
I see no slowdown when compiling ocamlc, so I'm thinking of merging this patch to trunk.
This fixes both this example, the one in PR#6403, and one in typing-gadts/omega07.ml.
Tell me if you see problems.
(0013871)
garrigue (manager)
2015-05-08 07:17

Cleaner patch. Could simplify parmatch more, as there seems to be no need to do 2 passes.
(0013884)
garrigue (manager)
2015-05-10 03:57

type_pat_cps3.diff additionally simplifies parmatch, merging the two passes.
Also changed the logic, so that the warning will preferably show ground patterns (they are checked first).
Note that all these patches change the output of exhaustiveness warnings, as or-patterns do not appear in the output (they are always exploded into their components, and only one of them is shown).
(0013951)
garrigue (manager)
2015-05-20 01:15

Created a branch in the subversion server.
The code can be checked out with
  svn checkout http://caml.inria.fr/svn/branches/gadt-warnings [^]
(0014508)
garrigue (manager)
2015-10-13 10:51

Still thinking about what to do without too much cost.
Handling of unused case warnings was added to the branch during ICFP (beginning of September).
Here are the costs

                       | camlinternalFormat | stdlib | make all
Exh. (current) | 3.1s | 11.4s | 100s
Exh. & Unus. | 3.4s | 11.8s | 102s
Exh. + explode | 3.9s | 12.2s | 101s
Exh. & Un. + ex.| 4.5s | 13.0s | 103s

The cost is not big, but still about 3 percents.
And half of it comes from the unused case analysis.
A partial solution would be to avoid exploding in the unused case analysis,
even if it loses symmetry.

An alternative solution would be to be more explicit about which pattern-matching to analyze.
A potential syntax is

      | pat -> _

to mean that pattern pat is not reached.
That would mean writing

     match e with
     | p1 -> e1
       ...
     | pn -> en
     | _ -> _

to mean that we want explicitly to check that the last case is not needed.
Advanced unused case analysis would only be done in that case, preserving the symmetry.
An extra advantage is that this can also be used to give explicitly deep patterns that are not possible. For instance, in the case "harder" of the ML workshop talk (*),

let harder :
  (zero succ, zero succ, zero succ) plus option
    -> bool
  = function None -> false
     | Some (PlusS _) -> _

would force exploding the subterms of PlusS.

My only concern is that it is more verbose, and feels like an overkill.

(*) http://www.math.nagoya-u.ac.jp/~garrigue/papers/index.html [^]
(0014509)
garrigue (manager)
2015-10-13 12:09

Here are a few more times, for make library on another machine.
The number indicates the "exploding depth", i.e. the number of
nested GADTs one may explode for analysis.
Exh(2)+Unus(2): 77.0s
Exh(1)+Unus(1): 6.55s
Exh(1)+Unus(0): 6.35s
Exh(0)+Unus(0): 5.95s
Exh(1): 6.19s
Exh(0): 5.75s

Note that most of the cost is GADT related. And there seems to be no clear bottleneck (except that depths higher than 1 do not seem to be a good idea...)
(0014511)
gasche (developer)
2015-10-14 07:50

As discussed during ICFP, I would be very happy to have an explicit syntax to claim that some patterns are unreachable. I think that would solve many problems with this proposal (by letting users express their intent in a way that can direct the choice of depth), and I suspect that code author could also find it a useful help to reason about the correctness of their GADT assumptions.

I'm not very happy with the syntax you suggest, as "-> _" to me reads exactly like "the right-hand-side is to be inferred or elaborated later". I think a nice syntax for an unreachable pattern should *not* have a "->" symbol in the first place; if we have to keep it for conflicts-avoidance reason, I would rather piggy-back on something like "(assert false)[@unreachable]", or plain "[%unreachable]".
(0014512)
garrigue (manager)
2015-10-14 13:30

Ok, my syntax proposal was just trying not to confuse the tools trying to parse ocaml code, by making it look like a real case. And while _ is used in Coq for the refine tactic, it is not used in ocaml for anything like that, and even if it were used for implicit arguments this would not be just after an arrow.

This said, I don't care as long as this is not going to take more characters: I don't think we want the user to have to write a lot just to say that a pattern-matching is finished.
So, an alternative syntax would be

   match e with
   | pat1 -> e1
    ...
   | patn -> en
   | _

I find it less explicit, but it indeed doesn't have an arrow...

But please, this is enough work to implement that, do not compose it with a flame war on syntax.
(0014513)
yallop (developer)
2015-10-14 13:47

It's useful to be able to write unmatchable patterns at any point in a match rather than just at the end. From that perspective the rhs-underscore proposal

   | pat -> _

is better than the arrow-free syntax

   | pat

which interacts badly with or-patterns anywhere except at the end. (I hope this doesn't count as a syntax flame.)
(0014514)
trefis (reporter)
2015-10-14 15:23

> Note that all these patches change the output of exhaustiveness warnings, as or-patterns do not appear in the output (they are always exploded into their components, and only one of them is shown).

I had missed the part between parentheses at the time.
May I ask why that change was introduced?
(0014515)
garrigue (manager)
2015-10-14 15:35

IIRC recent versions of the branch fix the change for warnings (only explode or-patterns when they contain gadts).
(0014516)
lpw25 (developer)
2015-10-14 16:02

One thing that is not entirely clear to me about these proposals is whether they mean this case is absurd (i.e. it cannot happen due to the types of indices) or this case is redundant/unreachable (i.e. it cannot happen due to some combination of the types of indices and the cases above it in the match statement).

They both seem potentially useful, and I'm not sure which one is being proposed.
(0014520)
garrigue (manager)
2015-10-14 20:30

This is the second one (previous cases are taken into account). This is the whole point of combining this with the exhaustiveness/redundancy check. Thanks to that, in most cases one would only have to add a wild pattern clause, the remaining cases to check being automatically inferred.
(0014521)
gasche (developer)
2015-10-14 21:56

what about

  | <pattern> done

or

  | <pattern> false
?

This would give for example:

let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function
  | Cons g, O -> O
  | Cons g, S n -> S (f (g, n))
  | Nil, _ done

let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function
  | Cons g, O -> O
  | Cons g, S n -> S (f (g, n))
  | Nil, _ false
(0014524)
garrigue (manager)
2015-10-15 04:38

I've added refuted cases to gadt-warnings (revision 16500).
I've kept the syntax "pat -> _" because this is for me the only one clear enough at this point.
(The arrow makes clear this is a full-fledge case, and the _ looks like undefined to me, hence unreachable...; I also tried "pat ->" but this creates conflicts in the grammar.)

A small explanation of how it works:
* by default, both exhaustiveness and redundancy checks try typing generated patterns, but without exploding wild-cards
* adding "_ -> _" to a pattern-matching will force exploding once for both exhaustiveness and redundancy (NB, this also means exploding tuples, records, and non-gadt datatypes with only one case)
* if the pattern matching has only one branch (such as a function parameter or a let-binding),
  the we automatically explode once for redundancy (i.e. a bit like an implicit "_ -> _" automatically added). The rationale is that you cannot add it without modifying the structure of the program, and since there is only one case the cost is small anyway. Also, redundancy is not relevant in that case.
* "pat -> _" are included in both exhaustiveness and redundancy checks. For exhaustiveness, this means that pat is supposed to be handled. For redundancy, one checks that there is no value that can reach this case and be accepted by pat (i.e. based on both previous patterns and pat). The check again can explode wildcards just once, but since you can give an explicit pattern, it becomes possible to build an arbitrary finite proof. Failing to be unreachable is an error (not a warning). Note that technically speaking, "_ -> _" is also handled this way (i.e. the pattern-matching is made exhaustive, and reachability is checked during unused cases analysis).

Here are some concrete examples from the testsuite:

type zero = Zero
type _ succ = Succ
;;
type (_,_,_) plus =
  | Plus0 : (zero, 'a, 'a) plus
  | PlusS : ('a, 'b, 'c) plus ->
       ('a succ, 'b, 'c succ) plus
;;
let easy : (zero, zero succ, zero) plus option -> bool =
  function None -> false (* ok since we explode once for single cases *)
;;
let harder : (zero succ, zero succ, zero succ) plus option -> bool =
  function None -> false (* warns *)
;;
let harder : (zero succ, zero succ, zero succ) plus option -> bool =
  function None -> false | Some (PlusS _) -> _ (* does not warn *)
;;
(0014525)
frisch (developer)
2015-10-15 10:41
edited on: 2015-10-15 10:41

If the syntax "pat -> _" is retained, I'd prefer to consider "_" as a new kind of expression that can appear anywhere, syntactically (instead of making the rhs field of "case" optional). This minimizes the impact on the Parsetree (and thus the breakage of existing code), and would allow to use this "_" in other contexts (e.g. for attribute/extension payloads). It could even make sense to have such "unreachable" expression in other contexts, in the future (with the intent of letting the compiler check unreachability). For instance:

   let _ = f ... in
   _

that would work only if the compiler can prove that f cannot return a value (based on its type).

(That said, and without entering the syntax flame, I find this choice of "_" unfortunate, as it prevents to use it to future extensions where "_" would be a more natural syntactic candidate.)

(0014526)
gasche (developer)
2015-10-15 10:53

> it prevents to use it to future extensions where "_" would be a more natural syntactic candidate

This is precisely my point.

We already have an expression marker for "unreachable code", it is (assert false). If you want to have a "->" (which I think in this particular case is wrong, but then I agree finding a satisfying syntax is difficult), I would rather have

 | p [@empty] -> assert false

Note using an attribute to express intent here is backward-compatible and upward-compatible:
- Backward-compatible: it will not create warnings/errors for people that use (| p -> assert false) for code that is unreachable for *other* reasons, for example a user-specified partial domain for the function.
- Upward-compatible: it can evolve in the more general idea that we use static reasoning to verify that "assert false" indeed is unreachable, with (several) attributes used to justify why a specific occurence is unreachable. Here it is unreachable because the pattern cannot match a value.

I think that a good longer syntax is better than a bad shorter syntax in this instance. We are talking about complex code (the user is doing non-trivial reasoning here), so the code author is not in a hurry. Your (Jacques') idea of implicitly assuming "_ [@empty] -> assert false" after each one-pattern construction is excellent and will solve the one case that really needs concision.
(0014527)
frisch (developer)
2015-10-15 11:13

I agree with most of Gabriel's points, except that I'm not a big fan of the idea of using attributes to affect the way type-checker works, except w.r.t. warnings (technically, we could argue that the current discussion is about the exhaustiveness check, which only triggers a warning, but this one should really be a strong error).
(0014528)
garrigue (manager)
2015-10-15 12:47

Well, the real point is that this is not about empty patterns, but unreachable right-hand side.
assert false does not denote a statically unreachable point, but something that shouldn't be reached at runtime.
Maybe having a category of static assertions (checked by the compiler) would solve the problem, but then we need a new keyword...
(0014529)
garrigue (manager)
2015-10-15 13:09

Also, the reason it should be compact:

The idea is that the programmer should just add the line

    | _ -> _

to a pattern matching involving gadts, and expect it to work.
If it returns a message about some value being reachable, just add it again (before this line), and it will be refined.
Incrementally, you can ensure that your pattern-matching is complete, without doing any advanced reasoning about refutation.

Another reason you want to put it is that it enables exploding in the redundancy check, which is more symmetric.

So, the worse thing would be that to be seen as an advanced feature you only use to optimize your code. It allows you to write better code, not just faster code.
(0014530)
lpw25 (developer)
2015-10-15 13:26

> Another reason you want to put it is that it enables exploding in the redundancy check, which is more symmetric.

I'm slightly unclear on this part. If you don't enable exploding in the redundancy check will you still get a warning, or is the default no warning.

It is also not clear why adding this case should improve the redundancy check. Asserting that a case is unreachable says nothing about the redundancy of cases above it, and it seems pretty arbitrary to treat it as if it did.
(0014531)
lpw25 (developer)
2015-10-15 13:31

I haven't checked the grammar properly but how about:

  | _ -> *

I think it '*' slightly better than '_' because '_' feels like it is saying "there is something here but I'm not specifying it" whereas '*' feels like it is saying "anything could go here" (taking its traditional wildcard meaning from globs). It is also unlikely to be given special meaning elsewhere since it is a binary operator in both expressions and types -- so any future use would already be quite restricted.
(0014532)
garrigue (manager)
2015-10-15 14:00

Concerning redundancy, the idea is to be symmetrical.
If a case could be removed while keeping the matching complete (and of course not changing its semantics), then it should be detected as redundant.
Without "_ -> _", no exploding is done, so you only detect as redundant cases for which the non-covered part (i.e. not covered by previous cases) is not typable.
With "_ -> _", the check is more agressive, and you're going to detect as redundant any case for which you could replace the right hand side by "_".
Thinking of it, I'm not sure this is 100% symmetric, as removing the case altogether might not be inferred as complete, but I'm not sure what other kind of trigger one could use (assuming we do not want to have the agressive check always on).

Ok, formally this looks like:
* by default, redundant cases are cases you could remove
* with "_ -> _", redundant cases are cases which can be proved unreachable

Concerning "_ -> *", I have no opinion.
It feels a bit more ad hoc to me, with "*" looking more like the unit constructor than bottom, but this kind of reading is quite arbitrary. And ocaml is a programming language, not a logic.
If people agree that this is better, I'm perfectly ok with it.
This said, there is one drawback: you must be careful when you use it before a parenthesis:

  (function None -> 1 | _ -> *)

If you're talking of regular expression, actually the wild card is ".", so this could also be
    _ -> .
I believe it would work too, but this may be confusing.
(0014534)
gasche (developer)
2015-10-15 15:18

I find your interaction use-case of ending with an absurd pattern as a discovery technique, rather than just after-the-fact expression of intent, rather interesting. Fabrice mentioned that he does the same kind of things already, generating the next pattern to handle from the warnings, and the Merlin folks do related things (but by interacting directly with the type-checker) with the "destruct" command (the demo is short and worth looking at)

  demo: http://the-lambda-church.github.io/merlin/destruct.ogv [^]
  documentation: https://github.com/the-lambda-church/merlin/blob/master/vim/merlin/doc/merlin.txt#L102 [^]
  implementation: https://github.com/the-lambda-church/merlin/blob/master/src/analysis/destruct.ml [^]

I would prefer "." over "_". (The problem with "*" and parentheses is serious.) Sorry for dragging the discussion into the syntactic swamp, but this is also a part that matters.

I'm not convinced by the distinction between "statically unreachable" and "dynamically unreachable". I would not (and I suspect many people follow this convention, although not everyone does) use "assert false" in a place that "can" be reached if the users didn't respect an invariant (this is what exceptions are for), only for code that I am sure will never be reached by *anyone* (if my mental model of the code is right), so it is rather a "statically unreachable" situation (and I fact I often write comment to explain why this code is unreachable). But "assert false" would indeed require an attribute if we wanted to use it here.

Are we sure that

  | p ->

introduces conflicts? I tried adding a

  | pattern MINUSGREATER (* nothing *) { Exp.case $1 (assert false) }

case in the yacc grammar (and my experimental menhir grammar), and I am not seeing any new conflict (whereas, say, replacing (* nothing *) by INT does introduce new conflicts).

I would also support the addition of (-/>) or (-!>) as a new keyword to use instead of (->) in this case, although this would risk breaking code as those are valid infix operators. (-#>) and (-#) are reserved, though.

let rec f : type g1 g2. (g1, g2) ctx * g1 var -> g2 var = function
  | Cons g, O -> O
  | Cons g, S n -> S (f (g, n))
  | Nil, _ -#
(0014535)
garrigue (manager)
2015-10-15 15:23

> Are we sure that "| p ->" introduces conflicts?

Yes it does. Try "make core". There is a conflict on let (it cannot distinguish anymore between let in and toplevel let).
(0014536)
lpw25 (developer)
2015-10-15 15:33

> Without "_ -> _", no exploding is done, so you only detect as redundant cases for which the non-covered part (i.e. not covered by previous cases) is not typable.

This part worries me a little. I feel like most of the value of the redundancy check comes from it triggering when you don't expect it to, so needing to ask the compiler to check for redundancy on a case-by-case basis seems to make it much less useful.

Would it not be possible to keep the warning conservative somehow, treating any wildcard as potentially redundant until proved non-redundant? Then the wildcard could be replaced with something more specific to force the search deeper and remove the warning.
(0014537)
garrigue (manager)
2015-10-15 15:43

By the way, "_ -> ." seems to work ok.
Like for "*", the risk of conflict is limited, as "." is already a keyword, and not really usable as prefix.

My only problem is that visually this is rather weak (doesn't really look like a syntactic object).
Another option is to create a new symbolic keyword (aargh), which really looks like the intended concept:

   pat -> _._

Then there is no problem in allowing it anywhere (eventhough it would not be very meaningful).

It's clear that it doesn't mean anything else, at least.
(0014538)
gasche (developer)
2015-10-15 15:54

No love for (pat -#)? That's my best proposal so far -- although I find (pat -> .) acceptable as well.
(0014539)
garrigue (manager)
2015-10-15 15:55

> This part worries me a little. I feel like most of the value of the redundancy check comes from it triggering when you don't expect it to, so needing to ask the compiler to check for redundancy on a case-by-case basis seems to make it much less useful.

I was also thinking about a way to enable it on a per-file basis, with a flag. For redundancy this would make sense.

> Would it not be possible to keep the warning conservative somehow, treating any wildcard as potentially redundant until proved non-redundant? Then the wildcard could be replaced with something more specific to force the search deeper and remove the warning.

The only thing you could do is see whether there is something that could be exploded, and suppose you're potentially redundant when you don't explode it. However, this is going to happen all the time, so the value of the warning is going to be killed by false positives, I'm afraid.
(0014540)
lpw25 (developer)
2015-10-15 16:01

> However, this is going to happen all the time, so the value of the warning is going to be killed by false positives, I'm afraid.

Could the false positives be reduced by detecting when types are potentially empty based on whether they "contain" GADTs? Something like you look at the top-level constructor of the type, if it is a GADT then the type may be empty, if it is abstract then you could not have exploded it anyway, and if it is a non-GADT datatype then you descend to its parameters.
(0014541)
garrigue (manager)
2015-10-15 16:02

> No love for (pat -#)?

No, I'm no adept of strange characters.

> I find (pat -> .) acceptable as well.

So what about (pat -> _._)
(0014543)
gasche (developer)
2015-10-15 16:20
edited on: 2015-10-15 16:22

"_._" is still problematic because, to me, it says "oh, the user wants me to infer both a record expression and the field projection!". And I'm afraid we are not yet ready to use ? as a reserved keyword.

P.S: ? is the unicode mathematical symbol "up tack", U+22A5.
  http://unicode.org/cldr/utility/character.jsp?a=22A5 [^]
Mantis seems unable to display it.

(0014544)
garrigue (manager)
2015-10-15 16:21

Of course, I was assuming that we only explode GADTs to start with. However, any GADT whose cases are all guarded will trigger it, so this still means lots of false positives. To be precise, there are GADTs for which you can never know for sure whether they are inhabited or not (the undecidability problem), so what would we do then.

Or do you mean that the warning should just encourage the user to add the "_ -> _", so that he gets a more precise warning? This could make sense too. Should probably be a distinct warning though. But then this warning could rather do the real work, exploding things, and you would disable it if you think it's too slow.
(0014545)
garrigue (manager)
2015-10-15 16:33

> it says "oh, the user wants me to infer both a record expression and the field projection!"

Seriously? Field projections are not first-class, so I don't expect we're going to infer them at any time, and I wouldn't call this a conflict. Honestly, I can only read this as bottom, which is the meaning I intended...

Of course, if you're Agda-intoxicated, it looks like something wholly different...

This said, 1_._3 is valid ocaml :-) (but not _._3)
(0014546)
gasche (developer)
2015-10-15 16:42

Bottom is _|_, and using this would be risky in a place full of | for other purposes.

I'm rather satisfied with "-> ." (or "-#"), so I'm going to be happy for the improvement brought by this exchange of proposals and step down from the discussion for a while -- so you can sort out the semantics with Leo.
(0014548)
garrigue (manager)
2015-10-15 17:45

I have added warning 51, for unreachable cases, on by default.
It does exploding, so that it detects cases which could be marked as unreachable.
In most cases, it would be sufficient to omit them, but we cannot tell it for sure (the inferred remaining cases may require some refinement).
The cost seems to be small enough, and you can disable it if there is a problem.

Adding "_ -> _" does not change the behavior for unused cases anymore.
(0014549)
lpw25 (developer)
2015-10-15 17:49

> Or do you mean that the warning should just encourage the user to add the "_ -> _", so that he gets a more precise warning?

I was more thinking that the warning would encourage the user to do one level of expansion themselves. So:

match foo with
| Foo -> ....
| _ -> .....

might warn, which would make you do:

match foo with
| Foo -> ....
| Bar _ | Baz -> ....

which would then expand one level deeper and perhaps demonstrate that the case is needed.
(0014550)
garrigue (manager)
2015-10-16 02:31

Ok, I've switched to "_ -> ." (not sure this is ideal, but I'm tired of arguing), and, more importantly, made it an expression internally (Texp_unreachable).
This means having to remove it in the backend, but this seems to work.

Anything else before I propose merging?

(About suggesting to explode for potentially unused cases, I think this is just too speculative to make a good warning. Being sure that all cases are used is not required for soundness, so just suggesting to remove cases that we could prove unreachable seems the best we can do.)
(0014607)
garrigue (manager)
2015-10-23 10:34

Branch gadt-warnings was merged in trunk at revision 16532.
(0014620)
gasche (developer)
2015-10-24 17:33

The compiler testsuite was broken by the merge of branch gadt-warnings. Besides some minor changes in error-messages that require adjusting some reference files (cd testsuite/tests/typing-gadts; make promote), the new branch seems to introduce a bug: it fails to compile the file testsuite/tests/tool-ocamldoc/odoc_test.ml, with the following error message:

Fatal error: exception File "typing/parmatch.ml", line 1959, characters 48-54: Assertion failed
(0014621)
garrigue (manager)
2015-10-25 13:52

This is fixed now. Thanks for the reminder.
(0014622)
gasche (developer)
2015-10-25 13:54

Thanks to you for the quick fix! I restarted the CI builds, which should be back to their happy state quickly.
(0014744)
trefis (reporter)
2015-11-20 16:05

The "pref = []" check here https://github.com/ocaml/ocaml/blob/trunk/typing/parmatch.ml#L1933 [^] looks dubious.

I don't know why it was added so I don't dare just remove it, but it implies that all of the following examples are accepted by the compiler:

    let f () = match None with _ -> .;;
    let g () = match None with _ -> () | exception _ -> .;;
    let h () = match None with _ -> . | exception _ -> .;;
(0014767)
garrigue (manager)
2015-11-21 06:42

@trefis:
Well-spotted. With the introduction of unreachable cases, the "pref = []" condition must be corrected.
Fixed in trunk, commit f4f858.
(0014776)
trefis (reporter)
2015-11-23 11:05

I just noticed that I also get:

    # let f x = match x with _ -> () | None -> .;;
    Characters 33-37:
      let f x = match x with _ -> () | None -> .;;
                                   ^^^^
    Warning 11: this match case is unused.
    val f : 'a option -> unit = <fun>

both before and after f4f858.

I assume this is because of the "&& q.pat_desc = Tpat_any" here https://github.com/ocaml/ocaml/blob/trunk/typing/parmatch.ml#L1929 [^] , which I think shouldn't be there.
(0014777)
trefis (reporter)
2015-11-23 11:11

Indeed if I remove that part of the condition the example works as expected and the testsuite still passes.
(0014778)
garrigue (manager)
2015-11-23 13:38

No, this is the intended behavior.
The idea is that this clause is covered by previous cases, which is different from being unreachable because of typing. If you write it, you expect it to be needed for the match to be complete.
Removing the condition would never warn for unreachable clauses which are already covered.
There is a choice here to do about the semantics.
(0014779)
gasche (developer)
2015-11-23 13:45

Indeed, I would expect to get some sort of warning if I write an unnecessary absurd clause (do we agree to call this clauses "absurd clauses", or what is the right vocabulary?). It may be useful, to avoid Thomas' confusion, to use a different warning message when the unnecessary clause is an absurd one.

I would suggest (with the de rigueur apology for my verbosity):
"Warning 11: this absurd clause is unnecessary, its pattern is covered by previous cases."

You may drop the part after the comma and still get a decent warning message.
(0014780)
trefis (reporter)
2015-11-23 13:57

> Removing the condition would never warn for unreachable clauses which are already covered.

Yes that's what I expected, but this is probably a result of me not reading this thread carefully enough.
Would you mind explaining then (and sorry if you already explained, I must have missed it) why you do warn in such cases except if the pattern is a wildcard, i.e. why [let f x = match x with _ -> () | _ -> .] doesn't trigger the warning?

> Indeed, I would expect to get some sort of warning if I write an unnecessary absurd clause (do we agree to call this clauses "absurd clauses", or what is the right vocabulary?).

Fair enough. The warning specialization for "absurd clauses" might be nice then, but I don't have a strong opinion.
(0014781)
gasche (developer)
2015-11-23 14:00
edited on: 2015-11-23 14:01

My understanding is that using a final "| _ -> ." is to be understood as a statement of intention of exhaustivity: I declare the pattern-matching above to be complete, with all cases covered. And in particular that means: compiler, please work harder unfolding cases to check that I'm not mistaken.

(0014782)
trefis (reporter)
2015-11-23 14:02

Thank you, sorry for the noise.
(0014783)
gasche (developer)
2015-11-23 14:03

Hopefully performing code reviews that clarify features and fix type-checking bugs are not yet considered "noise" around here!
(0014787)
lpw25 (developer)
2015-11-23 14:47
edited on: 2015-11-23 15:04

> My understanding is that using a final "| _ -> ." is to be understood as a statement of intention of exhaustivity: I declare the pattern-matching above to be complete, with all cases covered.

But in this case the warning is only being raised for cases that are trivially covered (i.e. without recourse to type information). I don't see, if the None deserves a warning, why the _ doesn't.

(0014824)
garrigue (manager)
2015-11-24 14:49

> Indeed, I would expect to get some sort of warning if I write an unnecessary absurd clause (do we agree to call this clauses "absurd clauses", or what is the right vocabulary?). It may be useful, to avoid Thomas' confusion, to use a different warning message when the unnecessary clause is an absurd one.

I rather call them "unreachable clauses", because what one would call "absurd" is the intersection of the pattern with the complement of the previous patterns

Changing the warning message is easy.
The question is whether this should be actually a different warning (or whether we really need a warning for that).
Indeed, one could also use this feature to ensure that a clause is already covered by the previous case (independently of typing). For instance, doing this before adding a catch all clause could make sense, to clarify the semantics of the program.

The reason "| _ -> ." is handled differently is that intuitively it means: "try harder to prove exhaustiveness". So you may end up adding it in places where it is not really needed, as it cannot harm anyway.
I we have a different warning for "already covered unreachable clauses", maybe this doesn't make sense to make an exception for it.
(0014826)
lpw25 (developer)
2015-11-24 15:36
edited on: 2015-11-24 15:43

I think we need to clarify what ".... -> ." means. I can see five possible meanings:

  a) This case is unreachable. In other words, all the possible matches of this case are covered by earlier cases in the match.

  b) This case is absurd. In other words, all possible matches of this case are disallowed by the types.

  c) The non-absurd matches of this case are unreachable. In other words all the non-absurd matches of this case (i.e. the matches not ruled out by typing) are covered by earlier cases in the match.

  d) The reachable matches of this case are absurd. In other words all the reachable matches of this case (i.e. the matches not covered by earlier cases) are disallowed by the types.

  e) This matches of this case are either absurd or unreachable.

a) is clearly not what we want as it does not allow you to specify absurd cases to force the exhaustivity checker deeper.

b) allows you to specify absurd cases and force the checking deeper, but it forces you to always write the absurd cases by hand even if there is only one reachable case remaining.

c), d) and e) are very similar and give an error on the same cases. However, they differ on when a warning should be given (a warning should indicate that an implication is trivially true because no matches meet its precondition). For c) a warning should be given if there are no non-absurd matches for the case, whereas for d) a warning should be given if there are no reachable matches for the case. Finally, e) would never give a warning.

In practice this means that c) would warn for:

  | None -> foo
  | Some _ -> .

for type `empty option` (where `empty` is some type with no valid constructors). Whilst d) would warn for:

   | () -> foo
   | _ -> .

What we currently have is part way between d) and e). It behaves as d) except for `_` patterns where it behaves as e).

To me it seems that d) would be the most useful behaviour. We can already indicate that all remaining cases are unreachable by simply writing no more cases, so `.... -> .` should be reserved for indicating the presence of an absurd case.

This means that we should remove the special exception for `_`.

(0014830)
garrigue (manager)
2015-11-24 22:02

After this discussion I would argue for (e):
* this is more economical, I.e. no need to warn for something which is not dangerous
* this actually avoids introducing extra distinctions: we just require that the . should be unreachable
* there are actually applications for this

If you can think of an example where warning for an unnecessary unreachable clause would detect a bug, please tell me.
(0014833)
lpw25 (developer)
2015-11-25 09:21

I have a slight preference for d), but e) is also fine. This means that the original example reported by @trefis should be fixed to not give a warning.
(0014834)
trefis (reporter)
2015-11-25 10:27

As you know my initial guess for the meaning of "... -> ." was (e), as such I have a slight preference for it.
(0014835)
gasche (developer)
2015-11-25 10:43
edited on: 2015-11-25 10:44

After thinking about this more, I think that semantically it is best if we avoid distinctions between "unreachable" and "absurd". This distinction is, in a sense, algorithmically motivated: we know what the exhaustiveness checker can reason about *easily* (the type-less constructor trees), and what needs refined type-directed reasoning. But the underlying explanation for pattern-matching is a capture of all the (well-typed) values that may flow into a clause, with no distinguo between those two settings.

This would suggest that (e) is the best choice -- in terms of mental model and explanation for beginners.

I'm not satisfied by the name "unreachable clause", because to me any clause whose right-hand-side is dead code (not necessarily ".") is an unreachable clause. (The warning "this match case is unused" could just as well say "this clause in unreachable".) We could pun on the "dead code" idea and call them "killed clauses", but I'm not sure this is of good taste. I still like the name "absurd clause" better; in the Agda world they are called "absurd patterns", but they have no "->" in their syntax, you would write

  | Right ()

where "()" is the marker for absurdity, not unlike our ".". We write

  | Right _ -> .

so we are writing a clause rather than a pattern.

Or what about "void clause" or "empty clause", maybe?

(0014836)
lpw25 (developer)
2015-11-25 11:14

> I still like the name "absurd clause" better

Unless we were to go with (b), "absurd" is a bad name for them because absurd means that the case does not make sense at all (i.e. the types disallow it).

"empty" or "void" seem best if we go with (e) because they do not differentiate between absurd and unreachable.
(0014837)
gasche (developer)
2015-11-25 11:37
edited on: 2015-11-25 11:37

A nice thing with "empty" or "void" clause is that they can also be understood as saying that we leave the right-hand-side empty (or void), and this understanding is consistent with the syntax used.

(0014843)
garrigue (manager)
2015-11-26 07:11

Pushed change to (e) at commit 61c2dd2 in trunk.

For the name, I still think that unreachable is clearer.
Namely we have
* unused clauses: for clauses with a real implementation that are never reached
* unreachable clauses: for clauses "pat -> ."
Of course, for an unreachable clause to be reachable is an error.
And it is always ok to turn an unused clause into an unreachable one.

I'm not opposed to other names, but then they should show that we're trying to prove/check something. Maybe something like "refutation clause", but it sounds pedantic.
(0014844)
gasche (developer)
2015-11-26 10:11
edited on: 2015-11-26 10:11

I like "refutation clause". Maybe "refuted clause" would sound simpler and be equally precise? (I thought of turning it into "refused clause" to use simpler language but that suggests something else, namely indicating an intent to be partial and *not* handle some cases.)

I'm not sure being pedantic is an issue, but we need a clear name to communicate to our users. (I understand that the documentation for this feature is not yet written.) "higher-order polymorphism" or "relaxed value restriction" or "principal inference" are all pedantic names, yet they are precise and useful way to name concepts that we can and do explain to our users. We don't need a name that can be guessed when seeing the feature, but rather a name that can be remembered once it has been encountered in the Changelog, a blog post, or the documentation -- and for this precision is key.

Thanks a lot for your work on the issue!


- Issue History
Date Modified Username Field Change
2014-05-21 20:47 mqtthiqs New Issue
2014-05-21 22:10 gasche Relationship added child of 0005998
2014-05-23 21:53 mqtthiqs Note Added: 0011560
2014-05-24 09:06 garrigue Note Added: 0011561
2014-05-30 14:02 shinwell Assigned To => garrigue
2014-05-30 14:02 shinwell Status new => confirmed
2014-05-30 17:05 mqtthiqs Note Added: 0011609
2014-05-31 15:33 garrigue Relationship added parent of 0006220
2014-05-31 15:46 garrigue Note Added: 0011610
2014-07-16 17:38 doligez Target Version => 4.03.0+dev
2015-04-27 12:21 garrigue Relationship added related to 0006403
2015-04-27 12:22 garrigue Note Added: 0013739
2015-04-30 06:44 garrigue Note Added: 0013767
2015-04-30 13:57 garrigue File Added: type_pat_cps.diff
2015-04-30 14:01 garrigue Note Added: 0013768
2015-04-30 14:16 garrigue Note Added: 0013770
2015-05-07 09:46 garrigue File Deleted: type_pat_cps.diff
2015-05-07 09:47 garrigue File Added: type_pat_cps.diff
2015-05-07 09:51 garrigue Note Added: 0013859
2015-05-08 07:16 garrigue File Added: type_pat_cps2.diff
2015-05-08 07:17 garrigue Note Added: 0013871
2015-05-10 03:52 garrigue File Added: type_pat_cps3.diff
2015-05-10 03:57 garrigue Note Added: 0013884
2015-05-18 04:29 garrigue Relationship added related to 0006801
2015-05-18 05:40 garrigue File Deleted: type_pat_cps3.diff
2015-05-18 05:40 garrigue File Added: type_pat_cps3.diff
2015-05-18 06:04 garrigue File Deleted: type_pat_cps3.diff
2015-05-18 06:04 garrigue File Added: type_pat_cps3.diff
2015-05-20 01:15 garrigue Note Added: 0013951
2015-10-13 10:51 garrigue Note Added: 0014508
2015-10-13 12:09 garrigue Note Added: 0014509
2015-10-14 07:50 gasche Note Added: 0014511
2015-10-14 13:30 garrigue Note Added: 0014512
2015-10-14 13:47 yallop Note Added: 0014513
2015-10-14 15:23 trefis Note Added: 0014514
2015-10-14 15:35 garrigue Note Added: 0014515
2015-10-14 16:02 lpw25 Note Added: 0014516
2015-10-14 20:30 garrigue Note Added: 0014520
2015-10-14 21:56 gasche Note Added: 0014521
2015-10-15 04:38 garrigue Note Added: 0014524
2015-10-15 10:41 frisch Note Added: 0014525
2015-10-15 10:41 frisch Note Edited: 0014525 View Revisions
2015-10-15 10:53 gasche Note Added: 0014526
2015-10-15 11:13 frisch Note Added: 0014527
2015-10-15 12:47 garrigue Note Added: 0014528
2015-10-15 13:09 garrigue Note Added: 0014529
2015-10-15 13:26 lpw25 Note Added: 0014530
2015-10-15 13:31 lpw25 Note Added: 0014531
2015-10-15 14:00 garrigue Note Added: 0014532
2015-10-15 15:18 gasche Note Added: 0014534
2015-10-15 15:23 garrigue Note Added: 0014535
2015-10-15 15:33 lpw25 Note Added: 0014536
2015-10-15 15:43 garrigue Note Added: 0014537
2015-10-15 15:54 gasche Note Added: 0014538
2015-10-15 15:55 garrigue Note Added: 0014539
2015-10-15 16:01 lpw25 Note Added: 0014540
2015-10-15 16:02 garrigue Note Added: 0014541
2015-10-15 16:20 gasche Note Added: 0014543
2015-10-15 16:21 garrigue Note Added: 0014544
2015-10-15 16:22 gasche Note Edited: 0014543 View Revisions
2015-10-15 16:33 garrigue Note Added: 0014545
2015-10-15 16:42 gasche Note Added: 0014546
2015-10-15 17:45 garrigue Note Added: 0014548
2015-10-15 17:49 lpw25 Note Added: 0014549
2015-10-16 02:31 garrigue Note Added: 0014550
2015-10-23 10:34 garrigue Note Added: 0014607
2015-10-23 10:34 garrigue Status confirmed => resolved
2015-10-23 10:34 garrigue Fixed in Version => 4.03.0+dev
2015-10-23 10:34 garrigue Resolution open => fixed
2015-10-24 17:33 gasche Note Added: 0014620
2015-10-25 13:52 garrigue Note Added: 0014621
2015-10-25 13:54 gasche Note Added: 0014622
2015-11-20 16:05 trefis Note Added: 0014744
2015-11-21 06:42 garrigue Note Added: 0014767
2015-11-23 11:05 trefis Note Added: 0014776
2015-11-23 11:11 trefis Note Added: 0014777
2015-11-23 13:38 garrigue Note Added: 0014778
2015-11-23 13:45 gasche Note Added: 0014779
2015-11-23 13:57 trefis Note Added: 0014780
2015-11-23 14:00 gasche Note Added: 0014781
2015-11-23 14:01 gasche Note Edited: 0014781 View Revisions
2015-11-23 14:02 trefis Note Added: 0014782
2015-11-23 14:03 gasche Note Added: 0014783
2015-11-23 14:47 lpw25 Note Added: 0014787
2015-11-23 15:04 lpw25 Note Edited: 0014787 View Revisions
2015-11-24 14:49 garrigue Note Added: 0014824
2015-11-24 15:36 lpw25 Note Added: 0014826
2015-11-24 15:43 lpw25 Note Edited: 0014826 View Revisions
2015-11-24 22:02 garrigue Note Added: 0014830
2015-11-25 09:21 lpw25 Note Added: 0014833
2015-11-25 10:27 trefis Note Added: 0014834
2015-11-25 10:43 gasche Note Added: 0014835
2015-11-25 10:44 gasche Note Edited: 0014835 View Revisions
2015-11-25 11:14 lpw25 Note Added: 0014836
2015-11-25 11:37 gasche Note Added: 0014837
2015-11-25 11:37 gasche Note Edited: 0014837 View Revisions
2015-11-26 07:11 garrigue Note Added: 0014843
2015-11-26 10:11 gasche Note Added: 0014844
2015-11-26 10:11 gasche Note Edited: 0014844 View Revisions


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker