New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn on ambiguous variables used in a when clause #7031
Comments
Comment author: @gasche I was just thinking of this bug yesterday, and had a small idea for a neat implementation of the [ambiguous variable] criterion -- because it is actually not trivial to decide it. We are trying to know whether the shared variables of (p | q) appear in the same place in both patterns. The "natural" idea is to recurse on p and q, collect the path from the root along the way, and return a map from variables to a list of reaching paths. But that seems a bit costly, and defining the type of paths will be a pain. The idea is as follows: implement a function (intersection : pattern -> pattern -> pattern), such as (intersection p q) is the most precise pattern matching all values of both p and q. This is rather easy to define: intersection x x = x Then it suffices to check that all variables of (p | q) appear in (intersection p q). Those that do not appear there are ambiguous in the sense of [ambiguous variables]. |
Comment author: @gasche In fact (intersection) is also useful to define the [overlap] criterion: we need only warn if (intersection p q) is not _ (if it is _, then [overlap] does not hold) yet misses some shared variables. |
Comment author: @alainfrisch
I guess you meant something else here.
I'm not sure to understand the definition. Cannot we define a function that satisfy this property as: intersection p q = p | q ? |
Comment author: @gasche Indeed, I meant a disjunction-free pattern. It returns the shared constructor tree prefix. For our original example, intersection (Var x, t) (t, Var x) = (_, _) (note that if we knew that Var is the only constructor, then an equally valid result would be (Var _, Var _), but I don't think that affects the ambiguity criterion derived from computing the intersection.) |
Comment author: @damiendoligez Faut-il fermer cette PR ? |
Original bug ID: 7031
Reporter: @gasche
Assigned to: @gasche
Status: resolved (set by @gasche on 2017-02-28T16:20:26Z)
Resolution: fixed
Priority: normal
Severity: feature
Version: 4.02.3
Fixed in version: 4.03.0
Category: middle end (typedtree to clambda)
Monitored by: @diml @yallop @hcarty @alainfrisch
Bug description
Martin Clochard, through the intermediary of Arthur Charguéraud,
reports a confusion arising from the following pattern-matching
clause:
( (Var x, t) | (t, Var x) ) when x = my_var -> ...
The problem is that the code author (and many readers of this code,
I checked) expect the value (Var not_my_var, Var my_var) to be matched
by this clause, while it is not.
Not matching is the expected semantics of "when" guards: they are
specified as first we match the pattern and then we check the
guard, and go to the next clause if the guard fails. On this example,
the pattern succeeds (in the first or-branch) and then the guard
fails. (The fact that or-pattern match left-to-right is explicitly
specified.)
This is however confusing, so we would like to warn in this situation
so that users are forced to think about this potential problem. In
a more general setting, we want to warn whenever the semantics of
| C[p | q] when test -> foo
is not equivalent to the distributive semantics, that is
| C[p] when test -> foo
| C[q] when test -> foo
(I'm using a pattern-context C[..] to account for the general case
where the or-pattern (p | q) may be deep inside the clause pattern.)
After discussing it with Luc and Damien, I propose the following
approximation criterion: fire the warning where both conditions below
are met:
[overlap]: there is a value that can match both sides of the
or-pattern (C[p] and C[q]). For example, if p and q start with
distinct head constructors, no confusion is possible.
[ambiguous variables]: the variables of (p | q) used by the guard
appear under different paths in p and q. For example,
((p1, Var x) | (Var x, q1)) when test x
should warn but
((p1, Var x, p2) | (q1, Var x, q2)) when test x
should not: in all values matching both p and q, the captured value
of x will be the same.
Additional information
Failed proposals:
ambiguous variables (as defined above) is used. This is much too
strong, as there is plenty of code that uses clauses like
| (Var x, t) | (t, Var x) -> subst x t prog
which is perfectly fine if an overlapping value (Var x1, Var x2) is
passed: the semantics is correct whichever branch is chosen.
Similarly, warning in all cases where or-pattern and when-guards are
combined was perceived as too strong.
the time or working example for why this criterion should be added was
that the following code should not warn:
((Var x, p) | (Var x, q)) when test x
and we noted that it would be fine to warn in this case, as this would
encourage the user to make this path-sharing explicit by rewriting the
code as:
(Var x, (p | q)) when test x
Unfortunately, this transformation cannot be applied in general, see
the above example
((p1, Var x, p2) | (q1, Var x, q2)) when test x
which cannot be factorized in a semantics-preserving way.
The text was updated successfully, but these errors were encountered: