|Anonymous | Login | Signup for a new account||2018-08-15 18:54 CEST|
|Main | My View | View Issues | Change Log | Roadmap|
|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0007835||OCaml||typing||public||2018-08-03 13:00||2018-08-04 20:10|
|Priority||normal||Severity||minor||Reproducibility||have not tried|
|Target Version||Fixed in Version|
|Summary||0007835: Incorrect (existential) type scope tracking|
|Description||Since the introduction of GADTs, we've been reserving a slot for up to a|
thousand existentials to be introduced during typing of pattern matches.
This is done by: https://github.com/ocaml/ocaml/blob/4.00/typing/typecore.ml#L2741 [^]
The reason for this is that scoping for type constructors is tracked using their
If that line wasn't present, then if we did introduce new types, their binding
time would be above the current level. Which means that any use of such a type
would be out of the scope of the type constructor.
https://github.com/ocaml/ocaml/pull/1951 [^] proposes to explicitely check that we
do not introduce more than a thousand type constructors, instead of letting them
through and then potentially reporting weird (i.e. incorrect) scope errors.
The hole in that reasonning (and in the GPR linked to above) is that type
constructors are not the only identifiers created during pattern matches:
pattern variables will also imply creating some idents, for example.
But pattern variables are not the only thing implying the creation of idents
though, otherwise we could almost dismiss the problem with "meh, no reasonable
hand written code will ever contain more than X many pattern variables, so it's
alright" (which I assume was the rational behind the 1K limit on existential per
An example as trivial as https://github.com/trefis/existentialism/blob/master/wrong-escape/main.ml [^]
can also get us over the limit if [Sub] contains a few too many idents. Indeed
in that example, thanks to the 1st line introducing a module alias, a
substitution on the signature of [Sub] happens during the typing of the
[M.Sub.Pack] pattern, which will generate as many fresh idents as there are
idents in [Other_module.Sub].
If you clone that repository and run "make failure OCAMLC=/path/to/trunk/ocamlc"
in the wrong-escape directory, then you'll get:
File "main.ml", line 6, characters 11-12:
Error: This expression has type $Pack_'a
but an expression was expected of type 'a
The type constructor $Pack_'a would escape its scope
The error also happens at the tip of the 4.07 branch.
This example was encountered when trying to compile janestreet's code base with
the tip of the 4.07 branch.
The reason why it wasn't a problem on 4.07.0 is that the slot happened to be
made much bigger (as was already discussed), which makes it much less likely
to create so many idents in real code.
The reason why it wasn't a problem before 4.07 is that the 1k limit was actually
just a big lie. The scope of type constructors introduced for existential
variables wasn't tracked using the ident binding time: we were looking up the
type declaration in the environment, and checking at which level it was
introduced: https://github.com/ocaml/ocaml/blob/4.00/typing/ctype.ml#L689 [^]
That is, you could have added 10k existential, and the scoping would have been
correct for each of them.
https://github.com/trefis/existentialism/blob/master/over_999/main.ml [^] is an
example of this.
However, that piece of code disappeared with https://github.com/ocaml/ocaml/pull/1609 [^]
There are a few possible ways to fix (or minimize the issue):
1. Let's remove the double meaning associed to stamps in identifiers. Introduce
a dedicated "scope" field and rely on that.
2. Reintroduce the piece of code which disappeared. That is basically a worse
version of (1): it removes the double meaning associed to identifiers stamps but
only for existentials, and the "scope" field is not a proper field but we go and
fetch the information in the environment
3. Leave a bigger slot when typing a match. 1k is clearly too small for real
code, 100k causes an other kind of issues. How about 10k? I have a good feeling
about 10k ( https://youtu.be/8Xjr2hnOHiM?t=1m34s [^] ).
I have started working on (1), but I am disappearing for my vacations next
Thursday, and it's unlikely that I will have a GPR opened before then. The
change is nothing too complicated but it is fairly invasive, so it takes a bit
That is: I hope (1) will be the fix chosen for 4.08, but if we want to release
4.07.1 it probably won't be ready in time, and it's also too much for a bugfix
If we want to release a 4.07.1 compiler soonish, then (2) and (3) should be
(2) is "more correct", in the sense that we're back to a world where we can have
as many existential as we want. Though it will most likely have a negative
impact on typing time (because we have to do a lookup in the environment
whenever we call [update_level]).
Personnaly I think I'd go with (3): it's the simplest change, it's unlikely to
be problematic in practice, and won't incur any performance regression compared
That's the choice we've made internally at janestreet.
|Tags||No tags attached.|
|2018-08-03 13:00||trefis||New Issue|
|2018-08-04 20:10||hhugo||Note Added: 0019292|
|Copyright © 2000 - 2011 MantisBT Group|