nikomatsakis/mcp.md Secret

## mcp.md

      
    Raw
  

              mcp.md
            
          
    move leak-check to during coherence, candidate eval

Summary


Alter the leak-check to be more precise and to work by examining all placeholders created during a snapshot
Remove the leak check so it occurs only during candidate evaluation + coherence

Background and motivation

Goal: unblock lazy normalization and resolve universe transition

The goals of this MCP are two-fold:

to unblock further progress towards lazy normalization (and by
extension const generics),
to work towards resolving the "universe transition" future
compatibility warning #56105, although (as noted below) the
changes described here are not by themselves sufficient.

What is the leak check?

The leak check is a way to manage higher-ranked obligations. It takes
place as part of a snapshot operation S on the inference context. It
examines all the 'a: 'b constraints created during that operation S,
and (on master today) it looks for any of the following patterns:

'!P1: '!P2 -- i.e., one placeholder that must outlive another
'!P1: 'R, 'R: '!P1 -- if the universe of 'R cannot name the universe of P1

How the leak check works today

The leak check today is meant to be used in conjunction with checking
some specific for<'a, ..., 'z> quantifier. The idea is that you

start a snapshot S
create a fresh universe U
substitute placeholders !Pa...!Pz (in the universe U) for the bound regions 'a...'z
perform various operations that may create region constraints like !Pa: 'b
consider the region constraints created during the snapshot as a graph

each !Pa: 'b constraint is an edge !Pa -> 'b


then search the graph starting from each placeholder (!Pa, !Pb,
etc), both forwards and backwards.
The leak check fails if we find any reachable node N

N is another placeholder
N is a region in some universe that cannot name U


If the leak check fails, then that typically produces some kind of
Err result -- e.g., the subtyping operation fails today. If it
succeeds, then Ok is produced. The resulting constraints may well be
added into the final set of the region constraints, and the region
solver (which is now "universe aware") will solve them.
Disparity between the leak check and the universe-aware region solver

As mentioned in the previous section, in master today, if the leak
check succeeds, then the resulting constraints are added to the solver
which ultimately performs much the same check, but in a different
way. However, the solver is more precise than the leak-check, and as a
result it accepts some things that the leak check rejects. This is why
if you run rustc with -Zskip-leak-check, some programs will start to
behave differently (e.g., compiling when they would have gotten an
error, or compiling differently).
To see how the leak-check and the solver differ, consider this
example.  Can we consider these two types to be subtypes?
for<'a> fn(&'a u32) -> &'static u32 
<:
for<'b> fn(&'b u32) -> &'b u32 
To solve this, we would first replace 'b with a placeholder !B in
a fresh universe U1. We would then replace 'a with a fresh
variable ?A in the universe U1. Then we relate the types,
which ultimately yields two region constraints:

fn(&'?A u32) -> &'static u32 <: fn(&'!B u32) -> &'!B u32

&'!B u32 <: &'?A u32

'!B: '?A, the first constraint


&'static u32 <: &'?!B u32

'static: '?!B, the second constraint


The resulting constraints are:

'!B: '?A
'static: '?!B

The current leak-check on master will thus reject this
subtyping. However, this can be somewhat difficult to observe. This
program, for example, compiles -- that is because edges like
'static: 'R are hard-coded to be ignored, so the leak-check never
sees them. Bu we can produce failures using inference variables, so
that the 'static doesn't show up as readily. Changing the return
type to _ gives a subtyping error, for example, even though we
just saw that &'static u32 would have worked. This is true even if
we give stronger hints.
Interaction between the leak check and lazy normalization

There is another problem with the leak-check as it operates today: it
interacts poorly with lazy normalization. The problem is that (a) the
leak-check operates during subtyping and (b) it requires examining the
full resulting set of region obliations that are produced by that
subtyping check. But the way that lazy normalization is meant to work
is that, during the subtyping check, we sometimes produce new
obligations that defer part of that check for later.
Consider an example like
for<'a> fn(<&'a as Mirror>::Item) =
  fn(&'b u8)

where <T as Mirror>::Item = T for all T. If we used a lazy
normalization scheme, we would want to produce a new subobligation
like
<'!1 as Mirror>::Item = &'b u8

This will, after being solved, ultimately yield a constraint that '!1 = 'b which will fail. But with the leak-check being performed on
subtyping, there is no opportunity to normalize <'!1 as Mirror>::Item (unless we invoke that normalization directly from
within subtyping, and I would prefer that subtyping and unification
are distinct operations rather than part of the trait solving stack).
Note that if we removed the leak check, we'd have no problem here. We
would produce this new obligation from subtyping, which would get
solved.  It would ultimately produce various region obligations that
contain variables in various universes, corresponding to the for<..>
binders we had to traverse, and the region solver can accommodate that
and determine if there is an appropriate solution.
Why not remove the leak check (part 1)? Coherence.

Since the existing solver can handle universes, and the leak check is
imprecise and causing problems for lazy normalization, the question
you might reasonably ask then is "why not remove the leak check?" And
indeed, we have tried in #55517, but we hit backwards compatibility
concerns, which prompted us to introduce a future-compatibility
warning while we figure out the ultimate solution. This MCP
does not resolve that problem completely, but it does suggest the
direction I think we ultimately want to take (more on that later).
The problem specifically centers around coherence. Today you can
have impls like
impl<T> Trait for fn(T) { }
impl<U> Trait for fn(&U) { }
and they are not considered to overlap. This is true even though the
coherence rules ignore region constraints (assume that they are
solveable).  This occurs because the leak-check produces a "subtyping
error" result before the region constraints ever come into play, as
the placeholder introduced for fn(&U) is equated with a region in
the root universe (introduced for T).
Some crates rely on this pattern. The crate that relies on it in the
most "troubling" way is wasm-bindgen. This is both because it's a
prominent crate but also because the use case is reasonble and
difficult to solve another way. As noted here, it has something
two impls and in the &U case it uses distint trait bounds and the like.
// Owneship (from a macro):
impl<'a, 'b, A, R> IntoWasmAbi for &'a (dyn Fn(A) -> R + 'b)
where 
    A: FromWasmAbi,
    R: ReturnWasmAbi
{ /* ... */ }

// Explicitly writing the bound lifetime.
impl<'a, 'b, A, R> IntoWasmAbi for &'a (dyn for<'x> Fn(&'x A) -> R + 'b)
where
    A: RefFromWasmAbi,
    R: ReturnWasmAbi
{ /* ... */ }
It's not clear how to preserve this pattern without accepting two distinct
impls.
Why not remove the leak check (part 2)? Subtle interactions.

One other observation I will note is that changing the subtyping rules
can affect things beyond coherence, such as method dispatch. This is
because we will decide that a method is "applicable" if the subtyping
check succeeds (no matter what region constraints it generates).  We
haven't observed any real problems due to this in the wild, but as
part of the PR implementing this proposal, some tests did mildly
change where they reported errors, which might indicate an
interaction. A crater run could help.
Proposed change in detail

OK, now that we've laid out the background, let's go over the proposed
changes.
First change: make the leak check match the behavior of region solver

The first change is to alter how the leak check works to make it more
precise. The newer leak-check is still invoked in the context of some
snapshot S, but it no longer takes a specific set of placeholders to
examine. Instead, it looks for placeholders in any universe created
since the snapshot began. This allows it to be invoked after "some
number" of higher-ranked for bindings have been performed, and it
effectively "leak checks" all of them at once.
The algorithm starts as follows:

start a snapshot S
perform some number of "higher-ranked operations, each of which:

create a fresh universe U
substitute placeholders !Pa...!Pz (in the universe U) for the bound regions 'a...'z
perform various operations that may create region constraints like !Pa: 'b


consider the region constraints created during those operations as a graph

each R1: R2 constraint is an edge R1 -> R2


So far it is similar to the old leak check. We then:

Search the graph forward from each placeholder P that was created in some
universe U created during the snapshot S.
The leak check fails if we find any reachable node N

N is another placeholder
N is a region in some universe that cannot name U


The main difference here is that we only search forward -- in
other words, for each placeholder, we consider the regions that the
placeholder must (transitively) outlive, but we don't consider those
regions that must outlive the placeholder.
This means that we accept this example subtyping given earlier without
an error:
for<'a> fn(&'a u32) -> &'static u32 
<:
for<'b> fn(&'b u32) -> &'b u32 
The reason is that the edge 'static: '!B that results is going in
the wrong direction.
With this change, I believe the leak check now only rejects programs
that would later fail the current region solver. However, there is a
catch, as detailed below: the current region solver itself generates
errors in some cases we would prefer to accept!
Perform leak check later

The next change is that we remove the leak check from the subtyping check.
This alone would break the wasm-bindgen crate, however, so we add it back
in two other locations:

during the coherence check
during the "candidate evaluation" step of trait solving

Adding the leak check during the coherence check allows us to accept
wasm-bindgen, though it works somewhat differently. Before, we would
attempt to unify the parameters of the two impls and fail. Now, that
unification succeeds, but it generates region constriants that later
fail the leak-check which coherence performs.
Adding the leak check into the "candidate evaluation" step is needed
to prevent coherence from generating ambiguities in trait resolution.
In particular, consider the two impls from wasm-bindgen, and assume
that we are testing them against a type F = fn(&u32). Before, only
one impl would be able to unify its types against the type F. Now,
both are able to do so, producing two candidates, but one of those
candidates is rejected during evaluation because it ultimately fails
the leak-check.
Observations on the proposed change

Full set of places we perform a leak-check

After this change, we perform the leak-check at the following places:

coherence -- added here in my branch
evaluation_probe -- added here in my branch

The current region solver (and proposed leak check) reject some programs they should accept

As I noted earlier, I believe that the leak check as described above
only rejects programs that would eventually have been rejected by the
current region solver. However, the current region solver itself is
overly conservative, and generates errors in cases that arguably we
should accept -- and the leak check shares this flaw.
The reasons for this come down to "implied bounds" and are somewhat
speculative -- i.e, te problems arise when we move the language in
directions that it likely has to go, but not so much now.
Specifically, consider a type like
for<'a, 'b> fn(&'a &'b u32)
This type suggests that the type fn applies to any two regions
'a and 'b, but that's not entirely true. In order for &'a &'b u32 to be a "well-formed" type, we must know that 'b: 'a. So this actually
constraints the set of regions that can be used with this function.
One might argue that, as a result, the fn type above should not be
considered well-formed. Instead, perhaps it should look more like this
(obviously not real syntax):
for<'a, 'b>
  where 'b: 'a
    fn(&'a &'b u32)
Here we are using a "where clause" to constrain the choices of 'a
and 'b that we can use. In fact, it turns out that having some
notion of something like where clauses is needed to fix a
long-standing soundness hole in Rust.
However, we clearly aren't going to go rejecting all the Rust programs
in existence by requiring that people write exotic where clauses like
the one above. Instead, the way we ahd planned to fix this was by using
the notion of "implied bounds". The fn type above in other words
would be considered to be "shorthand" for a type like
for<'a, 'b> 
  where WellFormed(&'a &'b u32)
    fn(&'a &'b u32)
This comment explains this scheme in more detail, and also
explains how it can fix the soundness hole.
Implied bounds mean we can sometimes know about relationships between placeholders

The key point for our purposes is that the implied bounds described
in the previous section may sometimes make it possible to prove
relationships between placeholders. Consider this type:
for<'a, 'b> fn(
  &'a &'b u32,
  &'b &'a u32,
)
As a result of the two argument, we know that 'b: 'a (first
argument) and 'a: 'b (second argument). This means that in fact 'a = 'b.
The current region checker and leak check, however, would give an
error if you have a constraint like !P1: !P2. This is because they
are ignoring the "environment" (what where clauses are in scope) when
they try to solve region constraints.
Being overly conservative is a problem for the leak check, but not region checker

Even though the leak checker and region solver are both overly
conservative ("incomplete", in the technical term), this is more of a
problem for the leak checker than the region solver. The reason is
that the role of the leak checker and the region solver are different:

The leak check is used to decide "which path to take" during
type-checking.  For example, it guides the trait solver when it
chooses which impls might apply and whether there is ambiguity, and
decides whether coherence considers two impls as overlapping.

By moving the leak check out from subtyping, this current MCP
minimizes this distinction, but doesn't eliminate it. This is
because subtyping is sometimes used to guide choices like "which
coercions apply", and the leak check would no longer be in that
path.


The region solver, in contrast, is invoked at the end and it simply must find
a solution or else compilation will abort.

In other words, if the region solver finds an error, compilation
always aborts. But if the leak check finds an error, compilation may
succeed in a different way.
An example where the leak check causes coherence to accept overlapping impls

Putting the previous two sections together, if we try to adapt the leak-check/solver
so that they account for implied bounds, then we will wind up making them reject
fewer sets of constraints. For the region solver, that's fine, it just means more programs
start to compile. But for the leak check, it can make some programs that used to compile
stop compiling, or it can cause programs to compile differently.
As an example, the following impls are still accepted on my branch
(although they do get the #56105 future compatibility warning):
trait Trait { }

impl Trait for for<'a, 'b> fn(
  &'a &'b u32,
  &'b &'a u32,
) { }

impl Trait for for<'c> fn(
  &'c &'c u32,
  &'c &'c u32,
) { }
This is true even though those two types are equivalent, if you take
implied bounds into account.
If we removed the leak check entirely, those two impls would be
rejected.
Lack of DRY

One final observation is that the leak check and the region solver
have lot of overlap in their functionality, which is clearly
suboptimal. The leak check is sort of a simpler version of the latter.
Future directions: let the trait solver solve higher-ranked things

Looking to the future, as well as possible developments like polonius,
I believe that there is a model that both fully satisfies #56105 and
preserves the wasm-bindgen pattern.
In short, the idea is to replace the 'leak check' with a model where
the trait solver has the job of "breaking down" higher-ranked region
relationships like exists<'a> { forall<'b> { Outlives('a: 'b) } } into simpler
ones that only occur in the root universe (in that particular example,
the result would be Outlives('a: 'static)).
(I made a bit of a leap here: previously I've been talking about
subtyping relationships and so forth. Right now I'm cutting to the
chase of describing the region constraints that arise out of those
subtyping obligations and just talking directly about those. These are
effectively chalk-like goals, which is the syntax I've adopted.)
The intuition here is that the trait solver knows all the possible
bounds on the placeholders. It also knows the environment and implied
bounds that may be in scope. Moreover, it already has to do reasoning
about implied bounds, so it is well-positioned to figure things out
like "which region obligations are implied by a type being
well-formed" (since that is not so different from "which trait bounds are implied
by a type being well formed").
To give a more complex example of how we can simplify constraints,
consider the following goal, where 'b and 'b two regions in the
root universe:
forall<'a> {
  if (Outlives('a: 'b)) {
    Outlives('a: 'c)
  }
}
We can convert the above into a goal like Outlives('b: 'c). This is
because there would be no other way we could possibly know that
Outlives('a: 'c) is true.
This fits quite well with a Polonius-like model, since it is not
entirely obvious how to gracefully extend that model to cover
higher-ranked obligations (note to say it can't be done, but it just
doesn't fit with the data-log-like execution model very
well). Similarly, the existing NLL solver integrates universes, but
the integration is awkward -- the analysis would otherwise be using
simple set union, but now it must make complex checks.
Complication: no best simplification

One complication is that there will sometimes be "no best way" to
resolve various possible region obligations. Consider something like:
forall<'a> {
  if (Outlives('a: 'b), Outlives('a: 'b2)) {
    Outlives('a: 'c)
  }
}
Whereas before we could simplify to Outlives('b: 'c), we now have two
possible simplifications:

Outlives('b: 'c)
Outlives('b2: 'c)

and we have no way to know which is better. Depending which one we
choose, we might generate borrow check errors down the line.
Actually, this problem already arises from the trait solver even
before we consider the question of higher-ranked solutions, as is
discussed in #21974.
Certainly the trait solver could produce a complex constraint that
involves ||, such as Outlives('b: 'c) || Outlives('b2: 'c). There
are two problems with this, though. First, the trait solver must
always explore all solutions fully to find the constraints to ||
together, and -- more importantly -- the region solver can't really
deal with || constraints, and solving them requires exploring all
possible combinations, which could be slow and expensive. However,
we can opt to be conservative in the region solver, and simply (for
example) convert all || into && constraints, or try to pick other
conservative approximations (we already do similar things today in
some cases). Because of the fact that any error prodcued by the region
solver ultimately halts compilation, it is safe for us to start simple
and get smarter.
We can also have the trait checker simply refuse to pick the best
solution in cases like these. This is what we do today in the case of
#21974, for example. That is also an ok solution.