Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Last active May 23, 2020 09:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nikomatsakis/8bfda6c1119727e13ec6e98f33d2b696 to your computer and use it in GitHub Desktop.
Save nikomatsakis/8bfda6c1119727e13ec6e98f33d2b696 to your computer and use it in GitHub Desktop.

move leak-check to during coherence, candidate eval

Summary

  • Alter the leak-check to be more precise and to work by examining all placeholders created during a snapshot
  • Remove the leak check so it occurs only during candidate evaluation + coherence

Background and motivation

Goal: unblock lazy normalization and resolve universe transition

The goals of this MCP are two-fold:

  • to unblock further progress towards lazy normalization (and by extension const generics),
  • to work towards resolving the "universe transition" future compatibility warning #56105, although (as noted below) the changes described here are not by themselves sufficient.

What is the leak check?

The leak check is a way to manage higher-ranked obligations. It takes place as part of a snapshot operation S on the inference context. It examines all the 'a: 'b constraints created during that operation S, and (on master today) it looks for any of the following patterns:

  • '!P1: '!P2 -- i.e., one placeholder that must outlive another
  • '!P1: 'R, 'R: '!P1 -- if the universe of 'R cannot name the universe of P1

How the leak check works today

The leak check today is meant to be used in conjunction with checking some specific for<'a, ..., 'z> quantifier. The idea is that you

  • start a snapshot S
  • create a fresh universe U
  • substitute placeholders !Pa...!Pz (in the universe U) for the bound regions 'a...'z
  • perform various operations that may create region constraints like !Pa: 'b
  • consider the region constraints created during the snapshot as a graph
    • each !Pa: 'b constraint is an edge !Pa -> 'b
  • then search the graph starting from each placeholder (!Pa, !Pb, etc), both forwards and backwards.
  • The leak check fails if we find any reachable node N
    • N is another placeholder
    • N is a region in some universe that cannot name U

If the leak check fails, then that typically produces some kind of Err result -- e.g., the subtyping operation fails today. If it succeeds, then Ok is produced. The resulting constraints may well be added into the final set of the region constraints, and the region solver (which is now "universe aware") will solve them.

Disparity between the leak check and the universe-aware region solver

As mentioned in the previous section, in master today, if the leak check succeeds, then the resulting constraints are added to the solver which ultimately performs much the same check, but in a different way. However, the solver is more precise than the leak-check, and as a result it accepts some things that the leak check rejects. This is why if you run rustc with -Zskip-leak-check, some programs will start to behave differently (e.g., compiling when they would have gotten an error, or compiling differently).

To see how the leak-check and the solver differ, consider this example. Can we consider these two types to be subtypes?

for<'a> fn(&'a u32) -> &'static u32 
<:
for<'b> fn(&'b u32) -> &'b u32 

To solve this, we would first replace 'b with a placeholder !B in a fresh universe U1. We would then replace 'a with a fresh variable ?A in the universe U1. Then we relate the types, which ultimately yields two region constraints:

  • fn(&'?A u32) -> &'static u32 <: fn(&'!B u32) -> &'!B u32
    • &'!B u32 <: &'?A u32
      • '!B: '?A, the first constraint
    • &'static u32 <: &'?!B u32
      • 'static: '?!B, the second constraint

The resulting constraints are:

  • '!B: '?A
  • 'static: '?!B

The current leak-check on master will thus reject this subtyping. However, this can be somewhat difficult to observe. This program, for example, compiles -- that is because edges like 'static: 'R are hard-coded to be ignored, so the leak-check never sees them. Bu we can produce failures using inference variables, so that the 'static doesn't show up as readily. Changing the return type to _ gives a subtyping error, for example, even though we just saw that &'static u32 would have worked. This is true even if we give stronger hints.

Interaction between the leak check and lazy normalization

There is another problem with the leak-check as it operates today: it interacts poorly with lazy normalization. The problem is that (a) the leak-check operates during subtyping and (b) it requires examining the full resulting set of region obliations that are produced by that subtyping check. But the way that lazy normalization is meant to work is that, during the subtyping check, we sometimes produce new obligations that defer part of that check for later.

Consider an example like

for<'a> fn(<&'a as Mirror>::Item) =
  fn(&'b u8)

where <T as Mirror>::Item = T for all T. If we used a lazy normalization scheme, we would want to produce a new subobligation like

<'!1 as Mirror>::Item = &'b u8

This will, after being solved, ultimately yield a constraint that '!1 = 'b which will fail. But with the leak-check being performed on subtyping, there is no opportunity to normalize <'!1 as Mirror>::Item (unless we invoke that normalization directly from within subtyping, and I would prefer that subtyping and unification are distinct operations rather than part of the trait solving stack).

Note that if we removed the leak check, we'd have no problem here. We would produce this new obligation from subtyping, which would get solved. It would ultimately produce various region obligations that contain variables in various universes, corresponding to the for<..> binders we had to traverse, and the region solver can accommodate that and determine if there is an appropriate solution.

Why not remove the leak check (part 1)? Coherence.

Since the existing solver can handle universes, and the leak check is imprecise and causing problems for lazy normalization, the question you might reasonably ask then is "why not remove the leak check?" And indeed, we have tried in #55517, but we hit backwards compatibility concerns, which prompted us to introduce a future-compatibility warning while we figure out the ultimate solution. This MCP does not resolve that problem completely, but it does suggest the direction I think we ultimately want to take (more on that later).

The problem specifically centers around coherence. Today you can have impls like

impl<T> Trait for fn(T) { }
impl<U> Trait for fn(&U) { }

and they are not considered to overlap. This is true even though the coherence rules ignore region constraints (assume that they are solveable). This occurs because the leak-check produces a "subtyping error" result before the region constraints ever come into play, as the placeholder introduced for fn(&U) is equated with a region in the root universe (introduced for T).

Some crates rely on this pattern. The crate that relies on it in the most "troubling" way is wasm-bindgen. This is both because it's a prominent crate but also because the use case is reasonble and difficult to solve another way. As noted here, it has something two impls and in the &U case it uses distint trait bounds and the like.

// Owneship (from a macro):
impl<'a, 'b, A, R> IntoWasmAbi for &'a (dyn Fn(A) -> R + 'b)
where 
    A: FromWasmAbi,
    R: ReturnWasmAbi
{ /* ... */ }

// Explicitly writing the bound lifetime.
impl<'a, 'b, A, R> IntoWasmAbi for &'a (dyn for<'x> Fn(&'x A) -> R + 'b)
where
    A: RefFromWasmAbi,
    R: ReturnWasmAbi
{ /* ... */ }

It's not clear how to preserve this pattern without accepting two distinct impls.

Why not remove the leak check (part 2)? Subtle interactions.

One other observation I will note is that changing the subtyping rules can affect things beyond coherence, such as method dispatch. This is because we will decide that a method is "applicable" if the subtyping check succeeds (no matter what region constraints it generates). We haven't observed any real problems due to this in the wild, but as part of the PR implementing this proposal, some tests did mildly change where they reported errors, which might indicate an interaction. A crater run could help.

Proposed change in detail

OK, now that we've laid out the background, let's go over the proposed changes.

First change: make the leak check match the behavior of region solver

The first change is to alter how the leak check works to make it more precise. The newer leak-check is still invoked in the context of some snapshot S, but it no longer takes a specific set of placeholders to examine. Instead, it looks for placeholders in any universe created since the snapshot began. This allows it to be invoked after "some number" of higher-ranked for bindings have been performed, and it effectively "leak checks" all of them at once.

The algorithm starts as follows:

  • start a snapshot S
  • perform some number of "higher-ranked operations, each of which:
    • create a fresh universe U
    • substitute placeholders !Pa...!Pz (in the universe U) for the bound regions 'a...'z
    • perform various operations that may create region constraints like !Pa: 'b
  • consider the region constraints created during those operations as a graph
    • each R1: R2 constraint is an edge R1 -> R2

So far it is similar to the old leak check. We then:

  • Search the graph forward from each placeholder P that was created in some universe U created during the snapshot S.
  • The leak check fails if we find any reachable node N
    • N is another placeholder
    • N is a region in some universe that cannot name U

The main difference here is that we only search forward -- in other words, for each placeholder, we consider the regions that the placeholder must (transitively) outlive, but we don't consider those regions that must outlive the placeholder.

This means that we accept this example subtyping given earlier without an error:

for<'a> fn(&'a u32) -> &'static u32 
<:
for<'b> fn(&'b u32) -> &'b u32 

The reason is that the edge 'static: '!B that results is going in the wrong direction.

With this change, I believe the leak check now only rejects programs that would later fail the current region solver. However, there is a catch, as detailed below: the current region solver itself generates errors in some cases we would prefer to accept!

Perform leak check later

The next change is that we remove the leak check from the subtyping check. This alone would break the wasm-bindgen crate, however, so we add it back in two other locations:

  • during the coherence check
  • during the "candidate evaluation" step of trait solving

Adding the leak check during the coherence check allows us to accept wasm-bindgen, though it works somewhat differently. Before, we would attempt to unify the parameters of the two impls and fail. Now, that unification succeeds, but it generates region constriants that later fail the leak-check which coherence performs.

Adding the leak check into the "candidate evaluation" step is needed to prevent coherence from generating ambiguities in trait resolution. In particular, consider the two impls from wasm-bindgen, and assume that we are testing them against a type F = fn(&u32). Before, only one impl would be able to unify its types against the type F. Now, both are able to do so, producing two candidates, but one of those candidates is rejected during evaluation because it ultimately fails the leak-check.

Observations on the proposed change

Full set of places we perform a leak-check

After this change, we perform the leak-check at the following places:

  • coherence -- added here in my branch
  • evaluation_probe -- added here in my branch

The current region solver (and proposed leak check) reject some programs they should accept

As I noted earlier, I believe that the leak check as described above only rejects programs that would eventually have been rejected by the current region solver. However, the current region solver itself is overly conservative, and generates errors in cases that arguably we should accept -- and the leak check shares this flaw.

The reasons for this come down to "implied bounds" and are somewhat speculative -- i.e, te problems arise when we move the language in directions that it likely has to go, but not so much now.

Specifically, consider a type like

for<'a, 'b> fn(&'a &'b u32)

This type suggests that the type fn applies to any two regions 'a and 'b, but that's not entirely true. In order for &'a &'b u32 to be a "well-formed" type, we must know that 'b: 'a. So this actually constraints the set of regions that can be used with this function.

One might argue that, as a result, the fn type above should not be considered well-formed. Instead, perhaps it should look more like this (obviously not real syntax):

for<'a, 'b>
  where 'b: 'a
    fn(&'a &'b u32)

Here we are using a "where clause" to constrain the choices of 'a and 'b that we can use. In fact, it turns out that having some notion of something like where clauses is needed to fix a long-standing soundness hole in Rust.

However, we clearly aren't going to go rejecting all the Rust programs in existence by requiring that people write exotic where clauses like the one above. Instead, the way we ahd planned to fix this was by using the notion of "implied bounds". The fn type above in other words would be considered to be "shorthand" for a type like

for<'a, 'b> 
  where WellFormed(&'a &'b u32)
    fn(&'a &'b u32)

This comment explains this scheme in more detail, and also explains how it can fix the soundness hole.

Implied bounds mean we can sometimes know about relationships between placeholders

The key point for our purposes is that the implied bounds described in the previous section may sometimes make it possible to prove relationships between placeholders. Consider this type:

for<'a, 'b> fn(
  &'a &'b u32,
  &'b &'a u32,
)

As a result of the two argument, we know that 'b: 'a (first argument) and 'a: 'b (second argument). This means that in fact 'a = 'b.

The current region checker and leak check, however, would give an error if you have a constraint like !P1: !P2. This is because they are ignoring the "environment" (what where clauses are in scope) when they try to solve region constraints.

Being overly conservative is a problem for the leak check, but not region checker

Even though the leak checker and region solver are both overly conservative ("incomplete", in the technical term), this is more of a problem for the leak checker than the region solver. The reason is that the role of the leak checker and the region solver are different:

  • The leak check is used to decide "which path to take" during type-checking. For example, it guides the trait solver when it chooses which impls might apply and whether there is ambiguity, and decides whether coherence considers two impls as overlapping.
    • By moving the leak check out from subtyping, this current MCP minimizes this distinction, but doesn't eliminate it. This is because subtyping is sometimes used to guide choices like "which coercions apply", and the leak check would no longer be in that path.
  • The region solver, in contrast, is invoked at the end and it simply must find a solution or else compilation will abort.

In other words, if the region solver finds an error, compilation always aborts. But if the leak check finds an error, compilation may succeed in a different way.

An example where the leak check causes coherence to accept overlapping impls

Putting the previous two sections together, if we try to adapt the leak-check/solver so that they account for implied bounds, then we will wind up making them reject fewer sets of constraints. For the region solver, that's fine, it just means more programs start to compile. But for the leak check, it can make some programs that used to compile stop compiling, or it can cause programs to compile differently.

As an example, the following impls are still accepted on my branch (although they do get the #56105 future compatibility warning):

trait Trait { }

impl Trait for for<'a, 'b> fn(
  &'a &'b u32,
  &'b &'a u32,
) { }

impl Trait for for<'c> fn(
  &'c &'c u32,
  &'c &'c u32,
) { }

This is true even though those two types are equivalent, if you take implied bounds into account.

If we removed the leak check entirely, those two impls would be rejected.

Lack of DRY

One final observation is that the leak check and the region solver have lot of overlap in their functionality, which is clearly suboptimal. The leak check is sort of a simpler version of the latter.

Future directions: let the trait solver solve higher-ranked things

Looking to the future, as well as possible developments like polonius, I believe that there is a model that both fully satisfies #56105 and preserves the wasm-bindgen pattern.

In short, the idea is to replace the 'leak check' with a model where the trait solver has the job of "breaking down" higher-ranked region relationships like exists<'a> { forall<'b> { Outlives('a: 'b) } } into simpler ones that only occur in the root universe (in that particular example, the result would be Outlives('a: 'static)).

(I made a bit of a leap here: previously I've been talking about subtyping relationships and so forth. Right now I'm cutting to the chase of describing the region constraints that arise out of those subtyping obligations and just talking directly about those. These are effectively chalk-like goals, which is the syntax I've adopted.)

The intuition here is that the trait solver knows all the possible bounds on the placeholders. It also knows the environment and implied bounds that may be in scope. Moreover, it already has to do reasoning about implied bounds, so it is well-positioned to figure things out like "which region obligations are implied by a type being well-formed" (since that is not so different from "which trait bounds are implied by a type being well formed").

To give a more complex example of how we can simplify constraints, consider the following goal, where 'b and 'b two regions in the root universe:

forall<'a> {
  if (Outlives('a: 'b)) {
    Outlives('a: 'c)
  }
}

We can convert the above into a goal like Outlives('b: 'c). This is because there would be no other way we could possibly know that Outlives('a: 'c) is true.

This fits quite well with a Polonius-like model, since it is not entirely obvious how to gracefully extend that model to cover higher-ranked obligations (note to say it can't be done, but it just doesn't fit with the data-log-like execution model very well). Similarly, the existing NLL solver integrates universes, but the integration is awkward -- the analysis would otherwise be using simple set union, but now it must make complex checks.

Complication: no best simplification

One complication is that there will sometimes be "no best way" to resolve various possible region obligations. Consider something like:

forall<'a> {
  if (Outlives('a: 'b), Outlives('a: 'b2)) {
    Outlives('a: 'c)
  }
}

Whereas before we could simplify to Outlives('b: 'c), we now have two possible simplifications:

  • Outlives('b: 'c)
  • Outlives('b2: 'c)

and we have no way to know which is better. Depending which one we choose, we might generate borrow check errors down the line.

Actually, this problem already arises from the trait solver even before we consider the question of higher-ranked solutions, as is discussed in #21974.

Certainly the trait solver could produce a complex constraint that involves ||, such as Outlives('b: 'c) || Outlives('b2: 'c). There are two problems with this, though. First, the trait solver must always explore all solutions fully to find the constraints to || together, and -- more importantly -- the region solver can't really deal with || constraints, and solving them requires exploring all possible combinations, which could be slow and expensive. However, we can opt to be conservative in the region solver, and simply (for example) convert all || into && constraints, or try to pick other conservative approximations (we already do similar things today in some cases). Because of the fact that any error prodcued by the region solver ultimately halts compilation, it is safe for us to start simple and get smarter.

We can also have the trait checker simply refuse to pick the best solution in cases like these. This is what we do today in the case of #21974, for example. That is also an ok solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment