nikomatsakis/hr.md Secret

## hr.md

      
    Raw
  

              hr.md
            
          
    In my previous post about Polonius and subregion obligations, I
mentioned that there needs to be a follow-up to deal with
higher-ranked subregions. This post doesn't really provide a
solution. Instead, it just kinds of digs a bit into what the problem
is in the first place.
The subset relation in Polonius is not enough

In my original post on Polonius, I assumed that when we computed
a subtype relation T1 <: T2 between two types, the result was either
a hard error or a set of subset relations between various regions.
So, for example, if we had a subtype relation between two references:
&'a u32 <: &'b u32

the result would be a subset relation 'a: 'b (or, "'a contains a
subset of the loans in 'b").
For a more complex case, consider the relationship of two fn types:
fn(&'a u32) <: fn(&'b u32)
// ^^^^^^^     ^^^^^^^^^^
// |           A fn expecting a `&'b u32` as argument.
// |
// A fn expecting a `&'a u32` as argument.

If we imagine that we have some variable f of type fn(&'a u32) --
that is, a fn that can be called with a 'a reference -- then this
subtype relation is saying that f can be given the type fn(&'b u32) -- that is, a fn that can be called with a 'b reference.  That
is fine so long as that 'b reference can be used as a 'a
reference: that is, &'b u32 <: &'a u32. So, we can say that the two
fn types are subtypes so long as 'b: 'a (note that the order is
reversed from the first example; this is because fn types are
contravariant in their argument type).
Unfortunately, this structure isn't flexible enough to accommodate a
subtyping question involving higher-ranked types. Consider a subtype
relation like this:
fn(&'a u32) <: for<'b> fn(&'b u32)
//             ^^^^^^^
//             Unlike before, the supertype
//             expects a reference with *any*
//             lifetime as argument.

What subtype relation should come from this? We can't say 'b: 'a as
before, because the lifetime 'b isn't some specific region --
rather, the supertype says that the function has to accept a reference
with any lifetime 'b. (In fact, this subtyping relation should
ultimately yield an error -- but we would like that error to arise not
during subtyping, but rather later, when )
To express a constraint like this, we need a richer set of constraints
than just subset. In fact, what rustc implements is something more
like this:
Constraint = Subset
           | Constraint, Constraint
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }

Subset = R1: R2  

Now we can say that
fn(&'a u32) <: for<'b> fn(&'b u32)

holds if the constraint forall<'b> { 'b: 'a } holds, which implies
that 'a has to contain all possible loans. (This would of course be
possible if 'a were an "infinite set" covering the whole program or
some such thing, but we can treat it as an error.)
Background: universes

In fact, rustc has recently changed how it handles type checks that
involve forall-style constraints. In PR
#55517, we adopted a
system based on the concept of universes, which is something I
first learned about when studying up on how Lambda Prolog is
implemented. If you're interested in some background material, I gave
a talk at a recent Papers We Love meetup where I covered the
paper that explains how it works. I also wrote a section
in the rustc-guide on the topic some time back.
The basic idea of a universe is that it represents a set of names.
There is a notion of the "root universe" U0 that contains the "global"
names in the program. It's easiest to explain in terms of types: the
global names might be the structs from the program. When we want to
reason about a "forall" constraint, though, we do that by introducing
a fresh, placeholder name that is distinct from all other names in the
program. To do this, we create a new universe U1 that is a child of
the root univese -- the idea is that U1 contains all the global names from
U0, but also this additional placeholder name.
Next, we associate each inference variable with a universea. The
inference variable is only allowed to be inferred to values that
contain names from that universe.
So, as a simple example, imagine that we have a constraint like the
following (this is not a region constraint, just a general logic
equation):
forall<A> { exists<B> { A = B } }

Here we are saying, for all types A, there exists some type B where A
= B. Well, that's clearly true, we can just infer B to be A. In terms
of universes, we would say that the name A is bound to a placeholder
in U1, and the inference variable B is created in that same universe
U1.
But if we reverse the order of "forall" and "exists", we want to get an error:
exists<B> { forall<A> { A = B } }

In this case, the variable B would wind up in the root universe U0,
but the placeholder A is still in the child universe U1. Therefore,
the value for B cannot be A, because A is not nameable from U0.
Conclusion (for now)

So now we have this problem. To encode the "solutions" to
higher-ranked subtyping and trait matching, we need to use this richer
notion of constraints that include forall and exists quantifiers.  In
fact, I think we will eventually want something even stronger, that
includes implication and maybe even "or", so we might as well "scale
up" to hereditary harrop constraints (also a notion I learned
about through Lambda Prolog):
Constraint = Subset
           | Constraint, Constraint // and
           | Constraint; Constraint // or
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }
           | if (Assumption) { Constraint }
           
Assumption = Subset
           | forall<R1> { Assumption }
           | if (Constraint) { Assumption }

Subset = R1: R2  

But now we have to map these complex constraints down to the simple
subset relation that Polonius uses. I'll talk about how to do that in
the next post.

  
## hr2.md

      
    Raw
  

              hr2.md
            
          
    In my previous post, I introduced this notion of more complex region
constraints, which corresponded to hereditary harrop
constraints:
Constraint = Subset
           | Constraint, Constraint // and
           | Constraint; Constraint // or
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }
           | if (Assumption) { Constraint }
           
Assumption = Subset
           | forall<R1> { Assumption }
           | if (Constraint) { Assumption }

Subset = R1: R2  

But we need a way to translate these constraints into the simpler
subset relation of Polonius.
My proposal is to do this by leveraging the logic programming
solver. Essentially, we will write some Prolog-like rules that dictate
when a Subset is provable -- and those rules will generate a set of
simpler, first-order constraints that can be fed to polonius.
(Actually, the rules will generate multiple sets of first-order
constraints, but we'll get to that particular problem later.)
Notation

I'm going to use a kind of special goal in my notation:
GenerateConstraint[R1: R2] :-
  Universe(R1, U0), 
  Universe(R2, U0).

As the prolog rule states, this rule is only applicable if R1 and R2
are in the root univese. What makes this rule special is that it is
(in some sense) "side-effecting" -- basically, each time we prove this
rule, we generate a first-order constraint that must be fed to
polonius.
From a different perspective, once we've found a proof tree for the
constraint, we can create the Polonius subset constraints by
extracting all GenerateConstraint[R1: R2] nodes from the proof tree.
(An alternative would be to formulate the "output constraints" as an
explicit logic variable, but that notation is a lot more tedious.)
Reflexive rule and root universe subsets

The first rule is the simplest. Every region outlives itself.
Therefore:
R1: R1.

Next, Any time we have a relation between two root universe variables,
we can prove it by "recording" the constraint:
(R1: R2) :-
  GenerateConstraint[R1: R2].

So, as a simple example, if we had to solve the subtyping constraint
&'a u32 <: &'b u32

this would give rise to the facts a region constraint 'a: 'b, where
'a and 'b are in the root universe U0. We can then treat this as a
goal to prove, resulting in the following proof tree:
- 'a: 'b
  - GenerateConstraint['a: 'b]
    - Universe('a, U0).
    - Universe('b, U0).

So now we know that we need the polonius constraint of 'a: 'b. Wow.
Outside of the root universe

Where things get more interesting is when either R1 or R2 is a
placeholder and hence outside the root universe. In that case, we
can't relate them directly -- what we need to do instead is to relate
them through intermediaries.
(R1: R3) :-
  (not { RootUniverse(R1) }; not { RootUniverse(R3) }),
  (R1: R3),
  (R2: R3).

mentioned that there needs to be a follow-up to deal with
higher-ranked subregions. This post kinds of digs into one approach
that I've been thinking about. The idea is actually not to deal with
higher-ranked problems within Polonius, but rather to solve such
problems before-hand, basically reducing them to the sorts of subset
relations that Polonius can solve with ease. Along the way, we'll
discuss some of what makes how we deal with region constraints in
Chalk, and some of the challenges they pose to an efficient
implementation (plus the workarounds we presently use in the
compiler).
The subset relation in Polonius is not enough

In my original post on Polonius, I assumed that when we computed
a subtype relation T1 <: T2 between two types, the result was either
a hard error or a set of subset relations between various regions.
So, for example, if we had a subtype relation between two references:
&'a u32 <: &'b u32

the result would be a subset relation 'a: 'b (or, "'a contains a
subset of the loans in 'b").
For a more complex case, consider the relationship of two fn types:
fn(&'a u32) <: fn(&'b u32)
// ^^^^^^^     ^^^^^^^^^^
// |           A fn expecting a `&'b u32` as argument.
// |
// A fn expecting a `&'a u32` as argument.

If we imagine that we have some variable f of type fn(&'a u32) --
that is, a fn that can be called with a 'a reference -- then this
subtype relation is saying that f can be given the type fn(&'b u32) -- that is, a fn that can be called with a 'b reference.  That
is fine so long as that 'b reference can be used as a 'a
reference: that is, &'b u32 <: &'a u32. So, we can say that the two
fn types are subtypes so long as 'b: 'a (note that the order is
reversed from the first example; this is because fn types are
contravariant in their argument type).
Unfortunately, this structure isn't flexible enough to accommodate a
subtyping question involving higher-ranked types. Consider a subtype
relation like this:
fn(&'a u32) <: for<'b> fn(&'b u32)
//             ^^^^^^^
//             Unlike before, the supertype
//             expects a reference with *any*
//             lifetime as argument.

What subtype relation should come from this? We can't say 'b: 'a as
before, because the lifetime 'b isn't some specific region --
rather, the supertype says that the function has to accept a reference
with any lifetime 'b. (In fact, this subtyping relation should
ultimately yield an error -- but we would like that error to arise not
during subtyping, but rather later, when )
To express a constraint like this, we need a richer set of constraints
than just subset. In fact, what rustc implements is something more
like this:
Constraint = Subset
           | Constraint, Constraint
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }

Subset = R1: R2  

Now we can say that
fn(&'a u32) <: for<'b> fn(&'b u32)

holds if the constraint forall<'b> { 'b: 'a } holds, which implies
that 'a has to contain all possible loans. (This would of course be
possible if 'a were an "infinite set" covering the whole program or
some such thing, but we can treat it as an error.)
Side note: universes

In fact, rustc has recently changed how it handles type checks that
involve forall-style constraints. In PR
#55517, we adopted a
system based on the concept of universes, which is something I
first learned about when studying up on how Lambda Prolog is
implemented. If you're interested in some background material, I gave
a talk at a recent Papers We Love meetup where I covered the
paper that explains how it works. I also wrote a section
in the rustc-guide on the topic some time back.
The basic idea of a universe is that it represents a set of names.
There is a notion of the "root universe" U0 that contains the "global"
names in the program. It's easiest to explain in terms of types: the
global names might be the structs from the program. When we want to
reason about a "forall" constraint, though, we do that by introducing
a fresh, placeholder name that is distinct from all other names in the
program. To do this, we create a new universe U1 that is a child of
the root univese -- the idea is that U1 contains all the global names from
U0, but also this additional placeholder name.
Next, we associate each inference variable with a universea. The
inference variable is only allowed to be inferred to values that
contain names from that universe.
So, as a simple example, imagine that we have a constraint like the
following (this is not a region constraint, just a general logic
equation):
forall<A> { exists<B> { A = B } }

Here we are saying, for all types A, there exists some type B where A
= B. Well, that's clearly true, we can just infer B to be A. In terms
of universes, we would say that the name A is bound to a placeholder
in U1, and the inference variable B is created in that same universe
U1.
But if we reverse the order of "forall" and "exists", we want to get an error:
exists<B> { forall<A> { A = B } }

In this case, the variable B would wind up in the root universe U0,
but the placeholder A is still in the child universe U1. Therefore,
the value for B cannot be A, because A is not nameable from U0.
A richer notion of region constraints

So now we have this problem. We have this richer notion of
constraints, that include forall and exists quantifiers, and we need
to map this to polonius constraints, which are just sets of subset
edges. In fact, I think we actually want a somewhat richer notion of
constraints, one that includes implication as well (I'll use the Chalk
notation of "if" for that):
Constraint = Subset
           | Constraint, Constraint
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }
           | if (Constraint) { Constraint }

Subset = R1: R2  

This maps pretty closely to the notion of hereditary harrop
clauses from Lambda Prolog.
Including implication is somewhat interesting. It implies we might have
a constraint like this, for example:
exists<'a> {
  forall<'b> {
    if ('a: 'b) {
      'a: 'b
    }
  }
}

Here, even though the variable 'a is in a universe which cannot name
'b, we still know that 'a: 'b is true (because we assumed it), and
so 'a: 'b is provable.
Let's look at another example:
exists<'a> {
  forall<'b> {
    if ('static: 'b) {
      'a: 'b
    }
  }
}

Here, 'a cannot name 'b, but it can name 'static. And so for
'a to outlive
As I wrote, rustc adopted a universe-based approach in #55517.a
`'static, bbu
"forall" things by introducing a fresh univese
it
encodes a set of loans. Each region (like 'a) is assigned to a
specific universe and that encodes the loans that it can contain.
Regions are structured into a tree, where a child universe U2
extends its parent U1 with additional loans that are not included
in U1. In other words, the child U2 is a strict superset of the
parent U1.
For our purposes, we will always have a root universe that
contains all the loans that appear in the current function being
checked. We'll create new universes to reason about higher-ranked
functions and trait matching. Just as in rustc, we can represent
universes as a simple integer. We call the root universe U0, and thus
a child universe might be U1 -- because U0 < U1, we know that U1 is a
child of U0.
So, let's go back to that example of
fn(&'a u32) <: for<'b> fn(&'b u32)

If you recall, that generated a constraint forall<'b> { 'b: 'a }.
We might encode that by saying that the region variable 'a is in the
universe U0, but the region variable 'b is in a child universe U1.
The idea is that because 'b corresponds to "any set of loans", we
model that as potentially containing some (as yet unknown) loans that
couldn't be named by 'a.

  
## subregion-contains.md

      
    Raw
  

              subregion-contains.md
            
          
    Now that NLL has been shipped, I've been doing some work revisiting
the Polonius project. Polonius is the project that implements
[the "alias-based formulation" described in my older
blogpost][pp]. Polonius has come a long way since that post; it's now
quite fast and also experimentally integrated into rustc, where it
passes the full test suite.
[pp]: {{ site.baseurl }}/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/
However, polonius as described is not complete. It describes the core
"borrow check" analysis, but there are a number of other checks that
the current implementation checks which polonius ignores:

Polonius does not account for moves and initialization.
Polonius does not check for relations between named lifetimes.

This blog post is focused on the second of those bullet points. It
covers the simple cases; hopefully I will soon post a follow-up that
targets some of the more complex cases that can arise.
Brief Polonius review

If you've never read the [the original Polonius post][pp], you should probably
do so now. But if you have, let me briefly review some of the key details
that are relevant to this post:

Instead of interpreting the 'a notation as the lifetime of a
reference (i.e., a set of points), we interpret 'a as a set of
loans. We refer to 'a as a "region"¹ in order to
emphasize this distinction.
We call 'a: 'b a subset relation; it means that the loans in
'a must be a subset of the loans in 'b. We track the required
subset relations at each point in the program.
A loan comes from some borrow expression like &foo. A loan L0
is "live" if some live variable contains a region 'a whose value
includes L0. When a loan is live, the "terms of the loan" must be
respected: for a shared borrow like &foo, that means the path that
was borrowed (foo) cannot be mutated. For a mutable borrow, it
means that the path that was borrowed cannot be accessed at all.

If an access occurs that violates the terms of a loan, that is an
error.


Running Example 1

Let's give a quick example of some code that should result in an
error, but which would not if we only considered the errors that
polonius reports today:
fn foo<'a, 'b>(x: &'a [u32], y: &'b [u32]) -> &'a u32 {
    &y[0]
}
Here, we declared that we are returning a &u32 with lifetime 'a
(i.e., borrowed from x) but in fact we are returning data with
lifetime 'b (i.e., borrowed from y).
Slightly simplified, the MIR for this function looks something like
this:
fn foo(_1: &'a [u32], _2: &'b [u32]) -> &'a [u32] {
  _0 = &'X (*_2)[const 0usize]; // S0
  return;                       // S1
}  

As you can see, there's only really one interesting statement; it
borrows from _2 and stores the result into _0, which is the
special "return slot" in MIR.
In the case of the parameters _1 and _2, the regions come directly
from the method signature. For regions appearing in the function body,
we create fresh region variables -- in this case, only one, 'X. 'X
represents the region assigned to the borrow.
The relevant polonius facts for this function are as follows:

base_subset('b, 'X, mid(S0)) -- as described in the NLL
RFC, "re-borrowing" the referent of a reference (i.e.,
*_2) creates a subset relation between the region of the region
(here, 'b) and the region of the borrow (here, 'X). Written in
the notation of the [NLL RFC], this would be the relation 'X: 'b @ mid(S0).
base_subset('X, 'a, mid(S0)) -- the borrow expression in S0
produces a result of type &'X u32. This is then assigned to _0,
which has the type &'a [u32].  The subtyping rules
require that 'X: 'a.

Combining the two base_subset relations allows us to conclude that
the full subset relation includes subset('b, 'a, mid(S0)) -- that
is, for the function to be valid, the region 'b must be a subset of
the region 'a. This is an error because the regions 'a and 'b
are actually parameters to foo; in other words, foo must be valid
for any set of regions 'a and 'b, and hence we cannot know if
there is a subset relationship between them. This is a different
sort of error than the "illegal access" errors that Polonius reported
in the past: there is no access at all, in fact, simply subset
relations.
Placeholder regions and loans

There is an important distinction between named regions like 'a and
'b and the region 'X we created for a borrow. The definition of
foo has to be true for all regions 'a and 'b, but for a
region like 'X there only has to be some valid value. This
difference is often called being universally quantified (true for
all regions) versus existentially quantified (true for some
region).
In this post, I will call universally quantified regions like 'a and
'b "placeholder" regions. This is because they don't really
represent a known quantity of loans, but rather a kind of
"placeholder" for some unknown set of loans.
Next, we'll introduce the idea of a placeholder loan. Whereas
previously all the Polonius loans represented some restrictions
created by borrow expressions that appeared within the source
function, placeholder loans represent restrictions created by borrow
expressions we can't see. There is one placeholder loan created for
each of the placeholder regions.
To represent these placeholders, we'll add a relation placeholder_region
that relates the region and its corresponding loan:
.decl placeholder_region(R: region, L: loan)
.input placeholder_region

This fact is true for any placeholder region. So in our example we might have
placeholder_region('a, loan_a).
placeholder_region('b, loan_b).

Here loan_a and loan_b represent the placeholder loans for 'a and 'b
respectively.
(This placeholder_region is similar but different to an existing
relation in the polonius implementation, called
universal_region. The main difference is that we now include the
placeholder loan -- these loans will be particularly useful in the
next post, when we go beyond named parameters like 'a or 'b to
talk about the higher-ranked regions that appear in types like
for<'a> fn(&'a u32).)
Renaming the "requires" relation from Polonius

The [original polonius formulation][pp] included a relation requires:
.decl requires(R:region, L:loan, P:point)

which can be read as "the region R requires the terms of the loan L to
be enforced at the point P".
In retrospect, though, I think I should have called this relation
contains, as it basically indicates set membership -- that is, it
indicates that -- at the point P -- the loan L is a member of
the region R (L ∈ R). So I'm going to retroactively rename it to
contains:
.decl contains(R:region, L:loan, P:point)

Extending the "contains" relation with placeholder

We can extend this to include placeholders like so:
contains(R, L, P) :- placeholder_region(R, L).

Note that P here from the header is actually unbound in the
body. This is not in fact valid datalog. We can make it valid datalog
by having an input relation for CFG nodes:
.decl cfg_node(P: point)

Then we can rewrite our requires rule as to use cfg_node:
contains(R, L, P) :- placeholder_region(R, L), cfg_node(P).

If we go back to our running example function, we can derive the following
clauses:
contains('a, loan_a, mid(S0))
contains('b, loan_b, mid(S0))

If you recall, we were also able to derive a subset relation between 'b and 'a:
subset('b, 'a, mid(S0))

Using the rules from the [previous polonius post][pp], we can combine
the subset and contains relations to conclude that loan_b is a
member of 'a (as 'a is a superset of 'b):
contains('a, loan_b, mid(S0))

And this precisely brings us to the point of our error: based on the
signature of our function, we do not expect 'a to have loan_b as a
member. But how do we express that in datalog?
Representing known relations

In practice, the contents of placeholder regions are not always
totally unknown. The function signature will often include where
clauses (or implied bounds) that indicate some known relationships
between placeholder regions. For example, if our example foo had
included a where clause like where 'b: 'a, then there would be no
error.
We can represent the known relationships using an input:
.decl known_subset(R1: region, R2: region)
.input known_subset

Then we can define a known_contains rules to use these "known
subset" relations to figure out which placeholder regions are known to
contain which loans:
.decl known_contains(R: region, L: region)

known_contains(R, L) :- placeholder_region(R, L).
known_contains(R2, L) :- known_contains(R1, L), known_subset(R1, R2).

In our example of foo, there are no where clauses nor implied
bounds, so these relations are empty. If there were a where clause
like where 'b: 'a, however, then we would have a known_subset('b, 'a) fact, which would allow us to conclude the following
known_contains relation:
known_contains('a, loan_a) // from `placeholder_region`
known_contains('b, loan_b) // from `placeholder_region`
known_contains('a, loan_b) // derived from `known_subset`

Detecting illegal subset relations

We can now extend the polonius rules to report errors for cases like
our running example. The basic idea is this: if the function requires
a subset relationship 'r1: 'r2 between two placeholder regions 'r1
and 'r2, then it must be a "known subset", or else we have an error.
We can encode this like so:
.decl placeholder_error(R1: region, R2: region, P:point)

placeholder_error(R1, R2, P) :-
  contains(R1, L2, P),        // `L1` is an element of `R1`
  placeholder_region(R1, _),  // `R1` is a placeholder
  placeholder_region(R2, L2), // `L2` is the placeholder loan for `R2`
  !known_contains(R1, L2).    // `R1` is not known to contain `L2`.

In our example program, we can clearly derive placeholder_error('a, 'b, mid(S0)),
and hence we have an error:

we saw earlier that contains('a, loan_b, mid(S0)) holds
as 'a is a placeholder region, placeholder_region('a, _) will
appear in the input
similarly, placeholder_region('b, loan_b) appears in the input,
giving us the connection to 'b
finally, the known_contains relation in our example is empty
(since known_subset is empty), so !known_contains('a, loan_b) is
true.

Sidenote on negative reasoning and stratification. This rule makes
use of negative reasoning in the form of the !known_contains(R1, R2)
predicate. Negative reasoning is fine in datalog so long as the
program is "stratified" -- in particular, we must be able to compute
the entire known_contains relation without having to compute
subset_error. In this case, the program is trivialy stratified --
known_contains depends only on the input relation
known_base_subset.)
Conclusion

This post describes a simple extension to the polonius rules that
covers errors arising from subset relations. Unlike the prior rules,
these errors are not triggered by any "access", but rather simply the
creation of a (transitive) subset relation between two placeholder
regions.
Unfortunately, this is not the complete story around region checking
errors. In particular, this post ignored subset relations that can
arise from "higher-ranked" types like for<'a> fn(&'a u32). Handling
these properly requires us to introduce a bit more logic and will be
covered in a follow-up.
Comments, if any, should be posted in the internals thread dedicated
to my previous polonius post
Footnotes

Footnotes


The term "region" is not an especially good fit, but it's common in academia. ↩


## subregion1.md

      
    Raw
  

              subregion1.md
            
          
    Now that NLL has been shipped, I've been doing some work revisiting
the Polonius project. Polonius is the project that implements
[the "alias-based formulation" described in my older
blogpost][pp]. Polonius has come a long way since that post; it's now
quite fast and also experimentally integrated into rustc, where it
passes the full test suite.
[pp]: {{ site.baseurl }}/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/
However, polonius as described is not complete. It describes the core
"borrow check" analysis, but there are a number of other checks that
the current implementation checks which polonius ignores:

Polonius does not account for moves and initialization.
Polonius does not check for relations between named lifetimes.

This blog post is focused on the second of those bullet points. It
covers the simple cases; hopefully I will soon post a follow-up that
targets some of the more complex cases that can arise (specifically,
dealing with higher-ranked things).
Brief Polonius review

If you've never read the [the original Polonius post][pp], you should probably
do so now. But if you have, let me briefly review some of the key details
that are relevant to this post:

Instead of interpreting the 'a notation as the lifetime of a
reference (i.e., a set of points), we interpret 'a as a set of
loans. We refer to 'a as a "region"¹ in order to
emphasize this distinction.
We call 'a: 'b a subset relation; it means that the loans in
'a must be a subset of the loans in 'b. We track the required
subset relations at each point in the program.
A loan comes from some borrow expression like &foo. A loan L0
is "live" if some live variable contains a region 'a whose value
includes L0. When a loan is live, the "terms of the loan" must be
respected: for a shared borrow like &foo, that means the path that
was borrowed (foo) cannot be mutated. For a mutable borrow, it
means that the path that was borrowed cannot be accessed at all.

If an access occurs that violates the terms of a loan, that is an
error.


Running Example 1

Let's give a quick example of some code that should result in an
error, but which would not if we only considered the errors that
polonius reports today:
fn foo<'a, 'b>(x: &'a [u32], y: &'b [u32]) -> &'a u32 {
    &y[0]
}
Here, we declared that we are returning a &u32 with lifetime 'a
(i.e., borrowed from x) but in fact we are returning data with
lifetime 'b (i.e., borrowed from y).
Slightly simplified, the MIR for this function looks something like
this.
fn foo(_1: &'a [u32], _2: &'b [u32]) -> &'a [u32] {
  _0 = &'X (*_2)[const 0usize]; // S0
  return;                       // S1
}  

As you can see, there's only really one interesting statement; it
borrows from _2 and stores the result into _0, which is the
special "return slot" in MIR.
In the case of the parameters _1 and _2, the regions come directly
from the method signature. For regions appearing in the function body,
we create fresh region variables -- in this case, only one, 'X. 'X
represents the region assigned to the borrow.
The relevant polonius facts for this function are as follows:

base_subset('b, 'X, mid(S0)) -- as described in the NLL
RFC, "re-borrowing" the referent of a reference (i.e.,
*_2) creates a subset relation between the region of the region
(here, 'b) and the region of the borrow (here, 'X). Written in
the notation of the [NLL RFC], this would be the relation 'X: 'b @ mid(S0).
base_subset('X, 'a, mid(S0)) -- the borrow expression in S0
produces a result of type &'X u32. This is then assigned to _0,
which has the type &'a [u32].  The subtyping rules
require that 'X: 'a.

Combining the two base_subset relations allows us to conclude that
the full subset relation includes subset('b, 'a, mid(S0)) -- that
is, for the function to be valid, the region 'b must be a subset of
the region 'a. This is an error because the regions 'a and 'b
are actually parameters to foo; in other words, foo must be valid
for any set of regions 'a and 'b, and hence we cannot know if
there is a subset relationship between them. This is a different
sort of error than the "illegal access" errors that Polonius reported
in the past: there is no access at all, in fact, simply subset
relations.
Placeholder regions

There is an important distinction between named regions like 'a and
'b and the region 'X we created for a borrow. The definition of
foo has to be true for all regions 'a and 'b, but for a
region like 'X there only has to be some valid value. This
difference is often called being universally quantified (true for
all regions) versus existentially quantified (true for some
region).
In this post, I will call universally quantified regions like 'a and
'b "placeholder" regions. This is because they don't really
represent a known quantity of loans, but rather a kind of
"placeholder" for some unknown set of loans.
We will include a base fact that helps us to identify placeholder regions:
.decl placeholder_region(R1: region)
.input placeholder_region

This fact is true for any placeholder region. So in our example we might have
placeholder_region('a).
placeholder_region('b).

Note that the actual polonius impl already includes a relation like
this², because we need to account for the fact that
placeholder regions are "live" at all points in the control-flow
graph, as we always assume there may be future uses of them that we
cannot see.
Representing known relations

Even placeholder regions are not totally unknown though. The
function signature will often include where clauses (or implied
bounds) that indicate some known relationships between placeholder
regions. For example, if foo included a where clause like where 'b: 'a, then it would be perfectly legal.
We can represent the known relationships using an input:
.decl known_base_subset(R1: region, R2: region)
.input known_base_subset

Naturally these known relations are transitive, so we can define a
known_subset rule to encode that:
.decl known_subset(R1: region, R2: region)

known_subset(R1, R2) :- known_base_subset(R1, R2).
known_subset(R1, R3) :- known_base_subset(R1, R2), known_subset(R2, R3).

In our example of foo, there are no where clauses nor implied
bounds, so these relations are empty. If there were a where clause
like where 'b: 'a, however, then we would have a
known_base_subset('b, 'a) fact. Similarly, per out implied bounds
rules, such an input fact might be derived from an argument with a
type like &'a &'b u32, where there are 'nested' regions.
Detecting illegal subset relations

We can now extend the polonius rules to report errors for cases like
our running example. The basic idea is this: if the function requires
a subset relationship 'r1: 'r2 between two placeholder regions 'r1
and 'r2, then it must be a "known subset", or else we have an error.
We can encode this like so:
.decl subset_error(R1: region, R2: region, P:point)

subset_error(R1, R2, P) :-
  subset(R1, R2, P),      // `R1: R2` required at `P`
  placeholder_region(R1), // `R1` is a placeholder
  placeholder_region(R2), // `R2` is also a placeholder
  !known_subset(R1, R2).  // `R1: R2` is not a "known subset" relation.

In our example program, we can clearly derive subset_error('b, 'a, mid(S0)),
and hence we have an error:

we saw earlier that subset('a, 'b, mid(S0)) holds
as 'a is a placeholder region, placeholder_region('a) will
appear in the input (same for 'b)
finally, the known_base_subset (and hence known_subset) relation
in our example is empty

Sidenote on negative reasoning and stratification. This rule makes
use of negative reasoning in the form of the !known_subset(R1, R2)
predicate. Negative reasoning is fine in datalog so long as the
program is "stratified" -- in particular, we must be able to compute
the entire known_subset relation without having to compute
subset_error. In this case, the program is trivialy stratified --
known_subset depends only on the input relation
known_base_subset.)
Observation about borrowing local data

It is interesting to walk through a different example. This is another
case where we expect an error, but in this case the error arises
because we are returning a reference to the stack:
fn bar<'a>(x: &'a [u32]) -> &'a u32 {
    let stack_slot = x[0];
    &stack_slot
}
Polonius will report an error for this case, but not because of the
mechanisms in this blog post. What happens instead is that we create a
loan for the borrow expression &stack_slot, we'll call it L0. When
the borrow is returned, this loan L0 winds up being a member of the
'a region.  It is therefore "live" when the storage for stack_slot
is popped from the stack, which is an error: you can't pop the storage
for a stack slot where there are live loans that have reference it.
Conclusion

This post describes a simple extension to the polonius rules that
covers errors arising from subset relations. Unlike the prior rules,
these errors are not triggered by any "access", but rather simply the
creation of a (transitive) subset relation between two placeholder
regions.
Unfortunately, this is not the complete story around region checking
errors. In particular, this post ignored subset relations that can
arise from "higher-ranked" types like for<'a> fn(&'a u32). Handling
these properly requires us to introduce a bit more logic and will be
covered in a follow-up.
Comments, if any, should be posted in the internals thread dedicated to my previous
polonius post
Appendix: A (potentially) more efficient formulation

The subset_error formulation above relied on the transitive subset
relation to work, because we wanted to report errors any time that one
placeholder wound up being forced to be a subset of another. In the
more optimized polonius implementations, we don't compute the full
transitive relation, so it might be useful to create a new relation
subset_placeholder that is specific to placeholder regions:
.decl subset_placeholder(R1: region, R2: region, P:point)

The idea is that subset_placeholder(R1, R2, P) means that, at the
point P, we know that R1: R2 must hold, where R1 is a placeholder.
You can express this via a "base" rule:
subset_placeholder(R1, R2, P) :-
  subset(R1, R2, P),      // `R1: R2` required at `P`
  placeholder_region(R1). // `R1` is a placeholder

and a transitive rule:
subset_placeholder(R1, R3, P) :-
  subset_placeholder(R1, R2, P), // `R1: R2` at P where `R1` is a placeholder
  subset(R2, R3, P).      // `R2: R3` required at `P`

Then we reformulate the subset_error rule to be based on subset_placeholder:
.decl subset_error(R1: region, R2: region, P:point)

subset_error(R1, R2, P) :-
  subset_placeholder(R1, R2, P), // `R1: R2` required at `P`
  placeholder_region(R2), // `R2` is also a placeholder
  !known_subset(R1, R2).  // `R1: R2` is not a "known subset" relation.

Footnotes

Footnotes


The term "region" is not an especially good fit, but it's common in academia. ↩


Currently called universal_region, though I plan to rename it. ↩


## subregion2.md

      
    Raw
  

              subregion2.md
            
          
    Following up on my previous post, let's take a look at checking
subregion relationships with higher-ranked types.
The subset relation in Polonius is not enough

In my original post on Polonius, I assumed that when we computed
a subtype relation T1 <: T2 between two types, the result was either
a hard error or a set of subset relations between various regions.
So, for example, if we had a subtype relation between two references:
&'a u32 <: &'b u32

the result would be a subset relation 'a: 'b (or, "'a contains a
subset of the loans in 'b").
For a more complex case, consider the relationship of two fn types:
fn(&'a u32) <: fn(&'b u32)
// ^^^^^^^     ^^^^^^^^^^
// |           A fn expecting a `&'b u32` as argument.
// |
// A fn expecting a `&'a u32` as argument.

If we imagine that we have some variable f of type fn(&'a u32) --
that is, a fn that can be called with a 'a reference -- then this
subtype relation is saying that f can be given the type fn(&'b u32) -- that is, a fn that can be called with a 'b reference.  That
is fine so long as that 'b reference can be used as a 'a
reference: that is, &'b u32 <: &'a u32. So, we can say that the two
fn types are subtypes so long as 'b: 'a (note that the order is
reversed from the first example; this is because fn types are
contravariant in their argument type).
Unfortunately, this structure isn't flexible enough to accommodate a
subtyping question involving higher-ranked types. Consider a subtype
relation like this:
fn(&'a u32) <: for<'b> fn(&'b u32)
//             ^^^^^^^
//             Unlike before, the supertype
//             expects a reference with *any*
//             lifetime as argument.

What subtype relation should come from this? We can't say 'b: 'a as
before, because the lifetime 'b isn't some specific region --
rather, the supertype says that the function has to accept a reference
with any lifetime 'b. (In fact, this subtyping relation should
ultimately yield an error -- but we would like that error to arise not
during subtyping, but rather later, when )
To express a constraint like this, we need a richer set of constraints
than just subset. In fact, what rustc implements is something more
like this:
Constraint = Subset
           | Constraint, Constraint
           | forall<R1> { Constraint }
           | exists<R1> { Constraint }

Subset = R1: R2  

Now we can say that
fn(&'a u32) <: for<'b> fn(&'b u32)

holds if the constraint forall<'b> { 'b: 'a } holds, which implies
that 'a has to contain all possible loans. (This would of course be
possible if 'a were an "infinite set" covering the whole program or
some such thing, but we can treat it as an error.)
Encoding these constraints with universes

OK, so we want this richer family of constraints, but how can we
encode those sorts of constraints into the Polonius subset relation?
Actually, the way we check these things in rustc has been changing
too. We've recently (in
#55517) adopted a
system based on the concept of universes, which is something I
first learned about when studying up on how Lambda Prolog is
implemented. If you're interested in some background material, I gave
a talk at a recent Papers We Love meetup where I covered the
paper that explains how it works. I also wrote a section
in the rustc-guide on the topic some time back.
The basic idea of a universe, in this context at least, is that it
encodes a set of loans. Each region (like 'a) is assigned to a
specific universe and that encodes the loans that it can contain.
Regions are structured into a tree, where a child universe U2
extends its parent U1 with additional loans that are not included
in U1. In other words, the child U2 is a strict superset of the
parent U1.
For our purposes, we will always have a root universe that
contains all the loans that appear in the current function being
checked. We'll create new universes to reason about higher-ranked
functions and trait matching. Just as in rustc, we can represent
universes as a simple integer. We call the root universe U0, and thus
a child universe might be U1 -- because U0 < U1, we know that U1 is a
child of U0.
So, let's go back to that example of
fn(&'a u32) <: for<'b> fn(&'b u32)

If you recall, that generated a constraint forall<'b> { 'b: 'a }.
We might encode that by saying that the region variable 'a is in the
universe U0, but the region variable 'b is in a child universe U1.
The idea is that because 'b corresponds to "any set of loans", we
model that as potentially containing some (as yet unknown) loans that
couldn't be named by 'a.
To encode these relationships into Polonius facts, I propose to extend
the subset relation. Whereas before we had a subset(R1, R2, P)
relation, we will now include a universe:
.decl subset_base(R1: region, U: universe, R2: region, P:point)
.input subset_base
The meaning is something like this:

For every loan L in R1,

there must be some loan L' in R2, where

L' is nameable in U, and
L is a subset of L'.


For subset_base, this universe will always be the universe
associated with the "superset" region -- so to encode forall<'b> { 'b: 'a }, we would use the universe U0, since 'a is in U0:
subset_base('b, U0, 'a, P)

We can carry the universe into the full subset relation unchanged:
subset(R1, U, R2, P) :-
  subset_base(R1, U, R2, P).

And hence we get the relation subset('b, U0, 'a, P).
Transitive rule with universes

The transitive subset rule is where things get interesting. The idea
here is that we can combine two subset relations, but we need to
also combine their universes by taking the "least upper bound" (which
is just the minimum of the two indices in practice):
subset(R1, U3, R3, P) :-
  subset(R1, U1, R2, P),
  subset(R2, U2, R3, P),
  least_upper_bound(U1, U2, U3).

Let's work through an example to show how this works. Imagine
that we have a region constraint like this:
exists<R1> {
  forall<R2> { R2: R1 }
  forall<R3> { R1: R3 }
}

We will see that this constraint is unsolveable. No single region R1
can be both a subset and a superset of all other
regions. (Nonetheless, we can still construct a transitive subset
relation using the rules above.)
Let's say that the universe of R1 is U1 and the universes for R2
and R3 are U2 and U3 respectively. These universes form a tree,
where U2 and U3 are siblings and U1 is their parent:

U1

U2
U3


We can translate this to two subset_base facts (here I am ignoring the position,
which doesn't play a role):
subset_base(R2, U1, R1, P)
subset_base(R1, U3, R3, P)

Remember that we always use the univese of the "superregion" -- in the
first relation, R2: R1, the superregion is R1 so we use U1. In
the second, we use the universe U3 for the region R3.
Now we can create a transitive subset relation, but we have to
combine U1 and U3 -- the minimum of which is U1. The resulting
subset relation is then:
subset(R2, U1, R3, P)

Accommodating approximation

subset(R1b, E, R2, P) :-
  subset(R1a, E, R2, P),
  placeholder_region(R1),
  approximate(R1a, E1, R1b).

This is a key point. If we are going to approximate, do we need to
generate a true subset relation? It's awfully unfortunate, going to
complicate optimization. My key fear is that we establish some subset
that can force a loan to be extended. Trying to come up with some
realistic code where that is the case. Trait matching is an obvious
candidate, but it's tricky, because it is invariant. But I feel like
there must be a way to "force" a region to 'static.
Well, there is a semi-obvious one:
fn foo<'a>(x: &'a u32)
where for<'b> 'a: 'b { }

fn main() {
  let data: u32 = 22;
  let p: &'1 u32 = &'0 data;
  foo::<'2>(data); // requires:
  // &'1 u32 <: &'2 u32
  // '1: '2
  // forall<'3> { '2: '3 }
}

Hmm but this isn't the best example. What happens here is that we
get an error because we push the loan of data into '3, and that
fails the "known contains" check (this assumes that we don't generate
loans of static data). Or... alternatively... we get an error because
'3 (as a placeholder..) is live, and that makes the loan live until
the slot is freed?
(What are the liveness rules around subuniverse placeholders? I
assume they are not live anywhere, actually.)
Maybe if we add if?
fn foo<'a>(x: &'a u32)
where for<'b> where ('a: 'b) { 'a: 'b } { }

fn main() {
  let data: u32 = 22;
  let p: &'1 u32 = &'0 data;
  foo::<'2>(data); // requires:
  // &'1 u32 <: &'2 u32
  // '1: '2
  // forall<'3> { '2: '3 }
}

Errors

placeholder_error(R1, E, R2, P) :-
  subset(R1, E, R2, P),
  placeholder_region(R1),
  inexpressible(R1, E1).