In my previous post about Polonius and subregion obligations, I mentioned that there needs to be a follow-up to deal with higher-ranked subregions. This post doesn't really provide a solution. Instead, it just kinds of digs a bit into what the problem is in the first place.
In my original post on Polonius, I assumed that when we computed
a subtype relation T1 <: T2
between two types, the result was either
a hard error or a set of subset
relations between various regions.
So, for example, if we had a subtype relation between two references:
&'a u32 <: &'b u32
the result would be a subset
relation 'a: 'b
(or, "'a
contains a
subset of the loans in 'b
").
For a more complex case, consider the relationship of two fn types:
fn(&'a u32) <: fn(&'b u32)
// ^^^^^^^ ^^^^^^^^^^
// | A fn expecting a `&'b u32` as argument.
// |
// A fn expecting a `&'a u32` as argument.
If we imagine that we have some variable f
of type fn(&'a u32)
--
that is, a fn that can be called with a 'a
reference -- then this
subtype relation is saying that f
can be given the type fn(&'b u32)
-- that is, a fn that can be called with a 'b
reference. That
is fine so long as that 'b
reference can be used as a 'a
reference: that is, &'b u32 <: &'a u32
. So, we can say that the two
fn types are subtypes so long as 'b: 'a
(note that the order is
reversed from the first example; this is because fn types are
contravariant in their argument type).
Unfortunately, this structure isn't flexible enough to accommodate a subtyping question involving higher-ranked types. Consider a subtype relation like this:
fn(&'a u32) <: for<'b> fn(&'b u32)
// ^^^^^^^
// Unlike before, the supertype
// expects a reference with *any*
// lifetime as argument.
What subtype relation should come from this? We can't say 'b: 'a
as
before, because the lifetime 'b
isn't some specific region --
rather, the supertype says that the function has to accept a reference
with any lifetime 'b
. (In fact, this subtyping relation should
ultimately yield an error -- but we would like that error to arise not
during subtyping, but rather later, when )
To express a constraint like this, we need a richer set of constraints than just subset. In fact, what rustc implements is something more like this:
Constraint = Subset
| Constraint, Constraint
| forall<R1> { Constraint }
| exists<R1> { Constraint }
Subset = R1: R2
Now we can say that
fn(&'a u32) <: for<'b> fn(&'b u32)
holds if the constraint forall<'b> { 'b: 'a }
holds, which implies
that 'a
has to contain all possible loans. (This would of course be
possible if 'a
were an "infinite set" covering the whole program or
some such thing, but we can treat it as an error.)
In fact, rustc has recently changed how it handles type checks that
involve forall
-style constraints. In PR
#55517, we adopted a
system based on the concept of universes, which is something I
first learned about when studying up on how Lambda Prolog is
implemented. If you're interested in some background material, I gave
a talk at a recent Papers We Love meetup where I covered the
paper that explains how it works. I also wrote a section
in the rustc-guide on the topic some time back.
The basic idea of a universe is that it represents a set of names. There is a notion of the "root universe" U0 that contains the "global" names in the program. It's easiest to explain in terms of types: the global names might be the structs from the program. When we want to reason about a "forall" constraint, though, we do that by introducing a fresh, placeholder name that is distinct from all other names in the program. To do this, we create a new universe U1 that is a child of the root univese -- the idea is that U1 contains all the global names from U0, but also this additional placeholder name.
Next, we associate each inference variable with a universea. The inference variable is only allowed to be inferred to values that contain names from that universe.
So, as a simple example, imagine that we have a constraint like the following (this is not a region constraint, just a general logic equation):
forall<A> { exists<B> { A = B } }
Here we are saying, for all types A, there exists some type B where A = B. Well, that's clearly true, we can just infer B to be A. In terms of universes, we would say that the name A is bound to a placeholder in U1, and the inference variable B is created in that same universe U1.
But if we reverse the order of "forall" and "exists", we want to get an error:
exists<B> { forall<A> { A = B } }
In this case, the variable B would wind up in the root universe U0, but the placeholder A is still in the child universe U1. Therefore, the value for B cannot be A, because A is not nameable from U0.
So now we have this problem. To encode the "solutions" to higher-ranked subtyping and trait matching, we need to use this richer notion of constraints that include forall and exists quantifiers. In fact, I think we will eventually want something even stronger, that includes implication and maybe even "or", so we might as well "scale up" to hereditary harrop constraints (also a notion I learned about through Lambda Prolog):
Constraint = Subset
| Constraint, Constraint // and
| Constraint; Constraint // or
| forall<R1> { Constraint }
| exists<R1> { Constraint }
| if (Assumption) { Constraint }
Assumption = Subset
| forall<R1> { Assumption }
| if (Constraint) { Assumption }
Subset = R1: R2
But now we have to map these complex constraints down to the simple subset relation that Polonius uses. I'll talk about how to do that in the next post.
Super nice and enjoyable to read!
A comment: if I understand correctly, it seems we'd have a
subset_error
for each point wherer1: r2
is required ? (I thought we'd be having only one error per "missing"r1: r2
subset (A.K.A!known_subset(r1, r2)
) ifr1: r2
is required at any point, à lasubset_error(R1, R2)
)(there's a tiny typo: "per out implied bounds rules")