nikomatsakis/dyn-trait-coherence.md Secret

## dyn-trait-coherence.md

      
    Raw
  

              dyn-trait-coherence.md
            
          
    Coherence can be bypassed by an indirect impl for a trait object #57893

Links


#57893

Background

dyn Trait represents "some type" that implements the trait

The dyn Trait type in Rust represents "some type T that implements Trait". Every dyn Trait value comes associated with a vtable, generated by the compiler, that contains information about what this type T is, and in particular pointers to the definitions of the various methods in Trait as defined for type T. As we all know, if you have a value like x: dyn Debug, then invoking a method like x.fmt(...) will use a virtual call, where the method is obtained by fetching a function pointer out of the vtable.
Rust encapsulates virtual tables in a special impl

Interestingly, at the MIR level, Rust does not have a concept of "virtual calls" today. Instead, those calls are encoded as ordinary trait calls, where the "self type" is dyn Debug. So the x.fmt(..) call above is equivalent to <dyn Debug as Debug>::fmt(&x).
In general, whenever we generate the code for a <T as Trait>::method(..) call, we resolve <T as Trait>::method to a specific function by finding the applicable impl for T and then finding the method definition within that impl.
The same is conceptually true for <dyn Debug as Debug>::fmt, except that the impl in this case is generated by the compiler. It contains code that conceptually looks something like this:
impl Debug for dyn Debug {
  fn fmt(
    &self, 
    formatter: &mut Formatter<'_>,
  ) -> std::fmt::Result<()> {
    let (data_pointer, vtable_pointer) = /* split fat pointer `self` */;
    let fmt_function: fn(..) = vtable_pointer[0]; // where 0 is index of `fmt`, say
    fmt_function(data_pointer, formatter)
  }
}
Here, the self: &dyn Debug reference will be a "fat pointer" that bundles together a (thin) "data pointer" and a (thin) "vtable pointer". The first line conceptually splits the fat pointer into the two thin pointers. The next line then extracts the method from the vtable and the third line invokes it.
The bug: Users can prove impls that overlap with the compiler-generated impl

The bug arises now: it turns out that users can provide impls that overlap with the compiler-generated impl. The compiler prevents you from doing this in simple cases, but the check is insufficient. What's worse, we rely on our behavior for the Any trait.
Consider the Any trait definition
trait Any {
  fn type_id(&self) -> TypeId;
}
The type_id method is meant to give a unique type that identifies the self type. Moreover, if you have a x: &dyn Any, then x.type_id() is meant to give the type-id of the underlying type. That is, you don't want the type-id for the type dyn Any, but rather for the dynamic type that was hidden.
You are supposed to be able to get the type_id of any type (well, any 'static type) without any form of opt-in. This is enabled by the one impl of Any, which is as follows:
impl<T> Any for T where
    T: 'static + ?Sized, 
{
  fn type_id(&self) -> TypeId {
    TypeId::of::<T>()
  }
}
Now consider what happens when you invoke <dyn Any as Any>::type_id -- which impl do we use? If we were to use the user-supplied impl, you would get back the type-id of the dyn Any type, which is unlikely to be what you wanted.
The compiler resolves the ambiguity by choosing the dyn Any impl, and we rely on this

We've already discussed how invoking <dyn Any as Any>::type_id will result in virtual dispatch, which is the behavior we want. But why does that happen?
This is in fact not an obvious question: in general, the trait solver will error out on ambiguity. In this case, it does indeed find that there are two possible impls it could use -- the user-given one or the auto-generated one. However, there is some logic in the trait solver that arbitrarily chooses the compiler-generated dyn  impl in cases like this. Therefore, invoking <dyn Any as Any>::type_id ultimately uses virtual dispatch.
Observe the implications here: The design of the Any trait is effectively relying on this current behavior. If we were to correctly report ambiguity, the Any trait would not work at all.
We can rationalize the current behavior as a form of specialization

As it happens, we can rationalize the current behavior as a form of specialization. Imagine that we had the following impls (and no compiler-generated impls):
trait Any { }

impl<T: ?Sized + 'static> Any for T {
  // Note the `default` keyword here, not present in the original code
  default fn type_id(&self) -> TypeId { .. }  
}

impl Any for dyn Any {
  fn type_id(&self) -> TypeId { /* use virtual dispatch*/ }
}
This would effectively behave the same as the compiler's current code. This is quite reasonable.
Enter associated types

There is one final wrinkle to explore here. dyn types in Rust also contain values for all associated types. For example, a dyn Iterator type is not complete, you need to write dyn Iterator<Item = I>, which includes a value (I) for the associated type Item.
If we expand our compiler generated impl, then, it would look something like this:
impl<I> Iterator for dyn Iterator<Item = I> {
  type Item = I;
 
  fn next(&mut self) -> I {
    /* invoke `next` method via virtual vtable */
  }
}
The bug: user supplied impls can overlap the compiler supplied one

Now we come to the problem. User supplied impls can, in some cases, wind up overlapping the compiler supplied impl. This can lead to inconsistencies, particularly when normalized associated types.
Consider, for example, the following trait (hat tip to Centril):
trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}
Crucially, the impl here can apply when T = dyn Object<U, Output = _>. In other words, there is an overlap with the compiler supplied impl:
impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
    type Output = V;
}
This conflict goes undetected by our coherence rules because the this impl, being compiler generated, goes undetected. The problem now is that different parts of the compiler could select different impls, resulting in distinct values for Output. For example, if there is a function foo where all we know is that T: ?Sized, it will select the first, user-supplied impl, because we do not know that T = dyn Object:
fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    //                  ^^^^^^^^^^^^^^^^^^^^^^^^^
    // Resolves to `U`, so `foo` thinks it has an argument of
    // type `U` -- therefore, it can be returned.
    x
}
On the other hand, the transmute function invokes foo with the type dyn Object<U, Output = T>:
fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
    //                           ^
    // Output is normalized to `T`
}
This winds up resolving using the "auto-generated" impl, and hence Output is normalized to T -- therefore, we can give an x: T, and we get back a U. This happens because the trait solver (for better or worse) resolves the ambiguity between the two impls by preferring the dyn impl, as mentioned previously.
Interaction: RFC 2027, "object safe for dispatch"

Under current Rust, a dyn type is only legal to use if the trait is
dyn safe. RFC 2027 proposed that we make dyn types legal all the
time. Instead, it is an error to coerce a value into a dyn type
unless the trait is dyn safe. The motivation was to enable more
ergonomic APIs, and in particular to permit methods to be defined on
traits and then accessed via notation like Trait::method().
Specifically the rules adopted by RFC 2027 are as follows:

If the trait (e.g. Foo) is not object safe:

The object type for the trait does not implement the trait;
Foo: Foo does not hold.
Implementations of the trait cannot be cast to the object type,
T as Foo is not valid
However, the object type is still a valid type. It just does not meet
the self-trait bound, and it cannot be instantiated in safe Rust.


Interaction: multi-trait dyn types

Currently, we support dyn types with multiple traits like dyn Write + Send, but we only allow at most one "non-auto-trait" in the
list. It has long been assumed we will generalize this to arbitrary
combinations of traits. This introduces certain complications into the
model described above.
In particular, the compiler doesn't just generate just one impl like
impl Trait for dyn Trait. It also generates impls like impl Trait for dyn Trait + X, where X is some other trait. These impls are
generated "on demand" and, conceptually, they simply "upcast" their
receiver from dyn Trait + X to dyn Trait and then delegate to the
core impl Trait for dyn Trait impl. (Note that this upcasting is not
yet implemented either, though there is a pending PR.)
How to fix for Rust 2018

In the short-term, we have to close the soundness hole with minimal disruption to the existing ecosystem.
We start by identifying "potentially dyn-overlapping" impls. We say an impl is "potentially dyn-overlapping" if:

the trait being implemented is dyn safe;
the impl has a self-type that is a type parameter T where T: ?Sized.

Note that this is a simple, "syntactic" check that doesn't require
deep analysis or trait solving. This is important because we have to
be able to do this check at various points in the trait solver without
creating cyclic dependencies.
The change we make is therefore:

All items in a "potentially dyn-overlapping" impl are considered default, even if they were not declared as default.

This is of course a breaking change, as it must be, but it is a fairly
narrow one. It affects only code where you have a dyn-safe trait that
contains associated types and a blanket impl. From the crater runs
that were done on prior proposals (which were more restrictive), we
did not identify any existing code that would be affected. It would
however affect the unsound impl from our example earlier:
impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
    // This associated type would be **implicitly considered default**:
    type Output = V;
}
This change would suffice to prevent the transmute logic from compiling.
Warnings

It would be nice if we emitted a warning on a "potentially
dyn-applicable" impl, to inform the user that default annotations
were being added.  However, we would need a good way to silence the
warning. The natural way would be for the user to write default
explicitly, but that is not currently permitted on stable Rust.
Therefore, we should avoid the warning until it is, although of course
it remains an option for users to use #[allow] annotations as well.
An alternative would be to issue the warning only for impls which
contain associated types. This is because a default label on fn
annotations has basically no user-observable effect, at least until
specialization lands on stable. This is because there is no way for
users to obtain the type of a function from an impl.
(Once specialization lands on stable, it would be possible for users
to observe the enforced default annotations, since one could now
write specializing impls that wouldn't have been possible
before. However, we could make it so that the default annotations that
are automatically added for soundness do not permit users to write
specializing impls, only the compiler.)
Proposal for Rust 2021 Edition

Inferring dyn safety considered harmful

One of the challenges with the current design is that the compiler is
automatically inferring whether a trait is dyn safe. This is not
somethign that the user actively declares. This has a few downsides:

It is an implicit semver promise. Users who publish a trait that
happens to be dyn safe are then unable to evolve that trait in ways
that would compromise dyn safety without risking breaking users.

As an example, in investigating this bug, we uncovered users
using dyn Borrow. It so happens that Borrow is dyn safe, but
it is unclear whether this was an original goal of the trait.


It creates limitations without explicit opt-in. Currently, the
compiler is choosing on its own whether to create the dyn impl
based only on the trait defintiion, but creating this impl can cause
other impls to be invalid, or to (implicitly) have default
declarations.

Traits that are not meant to be used as dyn values will
therefore have artificial limitations placed on them.


It is a fragile setup. Going a bit further, the current setup
where the compiler must infer whether a trait is dyn-safe is a bit
fragile and prone to creating cyclic dependencies in the
compiler.The conditions for dyn safety were initially simple, but
have grown more complex over time. This leads to an awkward
situation because we do not want checking the conditions for dyn
safety to require using the trait system, since the trait system in
turn would like to know if traits are dyn safe, and this creates a
cyclic dependency. If we can move to a system where the user
declares whether a trait is dyn-safe or not, this cuts the
dependency, and it is then relatively easy for us to enforce
dyn-safety conditions as a result.

Alternative: explicit declaration

I would propose that, for Rust 2021, we require that traits be
explicitly declared as dyn safe:
dyn trait Foo { ... }
This dyn declaration is only legal if the trait is dyn safe, otherwise
an error is generated. Presuming there is no error, it has the following effects:

The compiler will permit values to be coerced to dyn Foo values (per RFC 2027);
The compiler would generate an impl impl Foo for dyn Foo that uses
virtual dispatch, as today, along with impls for dyn Foo + Bar
like types.
Any "potentially dyn-overlapping" impls would be required to have their
methods declared as default.

Essentially, the rules are the same as in Rust 2018, however the
restrictions that methods be declared as default are opt-in (and the
methods must be declared as default explicitly, rather than having
it be automatically enforced).
Ability to migrate

Migration here is doable with some caveats. One observation first:
first, thanks to the orphan rule, "potentially dyn-overlapping" impls
must be located within the same crate where the trait is declared.
Given this, the migration rule would be that we find all dyn-safe
traits and we suggest adding the dyn keyword in front of them.
Further, if there is a potentially dyn-overlapping impl, we insert the
default keyword.
If we wished to be "less correct", we could choose to only add dyn
to a trait declaration if we locally observe some code that creates a
dyn value for that trait.
Dependency: specialization

Making this change would require that some form of specialization is
stable. Notably, it only really requires the ability to write the default
keyword. It doesn't actually require us to stabilize the ability for users
to write specializing impls of their own. Moreover, the compiler generated
impl is a "always applicable" impl and hence soundness is guaranteed.