The dyn Trait
type in Rust represents "some type T
that implements Trait
". Every dyn Trait
value comes associated with a vtable, generated by the compiler, that contains information about what this type T
is, and in particular pointers to the definitions of the various methods in Trait
as defined for type T
. As we all know, if you have a value like x: dyn Debug
, then invoking a method like x.fmt(...)
will use a virtual call, where the method is obtained by fetching a function pointer out of the vtable.
Interestingly, at the MIR level, Rust does not have a concept of "virtual calls" today. Instead, those calls are encoded as ordinary trait calls, where the "self type" is dyn Debug
. So the x.fmt(..)
call above is equivalent to <dyn Debug as Debug>::fmt(&x)
.
In general, whenever we generate the code for a <T as Trait>::method(..)
call, we resolve <T as Trait>::method
to a specific function by finding the applicable impl for T
and then finding the method definition within that impl.
The same is conceptually true for <dyn Debug as Debug>::fmt
, except that the impl in this case is generated by the compiler. It contains code that conceptually looks something like this:
impl Debug for dyn Debug {
fn fmt(
&self,
formatter: &mut Formatter<'_>,
) -> std::fmt::Result<()> {
let (data_pointer, vtable_pointer) = /* split fat pointer `self` */;
let fmt_function: fn(..) = vtable_pointer[0]; // where 0 is index of `fmt`, say
fmt_function(data_pointer, formatter)
}
}
Here, the self: &dyn Debug
reference will be a "fat pointer" that bundles together a (thin) "data pointer" and a (thin) "vtable pointer". The first line conceptually splits the fat pointer into the two thin pointers. The next line then extracts the method from the vtable and the third line invokes it.
The bug arises now: it turns out that users can provide impls that overlap with the compiler-generated impl. The compiler prevents you from doing this in simple cases, but the check is insufficient. What's worse, we rely on our behavior for the Any
trait.
Consider the Any
trait definition
trait Any {
fn type_id(&self) -> TypeId;
}
The type_id
method is meant to give a unique type that identifies the self type. Moreover, if you have a x: &dyn Any
, then x.type_id()
is meant to give the type-id of the underlying type. That is, you don't want the type-id for the type dyn Any
, but rather for the dynamic type that was hidden.
You are supposed to be able to get the type_id
of any type (well, any 'static
type) without any form of opt-in. This is enabled by the one impl of Any
, which is as follows:
impl<T> Any for T where
T: 'static + ?Sized,
{
fn type_id(&self) -> TypeId {
TypeId::of::<T>()
}
}
Now consider what happens when you invoke <dyn Any as Any>::type_id
-- which impl do we use? If we were to use the user-supplied impl, you would get back the type-id of the dyn Any
type, which is unlikely to be what you wanted.
We've already discussed how invoking <dyn Any as Any>::type_id
will result in virtual dispatch, which is the behavior we want. But why does that happen?
This is in fact not an obvious question: in general, the trait solver will error out on ambiguity. In this case, it does indeed find that there are two possible impls it could use -- the user-given one or the auto-generated one. However, there is some logic in the trait solver that arbitrarily chooses the compiler-generated dyn
impl in cases like this. Therefore, invoking <dyn Any as Any>::type_id
ultimately uses virtual dispatch.
Observe the implications here: The design of the Any
trait is effectively relying on this current behavior. If we were to correctly report ambiguity, the Any
trait would not work at all.
As it happens, we can rationalize the current behavior as a form of specialization. Imagine that we had the following impls (and no compiler-generated impls):
trait Any { }
impl<T: ?Sized + 'static> Any for T {
// Note the `default` keyword here, not present in the original code
default fn type_id(&self) -> TypeId { .. }
}
impl Any for dyn Any {
fn type_id(&self) -> TypeId { /* use virtual dispatch*/ }
}
This would effectively behave the same as the compiler's current code. This is quite reasonable.
There is one final wrinkle to explore here. dyn
types in Rust also contain values for all associated types. For example, a dyn Iterator
type is not complete, you need to write dyn Iterator<Item = I>
, which includes a value (I
) for the associated type Item
.
If we expand our compiler generated impl, then, it would look something like this:
impl<I> Iterator for dyn Iterator<Item = I> {
type Item = I;
fn next(&mut self) -> I {
/* invoke `next` method via virtual vtable */
}
}
Now we come to the problem. User supplied impls can, in some cases, wind up overlapping the compiler supplied impl. This can lead to inconsistencies, particularly when normalized associated types.
Consider, for example, the following trait (hat tip to Centril):
trait Object<U> {
type Output;
}
impl<T: ?Sized, U> Object<U> for T {
type Output = U;
}
Crucially, the impl here can apply when T = dyn Object<U, Output = _>
. In other words, there is an overlap with the compiler supplied impl:
impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
type Output = V;
}
This conflict goes undetected by our coherence rules because the this impl, being compiler generated, goes undetected. The problem now is that different parts of the compiler could select different impls, resulting in distinct values for Output
. For example, if there is a function foo
where all we know is that T: ?Sized
, it will select the first, user-supplied impl, because we do not know that T = dyn Object
:
fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
// ^^^^^^^^^^^^^^^^^^^^^^^^^
// Resolves to `U`, so `foo` thinks it has an argument of
// type `U` -- therefore, it can be returned.
x
}
On the other hand, the transmute
function invokes foo
with the type dyn Object<U, Output = T>
:
fn transmute<T, U>(x: T) -> U {
foo::<dyn Object<U, Output = T>, U>(x)
// ^
// Output is normalized to `T`
}
This winds up resolving using the "auto-generated" impl, and hence Output
is normalized to T
-- therefore, we can give an x: T
, and we get back a U
. This happens because the trait solver (for better or worse) resolves the ambiguity between the two impls by preferring the dyn
impl, as mentioned previously.
Under current Rust, a dyn
type is only legal to use if the trait is
dyn
safe. RFC 2027 proposed that we make dyn
types legal all the
time. Instead, it is an error to coerce a value into a dyn
type
unless the trait is dyn safe. The motivation was to enable more
ergonomic APIs, and in particular to permit methods to be defined on
traits and then accessed via notation like Trait::method()
.
Specifically the rules adopted by RFC 2027 are as follows:
- If the trait (e.g.
Foo
) is not object safe:- The object type for the trait does not implement the trait;
Foo: Foo
does not hold. - Implementations of the trait cannot be cast to the object type,
T as Foo
is not valid - However, the object type is still a valid type. It just does not meet the self-trait bound, and it cannot be instantiated in safe Rust.
- The object type for the trait does not implement the trait;
Currently, we support dyn
types with multiple traits like dyn Write + Send
, but we only allow at most one "non-auto-trait" in the
list. It has long been assumed we will generalize this to arbitrary
combinations of traits. This introduces certain complications into the
model described above.
In particular, the compiler doesn't just generate just one impl like
impl Trait for dyn Trait
. It also generates impls like impl Trait for dyn Trait + X
, where X
is some other trait. These impls are
generated "on demand" and, conceptually, they simply "upcast" their
receiver from dyn Trait + X
to dyn Trait
and then delegate to the
core impl Trait for dyn Trait
impl. (Note that this upcasting is not
yet implemented either, though there is a pending PR.)
In the short-term, we have to close the soundness hole with minimal disruption to the existing ecosystem.
We start by identifying "potentially dyn-overlapping" impls. We say an impl is "potentially dyn-overlapping" if:
- the trait being implemented is
dyn
safe; - the impl has a self-type that is a type parameter
T
whereT: ?Sized
.
Note that this is a simple, "syntactic" check that doesn't require deep analysis or trait solving. This is important because we have to be able to do this check at various points in the trait solver without creating cyclic dependencies.
The change we make is therefore:
- All items in a "potentially dyn-overlapping" impl are considered default, even if they were not declared as default.
This is of course a breaking change, as it must be, but it is a fairly narrow one. It affects only code where you have a dyn-safe trait that contains associated types and a blanket impl. From the crater runs that were done on prior proposals (which were more restrictive), we did not identify any existing code that would be affected. It would however affect the unsound impl from our example earlier:
impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
// This associated type would be **implicitly considered default**:
type Output = V;
}
This change would suffice to prevent the transmute logic from compiling.
It would be nice if we emitted a warning on a "potentially
dyn-applicable" impl, to inform the user that default
annotations
were being added. However, we would need a good way to silence the
warning. The natural way would be for the user to write default
explicitly, but that is not currently permitted on stable Rust.
Therefore, we should avoid the warning until it is, although of course
it remains an option for users to use #[allow]
annotations as well.
An alternative would be to issue the warning only for impls which
contain associated types. This is because a default
label on fn
annotations has basically no user-observable effect, at least until
specialization lands on stable. This is because there is no way for
users to obtain the type of a function from an impl.
(Once specialization lands on stable, it would be possible for users
to observe the enforced default
annotations, since one could now
write specializing impls that wouldn't have been possible
before. However, we could make it so that the default annotations that
are automatically added for soundness do not permit users to write
specializing impls, only the compiler.)
One of the challenges with the current design is that the compiler is automatically inferring whether a trait is dyn safe. This is not somethign that the user actively declares. This has a few downsides:
- It is an implicit semver promise. Users who publish a trait that
happens to be dyn safe are then unable to evolve that trait in ways
that would compromise dyn safety without risking breaking users.
- As an example, in investigating this bug, we uncovered users
using
dyn Borrow
. It so happens thatBorrow
is dyn safe, but it is unclear whether this was an original goal of the trait.
- As an example, in investigating this bug, we uncovered users
using
- It creates limitations without explicit opt-in. Currently, the
compiler is choosing on its own whether to create the
dyn
impl based only on the trait defintiion, but creating this impl can cause other impls to be invalid, or to (implicitly) havedefault
declarations.- Traits that are not meant to be used as
dyn
values will therefore have artificial limitations placed on them.
- Traits that are not meant to be used as
- It is a fragile setup. Going a bit further, the current setup where the compiler must infer whether a trait is dyn-safe is a bit fragile and prone to creating cyclic dependencies in the compiler.The conditions for dyn safety were initially simple, but have grown more complex over time. This leads to an awkward situation because we do not want checking the conditions for dyn safety to require using the trait system, since the trait system in turn would like to know if traits are dyn safe, and this creates a cyclic dependency. If we can move to a system where the user declares whether a trait is dyn-safe or not, this cuts the dependency, and it is then relatively easy for us to enforce dyn-safety conditions as a result.
I would propose that, for Rust 2021, we require that traits be explicitly declared as dyn safe:
dyn trait Foo { ... }
This dyn
declaration is only legal if the trait is dyn safe, otherwise
an error is generated. Presuming there is no error, it has the following effects:
- The compiler will permit values to be coerced to
dyn Foo
values (per RFC 2027); - The compiler would generate an impl
impl Foo for dyn Foo
that uses virtual dispatch, as today, along with impls fordyn Foo + Bar
like types. - Any "potentially dyn-overlapping" impls would be required to have their methods declared as default.
Essentially, the rules are the same as in Rust 2018, however the
restrictions that methods be declared as default are opt-in (and the
methods must be declared as default
explicitly, rather than having
it be automatically enforced).
Migration here is doable with some caveats. One observation first: first, thanks to the orphan rule, "potentially dyn-overlapping" impls must be located within the same crate where the trait is declared.
Given this, the migration rule would be that we find all dyn-safe
traits and we suggest adding the dyn
keyword in front of them.
Further, if there is a potentially dyn-overlapping impl, we insert the
default
keyword.
If we wished to be "less correct", we could choose to only add dyn
to a trait declaration if we locally observe some code that creates a
dyn
value for that trait.
Making this change would require that some form of specialization is
stable. Notably, it only really requires the ability to write the default
keyword. It doesn't actually require us to stabilize the ability for users
to write specializing impls of their own. Moreover, the compiler generated
impl is a "always applicable" impl and hence soundness is guaranteed.