Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Last active January 10, 2020 23:22
Show Gist options
  • Save nikomatsakis/0cf84ac05ce7751b5759cbf335c4d327 to your computer and use it in GitHub Desktop.
Save nikomatsakis/0cf84ac05ce7751b5759cbf335c4d327 to your computer and use it in GitHub Desktop.

Coherence can be bypassed by an indirect impl for a trait object #57893

Links

Background

dyn Trait represents "some type" that implements the trait

The dyn Trait type in Rust represents "some type T that implements Trait". Every dyn Trait value comes associated with a vtable, generated by the compiler, that contains information about what this type T is, and in particular pointers to the definitions of the various methods in Trait as defined for type T. As we all know, if you have a value like x: dyn Debug, then invoking a method like x.fmt(...) will use a virtual call, where the method is obtained by fetching a function pointer out of the vtable.

Rust encapsulates virtual tables in a special impl

Interestingly, at the MIR level, Rust does not have a concept of "virtual calls" today. Instead, those calls are encoded as ordinary trait calls, where the "self type" is dyn Debug. So the x.fmt(..) call above is equivalent to <dyn Debug as Debug>::fmt(&x).

In general, whenever we generate the code for a <T as Trait>::method(..) call, we resolve <T as Trait>::method to a specific function by finding the applicable impl for T and then finding the method definition within that impl.

The same is conceptually true for <dyn Debug as Debug>::fmt, except that the impl in this case is generated by the compiler. It contains code that conceptually looks something like this:

impl Debug for dyn Debug {
  fn fmt(
    &self, 
    formatter: &mut Formatter<'_>,
  ) -> std::fmt::Result<()> {
    let (data_pointer, vtable_pointer) = /* split fat pointer `self` */;
    let fmt_function: fn(..) = vtable_pointer[0]; // where 0 is index of `fmt`, say
    fmt_function(data_pointer, formatter)
  }
}

Here, the self: &dyn Debug reference will be a "fat pointer" that bundles together a (thin) "data pointer" and a (thin) "vtable pointer". The first line conceptually splits the fat pointer into the two thin pointers. The next line then extracts the method from the vtable and the third line invokes it.

The bug: Users can prove impls that overlap with the compiler-generated impl

The bug arises now: it turns out that users can provide impls that overlap with the compiler-generated impl. The compiler prevents you from doing this in simple cases, but the check is insufficient. What's worse, we rely on our behavior for the Any trait.

Consider the Any trait definition

trait Any {
  fn type_id(&self) -> TypeId;
}

The type_id method is meant to give a unique type that identifies the self type. Moreover, if you have a x: &dyn Any, then x.type_id() is meant to give the type-id of the underlying type. That is, you don't want the type-id for the type dyn Any, but rather for the dynamic type that was hidden.

You are supposed to be able to get the type_id of any type (well, any 'static type) without any form of opt-in. This is enabled by the one impl of Any, which is as follows:

impl<T> Any for T where
    T: 'static + ?Sized, 
{
  fn type_id(&self) -> TypeId {
    TypeId::of::<T>()
  }
}

Now consider what happens when you invoke <dyn Any as Any>::type_id -- which impl do we use? If we were to use the user-supplied impl, you would get back the type-id of the dyn Any type, which is unlikely to be what you wanted.

The compiler resolves the ambiguity by choosing the dyn Any impl, and we rely on this

We've already discussed how invoking <dyn Any as Any>::type_id will result in virtual dispatch, which is the behavior we want. But why does that happen?

This is in fact not an obvious question: in general, the trait solver will error out on ambiguity. In this case, it does indeed find that there are two possible impls it could use -- the user-given one or the auto-generated one. However, there is some logic in the trait solver that arbitrarily chooses the compiler-generated dyn impl in cases like this. Therefore, invoking <dyn Any as Any>::type_id ultimately uses virtual dispatch.

Observe the implications here: The design of the Any trait is effectively relying on this current behavior. If we were to correctly report ambiguity, the Any trait would not work at all.

We can rationalize the current behavior as a form of specialization

As it happens, we can rationalize the current behavior as a form of specialization. Imagine that we had the following impls (and no compiler-generated impls):

trait Any { }

impl<T: ?Sized + 'static> Any for T {
  // Note the `default` keyword here, not present in the original code
  default fn type_id(&self) -> TypeId { .. }  
}

impl Any for dyn Any {
  fn type_id(&self) -> TypeId { /* use virtual dispatch*/ }
}

This would effectively behave the same as the compiler's current code. This is quite reasonable.

Enter associated types

There is one final wrinkle to explore here. dyn types in Rust also contain values for all associated types. For example, a dyn Iterator type is not complete, you need to write dyn Iterator<Item = I>, which includes a value (I) for the associated type Item.

If we expand our compiler generated impl, then, it would look something like this:

impl<I> Iterator for dyn Iterator<Item = I> {
  type Item = I;
 
  fn next(&mut self) -> I {
    /* invoke `next` method via virtual vtable */
  }
}

The bug: user supplied impls can overlap the compiler supplied one

Now we come to the problem. User supplied impls can, in some cases, wind up overlapping the compiler supplied impl. This can lead to inconsistencies, particularly when normalized associated types.

Consider, for example, the following trait (hat tip to Centril):

trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}

Crucially, the impl here can apply when T = dyn Object<U, Output = _>. In other words, there is an overlap with the compiler supplied impl:

impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
    type Output = V;
}

This conflict goes undetected by our coherence rules because the this impl, being compiler generated, goes undetected. The problem now is that different parts of the compiler could select different impls, resulting in distinct values for Output. For example, if there is a function foo where all we know is that T: ?Sized, it will select the first, user-supplied impl, because we do not know that T = dyn Object:

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    //                  ^^^^^^^^^^^^^^^^^^^^^^^^^
    // Resolves to `U`, so `foo` thinks it has an argument of
    // type `U` -- therefore, it can be returned.
    x
}

On the other hand, the transmute function invokes foo with the type dyn Object<U, Output = T>:

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
    //                           ^
    // Output is normalized to `T`
}

This winds up resolving using the "auto-generated" impl, and hence Output is normalized to T -- therefore, we can give an x: T, and we get back a U. This happens because the trait solver (for better or worse) resolves the ambiguity between the two impls by preferring the dyn impl, as mentioned previously.

Interaction: RFC 2027, "object safe for dispatch"

Under current Rust, a dyn type is only legal to use if the trait is dyn safe. RFC 2027 proposed that we make dyn types legal all the time. Instead, it is an error to coerce a value into a dyn type unless the trait is dyn safe. The motivation was to enable more ergonomic APIs, and in particular to permit methods to be defined on traits and then accessed via notation like Trait::method().

Specifically the rules adopted by RFC 2027 are as follows:

  • If the trait (e.g. Foo) is not object safe:
    • The object type for the trait does not implement the trait; Foo: Foo does not hold.
    • Implementations of the trait cannot be cast to the object type, T as Foo is not valid
    • However, the object type is still a valid type. It just does not meet the self-trait bound, and it cannot be instantiated in safe Rust.

Interaction: multi-trait dyn types

Currently, we support dyn types with multiple traits like dyn Write + Send, but we only allow at most one "non-auto-trait" in the list. It has long been assumed we will generalize this to arbitrary combinations of traits. This introduces certain complications into the model described above.

In particular, the compiler doesn't just generate just one impl like impl Trait for dyn Trait. It also generates impls like impl Trait for dyn Trait + X, where X is some other trait. These impls are generated "on demand" and, conceptually, they simply "upcast" their receiver from dyn Trait + X to dyn Trait and then delegate to the core impl Trait for dyn Trait impl. (Note that this upcasting is not yet implemented either, though there is a pending PR.)

How to fix for Rust 2018

In the short-term, we have to close the soundness hole with minimal disruption to the existing ecosystem.

We start by identifying "potentially dyn-overlapping" impls. We say an impl is "potentially dyn-overlapping" if:

  • the trait being implemented is dyn safe;
  • the impl has a self-type that is a type parameter T where T: ?Sized.

Note that this is a simple, "syntactic" check that doesn't require deep analysis or trait solving. This is important because we have to be able to do this check at various points in the trait solver without creating cyclic dependencies.

The change we make is therefore:

  • All items in a "potentially dyn-overlapping" impl are considered default, even if they were not declared as default.

This is of course a breaking change, as it must be, but it is a fairly narrow one. It affects only code where you have a dyn-safe trait that contains associated types and a blanket impl. From the crater runs that were done on prior proposals (which were more restrictive), we did not identify any existing code that would be affected. It would however affect the unsound impl from our example earlier:

impl<T: ?Sized, U, V> Object<U> for dyn Object<U, Output = V> {
    // This associated type would be **implicitly considered default**:
    type Output = V;
}

This change would suffice to prevent the transmute logic from compiling.

Warnings

It would be nice if we emitted a warning on a "potentially dyn-applicable" impl, to inform the user that default annotations were being added. However, we would need a good way to silence the warning. The natural way would be for the user to write default explicitly, but that is not currently permitted on stable Rust. Therefore, we should avoid the warning until it is, although of course it remains an option for users to use #[allow] annotations as well.

An alternative would be to issue the warning only for impls which contain associated types. This is because a default label on fn annotations has basically no user-observable effect, at least until specialization lands on stable. This is because there is no way for users to obtain the type of a function from an impl.

(Once specialization lands on stable, it would be possible for users to observe the enforced default annotations, since one could now write specializing impls that wouldn't have been possible before. However, we could make it so that the default annotations that are automatically added for soundness do not permit users to write specializing impls, only the compiler.)

Proposal for Rust 2021 Edition

Inferring dyn safety considered harmful

One of the challenges with the current design is that the compiler is automatically inferring whether a trait is dyn safe. This is not somethign that the user actively declares. This has a few downsides:

  • It is an implicit semver promise. Users who publish a trait that happens to be dyn safe are then unable to evolve that trait in ways that would compromise dyn safety without risking breaking users.
    • As an example, in investigating this bug, we uncovered users using dyn Borrow. It so happens that Borrow is dyn safe, but it is unclear whether this was an original goal of the trait.
  • It creates limitations without explicit opt-in. Currently, the compiler is choosing on its own whether to create the dyn impl based only on the trait defintiion, but creating this impl can cause other impls to be invalid, or to (implicitly) have default declarations.
    • Traits that are not meant to be used as dyn values will therefore have artificial limitations placed on them.
  • It is a fragile setup. Going a bit further, the current setup where the compiler must infer whether a trait is dyn-safe is a bit fragile and prone to creating cyclic dependencies in the compiler.The conditions for dyn safety were initially simple, but have grown more complex over time. This leads to an awkward situation because we do not want checking the conditions for dyn safety to require using the trait system, since the trait system in turn would like to know if traits are dyn safe, and this creates a cyclic dependency. If we can move to a system where the user declares whether a trait is dyn-safe or not, this cuts the dependency, and it is then relatively easy for us to enforce dyn-safety conditions as a result.

Alternative: explicit declaration

I would propose that, for Rust 2021, we require that traits be explicitly declared as dyn safe:

dyn trait Foo { ... }

This dyn declaration is only legal if the trait is dyn safe, otherwise an error is generated. Presuming there is no error, it has the following effects:

  • The compiler will permit values to be coerced to dyn Foo values (per RFC 2027);
  • The compiler would generate an impl impl Foo for dyn Foo that uses virtual dispatch, as today, along with impls for dyn Foo + Bar like types.
  • Any "potentially dyn-overlapping" impls would be required to have their methods declared as default.

Essentially, the rules are the same as in Rust 2018, however the restrictions that methods be declared as default are opt-in (and the methods must be declared as default explicitly, rather than having it be automatically enforced).

Ability to migrate

Migration here is doable with some caveats. One observation first: first, thanks to the orphan rule, "potentially dyn-overlapping" impls must be located within the same crate where the trait is declared.

Given this, the migration rule would be that we find all dyn-safe traits and we suggest adding the dyn keyword in front of them. Further, if there is a potentially dyn-overlapping impl, we insert the default keyword.

If we wished to be "less correct", we could choose to only add dyn to a trait declaration if we locally observe some code that creates a dyn value for that trait.

Dependency: specialization

Making this change would require that some form of specialization is stable. Notably, it only really requires the ability to write the default keyword. It doesn't actually require us to stabilize the ability for users to write specializing impls of their own. Moreover, the compiler generated impl is a "always applicable" impl and hence soundness is guaranteed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment