Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@WorldSEnder
Last active July 31, 2022 20:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WorldSEnder/3e7c0fbf4326f3929d7f0c8c1918083f to your computer and use it in GitHub Desktop.
Save WorldSEnder/3e7c0fbf4326f3929d7f0c8c1918083f to your computer and use it in GitHub Desktop.

Some languages, such as Java, perform type-erasure to implement generics. Generic functions are compiled once, in their type-erased form. Rust, on the other hand, performs monomorphization. Polymorphic functions and structs are replaced by many monomorphic instantiations. A sort of type-erasure is given by the the Any trait, but this also shifts type-safety from compile-time to run-time. This blog post proposes an alternative type-erasure that can be fully checked at compile-time, and allows for a single instantiation of (a special form of) generic functions.

In the post, we will dissect the type-erasure available with dyn objects, and then re-assemble the pieces to gain a more powerful abstraction.

Monomorphization and dyn objects

The compiler has to instantiate generic functions for every type they are invoked with, creating a specialized function for each. It's not a secret that this process, called monomorphization, can lead to code bloat. In some cases, monorphization is necessary and useful. Take for example

// mod std::any
fn type_name<T: ?Sized>() -> &'static str;

// Exemplary instantiation would create monomorphizations such as
fn type_name::<String>() -> &'static str {
    "String"
}
fn type_name::<u32>() -> &'static str {
    "u32"
}

The above code is alike what the compiler synthesizes for this built-in when T is instantiated with String and u32 respectively. It has to know each instantiating type fully, to know what string to return. But for other functions, this monomorphization is technically unnecessary. For example, (thin) pointers have a uniform representation. For functions such as

fn std::ptr::null<T>() -> *const T;

the exact type T does not have to be known, but monomorphization is technically performed nonetheless. A lot of times, a concrete instantiation is then immediately inlined, so users don't see much of monomorphization. 1. But there are places where this inlining can not be done: traits and dyn objects.2

trait Bark {
    fn bark(&self);
}

struct Dog;

impl Bark for Dog {
    fn bark(&self) { println!("Woof! Woof!") }
}

fn pet_dog(dog: &dyn Bark) {
    dog.bark();
}

The impl of Bark::bark can not be inlined into pet_dog, since it is dynamically looked up in the vtable that is part of the &dyn Bark fat-pointer.

What is a trait object?

To create a dyn Trait object, one has to use an unsizing coercion. This is rust's way of saying that we start with a (reference to a) value of some known, sized, type and we end up with a reference to a dyn Trait, which is not Sized. In this sense, a reference to a trait object is a reference to a value of some hidden type implementing the Trait. The vtable, describing the implementation of the Trait for the hidden type, is also part of the reference, which is why they are sometimes called "fat" pointer.

But there is no undoing an unsizing coercion. The reason why we can't go back to a concrete type and a value, is that it would be too late for the compiler to monomorphize on the uncovered type! After all, the type would not be known until the value of a dyn object is observed at runtime, and type information is, at that point, preserved in a limited way if at all.

fn type_name_of_dyn(_: &dyn Bark) -> &'static str {
    std::any::type_name::< *what to put here?* >();
}

The Any trait

You may have heard of the Any trait, which is special and grants you (limited) superpowers to test a &dyn Any and check if the hidden type is equal to some concrete type:

// mod std::any
trait Any: 'static {
    fn type_id(&self) -> TypeId;
}
// Some of the provided methods
impl dyn Any {
    pub fn is<T: Any>(&self) -> bool;
    pub fn downcast_ref<T: Any>(&self) -> Option<&T>;
}

To get a reference to a "normal" object back, you have to probe the &dyn Any object with a concrete type. You still can't refer to the contained type. This matters, when you consider functions that take multiple arguments of the same type. 3

trait MyPartialEq: Any {
    fn eq(&self, other: &Self) -> bool;
}

fn compare_eq(left: &dyn MyPartialEq, right: &dyn MyPartialEq) -> bool {
    todo!("Oops, this doesn't compile due to object safety");
    // First check that the hidden types are equal
    if left.type_id() != right.type_id() {
        false
    } else {
       // Something ?
    }
}

In fact, the above fails to compile even before you consider a possible implementation of compare_eq: the trait MyPartialEq is not object safe 4. "Object safety" is jargon for the rules dictating that the type dyn MyPartialEq is currently illegal to write.

You can't be allowed to call the methods of MyPartialEq on its dyn objects, which is what object safety is there to prevent. The reasoning goes that the method MyPartialEq::eq(l, r) only makes sense when the two underlying types of l and r are the same, and since this can't be ensured statically when l: &dyn MyPartialEq, it's not object safe and thereby illegal. But in this case this is too conservative! By using Any as a supertrait of MyPartialEq, we can test that the two hidden type are the same.

Writing a working compare_eq will serve as somewhat of a running example and I will later show how to complete the implementation with the proposed extension of the type system.

There is another scenario where dyn Trait is insufficient, related to associated types. Consider

trait Actor {
    type Message;
    fn handle_msg(&self, cmd: &Self::Message);
}

//                vvv --- has to be generic over the associated type?
fn handle_msg_dyn<Msg>(actor: &dyn Actor<Message = Msg>, cmd: &Msg) {
//                                       ^^^^^^^^^^^^^ ---+
//                                                        |
// +------------------------------------------------------+
// |
// +-----> all associated types have to be fixed when using a `dyn Actor`!
    actor.handle_msg(cmd);
}

Note that a single instantiation of handle_msg_dyn should again be sufficient, even after tackling problems with the type signature. What should happen is that the vtable contained in the dyn Actor fat-pointer is used to look up the function that implements handle_msg and the arguments are simply passed on.

Ghost types

Let me now introduce the new concept, that I want to name "Ghost types"5. It's a qualification on generic type variables. Such type variables - in constrast to the "Physical" type variables we currently have - are not monomorphized on. Instead, any construct using a ghost type must make sure to generate the same code and ABI for all possible instantiations of that variable with Sized types. Syntactically, I will prefix a ghost type variable with dyn in the binder introducing it. So for example:

fn std::ptr::null<dyn T>() -> *const T;
//                +++ This part is new.
// The method behaves the same for any T: Sized

For the most part, you can think of this as a generalization of dyn Trait, except that the hidden type has been given an explicit name. Every usage of dyn Trait in a signature could be replaced by introducing exactly one ghost type per occurance. The literal translation of compare_eq from above into the new syntax would be:

fn compare_eq<dyn T: MyPartialEq, dyn U: MyPartialEq>(left: &T, right: &U) -> bool {
    todo!("At least no object saftey problem now")
}

One difference would be the way the compiler inserts the vtable argument. With the former syntax of &dyn MyPartialEq, the vtable is part of the fat pointer. With the new syntax, the vtable would be passed as a separate argument. I will get to ABI details in a minute. We gain some power, because we can write new functions that use a ghost type multiple times, or even refer to associated types:

fn compare_shim<dyn T: MyPartialEq>(left: &T, right: &T) -> bool {
    MyPartialEq::eq(left, right)
    // Allowed, since the type *must be* the same
    // The vtable only has to be passed once
}
fn bark_with_cmd<dyn T: BarkWithCmd>(barker: &T, command: &T::Command) {
    BarkWithCommand::bark(barker, command);
    // Allowed. The caller must ensure that the types match.
}

One thing you would not be allowed to do is instantiate non-ghost type variables with ghost types. As mentioned before, it's not possible to generate monomorphizations ahead of time and this doesn't change under the proposed language extension.

fn try_to_reflect_type_name<dyn T>() {
    std::any::type_name::<T>(); // Oops, can't monomorphize :/ Still not allowed
}

Instead of monomorphization, the compiler can generate, even before an instantiation of the generic parameters, a single function that will be called for all instantiations. For the blog post, I will call this instantiation the sized-generic-instantiation, i.e. it can reuse one instantiation for all Sized types by the guarantees mentioned above. Note that the compiler can and still possibly would inline the method, if the type was known concretely and the passed vtable was a known constant.

When mixing ghost types and other generic types, monomorphization would still happen, but only taking into account the other types, not the ghost types.

Observing ghost types at runtime

In case you are familiar, trait types mirror existential types in Haskell and ghost types can make this explicit (existential types in Rust are some thing different, I'm already sorry for any confusion ...). Ghost types can also be brought into context by "pattern matching". That is, we can go back and forth between dyn and ghost types

trait Zap {
    fn zap(&self);
}

fn call_dyn(zapper: &dyn Zap) {
    zapper.zap();
    // For the following call, invent a unique, newly introduced, ghost type
    // for this dyn reference ("skolem variable")
    // Operationally, this would call the sized-generic-instantiation of `call_ghost`.
    // The vtable is part of the fat pointer, unpacked, and passed on.
    call_ghost(zapper);
}

fn call_ghost<dyn T: Zap>(zapper: &T) {
    zapper.zap();
    // Can also convert back to a fat-pointer and re-attach the vtable
    call_dyn(zapper);
    // Yeah, this leads to infinite calls-back-and-forth. Sue me.
}

Great, we can go back and forth between ghost types and trait objects!

Let's get back to implementing that generic compare_eq function from above. Without further additions to the language, we'll have to resort to unsafe, to convince the compiler that we checked for type equality, but that's to be expected. See Appendix A for a possibility of getting rid of the unsafe transmute. In any case, it would immediately be possible to write this:

fn compare_eq<dyn T: MyPartialEq, dyn U: MyPartialEq>(left: &T, right: &U) -> bool {
    if left.type_id() == right.type_id() {
        // SAFETY: we just checked that the type ids are equal, so T = U
        let right_as_t: &T = unsafe { &*(right as *const U as *const T) };
        compare_shim::<T>(left, right_as_t)
    } else {
        false
    }
}

Getting more precise with ABI

Like dyn Trait, ghost types need some object safety rules. When writing dyn T: SomeTrait, we still have to construct and pass that vtable somehow. These rules are strictly more permissive though. Some traits are not object safe and can't be used with dyn Trait but can be used as bounds on ghost types. The reason is that, unlike dyn Trait, you can't provide possibly confusing implementations since there is no dyn object to be confused about. The relaxed rules are:

  • Sized may be a super trait, since it doesn't hurt to have this bound on the ghost type. In fact, any ghost type introduced by converting a dyn Trait is Sized, since unsized types can't be further unsized into a dyn object (and I'm unaware of any plans to relax this rule).
  • associated constants and associated types are allowed, as long as they have object safe bounds themselves
    • With the upcoming GATs feature, these constants may not be generic, except with lifetime and ghost type variables.
    • The compiler needs to include all of their (and the closure under associated types) vtables in a combined vtable.
  • ghost type variables are, like lifetimes, also allowed in generic trait method arguments. The vtable contains the sized-generic-instantiation for that method.
  • dispatchable functions can use Self and arbitrary receiver types.

So all of the below are fine:

trait EnlargedSafe: Sized {
    type Command: Debug; // vtable includes vtable of Self::Command: Debug;
    // It is okay to mention the associated type in method signatures
    fn call(&self, cmd: &Self::Command);
    // It is okay to have ghost type variables in method signatures.
    // The single generic-sized monomorphization is included in the vtable
    fn rechoice<dyn U>(&self, other: &U);
    // Self can be named multiple times.
    fn add_assign(&mut self, other: &Self);
    // It's okay for the following method to *exist* on the trait. One
    // still wouldn't be able to *call* it on values of a ghost type, since
    // the ABI of how parameters are passed differs between different
    // concrete instantiations. As such, a ghost type variable can not
    // exist by-value
    fn add(self, other: Self) -> Self;
}

Operationally, the vtable will include all associated items and the sized-generic-instantiation of every method.

Ghost types in structs and enums, and unsized ghost types

We have so far not seen how to store ghost types in structs (and enums). As with functions, ghost type variables must be "erasable". Having a ghost type variable on a struct is an extra guarantee that all possible substitutions by Sized types lead to same behaviour, i.e. struct layout. A difference to functions is that no vtable is implicitly passed or stored. Let's have a look at some standard structs that seem to fulfill this requirement, starting with Box.

//         vvv --- newly added
struct Box<dyn T: ?Sized> {
//                +----+ what about this, arent' ghost types Sized?
    ptr: Unique<T>,
    _phantom_owned: PhantomData<T>,
}

We can expect that e.g. Box<u32> has the same layout as Box<String> and the parameter can be a ghost type. But Box<u32> surely does not have the same layout as Box<dyn Debug> and neither would I expect it to. This is the reason that the extra guarantees have to be given only for substitutions of Sized types.

To make this more precise, one could say that the behaviour of some construct generic over a dyn T: ?Sized depends only on what's currently called <T as std::ptr::Pointee>::Metadata, which is () for any sized type. For this blog post though, I will only focus on this special case, allowing instantiations of a ghost type with an unsized parameter under the existing monomorphization rules.

The rules for a struct S<dyn T> generic parameter are:

  • there is exactly one layout for all substitutions of T with a Sized type. The standard pointer types, *const T, *mut T, &T, &mut T, Box<T>, Rc<T>, etc... form the basis for this for other structs (and enums). Even if the parameter is declared dyn T: ?Sized, the layout guarantee need only be true for sized T.
  • instantiations S<Dog> of a ghost parameter with a non-skolem Sized type most likely agree with current behaviour. The same is true for instantiations with an unsized type, like S::<dyn Trait> or S::<[T]>, where no extra guarantees have to be given compared to the current state of affairs. That is, if the struct wants to allow it, e.g. by declaring struct S<dyn T: ?Sized>..

To reiterate, the struct does not store the vtable or other metadata. Passing from a specific instantiation to a ghost type should not incur any cost. For example, you can pass a Box<u32> to a method expecting a fn(arg: Box<dyn Debug>) via an unsizing coercion, and you could now pass it to a method fn<dyn T>(arg: Box<T>) via erasure. The conversion erasing a type argument is always operationally a no-op, unsizing is not.6

The last point could require some elaboration to explain potential trait impls. Consider the very similar two structs

struct DynObjectBark {
    inner: Box<dyn Bark>,
}

struct GhostBark<dyn T: Bark> {
    inner: Box<T>,
}

Implementing traits for types with ghost parameters

Feel free to skip this subsection on your first read, it gets rather technical (as if it wasn't already!). DynObjectBark stores the underlying vtable of some T: Bark as part of the object referenced by inner. GhostBark<T> would not store that vtable.

// Business as usual, just call the inner barker
impl Bark for DynObjectBark {
    fn bark(&self) {
        // The vtable for the Bark impl of the object in `inner` is part
        // of the fat pointer and passed as part of the `self` argument
        self.inner.bark()
    }
}
// vs.
impl<dyn T: Bark> Bark for GhostBark<T> {
    //      ^^^^
    // An appropriate vtable for the inner `T: Bark` must be in scope when
    // this impl is needed anywhere
    fn bark(&self) { // <-- the vtable is *not* contained in `&self`.
        self.inner.bark()
    }
    // This means we need to capture the vtable of Bark dynamically, at runtime!
    // Exciting!!!!!
}
// GOTCHA: to guarantee consistency of impls, an impl instantiating a ghost type
// probably has to rule out more specific impls with concrete types
impl Bark for GhostBark<Dog> {
    // Oh-oh! impl Bark for Dog exists. Potentially `T = Dog` in the previous impl
    // in some erased context, but we can't know that.
    // That'd lead to two different vtables for GhostBark<Dog> floating around!
}

Unsizing with ghost-types

What if you do want to store the trait impl in a struct with a ghost-type type? In this case, you are still able to unsize coerce. To adapt the type signature of GhostBark above to allow for ?Sized parameters:

struct GhostBark<dyn T: ?Sized + Bark> {
//                      +------+ this part is new
}
fn unsize_ghost<dyn T: Bark>(
    barker: GhostBark<T>,
) -> GhostBark<dyn Bark> {
//   +---------^^^^^^^^+ this type is equivalent to DynObjectBark
    barker // <-- unsizing coercion here
}

A realistic application or: What do we need this for?

All this has been very abstract so far. If you didn't already know about trait objects and Any you perhaps have already learned a lot. But here is a concrete application of the proposal. Suppose you are writing a GUI library like yew. The core trait looks akin to this:

trait Component: 'static {
    type Message;
    // skipped some lifecycle methods
    /// Handle events, like onclick from a button or the user tapping impaciently
    fn update(&mut self, msg_queue: &mut Vec<Self::Message>);
    /// Show the state (as a number)
    fn view(&self) -> Html;
}
/// A handle to a component
struct Scope<Comp: Component>(_);
impl<Comp: Component> Scope<Comp> {
    /// *Schedule* an update in the future, then re-render
    fn send_message(&self, msg: Comp::Message);
}

Okay, so we have components, they can return some Html to display and we can send them messages. The hard problem to tackle is implementing backend code that does not explode in code size with the number of components in existence. You see, the exact component and message type should not change the way this is handled in the backend, the message just needs to be passed through somehow.

Let's design the backend and naively wrap a component with some additional state so that we can schedule it and call view whenever we need to.

mod scheduler {
    trait Runnable { fn run(&self) }
    // Kept abstract here. Put it on a queue.
    // The important point is, this is indiscrimate to the Comp type!
    fn schedule(impl Runnable);
}

struct CompStateInner<Comp: Component> {
    msg_queue: Vec<Comp::Message>,
    component: Comp,
}
struct ComponentState<Comp: Component>(
    Rc<RefCell<ComponentState<Comp>>>
);

struct Scope<Comp: Component>(ComponentState<Comp>);
impl<Comp: Component> Scope<Comp> {
    fn send_message(&self, msg: Comp::Message) {
        self.0.0.borrow_mut().msg_queue.push(msg);
        scheduler::schedule(self.0.clone());
    }
}
// Disaster! One impl per Comp?
impl<Comp: Component> scheduler::Runnable for ComponentState<Comp> {
    fn run(&self) {
        let mut inner = self.0.borrow_mut();
        inner.component.update(&mut inner.msg_queue);
        let next_view = inner.view();
        todo!("reconcile {next_view} somehow, omitted for brevity");
        todo!("in reality, it is this part that leads to code-bloat");
        todo!("since it is repeated for every Comp");
    }
}

With this design, we end up with one impl of scheduler::Runnable per component type! And in reality, there is more than one lifecycle event to schedule. The current work-around involves a heavy-handed use of dyn Any, downcasting and unwrapping where necessary. If you're interested, here's the relevant PR that fixed this, thanks @futursolo.

With this proposal, these work-arounds would not be necessary. Instead the necessary changes to fix the above are summarized in a few lines:

- struct ComponentState<Comp: Component>(
+ struct ComponentState<dyn Comp: Component>(

- impl<Comp: Component> scheduler::Runnable for ComponentState<Comp> {
+ impl<dyn Comp: Component> scheduler::Runnable for ComponentState<Comp> {

First of all, the ComponentState struct is representationally independent from the actual component type. Under the hood, it puts the actual component in an Rc as before. Additionally, there is only one sized-generic implementation for scheduler::Runnable. In the places where this impl is needed (in Scope::send_messagè, where the user interacts with a known component type), the vtable for impl Component for Comp is captured and put as part of the Runnable object into the scheduler. But let me re-iterate: everything in fn run exists only once, and then dispatches on that captured vtable.

Conclusion

This got a bit long and detailed. There are probably a lot of nooks and crannies to work out to flesh this out to a full RFC. Nevertheless, I expect that most currently consider extensions to the type system, and unsized types and associated types in particular, would be compatible with ghost types.

I don't expect this type of generic programming to come up often. Even as a more experienced Rustacean I'm still struggling to grasp exactly what it is that I outlined in this post. Maybe next time when you're battling code-bloat in your library or program 7 you can think back and see if this would help you.


Appendix A: arrays and slices

Since ghost type variables subsume dyn objects, could an extension of the syntax and idea to other const arguments lead to a subsumption of slices? Can we make sense of something like

// "old" syntax, can't refer to length
fn use_slice(arr: &[u32]) {
    unimplemented!()
}
// "new" syntax, explicitly refer to length
fn use_slice<dyn const N>(arr: &[u32; N]) {
    unimplemented!()
}

? I think we could, but how this would work exactly remains unthought. The current post refers to types with pointee metadata (), the above would target metadata usize.

Appending B: more parts of std/core

In the implementation of compare_eq we used unsafe pointer casting to get from &U to &T. We could also adapt the signature of std::mem::transmute to allow ghost-types and transmute those.

// mod std::mem
const unsafe extern "rust-intrinsic" fn transmute<dyn T, dyn U>(e : T) -> U;
// ^ Operationally a no-op, so can accept ghost-types just fine

Already stretching the size of on RFC, I propose to adapt the experimental DynMetadata (under ptr_metadata) as follows. I won't give a detailed explanation here, but the signature change would roughly be

// current proposed version
struct DynMetadata<Dyn: ?Sized> { /* omitted */ }
//                 ^^^^^^^^^^^ identify the trait of the contained vtable
// e.g. `DynMetadata<dyn Bark>`

// to new version
struct DynMetadata<Dyn: ?Sized, dyn T> { /* omitted */ }
//                              ^^^^^ --------+
//                                            |
// +------------------------------------------+
// |
// + identify a ghost type this vtable belongs to

pub fn std::ptr::from_raw_parts<T: ?Sized, dyn Ghost>(
//                                         ^^^^^^^^^^ --- the erased type. Sized!
    data_address: *const Ghost, // <--- pointer to a ghostly typed value,
                                // more explicit than the current *const ()
    metadata: <T as Pointee<Ghost>>::Metadata
                        //  ^^^^^ Pointee mentions "who" the vtable belongs to
) -> *const T;

std::ptr::to_raw_parts is a bit harder to type, since it has to generate a new erased type that carries the relationship between the thin-pointer-to-object and the typed DynMetadata. I know generativity in this form is probably a bit much, but I feel it's the correct way forward. To excuse syntax for a bit, generativity could be written as follows - feel free to bike-shed.

impl<dyn T: ?Sized> *const T {
    fn std::ptr::to_raw_parts(
        self,
    ) -> impl<dyn Erased> (*const Erased, <T as Pointee<Erased>>::Metadata)
//            ^^^^^^^^^^  +-----------------------------------------------+
// this part says that a new ghost type is invented at the callsite    |
// similar to when passing a &dyn T variable to a for<dyn T> fn(&T)    |
//                                                                     |
//               It can be used in ghost/dyn generic positions in the return type.
}

The syntactic use of impl<dyn T> here is akin to the use of for<'a>, except that it introduces a new type at the callsite instead of passing one to a callee.

Appendix C: type checked type equalities

Even more outlandish and definitely not in the scope of the proposed language addition, one could imagine a builtin type modelling the proposition that two types are equal. This could look something like

/// The compiler uses that whenever a value of `TypeEq<T, U>` is in scope, we have `T = U`.
enum TypeEq<dyn T, dyn U> { /* builtin, representationally a unit type */ }
impl<dyn T> TypeEq<T, T> {
    /// Can freely construct a TypeEq when the arguments are provable equal.
    pub const fn refl() -> Self;
}
impl<dyn T, dyn U> TypeEq<T, U> {
    /// Allow unsafe construction of type equalities discovered at runtime
    pub unsafe const fn trust_me() -> Self;
}

fn std::any::is_equal<dyn T: Any, dyn U: Any>() -> Option<TypeEq<T, U>> {
    if T::type_id() == U::type_id() {
        /// `T::type_id() == U::type_id()` proves the types are equal.
        Some(unsafe { TypeEq::trust_me() })
    } else {
        None
    }
}

// Then compare_eq could be written without directly using unsafe
fn compare_eq<dyn T: MyPartialEq, dyn U: MyPartialEq>(left: &T, right: &U) -> bool {
    match std::any::is_equal::<T, U>() {
        Some(_) => compare_shim(left, right), // Since we match on Some(_), the types are actually equal, and this call type-checks
        None => false, // types don't match, so consider them unequal
    }
}

I'm open to thoughts on this part, since "a value of TypeEq<T, U> is in scope" could possibly be hard to prove and track. Note though that the above TypeEq is guaranteed to be uniformly represented as an enum with one variant, since both its generic arguments are ghost types, so it should optimize (almost?) like the unit type. Of course the compiler can't be allowed to just invent values whenever it wants to, but it would be a builtin type anyway.

Footnotes

  1. Another oddity; an instantiation with the same type parameters can even show up multiple times, in different compilation units. This is the reason why you shouldn't use std::ptr::ptr_eq for &dyn Trait, since the vtable for the dyn part can exist multiple times, even for the same underlying object and type, if the dyn cast happens in separate compilation units. Clippy lints for this via the vtable_address_comparisons lint.

  2. It should be noted that llvm has a deduplication pass for code blocks, that can merge such instantiations. One way to read this post is as a proposal to specify syntax where such deduplication is guaranteed to occur.

  3. Also note that Any as-is suffers slightly from the method instantiation I outlined above. For every type that is unsized to dyn Any, per compilation unit, a (marginally small, but non-zero) method is compiled in. This method returns a constant: the TypeId of the type that was unsized. Wouldn't it be better if that TypeId was part of the fat pointer, without the extra indirection? But I digress, since this is a minor de-optimization.

  4. See also https://rust-lang.github.io/rfcs/0255-object-safety.html

  5. Since I first named this, I came aware of another issue using "Ghost type" as a name. I think that one should rather be referred to as "ghost code". Here, a different thing is proposed that is more about type-erasure not code erasure.

  6. Sneakily, while the erasure is a no-op, the function using ghost-types might expect further hidden arguments where vtables are passed.

  7. Shoutout to Twiggy that has helped a lot of times when it came to tracking down bloat.

@HeroicKatora
Copy link

  • trait BarkWithCmd could also be the (presumably more practically relevant) example of trait Actor { type Message; }

  • Usage of 'Ghost type' in literature:

  • Such type variables, contrary to "Physical" type variables that we currently have, can't be monomorphized on. Phrased in a more operational […]

    Monomorphization as part of codegen is as operational as it gets. Proposing: "For physical type variables, the implementation must not depend on the exact instance of the type—for Ghost Type variables the compiler generated symbol must not depend on it either." That's a subtle hint towards some subspace where no type dispatch is allowed, required for the generativity argument I'd try to build on top..

  • The kicker is now, that most (all?)

    I don't see how the example shows an immediate, amazing advantage. Just leave out the qualification. "This is a generalization of dyn traits in arguments as current usage of …".

@HeroicKatora
Copy link

HeroicKatora commented May 17, 2022

Re: generativity. I would really like this to soundly work. And if possible, be essentially zero-cost.

type WithCertifiedLen<U> = for<dyn T> FnOnce(Len<T>, &Slice<U, T>);

struct Len<dyn T>(usize);
struct Idx<dyn T>(usize);
struct Slice<U, dyn T>([U]);

fn scoped_certification<U>(slice: &[U], with: WithCertifiedLen<U>) {
    // Some internal magic
    let len = Len(slice.len());
    let slice = unsafe { std::transmute(slice) };
    with (len, slice);
}

impl<dyn T> Len<T> {
    // Perform the index check here…
    fn index(&self, idx: usize) -> Option<Idx<T>> {}
}

impl<U, dyn T> Slice<U, T> {
    // Yes, *safe* index unchecked.
    fn index_unchecked(&self, idx: Idx<T>) -> &U {}
}

@WorldSEnder
Copy link
Author

    // Some internal magic
    let len = Len(slice.len());

No magic needed. By assumption, it's okay to instantiate T with any sized type and get the same behaviour, so just choosing () or some other dummy must be sufficient - it'd get more problematic if WithCertifiedLen had trait bounds on the T, but it's not needed here. 1

    let len = Len::<()>(slice.len());

Footnotes

  1. Refer to Haskell Data.Reflection.reify where the generative type s has a Reifies s a bound carrying the value.

@HeroicKatora
Copy link

HeroicKatora commented May 17, 2022

It should, but that's kind of the point I was trying to make regarding Any. By some reasoning it must be clear that you can not be allowed to even observe Len<T>::type_id() == Len<U>::type_id() unless the T and U are definitionally equal. Which seemingly conflicts with the reasoning in your compare_eql call to transmute. Sure, the difference is that Any isn't mentioned explicitly in the bounds.

But you'd need to explain how Len<dyn T> interacts with traits. Otherwise, one might assume that we can infer the impl Len<dyn T>: Any. Indeed, why shouldn't one be able to infer that? It has an inherent impl:

  • impl<T: 'static> Any for T { /* auto-generated compiler magic */ }

But by 'definition' the compiler shouldn't depend on the concrete type for generating this impl? And if it does and somehow returns the id for Len<()> then we'd have Len<dyn T> == Len<dyn U> under type_id comparison. This would allow a safe cast.. which circumvents the intent by allowing the use of one Idx instance for an unrelated slice.

@WorldSEnder
Copy link
Author

WorldSEnder commented May 17, 2022

I should definitely explain this, the difference is in the impl

impl<T: 'static> Any for T { /* auto-generated compiler magic */ }
// is *not*
impl<dyn T: 'static> Any for T { /* auto-generated compiler magic */ }

so no, you can not invent an impl of Any for a ghost type out of thin air, it must already be in the context.

But by 'definition' the compiler shouldn't depend on the concrete type for generating this impl?

The builtin impl Any for T very heavily depend on the exact T to be known to generate the type-id that gets put in the vtable.

@HeroicKatora
Copy link

HeroicKatora commented May 17, 2022

I intend to use <Len<dyn T> as Any>. Are you suggesting Len<dyn T> is not a type either, or are standard types now a subclass, and if so what do we name that subclass and with what notation? That's confusing at best. I'd veto using that syntax then, quite clearly.

@WorldSEnder
Copy link
Author

WorldSEnder commented Jul 29, 2022

@HeroicKatora updated again. Now includes a real-world example at the bottom, apart from also mentioning that "monomorphization" for ghost types is basically one monomorphization per <_ as Pointee>::Metadata type (or just () for _: Sized). Hope that clears up a few missing links.

@HeroicKatora
Copy link

"Object safety" is jargon for the rules dictating that the type dyn MyPartialEq is currently illegal to write. This is some arbitrary limitation though.

It's certainly not arbitrary but a consequence needing to underapproximate which semantics can be represented in a dyn-trait-vtable. I'd just drop the second half of the statement in its entirety.

Wait did we just reference to a ghost type as a parameter to transmute? I guess that's okay, since that is operationally a no-op, so we also adapt its signature to

Fails to explain why this is a relaxation of the existing signature (or how to keep code compatible). I'm sure it's enough to state the demand for this to work, and work out the mechanism by which it works later. It's not entirely necessary to use transmute. Afterall, we have a pointer that can/could be cast with as or coercion.

Getting precise with ABI and wrestling unsized types

… doesn't actually get precise. I'm missing a matching over all the allowed items in such a trait, and how these items are translated into ABI. In particular, it's unclear to me how (and why) fn add(self) should be allowed since I see no way to make 'resizing' casts from dyn Trait to dyn T: Trait possible under such a model. In fact, I don't see exactly how the resizing cast with skolem variable is supposed to work at all when the vtable is not a subset of the dyn Trait vtable.

the layout guarantee must only be true for sized

should read 'the layout guarantee need only be true for sized'.

and you could now pass it to a method fn(arg: Box) via erasure. The conversion erasing a type argument is always operationally a no-op, unsizing is not.

The no-op isn't strictly true, one needs to find (maybe a compile time) the dyn-vtable (please find another name than vtable btw, so that we have semantic difference between the two tables) for its Sized requirement. This table includes: size(&self), align(&self), drop_in_place(&mut self).

Appendix A: arrays and slices

I feel like, with the right syntax, this should just be a natural extension where the reader sees: 'ah, of course, we can do this for other families of Ptr::Metadata as well'.

Appending B: more parts of std/core

'Appendix B', and the wording is how you get the proposal killed for begin too large in scope. Certainly, you don't want to make the introduction of these new semantics in the language dependent on support in std. Ideally, these are not dependent on each other in any way but we can still explain that they are no problem to implement in the future. Again, I've not seen enough discussion on how these signatures are relaxations to feel comfortable with the claim; and without backwards compatibility this extension won't happen.

Appendix C: type checked type equalities

Feels a bit out of place now. I didn't find any insight here. It's certainly not motivated enough to find into core.

@HeroicKatora
Copy link

@WorldSEnder Anyways, regardless of outstanding criticism, I'd advise to publish it to internals or the compiler zulip. There's certainly feedback I'm missing and the core idea and needs addressed by it seems clear now.

@WorldSEnder
Copy link
Author

it's unclear to me how (and why) fn add(self) should be allowed since I see no way to make 'resizing' casts from dyn Trait to dyn T: Trait possible

Such a method on a trait dyn T: Trait can not be called, but it's still safe for the method to purely exist on the trait. The current object safety rules also allow this, albeit only if the receiver type is Self and Self is not mentioned in other args. The point here is that Self can be mentioned for the object safety of traits on ghost-types. Calling them is still forbidden.

@WorldSEnder
Copy link
Author

Link back to the relevant Rust-lang Internals discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment