james7132/bevy_gats.md

## bevy_gats.md

      
    Raw
  

              bevy_gats.md
            
          
    GATs Case Study: Bevy Queries

Bevy is a game engine that uses Rust for both it's internal implemenation and user-facing programming language. One of the biggest selling points is its ease of use and ergonomics. It encodes the game world as entities. These entities can have components, which are just normal Rust types: structs, enums, etc. Game logic is encoded as systems which query for a specific set of components to apply the game logic to. For example, a simple "game" that just moves all entities upward may look like this:
#[derive(Component)]
pub struct Position(Vec3);

fn move_up_system(
  // The Query for all Position components as a *mutable* reference.
  mut positions: Query<&mut Position>,
) {
  // The query creates an iterator over all found results.
  for mut position in &mut positions {
    position.0 += Vec3::UP;
  }
}

fn main() {
  App::new()
      .add_system(move_up_system)
      .run()
}
This write-up will focus on that &mut Position, the internals that enable the ergonomics of that DSL, and how stablized GATs can both help simplify our implementation and ease learning for new contributors.
Current Implementation

NOTE: The APIs presented here are gross oversimplifcations of the actual API and omits much of the finer implementation details.
The core of the Query API is the WorldQuery trait. It's implemented on user facing types, most notably &T: Component  and &mut T: Component. Entangled with are a small web of associated traits and types, as defined as follows:
// Primary user facing type. This is what the user specifies.
unsafe trait WorldQuery: for<'a> WorldQueryGats<'a> {
   type ReadOnly: ReadOnlyWorldQuery;
}

// Marker trait. Denotes that the corresponding Fetch impl never
// mutates the fetched items.
unsafe trait ReadOnlyWorldQuery: WorldQuery {}

// nougat-like GATs workaround
trait WorldQueryGats<'a> {
   type Fetch: Fetch<'a>;
}

// The actual type fetching the results
// Note that T::Fetch::Item may not always be T
trait Fetch<'a> {
   type Item;
   fn fetch(&self, entity: Entity) -> Self::Item;
}
For additional context, the Component trait is primarily a marker trait. It also demarks, via associated type/constant where the items of a particular type are stored:
trait Component: Send + Sync + 'static {
  type Storage: ComponentStorage;
}

trait ComponentStorage {
  const STORAGE_TYPE: StorageType;
}

enum StorageType {
  Table,
  SparseSet,
  ...
}
Let's look at the most commonly used concrete implementations of these traits: &T and &mut T.
pub struct ReadFetch<'a, T: Component> {
   // Used only if T::Storage::STORAGE_TYPE == StorageTyype::Table
   table: Option<&'a [UnsafeCell<T>]>,
   // Used only if T::Storage::STORAGE_TYPE == StorageTyype::SparseSet
   sparse_set: Option<&'a SparseSet<T>>,
}

unsafe impl<'a, T: Component> WorldQuery for &'a T {
   type ReadOnly = Self;
}

unsafe impl<'a, T: Component> ReadOnlyWorldQuery for &'a T {}

impl<'a, T: Component> WorldQueryGats<'a> for &'a T {
   type Fetch = ReadFetch<'a, T>;
}

impl<'a, T: Component> Fetch<'a> for ReadFetch<'a, T> {
   type Item = &'a T;
   fn fetch(&self, entity: Entity) -> Self::Item {
      // Branching as a constant, the unused branch is optimized out at compile time.
      match T::Storage::STORAGE_TYPE {
          StorageType::Table => { ... },
          StorageType::SparseSet => { ... },
      }
   }
}

// ---

pub struct WriteFetch<'a, T: Component> {
   table: Option<&'a [UnsafeCell<T>]>,
   ...
   sparse_set: Option<&'a SparseSet<T>>,
   ...
}

unsafe impl<'w, T: Component> WorldQuery for &'w mut T {
   type ReadOnly = &'w T;
}

impl<'a, T: Component> WorldQueryGats<'a> for &mut T {
   type Fetch = WriteFetch<'a, T>;
}

impl<'a, T: Component> Fetch<'a> for WriteFetch<'a, T> {
   type Item = Mut<'a, T>;
   fn fetch(&self, entity: Entity) -> Self::Item {
      match T::Storage::STORAGE_TYPE {
          StorageType::Table => { ... },
          StorageType::SparseSet => { ... },
      }
   }
}
Every WorldQuery impl has up to four trait definitions (more if you're looking at the full implementation) and there's plenty of boilerplate to connect them all up.
One big downside to this approach is that we need very hefty type aliases to make the code readable. Getting the fetched Item from a WorldQuery requires type QueryItem<Q> = <<Q as WorldQueryGats<'_>>::Fetch as Fetch<'_>>::Item.  This gets even worse when using these in macros where we must use the fully qualfiied type name.
Where GATs can Help

The initial, most obvious, change is that the WorldQueryGats trait is no longer necessary:
unsafe trait WorldQuery {
   type Fetch<'a>: Fetch<'a>;
   type ReadOnly: ReadOnlyWorldQuery;
}

unsafe trait ReadOnlyWorldQuery: WorldQuery {}

trait Fetch<'a> {
   type Item;
   fn fetch(&self, entity: Entity) -> Self::Item;
}
One of the biggest wins here is the fact that the previously very verbose type aliases aren't as needed anymore: Q::Fetch::Item can be used in most contexts.
The second is that the internal representation of ReadFetch and WriteFetch can have the redundant Options removed.
trait ComponentStorage {
   type ReadFetch<'a, T>: Fetch<'a, T>;
   type WriteFetch<'a, T>: Fetch<'a, T>;
}

impl ComponentStorage for Table {
   type ReadFetch<'a, T> = TableReadFetch<'a, T>;
   type WriteFetch<'a, T> = TableWriteFetch<'a, T>;
}

impl ComponentStorage for SparseSet {
   type ReadFetch<'a, T> = SparseSetReadFetch<'a, T>;
   type WriteFetch<'a, T> = SparseSetWriteFetch<'a, T>;
}

unsafe impl<T: Component> WorldQuery for &T {
   type Fetch<'a> = T::Storage::ReadFetch<'a, T>;
   type ReadOnly = Self;
}

unsafe impl<'w, T: Component> WorldQuery for &'w mut T {
   type Fetch<'a> = T::Storage::WriteFetch<'a, T>;
   type ReadOnly = &'w T;
}

struct TableReadFetch<'a, T> {
   components: Option<&'a [UnsafeCell<T>]>,
}

struct SparseSetReadFetch<'a, T> {
   components: &'a SparseSet<T>,
}

...
Without GATs, the unused part of each Fetch type is marked as None, which still uses memory and still must be zeroed out during initialization (suboptimal) or intentionally left uninitialized (unsafe, probably unsound). One potential alternatie to implement this is to use the same const generics to drive compile-time discriminated unions:
union StorageSwitch<T, A, B> {
    table: ManuallyDrop<A>,
    sparse_set: ManuallyDrop<B>,
    marker: PhantomData<T>,
}

impl<T: Component, A, B> StorageSwitch<T, A, B> {
    pub const fn new_table(table: A) -> Self {
        Self {
            table: ManuallyDrop::new(table),
        }
    }

    pub const fn new_sparse_set(sparse_set: B) -> Self {
        Self {
            sparse_set: ManuallyDrop::new(sparse_set),
        }
    }
}

impl<T: Component, A: Copy, B: Copy> StorageSwitch<T, A, B> {
    pub fn table(&self) -> A {
        unsafe {
            match T::Storage::STORAGE_TYPE {
                StorageType::Table => *self.table,
                _ => debug_checked_unreachable(),
            }
        }
    }

    pub fn sparse_set(&self) -> B {
        unsafe {
            match T::Storage::STORAGE_TYPE {
                StorageType::SparseSet => *self.sparse_set,
                _ => debug_checked_unreachable(),
            }
        }
    }
}
This is is more unsafe, harder to prove correct without lots of boilerplate or debug build checks, harder to extend and maintain long term, still has sharp and unexpected API edges, still not an optimal solution, and harder for new contributors to understand.
"Why not?": Common Answers to Alternatives

Why not type erase the pointers?

We could do this. In fact, the underlying storage is type erased already. However, this both sacrifices type safety and
erases the lifetimes involved, which can make this API notably more unsafe to use in the general case. If we were OK doing
this, we might as well have written this entire API in C.
Why not use a Box<dyn Trait> for the underlying representation?

As shown in the overview, one of the primary use cases for this API is high performance iteration over in-memory data.
Adding dynamic dispatch in the hotpath for iteration prevents inlining, adds unavoidable overhead, and prevents
optimizations like autovectorization. Trait objects are not a viable solution here.
Why make ReadFetch/WriteFetch an enum mapped to StorageType?

As shown in the overview, one of the primary use cases for this API is high performance iteration over in-memory data.
Making the Fetch types enums forces the branch to be evaluated at runtime instead of at compile time. Even with branch prediction, the costs of processing unused jump instructions still adds unnecessary overhead.