Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Last active December 20, 2019 18:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nikomatsakis/a1e17c7d8f8647d7124cfdf9fec5b3bc to your computer and use it in GitHub Desktop.
Save nikomatsakis/a1e17c7d8f8647d7124cfdf9fec5b3bc to your computer and use it in GitHub Desktop.

Proposal for a shared type library

Goal

To have an ergonomic, flexible library that can abstractly represent Rust types. The library should be expose a "minimal" set of types that is nonetheless able to capture the full range of Rust types. "Minimal" here means that some of the surface differences in Rust types -- e.g., the distinction between built-in types like u32 and user-defined types like a struct -- ought to be minimized, so that code that works with these types (e.g., trait solving) can focus on the most important differences.

At various points in this document you see Question notes, which are places which note some specific uncertainties.

Specific points in the design

Generic over a "type family". Virtually all types in the library are made generic over a trait, TypeFamily. This allows embedders of the library to customize how the types themselves are represented, and to map the types.

The role of the TypeFamily

Most everything in the IR is parameterized by the TypeFamily trait:

trait TypeFamily: Copy + Clone + Debug + Eq + Ord { 
    ..
}

We'll go over the details later, but for now it suffices to say that the type family is defined by the embedded and can be used to control (to a certain extent) the actual representation of types, goals, and other things in memory. For example, the TypeFamily trait could be used to intern all the types, as rustc does, or it could be used to Box them instead, as the chalk testing harness currently does.

Question: What to name the TypeFamily? Since instances of it must be passed around, I've been considering TypeContext, as well, and to call those instances tcx: TypeContext, like in the compiler...?

Representing types

The Ty and TyData type and interning

Types are represented by a Ty<TF> type and the TyData<TF> enum. There is no direct connection between them. The link is rather made by the TypeFamily trait, via the InternedTy associated type:

struct Ty<TF: TypeFamily>(TF::InternedTy);
enum TyData<TF: TypeFamily> { .. }

The way this works is that the InternedTy trait has an associated type and two related methods:

trait InternedTy {
    type InternedTy;
    
    fn intern_ty(&self, data: &TyData<Self>) -> Self::InternedTy;
    fn lookup_ty(data: &Self::InternedTy) -> &TyData<Self>;
}

However, as a user you are not meant to use these directly. Rather, they are encapsulated in methods on the Ty and TyData trait:

impl<TF: TypeFamily> Ty<TF> {
  fn data(&self) -> &TyData<TF> {
    TF::lookup_ty(self)
  }
}

and

impl<TF: TypeFamily> TyData<TF> {
  fn intern(&self, tf: &TF) -> Ty<TF> {
    Ty(tf.intern_ty(self))
  }
}

Note that there is an assumption here that TF::lookup_ty needs no context. This effectively constrains the InternedTy representation to be a Box or & type. To be more general, at the cost of some convenience, we could make that a method as well, so that one would invoke ty.data(tf) instead of just ty.data(). This would permit us to use (for example) integers to represent interned types, which might be nice (e.g., to permit using generational indices).

Question: Should we make lookup_ty a proper method?

The TyData variants

This section covers the variants we use to categorize types. We have endeavored to create a breakdown that simplifies the Rust "surface syntax" of types to their "essence". In particular, the goal is to group together types that are largely treated identically by the system and to separate types when there are important semantic differences in how they are handled.

Variants and their equivalents in Rust syntax

Chalk variant Example Rust types
Apply Vec<u32>, f32
Placeholder how we represent T when type checking fn foo<T>() { .. }
Dyn dyn Trait
Fn fn(&u8)
Alias <T as Iterator>::Item, or the Foo in type Foo = impl Trait and type Foo = u32
BoundVariable an uninstantiated generic parameter like the T in struct Foo<T>

Justification for each variant

Each variant of TyData generally wraps a single struct, which represents a type known to be of that particular variant. This section goes through the variants in a bit more detail, and in particular describes why each variant exists.

  • Apply -- an ApplicationTy<TF> is kind of a "normal Rust type", like Vec<u32> or f32. These consist of a "type name" (in our examples, Vec and f32 respecively) and zero or more generic arguments (respectively, [u32] and []).
    • They are equal to other types (modulo aliases, see below) iff they have the same "type name" and the generic arguments are recursively equal
  • Placeholder -- a PlaceholderIndex type represents a generic type that is being treated abstractly or -- more generally -- the result of a "type function" that cannot be evaluated. For example, when typing the body of a generic function like fn foo<T: Iterator>, the type T would be represented with a placeholder. Similarly, in that same function, the associated type T::Item might be represented with a placeholder.
    • Like application types, placeholder types are only known to be equal
    • When proving negative goals, e.g., not { Implemented(T: Trait) }, placeholders are replaced with inference variables -- the idea is that this goal is only true if there is no type T that implements Trait. Therefore, if we can find no answeres for exists<T> { Implemented(T: Trait) }, then we know that the negation is true.
      • Note that this means that e.g. forall<X> { X = i32 } is false but so is forall<X> { not { X = i32 } }.
  • InferenceVar -- an InferenceVar type represents a type whose value is being inferred. Inference variables may be "bound" or "not bound", but that state is stored externally, in the inference context (see the section on inference below).
    • When equating, inference variables are treated specially in that they become bound (or, if they have already been bound, they are replaced with their value).
    • Inference variables are also integral to canonicalization and other types.
  • Dyn -- a DynTy<TF> represents a dyn Trait type. In chalk, these are represented as an existential type where we store the predicates that are known to be true. So a type like dyn Write would be represented as, effectively, an exists<T> { T: Write } type.
    • Two dyn P and dyn Q types are equal if P => Q and Q => P, in short.
    • There are "automatic" rules for proving that dyn P: P and so forth.
      • (We have to reconcile this with some of the soundness issues associated with dyn trait)
    • Question: in rustc, dyn Foo has a lifetime bound, I guess we need that in chalk?
  • Fn -- a FnTy<TF> represents a fn() type. These are almost just application types, except that they can also contain a forall quantifier over lifetimes (e.g., forall<'a> fn(&'a u32)).
    • Two Fn types A, B are equal A = B if A <: B and B <: A
    • Two Fn types A, B are subtypes A <: B if
      • After instantiating the lifetime parameters on B universally...
        • You can instantiate the lifetime parameters on A existentially...
          • And then you find that P_B <: P_A for every parameter type P on A and B and R_A <: R_B for the return type R of A and B.
    • We currently handle type inference with a bit of a hack (same as rustc); when relating a Fn type F to an unbounded type variable V, we instantiate V with F. But in practice because of the above subtyping rules there are actually a range of values that V could have and still be equal with F. This may or may not be someting to consider revisiting.
  • GeneratorWitness -- a GeneratorWitness<TF> represents the types that may be part of a generator state. Unlike other types, these types can include bound, existential lifetimes, which refer to lifetimes within the suspended stack frame. You can think of it as a type like exists<'a> { (T...) }.
    • This is very similar to an Apply type, but it has a binder for the erased lifetime(s), which must be handled specifically in equating and so forth.
    • It is also similar to a Dyn type, in that it represents an existential type, but in contrast to Dyn, what we know here is not a predicate but rather some upper bound on the set of types contained within.
  • Alias -- an AliasTy<TF> represents some form of type alias. These correspond to Rust types like <T as Iterator>::Item but also impl Trait types and type aliases like Foo<X> where type Foo<X> = Vec<X>. Each alias has an alias id as well as parameters. Aliases effectively represent a type function.
    • In general, an alias type A can also be equal to any other type T if evaluating the alias A yields T (this is currently handled in Chalk via a ProjectionEq goal, but it would be renamed to AliasEq under this proposal).
      • However, some alias types can also be instantiated as "alias placeholders". This occurs when the precise type of the alias is not known, but we know that there is some type that it evaluates to (for example, <T as Iterator>::Item might be treated opaquely as T::Item; similarly impl Trait types are treated opaquely until the latter phases of the compiler).
  • BoundVariable -- a bound variable represents some variable that is bound in an outer term. For example, given a term like forall<X> { Implemented(X: Trait) }, the X is bound. Bound variables in chalk (like rustc) use de bruijin indices (See below).
    • Bound variables are never directly equated, as any bound variables would have been instantiated with either inference variables or placeholders.
    • They do appear in canonical forms and other terms that contain binders.
  • Error -- the Error type represents a type that resulted from some erroneous expression. Error types generally propagate eagerly in an attempt to suppress nonsense errors that are derived by interactions with buggy code.
    • Error should be its own variant because most bits of code will want to handle it somewhat specially -- e.g., maybe it can "unify" with any other type without any effect, and so forth.

Mapping to rustc types

The rustc TyKind enum has a lot more variants than chalk. This section describes how the rustc types can be mapped to chalk types. The intention is that, at least when transitioning, rustc would implement the TypeFamily trait and would map from the TyKind enum to chalk's TyData on the fly, when data() is invoked.

This section describes how a "representative sample" of rustc types can be mapped to Chalk variants.

rustc type chalk variant (and some notes)
Bool Apply
Char Apply
Int(_) Apply
Uint(_) Apply
Float(_) Apply
Adt(_, _) Apply
Foreign(_) Apply
Str Apply
Array(_, _) Apply
Slice(_) Apply
RawPtr(_) Apply
Ref(_, _, _) Apply
FnDef(_, _) Apply
FnPtr(_, _) Fn
Dynamic(_, _) Dyn
Closure(_, _) Apply
Generator(_, _) Apply
GeneratorWitness(_) GeneratorWitness
Never Apply
Tuple(_) Apply
Projection(_) Alias
UnnormalizedProjection(_) (see below)
Opaque(_, _) Alias
Param(_) XXX Placeholder?
Bound(_, _) BoundVariable
Placeholder(_) Placeholder
Infer(_) InferenceVar
Error Error

Application types

An ApplicationTy<TF> is kind of a "normal Rust type", like Vec<u32> or f32. Such types are only "equal" to themselves (modulo aliases, see below), and they may take type arguments. Note that we group together both user-defined structs/enums/unions (like Vec) as well as built-in types like f32, which effectively behave the same.

An ApplicationTy<TF> contains two fields:

  • a "type name" (of type TypeName<TF>); and,
  • a list of generic arguments (of type GenericArguments<TF>).

The TypeName<TF> itself is largely opaque to chalk. We discuss it in more detail elsewhere. The point is that it represents, semantically, either the name of some user-defined type (like Vec) or builin-types like i32. It may also represent types like "tuple of arity 2" ((_, _)) or "fixed-length array" [_; _]. Note that the precise set of these built-in types is defined by the TypeFamily and is unknown to chalk-ir.

Delta vs chalk of today

This section is tracking things that have to be changed in chalk to match what is described in this document.

  • Parameter<TF> needs to be renamed to GenericArgument
  • Vec<Parameter<TF>> needs to be replaced with GenericArguments
  • Extract TypeName into something opaque to chalk-ir.
  • Dyn type equality should probably be driven by entailment.
  • Projections need to be renamed to aliases.
  • The variant we use for impl traits should be removed and folded into type aliases.
  • Remove placeholders and projection placeholders from apply and create placeholder types.
  • Move Error from a TypeName to its own variant.
  • Introduce GeneratorWitness into chalk
  • Complete transition from ForAll to Fn in chalk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment