- Feature Name:
pattern_types
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
This RFC introduces pattern types, which are subtypes of match
able types that
are statically restricted to a subset of the variants of the original type.
Alongside this, this RFC also makes some changes to the Any
trait to ensure
that this new form of subtyping does not lead to unsoundness or breakage.
- There are a number of "magic" types like
NonZeroI32
andOwnedFd
incore
andstd
that expose their validity invariants (non-zero / not-minus-one) to the compiler, allowing them to have extra niches, enabling guaranteed layout optimizations. It would be very useful to be able to create these kinds of types in user code. While currently implemented via a special perma-unstable attribute, past proposals to stabilise this where not succesful. - With
const
generics stabilized, the type system lacks a practical method to constrain these parameters. This would enable stabilizing theas_chunks
method on slices (whereN
may not be zero) and implementingDefault
for arrays of all sizes. - Expressing API invariants like providing a power-of-two alignment value in
Layout::from_size_align
currently necessitates creating extra wrapper types likeAlignment
or adding extra runtime checks to the function. A convenient, pay-as-you-go language-side solution would also eliminate checks around constant values likeNonZeroI32::new(1).unwrap
.
Pattern types are used to add compiler-checked invariants to types. They are
written by adding a pattern to the underlying type in the form type is pattern
.
A few examples:
Option<i32> is Some(_)
i32 is 1..16
&[Ordering] is [Ordering::Less, ..]
The compiler guarantees that values of these types always match their associated pattern.
These types allow moving runtime checks, documentation remarks or safety guarantees into the API of a function. Take for instance a function finding the minimum element of a slice of integers. Ideally, we would want its signature to be:
fn minimum(&[i32]) -> i32
But what if the slice is empty? We could make the function panic in that case,
but that introduces a source for bugs, as the function no longer works for all
inputs (it is not total). To make it total, we could make the return type an
Option<i32>
and just return None
for empty slices. Or, we can use pattern
types! To always return i32
, we just need to ensure that the function is never
called with empty slices and with pattern types, we can do exactly that!
The pattern [_, ..]
matches all non-empty slices, so [i32] is [_, ..]
is the
type of a non-empty slice. With this, we can write our total, i32
-returning
minimum function:
fn minimum(slice: &[i32] is [_, ..]) -> i32 {
// This works, because the compiler knows that there is at least one element!
let [first, rest @ ..] = slice;
let mut minimum = first;
for element in rest {
minimum = i32::min(minimum, element);
}
minimum
}
But how do we call this function? Simple! By just using a slice that the compiler knows is not empty, like an array:
let min = minimum(&[0, 1, 2, 3]);
assert_eq!(min, 0);
For dynamically-sized slices, this is a little more complicated. For instance,
let slice = /* some slice */;
if !slice.is_empty() {
let min = minimum(slice);
}
unfortunately does not work, as the compiler does not know the meaning of the
is_empty
method. What if it always returned false
, even for empty slices?
match
coming to the rescue! If a value is bound by a pattern, the compiler
remembers this, so the following works:
fn print_minimum(slice: &[i32]) {
match slice {
// s matches [_, ..], so it can be used as an &[i32] is [_, ..].
s @ [_, ..] => print!("{}", minimum(s)),
[] => {}
}
}
It doesn't even have to be a match
, even. Any binding works, so you could for
instance use a let-else statement like
let month @ 1..=12 = input else { todo!("handle invalid input") };
to get a value that can be used as an i32 is 1..=12
.
Or you can just pass along the requirement to the caller by using another pattern
type. Note that the patterns don't need to be the same, so you could for instance
use a pattern matching only slices with length two or three and still use the new
type as an &[i32] is [_, ..]
or even as a normal slice (&[i32]
):
fn minimum_of_two_or_three(slice: &[i32] is [_, _] | [_, _, _]) -> i32 {
minimum(slice)
}
In const generics, pattern types constrain the impl
to the matched values of
the type:
impl<const N: usize is 1..=5> MyTrait for [i32; N] {}
The compiler treats this impl
just as if you had written a separate impl
for each matched value:
impl MyTrait for [i32; 1] {}
impl MyTrait for [i32; 2] {}
impl MyTrait for [i32; 3] {}
impl MyTrait for [i32; 4] {}
impl MyTrait for [i32; 5] {}
N.B.: this way, traits like Default
can be implemented for arrays of all sizes:
use core::array;
impl<T> Default for [T; 0] {
fn default() -> [T; 0] {
[]
}
}
impl<T: Default, const N: usize is 1..> Default for [T; N] {
fn default() -> [T; N] {
array::from_fn(|_| T::default())
}
}
Inside the standard library, the as_chunks
method on slices (unstable and
implemented differently at the time of writing)
fn as_chunks<const N: usize is 1..>(self: &[T]) -> (&[[T; N]], &[])
uses pattern types to ensure at compile time that division by zero does not occur.
This method can be easily used with a constant value of N
fn in_twos(slice: &[i32]) -> &[[i32; 2]] {
let (chunks, _) = slice.as_chunks::<2>();
chunks
}
or by using a const-generic with a subpattern:
fn power_of_two_chunks<const N: usize is 1 | 2 | 4 | 8 | 16>(slice: &[i32]) -> &[[i32; N]] {
let (chunks, _) = slice.as_chunks::<N>();
chunks
}
For unconstrained N
s, using it is more complicated, as there is (currently) no
form of match
that only selects one arm for compilation. To work around this,
define a private trait and implement it for all values of N
, using pattern
types to "specialize" for certain cases (Note: this depends on
rust-lang/project-const-generics#26). E.g.:
fn chunks_or_empty<const N: usize>(slice: &[i32]) -> &[[i32; N]] {
trait ChunksOrEmpty {
fn divide(slice: &[i32]) -> &[Self];
}
impl ChunksOrEmpty for [i32; 0] {
fn divide(slice: &[i32]) -> &[Self] {
&[]
}
}
impl<const LEN: usize is 1..> for [i32; LEN] {
fn divide(slice: &[i32]) -> &[Self] {
slice.as_chunks() // works because LEN is not zero
}
}
<[i32; N]>::divide(slice)
}
By using a pattern type, you can provide additional niche information to the compiler. For instance,
pub struct Positive(i32 is 1..);
can only hold positive numbers, so other values can be used as niches for enums, reducing the size of the enum:
size_of::<Positive>() == 4
size_of::<Option<Positive>>() == 4
This optimization can only be applied when the pattern type is not a generic, so
e.g. Option<i32 is 1..>
has size 5. This is necessary because an
Option<i32 is 1..>
may be used as an Option<i32>
and therefore needs to have
the same layout.
Pattern types are a simple form of refinement types, where types are constrained by set relations.
A pattern type is a subtype of another type if all variants in its pattern-set
are contained in the pattern-set of the other type. This allows using e.g. an
i32 is 0..4
as an i32 is 0..42
. It also implies that e.g. i32 is 0..4
and
i32 is 0..=3
are equivalent.
The compiler checks that, when a value is used where a pattern type is expected, the value can only be one of the values matched by the pattern. Values satisfying the pattern can be obtained either by direct construction:
let val = 4;
let bounded: i32 is 2..=42 = val; // Works, because val can only be 4.
by using a value with a type that is a subtype of the expected type:
let val: &[i32] is [_, _, ..] = /* ... */;
let bounded: &[i32] is [_, ..] = val;
or by using a match
expression to get a value bounded by a specific pattern
match val {
// v can be used where a `Ordering is Ordering::Less | Ordering::Greater`
// is expected.
v @ Ordering::Less | Ordering::Greater => /* ... */,
_ => /* ... */,
}
Because it is impossible for a pattern type to hold a value not matched by its
pattern, these values are available as niches for layout optimization purposes.
Enums, however, are in most cases covariant over their generic types, and so
the layout may not change with the pattern. E.g. Option<i32 is 1..=42>
is a
subtype of Option<i32>
and so must have the same layout. Only when the enum
is invariant over the pattern type can the niches be used:
// Can be just a single i32.
enum VariantOrNum {
Variant,
Num(i32 is 0..=23),
}
// Has niches, so Option<Positive> will be smaller that Option<isize>
struct Positive(isize is 1..);
// Is invariant, so niches in the pattern of T can be used
enum Invariant<T> {
Function(fn(T) -> T),
Value(T),
}
Giving precise guarantees about where niche optimization is applied is left for future RFCs.
Traits can only be implemented once per base type. While this trait
implementation might be constrained by using a pattern type, allowing multiple
impl
s for the same type is a form of specialization not currently allowed:
trait SomeTrait {}
trait OtherTrait {}
impl SomeTrait for i32 is 0 | 1 {}
impl SomeTrait for i32 is 0 {} // Error: Conflicting trait implementations.
Furthermore, defining an associated item twice for the same base type is also forbidden:
enum MyEnum {
This,
That,
}
impl MyEnum is This {
fn method(self) {}
}
impl MyEnum is That {
fn method(self) {} // Error: Multiple definitions of the same item.
}
These restrictions are not applicable to the usage of pattern types in const generics, as the resulting types are completely distict in the type system. Therefore, you could write code like the following:
trait SomeTrait {}
struct WithConst<const V: i32> { .. }
impl<const V: i32 is -2..5> SomeTrait for WithConst<V> {}
impl<const V: i32 is 42..48> SomeTrait for WithConst<V> {}
The compiler treats this as if you had written a separate impl
for each
possible variant. This however only works as long as the patterns do not
intersect (that would result in multiple impl
s for the same type).
As (at the moment) pattern types cannot implement Clone
, they also do not
implement Copy
. To avoid inconveniencing users, however, non-generic
pattern types can still be copied around freely if their base type is Copy
.
E.g.:
fn generic<T: Copy>(t: T) {}
generic::<i32 is 0>(0); // ERROR: type does not implement `Copy`
generic(0); // Works as T is inferred as (i32 is _).
does not work, whereas
fn copy(v: i32 is 0) -> (i32 is 0, i32 is 0) {
(v, v)
}
works just fine. In the future, implementations generic over the pattern could be used to relax this restriction.
Alternatively, compiler magic could be used to implement Clone
for pattern
types with a primitive base type.
In previous versions of Rust, the Any
trait was implemented for all 'static
types. With pattern types however, this does hold anymore, as pattern types are
'static
if their base type is, but cannot implement Any
, since doing so
would either cause subtle breakage in existing code or lead to unsoundness.
For example,
let x = match 42 { x @ 42 => x, _ => panic!() };
let y: i32 = *(&x as &dyn Any).downcast_ref().unwrap();
must remain working, even though x
can be inferred to be i32 is 42
.
Therefore, if i32 is 42
implemented Any
, it would need to have the same
TypeId
as i32
, which would result in unsoundness:
// This should not work.
let constrained: i32 is 42 = *(&0 as &dyn Any).downcast_ref().unwrap();
For this reason, the Any
trait is implemented by the compiler only for
'static
, unconstrained types. This works as expected for generic code
with an explicit T: Any
bound. However, there exists a lot of code relying on
the assumption that T: 'static
leads to T: Any
. To ensure that this code
continues to work, the compiler automatically inserts an Any
bound whereever
T: 'static
can be proven. In most cases, like thread::spawn
for instance,
this is needlessly restrictive, so this bound can be manually relaxed by adding
T: ?Any
.
Care should be exercised when adding this bound in libraries, as it can be a breaking change.
It is undefined behaviour to bind a value with a pattern that does not include
the value, or to perform a match
on a value that is not covered by any arm.
The folling examples all exhibit this kind of undefined behaviour.
let x: i32 is 0 = unsafe { transmute(1) }; // UB.
fn take(x: i32 is 0) {}
take(unsafe { transmute(1) }); // UB.
match unsafe { transmute(1) } { // UB.
0 => println!("Ha, I tricked Rust");
}
This example, on the other hand, does not exhibit undefined behaviour, and remains sound, just as it was in previous versions of Rust.
let mut x = 0;
unsafe {
ptr::from_mut(&mut x).cast::<i32>().write(1);
}
match x {
0 => unreachable!(),
1 => println!("reached"),
}
This very limited definition for UB ensures that previously sound code is not
made unsound by the introduction of pattern types or by upgrades in the
analysis. However, it also limits the optimizations the compiler may perform, as
an inferred pattern may not be used to prove that a match
arm cannot be
reached. It also complicates the implementation of lints that recommend the
removal of such unreachable arms.
This RFC proposes to run the analysis in two steps, so just like with lifetimes, type inference generates pattern constraints in the form
pattern_a: pattern_b
(spoken as "pattern A contains pattern B") from subtyping relationships, and leaves these constrains to be solved by the pattern checker. The individual patterns can be either named (a set of variants) or pattern variables.
Method resolution and inference of associated types should not take patterns into account. This is aided by the restrictions that
- associated items can only be defined once per base type
- traits can only be implemented once per base type
With this rule in place, the type system gains the property that changing the return type of a function to a subtype of the original type is not a breaking change and cannot lead to inference failures. So changing
fn get() -> i32;
to
fn get() -> i32 is 0;
is a completely backwards-compatible change.
Therefore, the following changes can be made without breaking programs:
- The type of literal expressions and variant constructors is changed to a pattern type constraining the base type to the specific value being constructed.
- The return type of indexing an array with the open pattern
..
and the return type of theas_slice
/as_mut_slice
methods are changed to the pattern type describing a slice with the length of the array.
Furthermore, when a type is match
ed upon, type inference generates a pattern
constraint: the combination-set of all the matched variants must contain the
pattern-set of the type being match
ed. This replaces the existing
exhaustiveness check.
This RFC proposes to introduce pattern types with a flow-insensitive form of pattern checking, hence pattern bounds must hold in the entire CFG.
To prove a bound, the analysis has to verify that every variant matched by the pattern of the subtype is contained in/matched by the pattern of the base type.
With bounds between named patterns, this is equivalent to today's exhaustiveness check. For pattern variables, on the other hand, the analysis needs to ensure that there is no contradiction between the bounds. For example
a: 0..=10
0..=42: a
and
b: 0..=10
b: 0..=42
0..=100: b
work, but
c: 0..=10
0..=5: c
and
c: 0..=10
d: c
d: 0..=42
0..=5: d
don't.
As with any new language feature, the deciding question at the end is if the benefits outwheigh the added cost in language complexity.
However, at least for niche optimization, some language feature will always
need to exist, because e.g. NonZero*
has [a guaranteed niche]
(https://doc.rust-lang.org/nightly/std/num/struct.NonZeroI32.html#layout-1).
While the presented syntax is unambiguous and backwards-compatible in the normal syntax, introducing pattern types in the current edition would break macros. Therefore, the choosen syntax would need to be introduced with a new edition.
Choosing subtyping instead of coersion for forming relationships between pattern
types makes them easier to use and analyse. It has however the disadvantage of
making [T] is [_, _, _]
and [T; 3]
(or any other array) different types.
This is necessary because &[T] is [_, _, _]
needs to have the same layout as
&[T]
(i.e. a wide pointer) to allow for subtyping, which is incompatible
with &[T; 3]
(a single pointer). The resulting confusion could be remedied by
adding coercion from fixed-length slices to arrays, but that is left as a future
possibility.
As described above, the Any
trait is, in its current form, incompatible with
pattern types, at least if they are created through match
. The changes above
can only be avoided if pattern types use coercion and are created with a
special syntax, which makes them much less appealing as a language feature aimed
at being a lightweight addition.
Patterns provide a useful middle-ground between exhaustive enumeration and fully-general expressions.
- Sticking to a non-turing-complete mechanism avoids Rice's Theorem, allowing checking to be tractable.
- Patterns allow expressing more complicated properties than just "this value is the niche" or "it's exactly this variant".
- Reusing an existing set of constructs is easier for humans to learn, and helps simplify the rules for what has which type when.
This is like how Rust has exhaustiveness checking for patterns in match
,
but not for if
s.
As mentioned in the reference section, multiple trait implementations on the same base type conflict, even when their patterns do not intersect. In particular, this prohibits associated types from depending on the pattern. This rule is enforced to avoid complex interactions between pattern checking and the type system that would need further studying to ensure tractability. It is possible that the requirement could be relaxed in the future, allowing specialization depending on the pattern.
It is very difficult to reconcile a coercion-based approach with matching, as applying the coercion rules as they are today to pattern types would break previously working code. Consider this snippet:
let x = match 42 {
x @ 42 => x,
_ => return,
};
let y: &mut &i32 = &mut &x;
If the type of x
changes to i32 is 42
, the compiler will reject the snippet,
as reference creation is not a coercion site.
Making pattern types subtypes of their base types avoids these kinds of problems,
and additionally makes it possible to refine the return type of a function with
patterns in a backwards-compatible way, which is a necessary precondition for
the adoption of pattern types in existing APIs like std
s.
However, it also has some downsides:
Option<PatternType>
needs to have the same layout asOption<BaseType>
- traits can only be implemented once per base type, eliminating/complicating
specializations like
i32: Div<i32> // panicking version i32: Div<i32 is ..=-1 | 1..> // non-panicking version
Currently, niches for NonZeroI32
and friends are implemented using the perma-
unstable rustc_layout_scalar_valid_range
attribute. This attribute makes
constructing the type unsafe
and does not allow const-generics parameters to
be used. To solve this, a previous
proposal attempted to introduce
a new wrapper type for integers, associating them with a valid range similar to
the pattern refinement in this proposal:
#[lang_item = ".."]
pub struct Ranged<T, const VALID: RangeInclusive<T>>(..);
While this mostly solves the niche optimization use-case, it
- is inflexible, as it only allows a simple inclusive range as pattern.
- is unergonomic, since a special constructor is needed and no subtype relations are formed.
- does not work with enums and slices. The design has the advantage of not requiring any new syntax or analysis, however.
For the niche optimization usecase, other solutions have been proposed:
- RFC 3334 tried to make the existing internal attribute for setting niches public, but was closed in favour of
- PR 103724, which tried to
introduce a
Ranged
wrapper annotating an integer typeT
with its valid range. Was closed in favour of - PR 107606, implementing a basic subset of the feature proposed in this RFC
Prior art for refinement types includes:
- Liquid Haskell uses an SMT solver to implement flexible refinement types atop the normal Haskell type system. While allowing even more flexibility, this is out-of-scope for a general-purpose compiler.
- Flux is the Rust equivalent to Liquid Haskell.
The flexibility of the refinement bounds makes it easy to generate practically
unsolvable problems:
By not considering arithmetic operations and limiting the checks to a simple set-based analysis, proofs can be always achieved and in reasonable time, assuming that the pattern is simple (complex patterns are hard to write anyways because they are long).
#[flux::sig(fn(n: u128) -> u128{v: v < n})] fn collatz(num: u128) -> u128 { let mut len = 0; while num != 1 { len += 1; match num % 2 { 0 => num = num / 2, 1 => num = 3 * num + 1, } } len }
- Zulip discussion about this RFC
- Nico Lehman's (maintainer of Flux) thoughts on the pre-RFC
The current proposal uses is
as the separator between type and pattern. This
is unambigous and easy to read, but does not fit well with the @
separator
used in match guards.
If a type is specified without a pattern, e.g. i32
, there are two possible
interpretations:
- The type is unconstrained:
i32 is _
- The pattern needs to be inferred:
i32 is ?
(pseudo-syntax)
For everything but local variables, option 1 is obviously the right choice, as type inference should be local.
But for local variables, the choice becomes less clear:
- is stricter and may prevent subtle breakage
- allows more code and leaves more possibilities for the future.
-
Allow using the
enum
variant name as a shorthand for the pattern type constrained to only the variant and allow accessing the variant's fields (aka variant types).type Register = /* */; enum Operation { Nop, Add { dest: Register, source: Register, } } fn assemble_add(buf: &mut String, op: Operation::Add) { // Instead of `Operation is Operation::Add`. writeln!(buf, "add {dest}, {source})", op.dest, op.source); }
-
Introduce a shorthand for constrained slices, where the length is a pattern:
[i32; 314..=4096]
-
Introduce an empty pattern
$ty is !
that behaves similar to!
which allows changing the return type of functions to clarify behavour:trait Trait { fn method() -> i32; } impl Trait for () { // `-> !` would not have satisfied the trait's signature fn method() -> i32 in ! { loop { println!("rust rulezz") } } } fn main() { // Never-to-any coercion let _: u16 = ().method(); }
-
Make pattern checking control-flow aware (like polonius), meaning that bounds only have to hold at the point where they are used. This would allow code like
let mut num = 25; // some code later num = 5; let bind: i32 is 0..=10 = num;
to work.
-
Consider arithmetic expressions for pattern checking. While this makes trivially-correct code like
let i32 is 2 = 1 + 1;
work, it also significantly complicates analysis.
I assume you mean multiple impl blocks? This works today:
How does the pattern checker/pattern inference handle invariance? For example, what would be the inferred type of
m
in the following code?TypeId::of<i32 is 0>::()
presumably can't be equal toTypeId::of<i32 is 0 | 1>::()
. Could this force the compiler to generate a bunch of extra monomorphizations for all the different pattern types that a function is called with? Does it pose a semver hazard?Future possibility: have a special
!
pattern that matches no values, egi32 is !
would be uninhabited.is !
types could then have unique coercions, similar to those of the never type.