Skip to content

Instantly share code, notes, and snippets.

@qinwf
Created February 26, 2016 08:10
Show Gist options
  • Star 22 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save qinwf/55f651daece54a6fdfa2 to your computer and use it in GitHub Desktop.
Save qinwf/55f651daece54a6fdfa2 to your computer and use it in GitHub Desktop.
Code Reuse in Rust

Code Reuse in Rust

This post is not a introduction to Rust, and it covers many features and tells you when to use thems.

Basic features

  1. loop - you do not need to write 1000 lines for a loop
  2. vectorize style function - functional loop
  3. recursive - loop in function
  4. switch - less if else
  5. common data structer vector map - handle a bunch of data easier
  6. package system - use code from other folks

Function

The most common form of a code-reuse is function. But you may write some functions to do sevaral similar jobs. You might want "meta-programming" (code to write code) or "polymorphism" (code that can handle different kinds of data).

Lots of features in different languages hook into all this:

  1. macros
  2. templates
  3. generics
  4. inheritance
  5. function pointers
  6. interfaces
  7. overloading
  8. unions

and so on.

Major strategies

There are three major strategies: monomorphization, virtualization, and enumeration.

Monomorphization

copy-pasting code with details changed.

good - custom implementation, easier for compiler to understand

bad - duplicated code, hard to change, and it might make programs bloated and slow.

Virtualization

Adding more indirection. Virtualizing allow a single function to have custom behaviours, without having to copy-paste it.

good - less duplicated code.

bad - worse for performance to add lots of indirection. Heap allocation, and jumping to random places in memory. Compilers might inline that function call, but they won't always.

Examples:

  • function pointers and void pointers in C
  • callbacks, inheritance (Java, C++, C#)
  • generics in Java
  • prototypes in JavaScript

Note that in many of these examples, virtualization of functionality and data is combined.

Inheritance often results in virtualization of both code and data. If I have a pointer to an Animal, it might might actually be a Cat or a Dog. If I ask the animal to speak, how does it know to "bark" or "meow"?

The standard way to handle this is for every single instance of a type in an inheritance hierarchy to secretly store a pointer to various pieces of information that may be needed at runtime, called a "vtable".

The standard thing for a vtable to store would be a bunch of function pointers (one of which would be this instance's implementation of speak), but it might also store things like size, alignment, and the actual type.

Enumeration

Enumerations are a compromise between virtualization and monomorphization. Enumerations can be anything from a fixed list. The usual strategy is to pass around an integer "tag" which specifies which implementation to use.

good - benefits of Monomorphization and Virtualization

bad - know the full set of choices upfront.

Examples:

  • Many languages provide a notion of an enumeration as an enum, but often lack support for those enums having associated data, which limits their usefulness.
  • C provides the ability to declare that a field is the union of two types, but leaves it up to the programmer to determine which type the field should be interpreted as.
  • Many functional languages provide tagged unions, which are the combination of an enum and a C union, allowing arbitrary data to be associated with each variant of an enum.

Rust

Rust has 5 ways to do code reuse:

  • Macros (simple monomorphization!)
  • Enums (full enumeration!)
  • Traits (where the complexity is)
  • Generics (Monomorphization)
  • Traits Objects (Virtualization)
  • Compiler plugins (Not stable yet)

1. Macro

When to use

Macros are the easiest. They're pure, raw, code reuse. It works on parts of the AST (Abstract Syntax Tree; chunks of source code). You give them some bits of AST, and it spews out some new bits of AST at compile time.

There's no type information beyond stuff like "this string looked like a type name".

When to use:

  • you want to extend the language
  • you want to copy-paste some code with minor tweaks

Bad:

  • the monomorphization feature works like Generics, but with no type information, so the emmited compiler errors can be hard to understand.

Macros should be your last resort for code reusing problem. You can always try #[inline] function and other methods with type information on compile time.

Examples:

  • println!
  • thread_local!
  • vec! - vec! allows Vecs to be defined with the same syntax as array expressions.
  • try! - early return with less code

2. Enums

When to Use

Enums in Rust are exactly the tagged unions.

What about your own custom enums? You want this code to be generic over IPv4 and IPv6. You are absolutely certain that you don't care about the possibility of some hypothetical IPv8, and honestly you don't have a clue how to define an interface that would handle that anyway.

enum IpAddress {
    V4(IPv4Address),
    V6(Ipv6Address),
}

fn connect(addr: IpAddress) {
    // Check which version it was, and choose the right impl
    match addr {
        V4(ip) => connect_v4(ip),
        V6(ip) => connect_v6(ip),
    }
}

3. Traits

When to Use

In due time, traits will also be the center piece for specialization and probably every other big new user-facing feature added to Rust.

traits are just interfaces. Trait tell you what a type can do, and what is it.

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

trait DuckType{
	fn walks();
	fn swims();
	fn quacks();
}

traits as interfaces in Java or C#, but there's some slight differences.

In C# and Java, as far as I know, the only one who can implement MyTrait for MyType is the declarer of MyType.

But in Rust, the declarer of MyTrait can also implement it for MyType.

This lets a downstream library or application define interfaces and have them implemented by types declared in e.g. the standard library.

Trait implementations are only visible to code that has the relevant trait in scope. This is why doing I/O without importing the Read and Write traits often falls apart.

Coherence

The bulk of these restrictions are:

you need to either be declaring the trait or declaring the type to impl Trait for Type, and crates can't circularly depend on each-other (dependencies must form a DAG).

The messy case is that this is actually a lie, and you can do things like impl Trait for Box<MyType>, even though Trait and Box are declared elsewhere.

Most of the complexity in coherence, to my knowledge, is dealing with these special cases. The rules that govern this are the "orphan rules", which basically ensure that, for any web of dependencies, there's a single crate which can declare a particular impl Trait for ...

The result is that it's impossible for two separate libraries to compile but introduce a conflict when imported at the same time.

Also that wasn't even sufficient and a special hack had to be added to the compiler called #[fundamental] which declares that certain things have special coherence rules.

4. Generics

When to Use

  • When there are some containers you want to abstract the data and behavior, you can try Generics Types with struct or enum.

  • When there are traits may work with Generics types, you can try Generics Traits. For example, The Add<T, U> trait needs two generic parameters: T is the type of the RHS summand, and U is the type of the sum.

  • When function contains similar logic for different types, and you can try Generics Functions.

Declaring a monomorphic interface is done with what Rust calls generics. A monomorphic interface can be converted into a virtualized one by Trait Objects.

Generics Types

A generic struct. <..> is how we declare generic arguments. One can create a version of Generic for any type, unlike Concrete, which only works with u32.

struct Generic<T> {
    data: T,
}

Implementing functionality for a specific version of Foo. Note that this is not "specialization", in the sense that any names declared here can't conflict with other impls.

impl Generic<u32> {
    fn is_big(&self) -> bool {
        self.data > 120
    }
}

Implementing functionality for all choices of T. Note that the "impl" is also Generic here.

Hopefully this in conjunction with the previous example demonstrates why the extra is necessary.

impl<T> Generic<T> {
    fn new(data: T) -> Generic<T> {
        Generic { data: data }
    }

    fn get(&self) -> &T {
        &self.data
    }

    // ops need default feature for specialization 
    // default fn is_big(&self) ->bool{
    // 	return self.data >200
    // }
}

Compared Generics Types with Plain Traits

trait MyTrait {
    fn hello_word(&Self) -> String;
}

and

struct<T> MyTrait {
    t: T,
    hello_world: fn(&T) -> String,
}

impl<T> MyTrait<T> {
    fn new(t: T, hello_world: fn(&T) -> String) -> MyTrait<T>;

    fn hello_world(&self) -> String {
        (self.hello_world)(self.t)
    }
}

Both excerpts above allow the user to specify in a dynamic manner how hello_world should behave.

The one difference (semantically) is that the trait implementation guarantees that for a given type T implementing the trait, hello_world will always have the same behavior.

But the struct implementation allows having a different behavior on a per instance basis.

Generics Traits

A normal trait declaration.

trait Clone {
    fn clone(&self) -> Self;
}

A generic trait declaration. Generic traits introduce a relationship to some other type.

In this case, we want to be able to compare our type to other types. This will be clearer when we see impls.

trait Equal<T> {
    fn equal(&self, other: &T) -> bool;
}

Plain-jane trait impl

impl Clone for Concrete {
    fn clone(&self) -> Self {
        Concrete { data: self.data }
    }
}

Implementing a generic trait concretely

impl Equal<Concrete> for Concrete {
    fn equal(&self, other: &Concrete) -> bool {
        self.data == other.data
    }
}

We can do this for types we don't own, like primitives!

impl Clone for u32 {
    fn clone(&self) -> Self {
        *self
    }
}

impl Equal<u32> for u32 {
    fn equal(&self, other: &u32) -> Self {
        *self == *other
    }
}

// Taking advantage of that sweet generic trait!
impl Equal<i32> for u32 {
    fn equal(&self, other: &i32) -> Self {
        if *other < 0 {
            false
        } else {
            *self == *other as u32
        }
    }
}

Implementing a generic trait for a concrete type generically

impl<T: Equal<u32>> Equal<T> for Concrete {
    fn equal(&self, other: &T) -> bool {
        other.equal(&self.data)
    }
}

Implementing a concrete trait for a generic type generically.

Note that we require that T implements the Clone trait! This is a trait bound.

impl<T: Clone> Clone for Generic<T> {
    fn clone(&self) -> Self {
        Generic { data: self.data.clone() }
    }
}

Implementing a generic trait for a generic type generically.

Note that we have two generic types in play; T and U.

impl<T: Equal<U>, U> Equal<Generic<U>> for Generic<T> {
    fn equal(&self, other: &Generic<U>) -> bool {
       self.equal(&other.data)
    }
}

Generics Functions

And finally, individual functions can also be generic.

impl Concrete {
    fn my_equal<T: Equal<u32>>(&self, other: &T) -> bool {
        other.equal(&self.data)
    }
}

Interesting problem: we've inverted the order on equal here, (x == y is being evaluated as y == x). How can we express T: Equal<U> to fix this?

Note that we can't do this at the time where we declare T, because U doesn't exist yet!

impl<T> Generic<T> {
    fn my_equal<U: Equal<T>>(&self, other: &Generic<U>) -> bool {
       other.data.equal(&self.data)
    }
}

So we can see that there's a lot of combinations of situations you can run into as soon as you want to start describing interfaces and providing implementations that are generic over those interfaces.

It may or may not surprise you to learn that some really common functions get copied a lot. For instance, brson measured that 1700 copies of Option::map were being created while building Servo. Granted, virtualizing all those calls probably would have been disastrous for runtime performance.

Generics Inference

We can specify the type of y explicitly.

let y: Vec<u8> = x.clone().into_iter().collect();

Or directly tell collect what its generic arguments should be with the turbofish operator ::<>!

let y = x.clone().into_iter().collect::<Vec<u8>>();

5. Trait Objects

When to Use

Rust's solution to virtualization is called trait objects.

All you do is specify that the type of something is some trait, and Rust will handle the rest. Of course, in order to do this, you need to put your type behind a pointer like &, &mut Box, Rc, or Arc.

trait Print {
    fn print(&self);
}

impl Print for i32 {
    fn print(&self) { println!("{}", self); }
}

impl Print for i64 {
    fn print(&self) { println!("{}", self); }
}

fn main() {
    // Normal static, monomorphized usage
    let x = 0i32;
    let y = 10i64;
    x.print();      // 0
    y.print();      // 10

    let data: [Box<Print>; 2] = [Box::new(20i32), Box::new(30i64)];

    // Now we can print all the data in this list uniformly.
    for val in &data {
        val.print();    // 20, 30
    }
}

Box is a trait object, and therefore can store any implementor of Print. To create a Box, we just create a Box<T: Print>, and try to put it somewhere that expects a Box<Print>.

Here we specify that data contains Box<Print>s, so the array literal happily does the coercion for us! Note that we use this to insert an i32 and an i64 into the same list, which would be prevented if we used static dispatch.

Obejct Safe

This Clone trait defines a function that returns Self by-value. What would happen if we tried to write the following?

trait Clone {
    fn clone(&self) -> Self;
}

fn main() {
    let x: &Clone = ...; // doesn't matter
    let y = x.clone();   // Clone the data...?
}

How much space on the stack should y reserve in this case? What type should y even have?

The answer is that we can't know at compile time. This means that a Clone trait object is actually nonsensical.

More generally, any trait that talks about Self by-value anywhere can't be turned into a trait object.

Trait objects are also, interestingly, implemented in a rather unconventional way. Recall that the usual strategy for this sort of thing is for virtualizable types to store a secret vtable field. This is annoying for two reasons.

First, everything is storing and setting a pointer even when it doesn't need to. Whether you will actually be virtualized or not doesn't matter, because a type needs to have a fixed layouts. So if some Widgets could be virtualized, then all Widgets need to store that pointer.

Second, once you get to such a type's vtable, it's actually non-trivial to determine where the functions you're interested in are stored. This is because interfaces are, effectively, multiple inheritance (C++ meanwhile literally has multiple inheritance). As an example, consider this set of types, traits, and implementations:

trait Animal { }
trait Feline { }
trait Pet { }

// Animal, Feline, Pet
struct Cat { }

// Animal, Pet
struct Dog { }

// Animal, Feline
struct Tiger { }

Rust represents trait objects as fat pointers. Box<Pet> is not, in fact, a single pointer. It's a pair of pointers, (data, vtable). The vtable pointed to also isn't the vtable for the data's type, it's a vtable. Specifically, it's a custom vtable made explicitly for Pet:

However it has its own drawbacks.

First, fat pointers obviously occupy twice as much space, which may be a problem if you're storing a lot of them.

Second, we're monomorphizing vtables for every requested combination of traits.

This is possible because everything has a statically known type at some point, and all coercions to trait objects are also statically known.

Still, generous use of virtualization could lead to some serious bloat!

Finally, let's return to a claim that was made several section ago: a monomorphic interface can be converted into a virtualized one by the user. This is done by a feature called "impl Trait for Trait", which means that trait objects implement their own traits. The end result is that the following works:

// Stuff we've seen before...
trait Print {
    fn print(&self);
}

impl Print for i32 {
    fn print(&self) { println!("{}", self); }
}

impl Print for i64 {
    fn print(&self) { println!("{}", self); }
}

?Sized specifies that T may be virtualized.

Sized is a trait that all concrete types implicitly implement. However, things like Traits and [T] are "unsized types".

Specifying that T: ?Sized indicates to the compiler that it should be an error to ever try to use T by-value, because Sized might not be implemented.

fn print_it_twice<T: ?Sized + Print>(to_print: &T) {
    to_print.print();
    to_print.print();
}

fn main() {
    // Static dispatch; monomorphized version for each type. Because there is no traits
    print_it_twice(&0i32);  // 0, 0
    print_it_twice(&10i64); // 10, 10

    // Causes vtables for i32::Print and i64::Print to be constructed
    let data: [Box<Print>; 2] = [Box::new(20i32), Box::new(30i64)];

    for val in &data {
        // Dynamic dispatch; a single virtualized version is monomorphized.
        // Annoying manual conversion from &Box<Print> to &Print because
        // generics and auto-deref have a bad interaction.
        print_it_twice(&**val);    // 20, 20, 30, 30
    }
}

Extra: Associated Types

When to Use

Associated types are a grouping mechanism, so they should be used when it makes sense to group types together.

The Graph example, shows an example of this: you want a Graph to be generic, but once you have a specific kind of Graph, you don't want the Node or Edge types to vary anymore. A particular Graph isn't going to want to vary those types within a single implementation, and in fact, wants them to always be the same. They're grouped together, or one might even say... associated.

trait MyTrait {
    type Return;
    fn hello_world(&Self) -> Self::Return;
}

Or:

trait MyTrait<Return> {
    fn hello_world(&Self) -> Return;
}

Are equivalent to the late binding of methods above:

  • the first one enforces that for a given Self there is a single Return associated
  • the second one, instead, allows implementing MyTrait for Self for multiple Return

That is, StackIter only implements Iterator. It's never going to implement Iterator. In fact, upon reflection, allowing this would probably be a bad idea.

trait Iterator {
    // Every iterator yields a particular type of
    // item, which they must specify.
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

/// An iterator that yields the elements
/// from a Vec, Stackwise.
struct StackIter<T> {
    data: Vec<T>,
}

// An iterator over the range [min, max)
struct RangeIter {
    min: u32,
    max: u32,
}

impl<T> Iterator for StackIter<T> {
    // Associated items can still be
    // derived from other generics
    type Item = T;
    fn next(&mut self) -> Option<Self::Item> {
        self.data.pop()
    }
}

impl Iterator for RangeIter {
    // Or concretely specified.
    type Item = u32;
    fn next(&mut self) -> Option<Self::Item> {
        if self.min >= self.max {
            None
        } else {
            let res = Some(self.min);
            self.min += 1;
            res
        }
    }
}

Oh, one last thing: traits objects don't work with associated types for the exact same reason they don't work with by-value Self. No way to know the type statically, so no way to work with it. If you specify all the associated types, it does work though! That is, Box doesn't work, but Box<Iterator<Item=u32>> does.

Bound

impl<T> Generic<T> {
    fn my_equal<U>(&self, other: &Generic<U>) -> bool
        where T: Equal<U>
    {
       self.data.equal(&other.data)
    }
}

fn min<I>(mut iter: I) -> Option<I::Item>
    where I: Iterator,
          I::Item: Ord,
{
    if let Some(first) = iter.next() {
        let mut min = first;
        for x in iter {
            if x < min {
                min = x;
            }
        }
        Some(min)
    } else {
        None
    }
}

Where clauses allow us to specify bounds on arbitrary types. That's it. It's just a more flexible syntax than the inline one.

You can put where clauses at the top of

  • trait declarations
  • trait implementations
  • function declarations
  • enum declarations
  • struct declarations

Pretty much anywhere you could've started defining generic arguments.

trait Print {
    fn print(&self);

    // `where Self: Sized` means this function
    // isn't available for trait-objects. So it's
    // safe to use Print as a trait object!
    fn copy(&self) -> Self where Self: Sized;
}


impl Print for u32 {
    fn print(&self) { println!("{}", self); }
    fn copy(&self) -> Self { *self }
}

fn main() {
    let x: Box<Print> = Box::new(0u32);
    x.print();
}

Intergrating all these Kinds of Code Reuse

High Kinded Trait traits bond to traits

/// A filter over an iterator
struct Filter<I, F> {
    iter: I,
    pred: F, // will be Fn<>
}

/// Constructs a filter
fn filter<I, F>(iter: I, pred: F) -> Filter<I, F> {
    Filter { iter: iter, pred: pred }
}

impl<I, F> Iterator for Filter<I, F>
    where I: Iterator,
          F: Fn(&I::Item) -> bool,  // Magic! What's the lifetime?
{
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        while let Some(val) = self.iter.next() {
            if (self.pred)(&val) {
                return Some(val);
            }
        }
        None
    }
}

fn main() {
    let x = vec![1, 2, 3, 4, 5];
    for v in filter(x.into_iter(), |v: &i32| *v % 2 == 0) {
        println!("{}", v); // 2, 4
    }
}

It turns out that F: Fn(&I::Item) -> bool is sugar for

for<'a> F: Fn(&I::Item) -> bool

Where for<'a> is intended to read as literally "for all 'a". We call this a higher rank trait bound (HRTB). Unless you're into some deep type-level nonsense, you will literally only ever see HRTBs used with the function traits, and function traits have this nice sugar so you usually don't have to use HRTBs at all. Note that HTRBs literally only work for lifetimes right now.

Higher Kinded Types types abstract types

For instance, say we wanted to write a data structure that uses reference-counted pointers internally. Rust's standard library provides two choices: Rc, and Arc. Rc is more efficient, but Arc is thread-safe. For the purposes of our implementation, these two types are completely interchangeable. To the consumers of out implementation, which type is used has important semantic consequences.

// Kind: Type -> Type
trait RcLike<T> {
    type Output;
    fn new(data: T) -> Self::Output;
}

// Stubs
struct Rc_;
struct Arc_;

impl<T> RcLike<T> for Rc_ {
    type Output = Rc<T>;
    fn new(data: T) -> Self::Output {
        Rc::new(data)
    }
}

impl<T> RcLike<T> for Arc_ {
    type Output = Arc<T>;
    fn new(data: T) -> Self::Output {
        Arc::new(data)
    }
}

struct Node<Ref, T>
    // This `where` clause is the problem! (more on that later)
    where Ref: RcLike<Node<Ref, T>>,
{
    elem: T,
    // basically: Option<Rc<Node<Rc_, T>>
    next: Option<<Ref as RcLike<Node<Ref, T>>>::Output>
}

struct List<Ref, T>
    where Ref: RcLike<Node<Ref, T>>,
{
    head: Option<<Ref as RcLike<Node<Ref, T>>>::Output>
}

impl<Ref, T, RefNode> List<Ref, T>
    where Ref: RcLike<Node<Ref, T>, Output=RefNode>,
          RefNode: Deref<Target=Node<Ref, T>>,
          RefNode: Clone,
{
    fn new() -> Self {
        List {
            head: None
        }
    }

    fn push(&self, elem: T) -> Self {
        List {
            head: Some(Ref::new(Node {
                elem: elem,
                next: self.head.clone(),
            }))
        }
    }

    fn tail(&self) -> Self {
        List {
            head: self.head.as_ref().and_then(|head| head.next.clone())
        }
    }
}

Generativity

Consider arrays. Normally, when we iterate an array we do it to get the elements.

However this requires us to provide various iterators for all the different modes of access: Iter, IterMut, and IntoIter.

Wouldn't it be more composable if the iterator told us where to look, but let us decide how to perform the access? This feature might use unsafe.

@doowonee
Copy link

It's wonderful! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment