Skip to content

Instantly share code, notes, and snippets.

@bodil

bodil/index.md Secret

Last active February 8, 2023 15:25
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bodil/6bea70da6b946fc70cdab3a5473c8a05 to your computer and use it in GitHub Desktop.
Save bodil/6bea70da6b946fc70cdab3a5473c8a05 to your computer and use it in GitHub Desktop.
I Don't Think We're In Glasgow Any More: A Rust Survival Guide for Haskell Programmers

I Don't Think We're In Glasgow Any More: A Rust Survival Guide for Haskell Programmers

When I first started learning Rust, the language that was freshest in my mind, and which seemed the most immediately relevant as a point of reference, was Haskell. In the end, this turned out to be both a good idea and a terrible one. The two languages are surprisingly similar, so much that as a Haskell programmer you start out with a significant head start when learning Rust, but if you bring all your idioms and expectations over from one to the other, you will be writing code that is genuinely terrible. The trick lies in understanding, and accepting, what makes Rust different from Haskell, and why what's right in one context isn't always right in another.

This is the guide I wish I'd had back then as a Haskell programmer learning Rust.

A Primer

This chapter aims to teach you, very superficially, how to write code in Rust by leaning on what you already know about Haskell and explaining the differences.

Basics

I think we can all agree that Haskell possesses the clearest, most beautiful syntax of all languages{{footnote(name="1")}}. Unfortunately, Rust has gone down a more traditional path. If you're familiar with C++, Java or any of the other bastard children of Thompson, Kernighan and Richie, you'll find Rust syntax equally familiar, but underneath all the squiggly brackets hides a language perhaps more reminiscent of ML than C. Let's have a look.

Functions

Here's a simple function declaration in Haskell, with types.

add :: Integer -> Integer -> Integer
add a b = a + b

And here's the Rust equivalent.

fn add(a: i64, b: i64) -> i64 {
    a + b
}

We note a few things: Rust, being a "systems language," cares a lot about the representation of numeric types compared to Haskell. There are types for every size of integer: i8, i16, i32, i64 and even i128, with the equivalent unsigned versions u8 through u128, plus isize and usize denoting the platform preferred integer size (on 64-bit CPUs, that's 64 bits). There's also bool for booleans, which can be thought of as 1-bit integers as well as logical values.

Like C++ and Java, Rust mixes type declarations in with the function declaration, but in a momentary flash of good taste, at least it puts the type after the binding, not before. Type declarations are also mandatory at this point, though Rust does have type inference generally, and in any case, while they're not mandatory in Haskell, I think we can all agree they ought to be.

We also note that Rust does not do currying by default. The add function takes two arguments and anything else would be an error. You could define a function which takes one argument and returns another function which takes one argument and returns the result, but you're going to have to do the work yourself.

Finally, we note the absence of the return statement you'd expect in a Java function: in another flash of good taste, everything that could go inside a Rust function, with the exception of let statements, is an expression, and a function in Rust takes exactly one expression and returns its result.

Type Variables

Let's try this again with type variables: here's the ever popular identity function.

id :: a -> a
id a = a
fn id<A>(a: A) -> A {
    a
}

All types in Rust start with an upper case letter, including type variables, but excepting primitive types like i64 etc. The convention is that type variables have single letter names, and that they start at T, rather than A, but in my opinion this is misguided and I will start my type variables at A until I draw my dying breath.

Pattern Matching

Rust has pattern matching at the same level Haskell does, but not on function declarations.

fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = n + fib (n - 1)
fn fib(n: usize) -> usize {
    match n {
        0 => 0,
        1 => 1,
        n => n + fib(n - 1),
    }
}

ADTs

Rust has algebraic data types, though it makes the distinction between structs, with a single type constructor, and enums, with multiple.

data Pair a b = Pair a b
data Maybe a = Just a | None
struct Pair<A, B>(A, B);

enum Maybe<A> {
    Just(A),
    None,
}

Mind that the type constructors inside an enum are scoped to it, not to the scope you're in, so in order to construct a Just, you would use Maybe::Just(a) rather than Just(a). You could also import it into the current scope with a use statement, but they're out of scope for this guide.

Pattern Matching ADTs

You can pattern match on these as you'd expect, but note again how Rust requires a match block because the function syntax doesn't pattern match for you.

isJust :: Maybe a -> Bool
isJust (Just _) = True
isJust None = False
fn is_just<A>(maybe: Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}

Methods

A point where Rust starts to diverge from Haskell, which has stridently resisted anything resembling object orientedness, is in allowing you to associate functions with ADTs. Going with the Maybe type we defined above, we can define is_just as a "method" on Maybe like this, with an impl block.

impl<A> Maybe<A> {
    fn is_just(self) -> bool {
        match maybe {
            Just(_) => true,
            None => false,
        }
    }
}

Note the self argument, which has special meaning: self refers to the instance of Maybe you call it on, so that maybe.is_just() would be the same as calling is_just(maybe) using our standalone function declaration above.

If you know Rust, you'll spot a problem with the is_just declaration having to do with ownership, but we'll get to that later. For now, let's stay focussed on syntax and charge on like ownership isn't even a thing.

By the way, the Maybe type is in the Rust standard library, except it's called Option. Likewise, Either exists and is called Result. They work the same way they do in Haskell.

Type Classes

Perhaps the best thing about Rust is that it has type classes in exactly the same way Haskell does, except they're called "traits."

class Eq a where
  (==) :: a -> a -> Bool

instance (Eq (Maybe a)) where
  Just left == Just right = left == right
  None == None = True
  _ == _ = False
trait Eq<A> {
    fn eq(left: A, right: A) -> bool;
}

impl<A> Eq<A> for Maybe<A> {
    fn eq(left: A, right: A) -> bool {
        match (left, right) {
            (Just(left), Just(right)) => left == right,
            (None, None) => true,
            _ => false,
        }
    }
}

We could also have implemented eq using the self keyword that we know from method declarations, which is valid in trait impls the same way it is in bare impls.

Of course, Rust already has an Eq type class, and even a PartialEq type class, which is the one that implements the eq method in reality, because some things, like floating point numbers, have funny equality rules for some values, like NaN, which is not equal to itself, so when you're implementing equality for your types, you'll generally define PartialEq for them, and then add Eq as a marker trait saying your type isn't as funny as floating point numbers and you can always rely on identical values being equal. You can read about the peculiarities of Eq and PartialEq in the Rust API docs if you're interested, and there's a similar distinction between Ord and PartialOrd.

In Rust, like in Haskell, implementing Eq for a type lets you use the == operator on values of that type, and implementing Ord lets you use >, < and friends. Unlike in Haskell, though, these operators are built in and inextricably tied to their corresponding type classes, and you can't declare new ones. The most excruciating example of this to a Haskell programmer is >>=, which means "bit shift right and assign," and the type signature of the corresponding ShrAssign type class is incompatible with what you might have wanted it to do.

Conditionals

Rust's conditional statements are identical to Haskell's, save for the syntax.

evenOrOdd :: Integer -> String
evenOrOdd n = if (n `div` 2) * 2 == n then "even" else "odd"
fn even_or_odd(n: isize) -> &'static str {
    if (n / 2) * 2 == n {
        "even"
    } else {
        "odd"
    }
}

You might be wondering about the &'static str. That's a reference with a lifetime, and we'll get to that shortly. Let's get through the basic syntax first, though.

Conditionals In Pattern Matching

Of course, in Haskell one would prefer to write the above using a conditional pattern match rather than an if statement. In Rust, I think the if statement reads better in this case, but this is how you write a pattern guard in Rust.

evenOrOdd :: Integer -> String
evenOrOdd n | (n `div` 2) * 2 == n = "even"
evenOrOdd _ = "odd"
fn even_or_odd(n: isize) -> &'static str {
    match n {
        n if (n / 2) * 2 == n => "even",
        _ => "odd",
    }
}

Type Class Constraints

Type class constraints on type variables can be declared in one of two ways in Rust, one of which is a shorthand with certain restrictions. Let's illustrate with another absurdly contrived function.

eqInEnglish :: (Eq a) => a -> a -> String
eqInEnglish left right | left == right = "yes"
eqInEnglish _ _ = "no"
fn eq_in_english<A>(left: A, right: A) -> &'static str where A: Eq {
    if left == right {
        "yes"
    } else {
        "no"
    }
}

The shorthand looks like this, and will only let you declare direct constraints on a type variable. In the majority of cases, this is enough, but the where variant is generally preferred for readability in all cases.

fn eq_in_english<A: Eq>(left: A, right: A) -> &'static str {
    if left == right {
        "yes"
    } else {
        "no"
    }
}

Being Imperative

We already noted that Rust functions take a single expression, but Rust is also, in a very real sense, an imperative language, in that it does not require you to do I/O inside the IO monad, and functions can perform several I/O operations in sequence. How do we reconcile this apparent contradiction?

helloMovieStars :: IO ()
helloMovieStars = do
  putStrLn "Hello Joe!"
  putStrLn "Hello Mike!"
  putStrLn "Hello Robert!"

The Haskell version, as you can see, uses do notation to chain together the three IO monads. The Rust version looks a lot more like an imperative programming language like Java or C++, but the syntax is slightly deceptive. Let's have a look.

fn hello_movie_stars() {
    println!("Hello Joe!");
    println!("Hello Mike!");
    println!("Hello Robert!");
}

This is the first time we encounter the semicolon ; as, seemingly, a statement terminator, like in Java. If there were a return value expected, in Java, you'd expect to see a final return something; statement.

In Rust, the semicolon works a little differently. While, in practice, it can be seen as a statement terminator, what it actually does is combine two expressions and discard the result of the first one, leaving only whatever side effects it might have had. The three semicolons above can thus be read as "evaluate the first print, then the next print, then the final print, then evaluate the final expression and return its value." There's no final expression, or, rather, an empty function body, or nothing following a semicolon, implies the unit value (). As the absence of a -> and a return type also implies the unit type, this works out nicely.

fn hello_joe_and_number() -> isize {
    println!("Hello Joe!");
    25
}

In the above function, what the semicolon is saying is "evaluate the expression to the left of me and the expression to the right of me, and return the result of the right hand side only." You can think of the left hand expression (the println!) as returning an IO monad that is being directly evaluated, if you like. What it actually might return is what the IO monad would bind to, which is a Result type with a () (the unit type in both Rust and Haskell) and an std::io::Error. In the case of the println! macro, though, it would simply panic if there's an error, so the actual return type is just the unit type ().

If you're curious, the number returned by the function is the average number of years since Joe solved the problem you're dealing with right now.

Let Bindings

This leads us to the only thing that isn't an expression inside a function body, and which therefore has to be followed by a semicolon: the let statement.

This is functionally identical to a let binding in Haskell, except that it takes only a single binding. The syntax makes it unproblematic to chain up as many let statements as you like, though. Each let still starts a new scope, as you would if you nested let bindings in Haskell, but this is never going to be a problem.

plusThree :: Integer -> Integer
plusThree n =
  let one = 1
      two = 2
  in n + one + two
fn plus_three(n: i64) -> i64 {
    let one = 1;
    let two = 2;
    n + one + two
}

There's no Rust equivalent to the where binding.

Ownership

We've covered how Rust and Haskell differ in syntax for similar concepts, and how Rust differs slightly in having method syntax for ADTs and type classes. Now it's time to have a look at how they differ.

No Garbage Collector

Rust is not a garbage collected language. It also deals with garbage collection for you, unlike C and C++. The difference is that Rust is able to determine statically when an object needs to be cleaned up. It tracks the lifetime of any object, and when it goes out of scope, or, rather, when the scope that currently owns it goes away, the Rust compiler will automatically insert code to have it cleaned up for you.

A scope, in this context, is either a function body, or an explicit scope surrounded by { and }. You can nest these inside a function body as you like, and you'll note by this logic a function body is a scope delimited by { and } too.

When you create a value, it's owned by the scope in which you created it. When that scope ends, if it still owns the value, the value is deallocated.

{
    let one = 1;
    let two = 2;
    two
}

A scope is also an expression, with the normal evaluation rules, so this scope evaluates to the integer value 2. It creates two values, and bind them to one and two. The value of two is returned, and ownership of it thus passes to the outer scope, or the function's caller, whatever the outer scope is. The value of one, on the other hand, is still owned by the scope when it ends, so it's deallocated ("dropped," as the Rust terminology goes) along with the scope.

Thus, the lifetime of one is equal to the lifetime of the scope. The lifetime of two, on the other hand, depends on the scope that takes over ownership of it, and cannot be determined by looking at this scope in isolation.

(Of course, what really happens in the above example is that the Rust compiler realises that one is never used, and thus never actually constructs the value 1 in the first place. But from the type checker's perspective, 1 is created and dropped inside this scope.)

Passing Ownership

You might remember our Maybe type class from earlier, and the is_just function. It's time to revisit that, because it has a serious problem.

fn is_just<A>(maybe: Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}

When you call it, you actually pass ownership of the argument to the function—and the function does not hand ownership back! All you get back is a bool, either true or false, so in effect the value is lost! This is perhaps a bit unfortunate for a function like this.

What we could do is pass ownership back when we're done with it.

fn is_just<A>(maybe: Maybe<A>) -> (Maybe<A>, bool) {
    match maybe {
        Just(_) => (maybe, true),
        None => (maybe, false),
    }
}

This way, the function returns a tuple of the value you passed in and the boolean you requested, so ownership of the value now passes back to you as the caller, along with ownership of the boolean information you requested.

References

This might seem like a lot.

In Haskell, what happens when you call isJust with a Maybe argument is that you pass a reference to the Maybe into the function, so it doesn't actually consume the input value when you call it. This is also what you'd expect to happen in any other reasonable language.

So is there a way to call a Rust function with an argument that isn't consumed? Yes, by explicitly passing a reference to the value instead of passing the value itself. This is what the & prefix operator (on both types and values) does.

fn is_just<A>(maybe: &Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}

And, to call it without passing ownership:

let maybe = Maybe::Just("Joe");
is_just(&maybe)

If you use the special self argument, it's easier, as you don't even have to think about whether you're passing the argument by reference or not.

impl<A> Maybe<A> {
    fn is_just<A>(&self) -> bool {
        match self {
            Just(_) => true,
            None => false,
        }
    }
}

let maybe = Maybe::Just("Joe");
self.is_just()

The Lifetime Of References

Of course, in order for a reference to a value to be valid, the value itself has to exist. Therefore, references, as well as values, come with a lifetime. This is usually inferred by the type checker in syntax, but when you have to talk about lifetimes explicitly, they look like this: 'a. That is, an apostrophe ' followed by a lower case identifier. In the case of lifetimes, it is idiomatic in Rust to start with a. Yes, I wish they'd be consistent too.

So the lifetime of a reference is equal to the lifetime of the value it refers to. If you try to pass a reference outside of that lifetime, or, in other words, outside of the scope that owns the value, the type checked will complain at you that the reference has outlived the lifetime of its value. For instance, if your function creates a value and try to return a reference to it, you're going to have trouble with the type checker.

fn nope() -> &i64 {
    let three = 3;
    &three
}

You could do this very easily in C, of course, it's the equivalent to allocating a chunk of memory, immediately deallocating it, then returning a now extremely invalid pointer to it. Rust's type checker won't let you do that.

Mutable References

A regular reference is immutable. It means that when you pass a reference to something, all you can do is look at it, not change it. This way, it's safe to have as many references to a value floating around as you like, as long as none of them outlive the lifetime of the value.

There's another type of reference called a mutable reference. This is a way of temporarily passing full ownership to a value, so that you can modify it. At any given point in an object's lifetime, Rust guarantees that only a single mutable reference to it can exist. This means that another thing Rust's type checker will protect you from is accidental mutation.

fn set_if_just<A>(maybe: &mut Maybe<A>, value: A) {
    match maybe {
        Just(just) => {
            *just = value;
        }
        None => {}
    }
}

What this function does is get a mutable reference to the value inside of the Just case of maybe (because maybe is a mutable reference too, so just will be mutable reference). Then we overwrite the value just points to with value. This is what the * is for—it dereferences the reference, allowing us to use the assignment syntax = to put a new value in there.

What happens to the old value? Why, it goes out of scope and is dropped.

[HERE]

To Reiterate

Another adverse reaction I had to Rust initially is that there is not a single persistent data structure in sight. The standard list data type is Vec, which is just a sized array! An array, in this day and age. How do you even recurse over that?

Here's another shocker: Rust doesn't have tail call elimination, so, generally, you do not recurse over data structures. It wasn't designed by complete savages, though, so you won't be expected to increment counters and do bounds checking like an animal. What we have instead is the Iterator type class (or "trait," as the locals insist on calling type classes).

What you're about to see is going to be disturbing. Just like we have the do syntax in Haskell as syntactic sugar over binding warm fuzzy things, it turns out iterators are so common in Rust code that syntactic sugar for iteration becomes desirable. It looks like this:

let the_holy_trinity = ["functors", "applicatives", "warm fuzzy things"];

for item in the_holy_trinity {
    println!("{}", item);
}

I know what you're thinking. I see the for loop too. You have to keep in mind, though, that this expands to a loop which exhausts an Iterator, not a C style increment-counter-until-greater-than crime against all that is mathematically beautiful.

In fact, what this is syntactic sugar for looks like this:

let mut iterator = the_holy_trinity.iter();
while let Some(item) = iterator.next() {
    println!("{}", item);
}

It's still a loop, though, but without tail call elimination, what else can you do? Consider, even, that tail call elimination itself is in a real sense just syntactic sugar for a loop.

Recall how you'd iterate over a list in Haskell—let's write a function which sums the numbers in a list of integers:

sum :: [Integer] -> Integer
sum [] = 0
sum (x:xs) = x + sum xs

I can hear you sneering back there. This is the introductory course, though, it's how you'd write it if Haskell didn't come with a prelude, and it teaches you basic essential ideas like recursion and tail call elimination.

Of course, we're all adults here, we can do better:

sum :: (Foldable f) => f Integer -> Integer
sum = foldr (+) 0

Much better.

sum :: (Foldable f, Monoid m) => f m -> m
sum = fold

Even better!

Now look back at that squicky while loop. That's the equivalent of the Haskell function where we recurse over a list of integers like a sophomore. We can up our game in Rust too, because, just like how Foldable gives you more than just folds, Iterator comes with a lot of methods beyond next(), including, of course, fold().

fn sum<I: IntoIterator<Item=usize>>(i: I) -> usize {
    i.into_iter().fold(0, |a, b| a + b)
}

It's less succinct than the Haskell version, because Rust doesn't do currying, but next to that loop it looks positively delightful. Note that Iterator is doing the work of Foldable here.

We can do monoids in Rust too. First, the semigroup type class is called Add in Rust - anything which implements Add lets us use the + operator. A monoid is a semigroup with a zero value, and we don't have an equivalent type class in Rust, but we do have one for something which has a zero value: Default. Therefore, something which implements both Add and Default is a monoid{{footnote(name="3")}}. Thus:

fn sum<A: Add + Default, I: IntoIterator<Item=A>>(i: I) -> A {
    i.into_iter().fold(Default::default(), |a, b| a + b)
}

Right, but that's still a fold with a function. In Haskell, we can do fold over monoids to sum them. In Rust, we use the Sum trait, which defines a type which can be created by summing an iterator of another type. It's less elegant than fold, honestly, as it's not automatically defined for anything which implements Add, but it lets us do this:

fn sum<B: Sum<A>, I: Iterator<Item=A>>(i: I) -> B {
    i.into_iter().sum()
}

And that's as close as we get to the succinctness of the Haskell version. This isn't code golf, though—the takeaway here is that Iterator actually offers a considerable degree of abstraction beyond crude for loops, so it's not as bad as it seems at first glance. Even better, Rust is really good at optimising those iterators—our sum above, over a Vec, will be easily as fast as a for loop in C doing the same thing.

More to the point, we can manipulate crude arrays (and anything else that implements IntoIterator) with virtually the same elegance we manipulate lists (and anything else that implements Foldable) in Haskell. In this strange new land, it doesn't even matter if our data structures are persistent.

Oh, and by the way, Iterator is lazy. You can easily, say, write an iterator over an infinite stream of monotonically increasing numbers with Rust's std::iter::from_fn function, which is like a weird Rusty unfold:

let mut i: usize = 0;
let numbers_from_1 = std::iter::from_fn(|| {
    i += 1;
    Some(i)
});

Using the above iterator, you can produce a Vec containing a sequence of numbers like this:

let numbers: Vec<usize> = numbers_from_1.skip(5).take(5).collect();
assert_eq!(&[6, 7, 8, 9, 10], &numbers);

Note how collect() above takes the place of the fromFoldable idiom in Haskell: it will collect any Iterator into anything which implements FromIterator.

Let's Get Functorial

You might be wondering, at this point, about functors. You might have guessed that Iterator does indeed provide a [map][iterator::map] method. You might be feeling a little concerned because a.into_iter().map(|a| a).collect() seems like quite a detour just to modify the values inside a Vec, because what that code is doing is deconstructing a and collecting it into an entire new data structure, and while that's usually what goes on in Haskell too, Rust is supposed to be an efficient systems language.

In fact, the right way to map an endofunctor{{footnote(name="4")}} is going to freak you right out:

let mut numbers: Vec<usize> = vec![1, 2, 3, 4, 5];
for number in &mut numbers {
    *number += 1;
}
assert_eq!(&[2, 3, 4, 5, 6], &number);

That's right—a for loop. The &mut numbers bit gets us an iterator which yields mutable references to the elements inside the vector, allowing us to modify them directly. It's blindingly fast, and it's still completely pure.

You could also write it in a more Haskell minded way like this:

numbers.iter_mut().for_each(|number| *number += 1);

They're exact equivalents, down to the generated machine code, but one is generally more readable. And, this is the kicker, it's not the one liner. The one liner isn't even the more succinct one, if you'd like to count the characters.

This was another hard lesson for me: the one liner seems the obviously better piece of code, but to most people without a Haskell eye the for loop will be more readable, and in the end there's—rather shockingly, given the unapologetically imperative feel of the for loop—nothing else that distinguishes the two pieces of code. So you might as well go for the version which will be the least confusing to the largest number of people.

Close Encounters Of The Higher Kind

Hey now, sure, we can mutate an endofunctor in place, but this isn't really a proper functor! To be able to abstract over it, we'll need a way to describe an operation from a functor of A to a functor of B. In other words:

trait Functor {
    fn map<A, B, F: Fn(A) -> B>(input: Self<A>, map: F) -> Self<B>;
}

Unfortunately, I just made that syntax up. Rust, sadly, is currently lacking higher kinded types altogether, so you can't express a relationship like this in its type system.

The good news is that it turns out that Iterator can do most of the heavy lifting you'd expect from a functor hierarchy in practice. For instance, you can, as alluded to earlier, map a functor using the [map][iterator::map] method on Iterator like this:

let numbers: Vec<usize> = vec![1, 2, 3];
let strings: Vec<String> = numbers.into_iter().map(|n| n.to_string()).collect();

In the same vein, instead of your functions taking Functor arguments when you intend to map over them, have them take IntoIterator instead, and if you mean to return a Functor, return your Iterator instead, allowing the user to collect() it, or return a type variable constrained to FromIterator and do the collect() inside your function.

Footnotes

{% define_footnote(name="1") %} Except for the appalling double colon to introduce types, which, luckily, has been corrected in the superior Idris language. {% end %}

{% define_footnote(name="3") %} Someone really should start a petition to add trait Monoid: Add + Default {} to the standard library. {% end %}

{% define_footnote(name="4") %} An endofunctor is just a functor from a category back to the same category. In other words, it's a functor whose mapping function returns a value of the same type as the input value. What's the problem? {% end %}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment