bodil/index.md Secret

## index.md

      
    Raw
  

              index.md
            
          
    I Don't Think We're In Glasgow Any More: A Rust Survival Guide for Haskell Programmers

When I first started learning Rust, the language that was freshest in my mind, and which seemed the
most immediately relevant as a point of reference, was Haskell. In the end, this turned out to be
both a good idea and a terrible one. The two languages are surprisingly similar, so much that as a
Haskell programmer you start out with a significant head start when learning Rust, but if you bring
all your idioms and expectations over from one to the other, you will be writing code that is
genuinely terrible. The trick lies in understanding, and accepting, what makes Rust different from
Haskell, and why what's right in one context isn't always right in another.
This is the guide I wish I'd had back then as a Haskell programmer learning Rust.
A Primer

This chapter aims to teach you, very superficially, how to write code in Rust by leaning on what you
already know about Haskell and explaining the differences.
Basics

I think we can all agree that Haskell possesses the clearest, most beautiful syntax of all
languages{{footnote(name="1")}}. Unfortunately, Rust has gone down a more traditional path. If
you're familiar with C++, Java or any of the other bastard children of Thompson, Kernighan and
Richie, you'll find Rust syntax equally familiar, but underneath all the squiggly brackets hides a
language perhaps more reminiscent of ML than C. Let's have a look.
Functions

Here's a simple function declaration in Haskell, with types.
add :: Integer -> Integer -> Integer
add a b = a + b
And here's the Rust equivalent.
fn add(a: i64, b: i64) -> i64 {
    a + b
}
We note a few things: Rust, being a "systems language," cares a lot about the representation of
numeric types compared to Haskell. There are types for every size of integer: i8, i16, i32,
i64 and even i128, with the equivalent unsigned versions u8 through u128, plus isize and
usize denoting the platform preferred integer size (on 64-bit CPUs, that's 64 bits). There's also
bool for booleans, which can be thought of as 1-bit integers as well as logical values.
Like C++ and Java, Rust mixes type declarations in with the function declaration, but in a momentary
flash of good taste, at least it puts the type after the binding, not before. Type declarations are
also mandatory at this point, though Rust does have type inference generally, and in any case, while
they're not mandatory in Haskell, I think we can all agree they ought to be.
We also note that Rust does not do currying by default. The add function takes two arguments and
anything else would be an error. You could define a function which takes one argument and returns
another function which takes one argument and returns the result, but you're going to have to do the
work yourself.
Finally, we note the absence of the return statement you'd expect in a Java function: in another
flash of good taste, everything that could go inside a Rust function, with the exception of let
statements, is an expression, and a function in Rust takes exactly one expression and returns its
result.
Type Variables

Let's try this again with type variables: here's the ever popular identity function.
id :: a -> a
id a = a
fn id<A>(a: A) -> A {
    a
}
All types in Rust start with an upper case letter, including type variables, but excepting primitive
types like i64 etc. The convention is that type variables have single letter names, and that they
start at T, rather than A, but in my opinion this is misguided and I will start my type
variables at A until I draw my dying breath.
Pattern Matching

Rust has pattern matching at the same level Haskell does, but not on function declarations.
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = n + fib (n - 1)
fn fib(n: usize) -> usize {
    match n {
        0 => 0,
        1 => 1,
        n => n + fib(n - 1),
    }
}
ADTs

Rust has algebraic data types, though it makes the distinction between structs, with a single type
constructor, and enums, with multiple.
data Pair a b = Pair a b
data Maybe a = Just a | None
struct Pair<A, B>(A, B);

enum Maybe<A> {
    Just(A),
    None,
}
Mind that the type constructors inside an enum are scoped to it, not to the scope you're in, so in
order to construct a Just, you would use Maybe::Just(a) rather than Just(a). You could also
import it into the current scope with a use statement, but they're out of scope for this guide.
Pattern Matching ADTs

You can pattern match on these as you'd expect, but note again how Rust requires a match block
because the function syntax doesn't pattern match for you.
isJust :: Maybe a -> Bool
isJust (Just _) = True
isJust None = False
fn is_just<A>(maybe: Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}
Methods

A point where Rust starts to diverge from Haskell, which has stridently resisted anything resembling
object orientedness, is in allowing you to associate functions with ADTs. Going with the Maybe
type we defined above, we can define is_just as a "method" on Maybe like this, with an impl
block.
impl<A> Maybe<A> {
    fn is_just(self) -> bool {
        match maybe {
            Just(_) => true,
            None => false,
        }
    }
}
Note the self argument, which has special meaning: self refers to the instance of Maybe you
call it on, so that maybe.is_just() would be the same as calling is_just(maybe) using our
standalone function declaration above.
If you know Rust, you'll spot a problem with the is_just declaration having to do with ownership,
but we'll get to that later. For now, let's stay focussed on syntax and charge on like ownership
isn't even a thing.
By the way, the Maybe type is in the Rust standard library, except it's called Option. Likewise,
Either exists and is called Result. They work the same way they do in Haskell.
Type Classes

Perhaps the best thing about Rust is that it has type classes in exactly the same way Haskell does,
except they're called "traits."
class Eq a where
  (==) :: a -> a -> Bool

instance (Eq (Maybe a)) where
  Just left == Just right = left == right
  None == None = True
  _ == _ = False
trait Eq<A> {
    fn eq(left: A, right: A) -> bool;
}

impl<A> Eq<A> for Maybe<A> {
    fn eq(left: A, right: A) -> bool {
        match (left, right) {
            (Just(left), Just(right)) => left == right,
            (None, None) => true,
            _ => false,
        }
    }
}
We could also have implemented eq using the self keyword that we know from method declarations,
which is valid in trait impls the same way it is in bare impls.
Of course, Rust already has an Eq type class, and even a PartialEq type class, which is the one
that implements the eq method in reality, because some things, like floating point numbers, have
funny equality rules for some values, like NaN, which is not equal to itself, so when you're
implementing equality for your types, you'll generally define PartialEq for them, and then add
Eq as a marker trait saying your type isn't as funny as floating point numbers and you can always
rely on identical values being equal. You can read about the peculiarities of Eq and PartialEq
in the Rust API docs if you're interested, and there's a similar distinction between Ord and
PartialOrd.
In Rust, like in Haskell, implementing Eq for a type lets you use the == operator on values of
that type, and implementing Ord lets you use >, < and friends. Unlike in Haskell, though,
these operators are built in and inextricably tied to their corresponding type classes, and you
can't declare new ones. The most excruciating example of this to a Haskell programmer is >>=,
which means "bit shift right and assign," and the type signature of the corresponding ShrAssign
type class is incompatible with what you might have wanted it to do.
Conditionals

Rust's conditional statements are identical to Haskell's, save for the syntax.
evenOrOdd :: Integer -> String
evenOrOdd n = if (n `div` 2) * 2 == n then "even" else "odd"
fn even_or_odd(n: isize) -> &'static str {
    if (n / 2) * 2 == n {
        "even"
    } else {
        "odd"
    }
}
You might be wondering about the &'static str. That's a reference with a lifetime, and we'll get
to that shortly. Let's get through the basic syntax first, though.
Conditionals In Pattern Matching

Of course, in Haskell one would prefer to write the above using a conditional pattern match rather
than an if statement. In Rust, I think the if statement reads better in this case, but this is how
you write a pattern guard in Rust.
evenOrOdd :: Integer -> String
evenOrOdd n | (n `div` 2) * 2 == n = "even"
evenOrOdd _ = "odd"
fn even_or_odd(n: isize) -> &'static str {
    match n {
        n if (n / 2) * 2 == n => "even",
        _ => "odd",
    }
}
Type Class Constraints

Type class constraints on type variables can be declared in one of two ways in Rust, one of which is
a shorthand with certain restrictions. Let's illustrate with another absurdly contrived function.
eqInEnglish :: (Eq a) => a -> a -> String
eqInEnglish left right | left == right = "yes"
eqInEnglish _ _ = "no"
fn eq_in_english<A>(left: A, right: A) -> &'static str where A: Eq {
    if left == right {
        "yes"
    } else {
        "no"
    }
}
The shorthand looks like this, and will only let you declare direct constraints on a type variable.
In the majority of cases, this is enough, but the where variant is generally preferred for
readability in all cases.
fn eq_in_english<A: Eq>(left: A, right: A) -> &'static str {
    if left == right {
        "yes"
    } else {
        "no"
    }
}
Being Imperative

We already noted that Rust functions take a single expression, but Rust is also, in a very real
sense, an imperative language, in that it does not require you to do I/O inside the IO monad, and
functions can perform several I/O operations in sequence. How do we reconcile this apparent
contradiction?
helloMovieStars :: IO ()
helloMovieStars = do
  putStrLn "Hello Joe!"
  putStrLn "Hello Mike!"
  putStrLn "Hello Robert!"
The Haskell version, as you can see, uses do notation to chain together the three IO monads. The
Rust version looks a lot more like an imperative programming language like Java or C++, but the
syntax is slightly deceptive. Let's have a look.
fn hello_movie_stars() {
    println!("Hello Joe!");
    println!("Hello Mike!");
    println!("Hello Robert!");
}
This is the first time we encounter the semicolon ; as, seemingly, a statement terminator, like in
Java. If there were a return value expected, in Java, you'd expect to see a final
return something; statement.
In Rust, the semicolon works a little differently. While, in practice, it can be seen as a statement
terminator, what it actually does is combine two expressions and discard the result of the first
one, leaving only whatever side effects it might have had. The three semicolons above can thus be
read as "evaluate the first print, then the next print, then the final print, then evaluate the
final expression and return its value." There's no final expression, or, rather, an empty function
body, or nothing following a semicolon, implies the unit value (). As the absence of a -> and a
return type also implies the unit type, this works out nicely.
fn hello_joe_and_number() -> isize {
    println!("Hello Joe!");
    25
}
In the above function, what the semicolon is saying is "evaluate the expression to the left of me
and the expression to the right of me, and return the result of the right hand side only." You can
think of the left hand expression (the println!) as returning an IO monad that is being directly
evaluated, if you like. What it actually might return is what the IO monad would bind to, which is
a Result type with a () (the unit type in both Rust and Haskell) and an std::io::Error. In the
case of the println! macro, though, it would simply panic if there's an error, so the actual
return type is just the unit type ().
If you're curious, the number returned by the function is the average number of years since Joe
solved the problem you're dealing with right now.
Let Bindings

This leads us to the only thing that isn't an expression inside a function body, and which therefore
has to be followed by a semicolon: the let statement.
This is functionally identical to a let binding in Haskell, except that it takes only a single
binding. The syntax makes it unproblematic to chain up as many let statements as you like, though.
Each let still starts a new scope, as you would if you nested let bindings in Haskell, but this
is never going to be a problem.
plusThree :: Integer -> Integer
plusThree n =
  let one = 1
      two = 2
  in n + one + two
fn plus_three(n: i64) -> i64 {
    let one = 1;
    let two = 2;
    n + one + two
}
There's no Rust equivalent to the where binding.
Ownership

We've covered how Rust and Haskell differ in syntax for similar concepts, and how Rust differs
slightly in having method syntax for ADTs and type classes. Now it's time to have a look at how they
differ.
No Garbage Collector

Rust is not a garbage collected language. It also deals with garbage collection for you, unlike C
and C++. The difference is that Rust is able to determine statically when an object needs to be
cleaned up. It tracks the lifetime of any object, and when it goes out of scope, or, rather, when
the scope that currently owns it goes away, the Rust compiler will automatically insert code to have
it cleaned up for you.
A scope, in this context, is either a function body, or an explicit scope surrounded by { and }.
You can nest these inside a function body as you like, and you'll note by this logic a function body
is a scope delimited by { and } too.
When you create a value, it's owned by the scope in which you created it. When that scope ends, if
it still owns the value, the value is deallocated.
{
    let one = 1;
    let two = 2;
    two
}
A scope is also an expression, with the normal evaluation rules, so this scope evaluates to the
integer value 2. It creates two values, and bind them to one and two. The value of two is
returned, and ownership of it thus passes to the outer scope, or the function's caller, whatever the
outer scope is. The value of one, on the other hand, is still owned by the scope when it ends, so
it's deallocated ("dropped," as the Rust terminology goes) along with the scope.
Thus, the lifetime of one is equal to the lifetime of the scope. The lifetime of two, on the
other hand, depends on the scope that takes over ownership of it, and cannot be determined by
looking at this scope in isolation.
(Of course, what really happens in the above example is that the Rust compiler realises that one
is never used, and thus never actually constructs the value 1 in the first place. But from the
type checker's perspective, 1 is created and dropped inside this scope.)
Passing Ownership

You might remember our Maybe type class from earlier, and the is_just function. It's time to
revisit that, because it has a serious problem.
fn is_just<A>(maybe: Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}
When you call it, you actually pass ownership of the argument to the function—and the function
does not hand ownership back! All you get back is a bool, either true or false, so in effect
the value is lost! This is perhaps a bit unfortunate for a function like this.
What we could do is pass ownership back when we're done with it.
fn is_just<A>(maybe: Maybe<A>) -> (Maybe<A>, bool) {
    match maybe {
        Just(_) => (maybe, true),
        None => (maybe, false),
    }
}
This way, the function returns a tuple of the value you passed in and the boolean you requested, so
ownership of the value now passes back to you as the caller, along with ownership of the boolean
information you requested.
References

This might seem like a lot.
In Haskell, what happens when you call isJust with a Maybe argument is that you pass a
reference to the Maybe into the function, so it doesn't actually consume the input value when
you call it. This is also what you'd expect to happen in any other reasonable language.
So is there a way to call a Rust function with an argument that isn't consumed? Yes, by explicitly
passing a reference to the value instead of passing the value itself. This is what the & prefix
operator (on both types and values) does.
fn is_just<A>(maybe: &Maybe<A>) -> bool {
    match maybe {
        Just(_) => true,
        None => false,
    }
}
And, to call it without passing ownership:
let maybe = Maybe::Just("Joe");
is_just(&maybe)
If you use the special self argument, it's easier, as you don't even have to think about whether
you're passing the argument by reference or not.
impl<A> Maybe<A> {
    fn is_just<A>(&self) -> bool {
        match self {
            Just(_) => true,
            None => false,
        }
    }
}

let maybe = Maybe::Just("Joe");
self.is_just()
The Lifetime Of References

Of course, in order for a reference to a value to be valid, the value itself has to exist.
Therefore, references, as well as values, come with a lifetime. This is usually inferred by the
type checker in syntax, but when you have to talk about lifetimes explicitly, they look like this:
'a. That is, an apostrophe ' followed by a lower case identifier. In the case of lifetimes, it
is idiomatic in Rust to start with a. Yes, I wish they'd be consistent too.
So the lifetime of a reference is equal to the lifetime of the value it refers to. If you try to
pass a reference outside of that lifetime, or, in other words, outside of the scope that owns the
value, the type checked will complain at you that the reference has outlived the lifetime of its
value. For instance, if your function creates a value and try to return a reference to it, you're
going to have trouble with the type checker.
fn nope() -> &i64 {
    let three = 3;
    &three
}
You could do this very easily in C, of course, it's the equivalent to allocating a chunk of memory,
immediately deallocating it, then returning a now extremely invalid pointer to it. Rust's type
checker won't let you do that.
Mutable References

A regular reference is immutable. It means that when you pass a reference to something, all you
can do is look at it, not change it. This way, it's safe to have as many references to a value
floating around as you like, as long as none of them outlive the lifetime of the value.
There's another type of reference called a mutable reference. This is a way of temporarily passing
full ownership to a value, so that you can modify it. At any given point in an object's lifetime,
Rust guarantees that only a single mutable reference to it can exist. This means that another
thing Rust's type checker will protect you from is accidental mutation.
fn set_if_just<A>(maybe: &mut Maybe<A>, value: A) {
    match maybe {
        Just(just) => {
            *just = value;
        }
        None => {}
    }
}
What this function does is get a mutable reference to the value inside of the Just case of maybe
(because maybe is a mutable reference too, so just will be mutable reference). Then we overwrite
the value just points to with value. This is what the * is for—it dereferences the
reference, allowing us to use the assignment syntax = to put a new value in there.
What happens to the old value? Why, it goes out of scope and is dropped.
[HERE]

To Reiterate

Another adverse reaction I had to Rust initially is that there is not a single persistent data
structure in sight. The standard list data type is Vec, which is just a sized array! An
array, in this day and age. How do you even recurse over that?
Here's another shocker: Rust doesn't have tail call elimination, so, generally, you do not recurse
over data structures. It wasn't designed by complete savages, though, so you won't be expected to
increment counters and do bounds checking like an animal. What we have instead is the
Iterator type class (or "trait," as the locals insist on calling type classes).
What you're about to see is going to be disturbing. Just like we have the do syntax in Haskell as
syntactic sugar over binding warm fuzzy things, it turns out iterators are so common in Rust code
that syntactic sugar for iteration becomes desirable. It looks like this:
let the_holy_trinity = ["functors", "applicatives", "warm fuzzy things"];

for item in the_holy_trinity {
    println!("{}", item);
}
I know what you're thinking. I see the for loop too. You have to keep in mind, though, that this
expands to a loop which exhausts an Iterator, not a C style
increment-counter-until-greater-than crime against all that is mathematically beautiful.
In fact, what this is syntactic sugar for looks like this:
let mut iterator = the_holy_trinity.iter();
while let Some(item) = iterator.next() {
    println!("{}", item);
}
It's still a loop, though, but without tail call elimination, what else can you do? Consider,
even, that tail call elimination itself is in a real sense just syntactic sugar for a loop.
Recall how you'd iterate over a list in Haskell—let's write a function which sums the numbers in a
list of integers:
sum :: [Integer] -> Integer
sum [] = 0
sum (x:xs) = x + sum xs
I can hear you sneering back there. This is the introductory course, though, it's how you'd write it
if Haskell didn't come with a prelude, and it teaches you basic essential ideas like recursion and
tail call elimination.
Of course, we're all adults here, we can do better:
sum :: (Foldable f) => f Integer -> Integer
sum = foldr (+) 0
Much better.
sum :: (Foldable f, Monoid m) => f m -> m
sum = fold
Even better!
Now look back at that squicky while loop. That's the equivalent of the Haskell function where we
recurse over a list of integers like a sophomore. We can up our game in Rust too, because, just like
how Foldable gives you more than just folds, Iterator comes with a lot of methods
beyond next(), including, of course, fold().
fn sum<I: IntoIterator<Item=usize>>(i: I) -> usize {
    i.into_iter().fold(0, |a, b| a + b)
}
It's less succinct than the Haskell version, because Rust doesn't do currying, but next to that loop
it looks positively delightful. Note that Iterator is doing the work of Foldable
here.
We can do monoids in Rust too. First, the semigroup type class is called Add in Rust -
anything which implements Add lets us use the + operator. A monoid is a semigroup with a
zero value, and we don't have an equivalent type class in Rust, but we do have one for something
which has a zero value: Default. Therefore, something which implements both
Add and Default is a monoid{{footnote(name="3")}}. Thus:
fn sum<A: Add + Default, I: IntoIterator<Item=A>>(i: I) -> A {
    i.into_iter().fold(Default::default(), |a, b| a + b)
}
Right, but that's still a fold with a function. In Haskell, we can do fold over monoids to sum
them. In Rust, we use the Sum trait, which defines a type which can be created by summing
an iterator of another type. It's less elegant than fold, honestly, as it's not automatically
defined for anything which implements Add, but it lets us do this:
fn sum<B: Sum<A>, I: Iterator<Item=A>>(i: I) -> B {
    i.into_iter().sum()
}
And that's as close as we get to the succinctness of the Haskell version. This isn't code golf,
though—the takeaway here is that Iterator actually offers a considerable degree of
abstraction beyond crude for loops, so it's not as bad as it seems at first glance. Even better,
Rust is really good at optimising those iterators—our sum above, over a Vec, will be
easily as fast as a for loop in C doing the same thing.
More to the point, we can manipulate crude arrays (and anything else that implements
IntoIterator) with virtually the same elegance we manipulate lists (and anything
else that implements Foldable) in Haskell. In this strange new land, it doesn't even matter if our
data structures are persistent.
Oh, and by the way, Iterator is lazy. You can easily, say, write an iterator over an
infinite stream of monotonically increasing numbers with Rust's std::iter::from_fn
function, which is like a weird Rusty unfold:
let mut i: usize = 0;
let numbers_from_1 = std::iter::from_fn(|| {
    i += 1;
    Some(i)
});
Using the above iterator, you can produce a Vec containing a sequence of numbers like this:
let numbers: Vec<usize> = numbers_from_1.skip(5).take(5).collect();
assert_eq!(&[6, 7, 8, 9, 10], &numbers);
Note how collect() above takes the place of the fromFoldable idiom in
Haskell: it will collect any Iterator into anything which implements
FromIterator.
Let's Get Functorial

You might be wondering, at this point, about functors. You might have guessed that
Iterator does indeed provide a [map][iterator::map] method. You might be feeling a
little concerned because a.into_iter().map(|a| a).collect() seems like quite a detour just to
modify the values inside a Vec, because what that code is doing is deconstructing a and
collecting it into an entire new data structure, and while that's usually what goes on in Haskell
too, Rust is supposed to be an efficient systems language.
In fact, the right way to map an endofunctor{{footnote(name="4")}} is going to freak you right out:
let mut numbers: Vec<usize> = vec![1, 2, 3, 4, 5];
for number in &mut numbers {
    *number += 1;
}
assert_eq!(&[2, 3, 4, 5, 6], &number);
That's right—a for loop. The &mut numbers bit gets us an iterator which yields mutable
references to the elements inside the vector, allowing us to modify them directly. It's blindingly
fast, and it's still completely pure.
You could also write it in a more Haskell minded way like this:
numbers.iter_mut().for_each(|number| *number += 1);
They're exact equivalents, down to the generated machine code, but one is generally more readable.
And, this is the kicker, it's not the one liner. The one liner isn't even the more succinct one, if
you'd like to count the characters.
This was another hard lesson for me: the one liner seems the obviously better piece of code, but to
most people without a Haskell eye the for loop will be more readable, and in the end
there's—rather shockingly, given the unapologetically imperative feel of the for loop—nothing else
that distinguishes the two pieces of code. So you might as well go for the version which will be the
least confusing to the largest number of people.
Close Encounters Of The Higher Kind

Hey now, sure, we can mutate an endofunctor in place, but this isn't really a proper functor! To be
able to abstract over it, we'll need a way to describe an operation from a functor of A to a
functor of B. In other words:
trait Functor {
    fn map<A, B, F: Fn(A) -> B>(input: Self<A>, map: F) -> Self<B>;
}
Unfortunately, I just made that syntax up. Rust, sadly, is currently lacking higher kinded types
altogether, so you can't express a relationship like this in its type system.
The good news is that it turns out that Iterator can do most of the heavy lifting
you'd expect from a functor hierarchy in practice. For instance, you can, as alluded to earlier, map
a functor using the [map][iterator::map] method on Iterator like this:
let numbers: Vec<usize> = vec![1, 2, 3];
let strings: Vec<String> = numbers.into_iter().map(|n| n.to_string()).collect();
In the same vein, instead of your functions taking Functor arguments when you intend to map over
them, have them take IntoIterator instead, and if you mean to return a Functor,
return your Iterator instead, allowing the user to collect()
it, or return a type variable constrained to FromIterator and do the
collect() inside your function.
Footnotes

{% define_footnote(name="1") %} Except for the appalling double colon to introduce types, which,
luckily, has been corrected in the superior Idris language. {% end %}
{% define_footnote(name="3") %} Someone really should start a petition to add
trait Monoid: Add + Default {} to the standard library. {% end %}
{% define_footnote(name="4") %} An endofunctor is just a functor from a category back to the same
category. In other words, it's a functor whose mapping function returns a value of the same type as
the input value. What's the problem? {% end %}