When I first started learning Rust, the language that was freshest in my mind, and which seemed the most immediately relevant as a point of reference, was Haskell. In the end, this turned out to be both a good idea and a terrible one. The two languages are surprisingly similar, so much that as a Haskell programmer you start out with a significant head start when learning Rust, but if you bring all your idioms and expectations over from one to the other, you will be writing code that is genuinely terrible. The trick lies in understanding, and accepting, what makes Rust different from Haskell, and why what's right in one context isn't always right in another.
This is the guide I wish I'd had back then as a Haskell programmer learning Rust.
This chapter aims to teach you, very superficially, how to write code in Rust by leaning on what you already know about Haskell and explaining the differences.
I think we can all agree that Haskell possesses the clearest, most beautiful syntax of all languages{{footnote(name="1")}}. Unfortunately, Rust has gone down a more traditional path. If you're familiar with C++, Java or any of the other bastard children of Thompson, Kernighan and Richie, you'll find Rust syntax equally familiar, but underneath all the squiggly brackets hides a language perhaps more reminiscent of ML than C. Let's have a look.
Here's a simple function declaration in Haskell, with types.
add :: Integer -> Integer -> Integer
add a b = a + b
And here's the Rust equivalent.
fn add(a: i64, b: i64) -> i64 {
a + b
}
We note a few things: Rust, being a "systems language," cares a lot about the representation of
numeric types compared to Haskell. There are types for every size of integer: i8
, i16
, i32
,
i64
and even i128
, with the equivalent unsigned versions u8
through u128
, plus isize
and
usize
denoting the platform preferred integer size (on 64-bit CPUs, that's 64 bits). There's also
bool
for booleans, which can be thought of as 1-bit integers as well as logical values.
Like C++ and Java, Rust mixes type declarations in with the function declaration, but in a momentary flash of good taste, at least it puts the type after the binding, not before. Type declarations are also mandatory at this point, though Rust does have type inference generally, and in any case, while they're not mandatory in Haskell, I think we can all agree they ought to be.
We also note that Rust does not do currying by default. The add
function takes two arguments and
anything else would be an error. You could define a function which takes one argument and returns
another function which takes one argument and returns the result, but you're going to have to do the
work yourself.
Finally, we note the absence of the return
statement you'd expect in a Java function: in another
flash of good taste, everything that could go inside a Rust function, with the exception of let
statements, is an expression, and a function in Rust takes exactly one expression and returns its
result.
Let's try this again with type variables: here's the ever popular identity function.
id :: a -> a
id a = a
fn id<A>(a: A) -> A {
a
}
All types in Rust start with an upper case letter, including type variables, but excepting primitive
types like i64
etc. The convention is that type variables have single letter names, and that they
start at T
, rather than A
, but in my opinion this is misguided and I will start my type
variables at A
until I draw my dying breath.
Rust has pattern matching at the same level Haskell does, but not on function declarations.
fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = n + fib (n - 1)
fn fib(n: usize) -> usize {
match n {
0 => 0,
1 => 1,
n => n + fib(n - 1),
}
}
Rust has algebraic data types, though it makes the distinction between struct
s, with a single type
constructor, and enum
s, with multiple.
data Pair a b = Pair a b
data Maybe a = Just a | None
struct Pair<A, B>(A, B);
enum Maybe<A> {
Just(A),
None,
}
Mind that the type constructors inside an enum
are scoped to it, not to the scope you're in, so in
order to construct a Just
, you would use Maybe::Just(a)
rather than Just(a)
. You could also
import it into the current scope with a use
statement, but they're out of scope for this guide.
You can pattern match on these as you'd expect, but note again how Rust requires a match
block
because the function syntax doesn't pattern match for you.
isJust :: Maybe a -> Bool
isJust (Just _) = True
isJust None = False
fn is_just<A>(maybe: Maybe<A>) -> bool {
match maybe {
Just(_) => true,
None => false,
}
}
A point where Rust starts to diverge from Haskell, which has stridently resisted anything resembling
object orientedness, is in allowing you to associate functions with ADTs. Going with the Maybe
type we defined above, we can define is_just
as a "method" on Maybe
like this, with an impl
block.
impl<A> Maybe<A> {
fn is_just(self) -> bool {
match maybe {
Just(_) => true,
None => false,
}
}
}
Note the self
argument, which has special meaning: self
refers to the instance of Maybe
you
call it on, so that maybe.is_just()
would be the same as calling is_just(maybe)
using our
standalone function declaration above.
If you know Rust, you'll spot a problem with the is_just
declaration having to do with ownership,
but we'll get to that later. For now, let's stay focussed on syntax and charge on like ownership
isn't even a thing.
By the way, the Maybe
type is in the Rust standard library, except it's called Option
. Likewise,
Either
exists and is called Result
. They work the same way they do in Haskell.
Perhaps the best thing about Rust is that it has type classes in exactly the same way Haskell does, except they're called "traits."
class Eq a where
(==) :: a -> a -> Bool
instance (Eq (Maybe a)) where
Just left == Just right = left == right
None == None = True
_ == _ = False
trait Eq<A> {
fn eq(left: A, right: A) -> bool;
}
impl<A> Eq<A> for Maybe<A> {
fn eq(left: A, right: A) -> bool {
match (left, right) {
(Just(left), Just(right)) => left == right,
(None, None) => true,
_ => false,
}
}
}
We could also have implemented eq
using the self
keyword that we know from method declarations,
which is valid in trait impl
s the same way it is in bare impl
s.
Of course, Rust already has an Eq
type class, and even a PartialEq
type class, which is the one
that implements the eq
method in reality, because some things, like floating point numbers, have
funny equality rules for some values, like NaN
, which is not equal to itself, so when you're
implementing equality for your types, you'll generally define PartialEq
for them, and then add
Eq
as a marker trait saying your type isn't as funny as floating point numbers and you can always
rely on identical values being equal. You can read about the peculiarities of Eq
and PartialEq
in the Rust API docs if you're interested, and there's a similar distinction between Ord
and
PartialOrd
.
In Rust, like in Haskell, implementing Eq
for a type lets you use the ==
operator on values of
that type, and implementing Ord
lets you use >
, <
and friends. Unlike in Haskell, though,
these operators are built in and inextricably tied to their corresponding type classes, and you
can't declare new ones. The most excruciating example of this to a Haskell programmer is >>=
,
which means "bit shift right and assign," and the type signature of the corresponding ShrAssign
type class is incompatible with what you might have wanted it to do.
Rust's conditional statements are identical to Haskell's, save for the syntax.
evenOrOdd :: Integer -> String
evenOrOdd n = if (n `div` 2) * 2 == n then "even" else "odd"
fn even_or_odd(n: isize) -> &'static str {
if (n / 2) * 2 == n {
"even"
} else {
"odd"
}
}
You might be wondering about the &'static str
. That's a reference with a lifetime, and we'll get
to that shortly. Let's get through the basic syntax first, though.
Of course, in Haskell one would prefer to write the above using a conditional pattern match rather
than an if
statement. In Rust, I think the if statement reads better in this case, but this is how
you write a pattern guard in Rust.
evenOrOdd :: Integer -> String
evenOrOdd n | (n `div` 2) * 2 == n = "even"
evenOrOdd _ = "odd"
fn even_or_odd(n: isize) -> &'static str {
match n {
n if (n / 2) * 2 == n => "even",
_ => "odd",
}
}
Type class constraints on type variables can be declared in one of two ways in Rust, one of which is a shorthand with certain restrictions. Let's illustrate with another absurdly contrived function.
eqInEnglish :: (Eq a) => a -> a -> String
eqInEnglish left right | left == right = "yes"
eqInEnglish _ _ = "no"
fn eq_in_english<A>(left: A, right: A) -> &'static str where A: Eq {
if left == right {
"yes"
} else {
"no"
}
}
The shorthand looks like this, and will only let you declare direct constraints on a type variable.
In the majority of cases, this is enough, but the where
variant is generally preferred for
readability in all cases.
fn eq_in_english<A: Eq>(left: A, right: A) -> &'static str {
if left == right {
"yes"
} else {
"no"
}
}
We already noted that Rust functions take a single expression, but Rust is also, in a very real
sense, an imperative language, in that it does not require you to do I/O inside the IO
monad, and
functions can perform several I/O operations in sequence. How do we reconcile this apparent
contradiction?
helloMovieStars :: IO ()
helloMovieStars = do
putStrLn "Hello Joe!"
putStrLn "Hello Mike!"
putStrLn "Hello Robert!"
The Haskell version, as you can see, uses do
notation to chain together the three IO
monads. The
Rust version looks a lot more like an imperative programming language like Java or C++, but the
syntax is slightly deceptive. Let's have a look.
fn hello_movie_stars() {
println!("Hello Joe!");
println!("Hello Mike!");
println!("Hello Robert!");
}
This is the first time we encounter the semicolon ;
as, seemingly, a statement terminator, like in
Java. If there were a return value expected, in Java, you'd expect to see a final
return something;
statement.
In Rust, the semicolon works a little differently. While, in practice, it can be seen as a statement
terminator, what it actually does is combine two expressions and discard the result of the first
one, leaving only whatever side effects it might have had. The three semicolons above can thus be
read as "evaluate the first print, then the next print, then the final print, then evaluate the
final expression and return its value." There's no final expression, or, rather, an empty function
body, or nothing following a semicolon, implies the unit value ()
. As the absence of a ->
and a
return type also implies the unit type, this works out nicely.
fn hello_joe_and_number() -> isize {
println!("Hello Joe!");
25
}
In the above function, what the semicolon is saying is "evaluate the expression to the left of me
and the expression to the right of me, and return the result of the right hand side only." You can
think of the left hand expression (the println!
) as returning an IO
monad that is being directly
evaluated, if you like. What it actually might return is what the IO
monad would bind to, which is
a Result
type with a ()
(the unit type in both Rust and Haskell) and an std::io::Error
. In the
case of the println!
macro, though, it would simply panic if there's an error, so the actual
return type is just the unit type ()
.
If you're curious, the number returned by the function is the average number of years since Joe solved the problem you're dealing with right now.
This leads us to the only thing that isn't an expression inside a function body, and which therefore
has to be followed by a semicolon: the let
statement.
This is functionally identical to a let
binding in Haskell, except that it takes only a single
binding. The syntax makes it unproblematic to chain up as many let
statements as you like, though.
Each let
still starts a new scope, as you would if you nested let
bindings in Haskell, but this
is never going to be a problem.
plusThree :: Integer -> Integer
plusThree n =
let one = 1
two = 2
in n + one + two
fn plus_three(n: i64) -> i64 {
let one = 1;
let two = 2;
n + one + two
}
There's no Rust equivalent to the where
binding.
We've covered how Rust and Haskell differ in syntax for similar concepts, and how Rust differs slightly in having method syntax for ADTs and type classes. Now it's time to have a look at how they differ.
Rust is not a garbage collected language. It also deals with garbage collection for you, unlike C and C++. The difference is that Rust is able to determine statically when an object needs to be cleaned up. It tracks the lifetime of any object, and when it goes out of scope, or, rather, when the scope that currently owns it goes away, the Rust compiler will automatically insert code to have it cleaned up for you.
A scope, in this context, is either a function body, or an explicit scope surrounded by {
and }
.
You can nest these inside a function body as you like, and you'll note by this logic a function body
is a scope delimited by {
and }
too.
When you create a value, it's owned by the scope in which you created it. When that scope ends, if it still owns the value, the value is deallocated.
{
let one = 1;
let two = 2;
two
}
A scope is also an expression, with the normal evaluation rules, so this scope evaluates to the
integer value 2
. It creates two values, and bind them to one
and two
. The value of two
is
returned, and ownership of it thus passes to the outer scope, or the function's caller, whatever the
outer scope is. The value of one
, on the other hand, is still owned by the scope when it ends, so
it's deallocated ("dropped," as the Rust terminology goes) along with the scope.
Thus, the lifetime of one
is equal to the lifetime of the scope. The lifetime of two
, on the
other hand, depends on the scope that takes over ownership of it, and cannot be determined by
looking at this scope in isolation.
(Of course, what really happens in the above example is that the Rust compiler realises that one
is never used, and thus never actually constructs the value 1
in the first place. But from the
type checker's perspective, 1
is created and dropped inside this scope.)
You might remember our Maybe
type class from earlier, and the is_just
function. It's time to
revisit that, because it has a serious problem.
fn is_just<A>(maybe: Maybe<A>) -> bool {
match maybe {
Just(_) => true,
None => false,
}
}
When you call it, you actually pass ownership of the argument to the function—and the function
does not hand ownership back! All you get back is a bool
, either true
or false
, so in effect
the value is lost! This is perhaps a bit unfortunate for a function like this.
What we could do is pass ownership back when we're done with it.
fn is_just<A>(maybe: Maybe<A>) -> (Maybe<A>, bool) {
match maybe {
Just(_) => (maybe, true),
None => (maybe, false),
}
}
This way, the function returns a tuple of the value you passed in and the boolean you requested, so ownership of the value now passes back to you as the caller, along with ownership of the boolean information you requested.
This might seem like a lot.
In Haskell, what happens when you call isJust
with a Maybe
argument is that you pass a
reference to the Maybe
into the function, so it doesn't actually consume the input value when
you call it. This is also what you'd expect to happen in any other reasonable language.
So is there a way to call a Rust function with an argument that isn't consumed? Yes, by explicitly
passing a reference to the value instead of passing the value itself. This is what the &
prefix
operator (on both types and values) does.
fn is_just<A>(maybe: &Maybe<A>) -> bool {
match maybe {
Just(_) => true,
None => false,
}
}
And, to call it without passing ownership:
let maybe = Maybe::Just("Joe");
is_just(&maybe)
If you use the special self
argument, it's easier, as you don't even have to think about whether
you're passing the argument by reference or not.
impl<A> Maybe<A> {
fn is_just<A>(&self) -> bool {
match self {
Just(_) => true,
None => false,
}
}
}
let maybe = Maybe::Just("Joe");
self.is_just()
Of course, in order for a reference to a value to be valid, the value itself has to exist.
Therefore, references, as well as values, come with a lifetime. This is usually inferred by the
type checker in syntax, but when you have to talk about lifetimes explicitly, they look like this:
'a
. That is, an apostrophe '
followed by a lower case identifier. In the case of lifetimes, it
is idiomatic in Rust to start with a
. Yes, I wish they'd be consistent too.
So the lifetime of a reference is equal to the lifetime of the value it refers to. If you try to pass a reference outside of that lifetime, or, in other words, outside of the scope that owns the value, the type checked will complain at you that the reference has outlived the lifetime of its value. For instance, if your function creates a value and try to return a reference to it, you're going to have trouble with the type checker.
fn nope() -> &i64 {
let three = 3;
&three
}
You could do this very easily in C, of course, it's the equivalent to allocating a chunk of memory, immediately deallocating it, then returning a now extremely invalid pointer to it. Rust's type checker won't let you do that.
A regular reference is immutable. It means that when you pass a reference to something, all you can do is look at it, not change it. This way, it's safe to have as many references to a value floating around as you like, as long as none of them outlive the lifetime of the value.
There's another type of reference called a mutable reference. This is a way of temporarily passing full ownership to a value, so that you can modify it. At any given point in an object's lifetime, Rust guarantees that only a single mutable reference to it can exist. This means that another thing Rust's type checker will protect you from is accidental mutation.
fn set_if_just<A>(maybe: &mut Maybe<A>, value: A) {
match maybe {
Just(just) => {
*just = value;
}
None => {}
}
}
What this function does is get a mutable reference to the value inside of the Just
case of maybe
(because maybe
is a mutable reference too, so just
will be mutable reference). Then we overwrite
the value just
points to with value
. This is what the *
is for—it dereferences the
reference, allowing us to use the assignment syntax =
to put a new value in there.
What happens to the old value? Why, it goes out of scope and is dropped.
Another adverse reaction I had to Rust initially is that there is not a single persistent data
structure in sight. The standard list data type is Vec
, which is just a sized array! An
array, in this day and age. How do you even recurse over that?
Here's another shocker: Rust doesn't have tail call elimination, so, generally, you do not recurse
over data structures. It wasn't designed by complete savages, though, so you won't be expected to
increment counters and do bounds checking like an animal. What we have instead is the
Iterator
type class (or "trait," as the locals insist on calling type classes).
What you're about to see is going to be disturbing. Just like we have the do
syntax in Haskell as
syntactic sugar over binding warm fuzzy things, it turns out iterators are so common in Rust code
that syntactic sugar for iteration becomes desirable. It looks like this:
let the_holy_trinity = ["functors", "applicatives", "warm fuzzy things"];
for item in the_holy_trinity {
println!("{}", item);
}
I know what you're thinking. I see the for
loop too. You have to keep in mind, though, that this
expands to a loop which exhausts an Iterator
, not a C style
increment-counter-until-greater-than crime against all that is mathematically beautiful.
In fact, what this is syntactic sugar for looks like this:
let mut iterator = the_holy_trinity.iter();
while let Some(item) = iterator.next() {
println!("{}", item);
}
It's still a loop, though, but without tail call elimination, what else can you do? Consider, even, that tail call elimination itself is in a real sense just syntactic sugar for a loop.
Recall how you'd iterate over a list in Haskell—let's write a function which sums the numbers in a list of integers:
sum :: [Integer] -> Integer
sum [] = 0
sum (x:xs) = x + sum xs
I can hear you sneering back there. This is the introductory course, though, it's how you'd write it if Haskell didn't come with a prelude, and it teaches you basic essential ideas like recursion and tail call elimination.
Of course, we're all adults here, we can do better:
sum :: (Foldable f) => f Integer -> Integer
sum = foldr (+) 0
Much better.
sum :: (Foldable f, Monoid m) => f m -> m
sum = fold
Even better!
Now look back at that squicky while
loop. That's the equivalent of the Haskell function where we
recurse over a list of integers like a sophomore. We can up our game in Rust too, because, just like
how Foldable
gives you more than just folds, Iterator
comes with a lot of methods
beyond next()
, including, of course, fold()
.
fn sum<I: IntoIterator<Item=usize>>(i: I) -> usize {
i.into_iter().fold(0, |a, b| a + b)
}
It's less succinct than the Haskell version, because Rust doesn't do currying, but next to that loop
it looks positively delightful. Note that Iterator
is doing the work of Foldable
here.
We can do monoids in Rust too. First, the semigroup type class is called Add
in Rust -
anything which implements Add
lets us use the +
operator. A monoid is a semigroup with a
zero value, and we don't have an equivalent type class in Rust, but we do have one for something
which has a zero value: Default
. Therefore, something which implements both
Add
and Default
is a monoid{{footnote(name="3")}}. Thus:
fn sum<A: Add + Default, I: IntoIterator<Item=A>>(i: I) -> A {
i.into_iter().fold(Default::default(), |a, b| a + b)
}
Right, but that's still a fold with a function. In Haskell, we can do fold
over monoids to sum
them. In Rust, we use the Sum
trait, which defines a type which can be created by summing
an iterator of another type. It's less elegant than fold
, honestly, as it's not automatically
defined for anything which implements Add
, but it lets us do this:
fn sum<B: Sum<A>, I: Iterator<Item=A>>(i: I) -> B {
i.into_iter().sum()
}
And that's as close as we get to the succinctness of the Haskell version. This isn't code golf,
though—the takeaway here is that Iterator
actually offers a considerable degree of
abstraction beyond crude for
loops, so it's not as bad as it seems at first glance. Even better,
Rust is really good at optimising those iterators—our sum
above, over a Vec
, will be
easily as fast as a for
loop in C doing the same thing.
More to the point, we can manipulate crude arrays (and anything else that implements
IntoIterator
) with virtually the same elegance we manipulate lists (and anything
else that implements Foldable
) in Haskell. In this strange new land, it doesn't even matter if our
data structures are persistent.
Oh, and by the way, Iterator
is lazy. You can easily, say, write an iterator over an
infinite stream of monotonically increasing numbers with Rust's std::iter::from_fn
function, which is like a weird Rusty unfold
:
let mut i: usize = 0;
let numbers_from_1 = std::iter::from_fn(|| {
i += 1;
Some(i)
});
Using the above iterator, you can produce a Vec
containing a sequence of numbers like this:
let numbers: Vec<usize> = numbers_from_1.skip(5).take(5).collect();
assert_eq!(&[6, 7, 8, 9, 10], &numbers);
Note how collect()
above takes the place of the fromFoldable
idiom in
Haskell: it will collect any Iterator
into anything which implements
FromIterator
.
You might be wondering, at this point, about functors. You might have guessed that
Iterator
does indeed provide a [map
][iterator::map] method. You might be feeling a
little concerned because a.into_iter().map(|a| a).collect()
seems like quite a detour just to
modify the values inside a Vec
, because what that code is doing is deconstructing a
and
collecting it into an entire new data structure, and while that's usually what goes on in Haskell
too, Rust is supposed to be an efficient systems language.
In fact, the right way to map an endofunctor{{footnote(name="4")}} is going to freak you right out:
let mut numbers: Vec<usize> = vec![1, 2, 3, 4, 5];
for number in &mut numbers {
*number += 1;
}
assert_eq!(&[2, 3, 4, 5, 6], &number);
That's right—a for
loop. The &mut numbers
bit gets us an iterator which yields mutable
references to the elements inside the vector, allowing us to modify them directly. It's blindingly
fast, and it's still completely pure.
You could also write it in a more Haskell minded way like this:
numbers.iter_mut().for_each(|number| *number += 1);
They're exact equivalents, down to the generated machine code, but one is generally more readable. And, this is the kicker, it's not the one liner. The one liner isn't even the more succinct one, if you'd like to count the characters.
This was another hard lesson for me: the one liner seems the obviously better piece of code, but to
most people without a Haskell eye the for
loop will be more readable, and in the end
there's—rather shockingly, given the unapologetically imperative feel of the for
loop—nothing else
that distinguishes the two pieces of code. So you might as well go for the version which will be the
least confusing to the largest number of people.
Hey now, sure, we can mutate an endofunctor in place, but this isn't really a proper functor! To be
able to abstract over it, we'll need a way to describe an operation from a functor of A
to a
functor of B
. In other words:
trait Functor {
fn map<A, B, F: Fn(A) -> B>(input: Self<A>, map: F) -> Self<B>;
}
Unfortunately, I just made that syntax up. Rust, sadly, is currently lacking higher kinded types altogether, so you can't express a relationship like this in its type system.
The good news is that it turns out that Iterator
can do most of the heavy lifting
you'd expect from a functor hierarchy in practice. For instance, you can, as alluded to earlier, map
a functor using the [map
][iterator::map] method on Iterator
like this:
let numbers: Vec<usize> = vec![1, 2, 3];
let strings: Vec<String> = numbers.into_iter().map(|n| n.to_string()).collect();
In the same vein, instead of your functions taking Functor
arguments when you intend to map over
them, have them take IntoIterator
instead, and if you mean to return a Functor
,
return your Iterator
instead, allowing the user to collect()
it, or return a type variable constrained to FromIterator
and do the
collect()
inside your function.
{% define_footnote(name="1") %} Except for the appalling double colon to introduce types, which, luckily, has been corrected in the superior Idris language. {% end %}
{% define_footnote(name="3") %} Someone really should start a petition to add
trait Monoid: Add + Default {}
to the standard library. {% end %}
{% define_footnote(name="4") %} An endofunctor is just a functor from a category back to the same category. In other words, it's a functor whose mapping function returns a value of the same type as the input value. What's the problem? {% end %}