flying-sheep/r4cpp.md

## r4cpp.md

      
    Raw
  

              r4cpp.md
            
          
    Rust for C++ programmers

part 1: Hello world

This is the first in a series of blog posts (none written yet) which aim to help experienced C++ programmers learn Rust. Expect updates to be sporadic at best. In this first blog post we'll just get setup and do a few super basic things. Much better resources are at the tutorial and reference manual.
First you need to install Rust. You can download a nightly build from http://www.rust-lang.org/install.html (I recommend the nighlties rather than 'stable' versions – the nightlies are stable in that they won't crash too much (no more than the stable versions) and you're going to have to get used to Rust evolving under you sooner or later anyway). Assuming you manage to install things properly, you should then have a rustc command available to you. Test it with rustc -v.
Now for our first program. Create a file, copy and paste the following into it and save it as hello.rs or something equally imaginative.
fn main() {
	println!("Hello world!");
}
Compile this using rustc hello.rs, and then run ./hello. It should display the expected greeting \o/
Two compiler options you should know are -o ex_name to specify the name of the executable and -g to output debug info; you can then debug as expected using gdb or lldb, etc. Use -h to show other options.
OK, back to the code. A few interesting points – we use fn to define a function or method. main() is the default entry point for our programs (we'll leave program args for later). There are no separate declarations or header files as with C++. println! is Rust's equivalent of printf. The ! means that it is a macro, for now you can just treat it like a regular function. A subset of the standard library is available without needing to be explicitly imported/included (we'll talk about that later). The println! macros is included as part of that subset.
Lets change our example a little bit:
fn main() {
	let world = "world";
	println!("Hello {}!", world);
}
let is used to introduce a variable, world is the variable name and it is a string (technically the type is &'static str, but more on that in a later post). We don't need to specify the type, it will be inferred for us.
Using {} in the println! statement is like using %s in printf. In fact, it is a bit more general than that because Rust will try to convert the variable to a string if it is not one already*. You can easily play around with this sort of thing – try multiple strings and using numbers (integer and float literals will work).
If you like, you can explicitly give the type of world:
let world: &'static str = "world";
In C++ we write T x to declare a variable x with type T. In Rust we write x: T, whether in let statements or function signatures, etc. Mostly we omit explicit types in let statements, but they are required for function arguments. Lets add another function to see it work:
fn foo(_x: &'static str) -> &'static str {
    "world"
}

fn main() {
    println!("Hello {}!", foo("bar"));
}
The function foo has a single argument _x which is a string literal (we pass it "bar" from main). We don't actually use that argument in foo. Usually, Rust will warn us about this. By prefixing the argument name with _ we avoid these warnings. In fact, we don't need to name the argument at all, we could just use _.
The return type for a function is given after ->. If the function doesn't return anything (a void function in C++), we don't need to give a return type at all (as in main). If you want to be super-explicit, you can write -> (), () is the void type in Rust. foo returns a string literal.
You don't need the return keyword in Rust, if the last expression in a function body (or any other body, we'll see more of this later) is not finished with a semicolon, then it is the return value. So foo will always return "world". The return keyword still exists so we can do early returns. You can replace "world" with return "world"; and it will have the same effect.

*This is a programmer specified conversion which uses the Show trait, which works a bit like toString in Java. You can also use {:?} which gives a compiler generated representation which is sometimes useful for debugging. As with printf, there are many other options.
an intermission – why Rust

I realise that in terms of learning Rust, I had jumped straight to the 'how' and skipped the 'why'. I guess I am in enough of a Rust bubble that I can't imagine why you wouldn't want to learn it. So, I will make a bit more of an effort to explain why things are how they are. Here I will try to give a bit of an overview/motivation.
If you are using C or C++, it is probably because you have to – either you need low-level access to the system, or need every last drop of performance, or both. Rust aims to do offer the same level of abstraction around memory, the same performance, but be safer and make you more productive.
Concretely, there are many languages out there that you might prefer to use to C++: Java, Scala, Haskell, Python, and so forth, but you can't because either the level of abstraction is too high – you don't get direct access to memory, you are forced to use garbage collection, etc. – or there are performance issues – either performance is unpredictable or its simply not fast enough. Rust does not force you to use garbage collection, and as in C++, you get raw pointers to memory to play with. Rust subscribes to the 'pay for what you use' philosophy of C++. If you don't use a feature, then you don't pay any performance overhead for its existence. Furthermore, all language features in Rust have predictable (and usually small) cost.
Whilst these constraints make Rust a (rare) viable alternative to C++, Rust also has benefits: it is memory safe – Rust's type system ensures that you don't get the kind of memory errors which are common in C++ – memory leaks, accessing un-initialised memory, dangling pointers – all are impossible in Rust. Furthermore, whenever other constraints allow, Rust strives to prevent other safety issues too – for example, all array indexing is bounds checked (of course, if you want to avoid the cost, you can (at the expense of safety) – Rust allows you to do this in unsafe blocks, along with many other unsafe things. Crucially, Rust ensures that unsafety in unsafe blocks stays in unsafe blocks and can't affect the rest of your program). Finally, Rust takes many concepts from modern programming languages and introduces them to the systems language space. Hopefully, that makes programming in Rust more productive, efficient, and enjoyable.
I would like to motivate some of the language features from part 1. Local type inference is convenient and useful without sacrificing safety or performance (it's even in modern versions of C++ now). A minor convenience is that language items are consistently denoted by keyword (fn, let, etc.), this makes scanning by eye or by tools easier, in general the syntax of Rust is simpler and more consistent than C++. The println! macro is safer than printf – the number of arguments is statically checked against the number of 'holes' in the string and the arguments are type checked. This means you can't make the printf mistakes of printing memory as if it had a different type or addressing memory further down the stack by mistake. These are fairly minor things, but I hope they illustrate the philosophy behind the design of Rust.
part 2: control flow

If

The if statement is pretty much the same in Rust as C++. One difference is that the braces are mandatory, but brackets around the expression being tested are not. Another is that if is an expression, so you can use it the same way as the ternary ? operator in C++ (remember from last time that if the last expression in a block is not terminated by a semi-colon, then it becomes the value of the block). There is no ternary ? in Rust. So, the following two functions do the same thing:
fn foo(x: int) -> &'static str {
    let mut result: &'static str;
    if x < 10 {
        result = "less than 10";
    } else {
        result = "10 or more";
    }
    return result;
}

fn bar(x: int) -> &'static str {
    if x < 10 {
        "less than 10"
    } else {
        "10 or more"
    }
}
The first is a fairly literal translation of what you might write in C++. The second is in better Rust style.
You can also write let x = if ..., etc.
Loops

Rust has while loops, again just like C++:
fn main() {
    let mut x = 10;
    while x > 0 {
        println!("Current value: {}", x);
        x -= 1;
    }
}
There is no do...while loop in Rust, but we do have the loop statement which just loops forever:
fn main() {
    loop {
        println!("Just looping");   
    }
}
Rust has break and continue just like C++.
For loops

Rust also has for loops, but these are a bit different. Lets say you have a vector of ints and you want to print them all (we'll cover vectors/arrays, iterators, and generics in more detail in the future. For now, know that a Vec<T> is a sequence of Ts and iter() returns an iterator from anything you might reasonably want to iterate over). A simple for loop would look like:
fn print_all(all: Vec<int>) {
    for a in all.iter() {
        println!("{}", a);
    }
}
If we want to index over the indices of all (a bit more like a standard C++ for loop over an array), you could do
fn print_all(all: Vec<int>) {
    for i in range(0, all.len()) {
        println!("{}: {}", i, all.get(i));
    }
}
Hopefully, it is obvious what the range and len functions do.
Switch/Match

Rust has a match expression which is similar to a C++ switch statement, but much more powerful. This simple version should look pretty familiar:
fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        y => println!("x is something else {}", y),
    }
}
There are some syntactic differences – we use => to go from the matched value to the expression to execute, and the match arms are separated by , (that last , is optional). There are also some semantic differences which are not so obvious: the matched patterns must be exhaustive, that is all possible values of the matched expression (x in the above example) must be covered. Try removing the y => ... line and see what happens; that is because we only have matches for 0, 1, and 10 and obviously there are lots of other ints which don't get matched. In that last arm, y is bound to the value being matched (x in this case). We could also write:
fn print_some(x: int) {
    match x {
        x => println!("x is something else {}", x)
    }
}
Here the x in the match arm introduces a new variable which hides the argument x, just like declaring a variable in an inner scope.
If we don't want to name the variable, we can use _ for an unnamed variable, which is like having a wildcard match. If we don't want to do anything, we can provide an empty branch:
fn print_some(x: int) {
    match x {
        0 => println!("x is zero"),
        1 => println!("x is one"),
        10 => println!("x is ten"),
        _ => {}
    }
}
Another semantic difference is that there is no fall through from one arm to the next.
We'll see in later posts that match is extremely powerful. For now I want to introduce just a couple more features – the 'or' operator for values and if clauses on arms. Hopefully an example is self-explanatory:
fn print_some_more(x: int) {
    match x {
        0 | 1 | 10 => println!("x is one of zero, one, or ten"),
        y if y < 20 => println!("x is less than 20, but not zero, one, or ten"),
        y if y == 200 => println!("x is 200 (but this is not very stylish)"),
        _ => {}
    }
}
Just like if expressions, match statements are actually expressions so we could re-write the last example as:
fn print_some_more(x: int) {
    let msg = match x {
        0 | 1 | 10 => "one of zero, one, or ten",
        y if y < 20 => "less than 20, but not zero, one, or ten",
        y if y == 200 => "200 (but this is not very stylish)",
        _ => "something else"
    };

    println!("x is {}", msg);
}
Note the semi-colon after the closing brace, that is because the let statement is a statement and must take the form let msg = ...;. We fill the rhs with a match expression (which doesn't usually need a semi-colon), but the let statement does. This catches me out all the time.
Motivation: Rust match statements avoid the common bugs with C++ switch statements – you can't forget a break and unintentionally fall through; if you add a case to an enum (more later on) the compiler will make sure it is covered by your match statement.
Method call

Finally, just a quick note that methods exist in Rust, similarly to C++. They are always called via the . operator (no ->, more on this in another post). We saw a few examples above (len, iter). We'll go into more detail in the future about how they are defined and called. Most assumptions you might make from C++ or Java are probably correct.
part 3: primitive types and operators

Rust has pretty much the same arithmetic and logical operators as C++. bool is the same in both languages (as are the true and false literals). Rust has similar concepts of integers, unsigned integers, and floats. However the syntax is a bit different. Rust uses int to mean an integer and uint to mean an unsigned integer. These types are pointer sized. E.g., on a 32 bit system, uint means a 32 bit unsigned integer. Rust also has explicitly sized types which are u or i followed by 8, 16, 32, or 64. So, for example, u8 is an 8 bit unsigned integer and i32 is a 32 bit signed integer. For floats, Rust has f32 and f64 (f128 is coming soon too).
Numeric literals can take suffixes to indicate their type (using i and u instead of int and uint). If no suffix is given, Rust tries to infer the type. If it can't infer, it uses int or f64 (if there is a decimal point). Examples:
fn main() {
    let x: bool = true;
    let x = 34;   // type int
    let x = 34u;  // type uint
    let x: u8 = 34u8;
    let x = 34i64;
    let x = 34f32;
}
As a side note, Rust lets you redefine variables so the above code is legal – each let statement creates a new variable x and hides the previous one. This is more useful than you might expect due to variables being immutable by default.
Numeric literals can be given as binary, octal, and hexadecimal, as well as decimal. Use the 0b, 0o, and 0x prefixes, respectively. You can use an underscore anywhere in a numeric literal and it will be ignored. E.g,
fn main() {
    let x = 12;
    let x = 0b1100;
    let x = 0o14;
    let x = 0xe;
    let y = 0b_1100_0011_1011_0001;
}
Rust has chars and strings, but since they are Unicode, they are a bit different from C++. I'm going to postpone talking about them until after I've introduced pointers, references, and vectors (arrays).
Rust does not implicitly coerce numeric types. In general, Rust has much less implicit coercion and subtyping than C++. Rust uses the as keyword for explicit coercions and casting. Any numeric value can be cast to another numeric type. as cannot be used to convert between booleans and numeric types. E.g.,
fn main() {
    let x = 34u as int;     // cast unsigned int to int
    let x = 10 as f32;      // int to float
    let x = 10.45f64 as i8; // float to int (loses precision)
    let x = 4u8 as u64;     // gains precision
    let x = 400u16 as u8;   // 144, loses precision (and thus changes the value)
    println!("`400u16 as u8` gives {}", x);
    let x = -3i8 as u8;     // 253, signed to unsigned (changes sign)
    println!("`-3i8 as u8` gives {}", x);
    //let x = 45u as bool;  // FAILS!
}
Rust has the following numeric operators: +, -, *, /, %; bitwise operators: |, &, ^, <<, >>; comparison operators: ==, !=, >, <, >=, <=; short-circuit logical operators: ||, &&. All of these behave as in C++, however, Rust is a bit stricter about the types the operators can be applied to – the bitwise operators can only be applied to integers and the logical operators can only be applied to booleans. Rust has the - unary operator which negates a number. The ! operator negates a boolean and inverts every bit on an integer type (equivalent to ~ in C++ in the latter case). Rust has compound assignment operators as in C++, e.g., +=, but does not have increment or decrement operators (e.g., ++).
part 4: unique pointers

Rust is a systems language and therefore must give you raw access to memory. It does this (as in C++) via pointers. Pointers are one area where Rust and C++ are very different, both in syntax and semantics. Rust enforces memory safety by type checking pointers. That is one of its major advantages over other languages. Although the type system is a bit complex, you get memory safety and bare-metal performance in return.
I had intended to cover all of Rust's pointers in one post, but I think the subject is too large. So this post will cover just one kind – unique pointers – and other kinds will be covered in follow up posts.
First, an example without pointers:
fn foo() {
    let x = 75;

    // ... do something with `x` ...
}
When we reach the end of foo, x goes out of scope (in Rust as in C++). That means the variable can no longer be accessed and the memory for the variable can be reused.
In Rust, for every type T we can write ~T for an owning (aka unique) pointer to T. We use the box keyword to allocate space on the heap and initialise that space with the supplied value (this has very recently changed from using ~ for allocation too). This is similar to new in C++. For example,
fn foo() {
    let x = box 75;
}
Here x is a pointer to a location on the heap which contains the value 75. x has type ~int; we could have written let x: ~int = box 75;. This is similar to writing int* x = new int(75); in C++. Unlike in C++, Rust will tidy up the memory for us, so there is no need to call free or delete. Unique pointers behave similarly to values – they are deleted when the variable goes out of scope. In our example, at the end of the function foo, x can no longer be accessed and the memory pointed at by x can be reused.
Owning pointers are dereferenced using the * as in C++. E.g.,
fn foo() {
    let x = box 75;
    println!("`x` points to {}", *x);
}
As with primitive types in Rust, owning pointers and the data they point to are immutable by default. Unlike C, you can't have a mutable (unique) pointer to immutable data or vice-versa. Mutability of the data follows from the pointer. E.g.,
fn foo() {
    let x = box 75;
    let y = box 42;
    // x = y;         // Not allowed, x is immutable.
    // *x = 43;       // Not allowed, *x is immutable.
    let mut x = box 75;
    x = y;            // OK, x is mutable.
    *x = 43;          // OK, *x is mutable.
}
Owning pointers can be returned from a function and continue to live on. If they are returned, then their memory will not be freed, i.e., there are no dangling pointers in Rust. The memory will not leak however, eventually it must go out of scope and then it will be free. E.g.,
fn foo() -> ~int {
    let x = box 75;
    x
}

fn bar() {
    let y = foo();
    // ... use y ...
}
Here, memory is initialised in foo, and returned to bar. x is returned from foo and stored in y, so it is not deleted. At the end of bar, y goes out of scope and so the memory is reclaimed.
Owning pointers are unique (also called linear) because there can be only one (owning) pointer to any piece of memory at any time. This is accomplished by move semantics. When one pointer points at a value, any previous pointer can no longer be accessed. E.g.,
fn foo() {
    let x = box 75;
    let y = x;
    // x can no longer be accessed
    // let z = *x;   // Error.
}
Likewise, if an owning pointer is passed to another function or stored in a field it can no longer be accessed:
fn bar(y: ~int) {}

fn foo() {
    let x = box 75;
    bar(x);
    // x can no longer be accessed
    // let z = *x;   // Error.
}
Rust's unique pointers are similar to C++ std::unique_ptrs. In Rust, as in C++, there can be only one unique pointer to a value and that value is deleted when the pointer goes out of scope. Rust does most of its checking statically rather than at runtime. So, in C++ accessing a unique pointer whose value has moved will result in a runtime error (since it will be null). In Rust this produces a compile time error and you cannot go wrong at runtime.
We'll see later that it is possible to create other pointer types which point at a unique pointer's value in Rust. This is similar to C++. However, in C++ this allows you to cause errors at runtime by holding a pointer to freed memory. That is not possible in Rust (we'll see how when we cover Rust's other pointer types).
As shown above, owning pointers must be dereferenced to use their values. However, method calls automatically dereference, so there is no need for a -> operator or to use * for method calls. In this way, Rust pointers are a bit similar to both pointers and references in C++. E.g.,
fn bar(x: ~Foo, y: ~~~~~Foo) {
    x.foo();
    y.foo();
}
Assuming that the type Foo has a method foo(), both these expressions are OK.
Using the box operator on an existing value does not take a reference to that value, it copies that value. So,
fn foo() {
    let x = 3;
    let mut y = box x;
    *y = 45;
    println!("x is still {}", x);
}
In general, Rust has move rather than copy syntax (as seen above with unique pointers). Primitive types have copy semantics, so in the above example the value 3 is copied, but for more complex values it would be moved. We'll cover this in more detail later.
Sometimes when programming, however, we need more than one reference to a value. For that, Rust has borrowed pointers. I'll cover those in the next post.