Skip to content

Instantly share code, notes, and snippets.

@aidanhs
Last active March 19, 2024 16:01
Show Gist options
  • Star 100 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save aidanhs/5ac9088ca0f6bdd4a370 to your computer and use it in GitHub Desktop.
Save aidanhs/5ac9088ca0f6bdd4a370 to your computer and use it in GitHub Desktop.
Rust binary tree worked example

PLEASE DON'T USE THIS GUIDE

It's over 9 years old (as of 2024-02-18), there are many better guides! You might like https://rust-unofficial.github.io/too-many-lists/

% Let's build a binary tree!

Let's build a binary tree of strings in Rust. To recap, each node in a binary tree:

  1. must have a value
  2. may or may not have left and/or right children

So we can describe a single node like this:

struct Node {
    val: &str,
    l: Option<Node>,
    r: Option<Node>,
}

Unfortunately the above won't compile! The first issue is explained in the Rust book - if a struct will contain borrowed values (val), we must tell the compiler how long they're expected to last. The borrow checker can then enforce that any use of the val field obeys the restriction. The second issue is simply that Rust cannot know how much memory to allocate to the data type if it recursively contains itself. The fix is simply to fill this field with a pointer instead of an instance of the type. Our fixed Node:

struct Node<'a> {
    val: &'a str,
    l: Option<Box<Node<'a>>>,
    r: Option<Box<Node<'a>>>,
}

Now we know what our trees are made up of, but we have no way to build one. Let's create a method to allow insertion of items into a tree. My preference as a rust beginner is to write down the Rust-like pseudocode (i.e. minus mutability and lifetime specifiers) of the logic I'd like to use, then convince the borrow checker that my code is correct.

impl Node {
    pub fn insert(&self, new_val: &str) {
        if self.val == new_val {
            return
        }
        let target_node = if new_val < self.val { &self.l } else { &self.r };
        match target_node {
            &Some(subnode) => subnode.insert(new_val),
            &None => {
                let new_node = Node { val: new_val, l: None, r: None };
                let boxed_node = Some(Box::new(new_node));
                *target_node = boxed_node;
            }
        }
    }
}

There's nothing exciting going on here. We find which side of the current Node the new value should go on, then either add it as a subnode if there isn't one already on that side, or recurse down by calling insert on the subnode.

Needless to say, the borrow checker finds this piece of code very offensive! Since the first compiler complaints are about lifetimes, let's work those out first.

The first error we get is pretty clear (wrong number of lifetime parameters) and easily fixed, right? We're doing an impl for Node, but our type is actually Node<'a>, so tweak that and the problem is solved? Unfortunately not.

Let's consider at a bit of background here. Looking back at the section on 'Ownership' in the rust book, we can see that when creating functions we have to declare the lifetimes as a parameter of the function to bring those lifetime names in scope. This is the key - whenever a lifetime is referenced, it has been declared somewhere beforehand. In fact, this is what we did for the struct when saying that val would last for the lifetime 'a.

Coming back to the problem at hand, we know know we need to bring the lifetime into scope somehow. The correct way to do this in an impl is as a parameter to the impl itself, like so:

impl<'a> Node<'a> {

If we put this in, we now get an error about not being able to infer an appropriate lifetime for new_val when creating a new Node. There's a temptation when seeing an error like this to shove in a lifetime as a parameter to the function, add it to the function argument and hope for the best. In fact, a lot of the time this does work.

Unfortunately you need to take more care when implementing functions for an impl, as there's already a lifetime hanging around - the lifetime for the entire struct. Looking back at the definition of a Node, we've said that the value and sub-Nodes of a Node (and therefore all their values as well) must have the same lifetime as the Node itself. The error is about the value of a sub-Node, so let's make sure it has the same lifetime as the parent Node.

    pub fn insert(&self, new_val: &'a str) {

We've used the lifetime brought in scope by the impl, used as the lifetime of the struct the impl is for, as the lifetime of the value.

Hooray! We've completed a first pass of convincing the borrow-checker that all the values involved live for the correct amount of time. The next stage is to make sure that we can actually modify values, as Rust requires us to explicitly keep track of mutability.

We get two different types of error if we try to compile now. The first talks about moving values out of a borrow - when we assigned the value of target_node, the & operator was used so we took a reference. The pattern we match on attempts to take ownership (i.e. 'move') the value inside the target_node option...which isn't allowed, because we're just borrowing it! The Rust book chapter on patterns mentions the way forward here: ref.

            &Some(ref subnode) => subnode.insert(new_val),

The next error relates to target_node. We want to replace the value being pointed to (*target_node = ...), so we need to make the reference we take be mutable. This has the knock on effect of needing to change our pattern matching, as mut is technically part of the type.

        let target_node = if new_val < self.val { &mut self.l } else { &mut self.r };
        match target_node {
            &mut Some(ref subnode) => subnode.insert(new_val),
            &mut None => {

Attempting to compile now tells us that we can't take a mutable reference to self.l or self.r as self isn't actually mutable. We fix this in the arguments to the function:

    pub fn insert(&mut self, new_val: &'a str) {

As is typical when fixing an error by adding mutability or lifetime constraints, changes are required elsewhere to comply with the new rules. In this case, our ref subnode is not mutable, so you can't call subnode.insert (which now requires self to be mutable). Again referring back to the patterns section of the Rust book, we find our answer:

            &mut Some(ref mut subnode) => subnode.insert(new_val),

It's worth noting that this has worked because we're already taking a mutable reference to the Option - if that was immutable, so would the sub-Node inside be.

You should now find that compilation is successful! The completed code is below along with a few test cases to see it working.

#[derive(PartialEq)]
struct Node<'a> {
    val: &'a str,
    l: Option<Box<Node<'a>>>,
    r: Option<Box<Node<'a>>>,
}
impl<'a> Node<'a> {
    pub fn insert(&mut self, new_val: &'a str) {
        if self.val == new_val {
            return
        }
        let target_node = if new_val < self.val { &mut self.l } else { &mut self.r };
        match target_node {
            &mut Some(ref mut subnode) => subnode.insert(new_val),
            &mut None => {
                let new_node = Node { val: new_val, l: None, r: None };
                let boxed_node = Some(Box::new(new_node));
                *target_node = boxed_node;
            }
        }
    }
}
fn main () {
    let mut x = Node { val: "m", l: None, r: None };
    x.insert("z");
    x.insert("b");
    x.insert("c");
    assert!(x == Node {
        val: "m",
        l: Some(Box::new(Node {
            val: "b",
            l: None,
            r: Some(Box::new(Node { val: "c", l: None, r: None })),
        })),
        r: Some(Box::new(Node { val: "z", l: None, r: None })),
    });
}

In the next instalment we'll look at implementing an iterator and why putting 'a as a lifetime everywhere can be somewhat limiting.

@aidanhs
Copy link
Author

aidanhs commented Feb 19, 2016

Yes you're right, I've corrected that now.

This was actually written with the intent of submitting it to some rust documentation (perhaps rust by example) as I previously struggled with understanding this. However the feedback I got noted that:

  1. The motivating example needs some work - in most cases you'd make a tree of String to simplify lifetimes involved (I couldn't because I wanted to use string slices from structures returned from C).
  2. It's unidiomatic rust, particularly the recursion, but also some other bits and pieces like I should have used match *x { foo => ... } rather than match x { &foo => ... }.
  3. Most structures like this would try and be generic over T, so wouldn't have to deal with lifetimes at all.

In the end I didn't have the motivation to continue with the writing, though I'm sure someone could take the concept and run with it.

The next instalment would have implemented an iterator effectively identical to the one in https://github.com/aidanhs/tclscan/blob/c52df/src/rstcl.rs#L38-L81 and would have introduced the 'b: 'a syntax you can see there in a few places. In short, this says "b will outlive a", which is an important thing for the compiler to know, otherwise a struct like

pub struct TclTokenIter<'a, 'b: 'a> {
    token: &'a TclToken<'b>,
    cur: usize,
}

would be rejected (if a lifetime associated with TclToken was less than the reference lifetime, the reference could become invalid).

If you are interested in documentation like this, I'd suggest posting to https://users.rust-lang.org/ and noting that this could be useful.

@Tower450
Copy link

nice article :) helped a lot!

@tancorko
Copy link

tancorko commented Dec 23, 2016

What do you think about this version?

use std::cmp::Ordering;

#[derive(Debug)]
enum BTree<T: Ord> {
	Leaf {
		v: T,
		l: Box<BTree<T>>,
		r: Box<BTree<T>>,
	},
	Empty,
}

impl<T: Ord> BTree<T> {
	fn new() -> BTree<T> {
		BTree::Empty
	}

	fn insert(&mut self, nv: T) {
		match self {
			&mut BTree::Leaf { ref v, ref mut l, ref mut r } => {
				match nv.cmp(v) {
					Ordering::Less => r.insert(nv),
					Ordering::Greater => l.insert(nv),
					_  => return
				}
			},
			&mut BTree::Empty => {
				*self = BTree::Leaf { v: nv, l: Box::new(BTree::Empty), r: Box::new(BTree::Empty) }
			},
		};
	}

	fn is_empty(&self) -> bool {
		match self {
			&BTree::Leaf { .. } => false,
			&BTree::Empty => true,
		}
	}

	fn find(&self, fv: T) -> bool {
		match self {
			&BTree::Leaf { ref v, ref l, ref r } => {
				match fv.cmp(v) {
					Ordering::Less => r.find(fv),
					Ordering::Greater => l.find(fv),
					_  => true
				}
			},
			&BTree::Empty => false,
		}
	}
}

@VJ-OK
Copy link

VJ-OK commented Feb 18, 2017

When you enter large amount of items, won't it, stack overflow ?

@donaldpipowitch
Copy link

Thank you for the gist!

@nlinker
Copy link

nlinker commented Aug 11, 2017

@tancorko I like this immutable variant 👍

@ishanjain28
Copy link

@VJ-OK It'll Stack overflow. I think, Traversing the tree for correct position to insert the node should be done using a while loop.

@mitnk
Copy link

mitnk commented Jan 25, 2019

@tancorko nice one!

@aka-bash0r
Copy link

aka-bash0r commented Oct 26, 2019

@VJ-OK @ishanjain28 what's missing here is the balancing of the tree. This is nothing you should use in production. As soon as you start to balance your tree, worst case runtime of insertion boils down to O(log_2 n). You'll run out of memory before your stack ever hits its limits.

@yehohanan7
Copy link

may be a bit more simplified

struct Node {
    value: i32,
    left: Option<Box<Node>>,
    right: Option<Box<Node>>,
}

impl Node {
    fn new(value: i32) -> Node {
        Node {
            value: value,
            left: None,
            right: None,
        }
    }

    fn insert(&mut self, value: i32) {
        let new_node = Some(Box::new(Node::new(value)));
        if value < self.value {
            match self.left.as_mut() {
                None => self.left = new_node,
                Some(left) => left.insert(value),
            }
        } else {
            match self.right.as_mut() {
                None => self.right = new_node,
                Some(right) => right.insert(value),
            }
        }
    }

    fn search(&self, target: i32) -> Option<i32> {
        match self.value {
            value if target == value => Some(value),
            value if target < value => self.left.as_ref()?.search(target),
            value if target > value => self.right.as_ref()?.search(target),
            _ => None,
        }
    }
}

@Horki
Copy link

Horki commented Jun 19, 2020

@tancorko I have updated with binary tree traversals

    fn print_preorder(&self) {
        match self {
            &Btree::Empty => return,
            &Btree::Leaf {
                ref data,
                ref left,
                ref right,
            } => {
                print!("{}, ", data);
                left.print_preorder();
                right.print_preorder();
            }
        };
    }
    fn print_postorder(&self) {
        match self {
            &Btree::Empty => return,
            &Btree::Leaf {
                ref data,
                ref left,
                ref right,
            } => {
                left.print_postorder();
                right.print_postorder();
                print!("{}, ", data);
            }
        };
    }

    fn print_inorder(&self) {
        match self {
            &Btree::Empty => return,
            &Btree::Leaf {
                ref data,
                ref left,
                ref right,
            } => {
                left.print_inorder();
                print!("{}, ", data);
                right.print_inorder();
            }
        };
    }
    fn get_left(&self) -> Option<&Box<Btree>> {
        match self {
            &Btree::Empty => None,
            &Btree::Leaf {
                data: _,
                ref left,
                right: _,
            } => Some(left),
        }
    }

    fn get_right(&self) -> Option<&Box<Btree>> {
        match self {
            &Btree::Empty => None,
            &Btree::Leaf {
                data: _,
                left: _,
                ref right,
            } => Some(right),
        }
    }
    fn get_data(&self) -> Option<&i32> {
        match self {
            &Btree::Empty => None,
            &Btree::Leaf {
                ref data,
                left: _,
                right: _,
            } => Some(data),
        }
    }

    fn print_levelorder(&self) {
        match self {
            &Btree::Empty => return,
            &Btree::Leaf {
                data: _,
                left: _,
                right: _,
            } => {
                let mut q = VecDeque::new();
                q.push_front(self);
                while !q.is_empty() {
                    let node = q.pop_front().unwrap();
                    if let Some(data) = node.get_data() {
                        print!("{}, ", data);
                    }
                    if let Some(l) = node.get_left() {
                        q.push_back(l);
                    }
                    if let Some(r) = node.get_right() {
                        q.push_back(r);
                    }
                }
                println!();
            }
        }
    }

@DearYong
Copy link

@Horki Can "print!("{}, ", data);" pass the compiler when data is generic type?

@Horki
Copy link

Horki commented Oct 13, 2020

@Horki Can "print!("{}, ", data);" pass the compiler when data is generic type?

@DearYong, of course, if a generic type has Display trait implemented

@vsraptor
Copy link

vsraptor commented Feb 23, 2021

Another one ... put simplified version here :

https://gist.github.com/vsraptor/96b60b2758919fa34769c8e4d5d49a9f

@kaphula
Copy link

kaphula commented Jan 12, 2022

Very good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment