Skip to content

Instantly share code, notes, and snippets.

@yawaramin
Forked from matthieubulte/avl.ml
Created June 20, 2020 03:25
Show Gist options
  • Save yawaramin/2b86cee1dd3cd3a678747e4038298743 to your computer and use it in GitHub Desktop.
Save yawaramin/2b86cee1dd3cd3a678747e4038298743 to your computer and use it in GitHub Desktop.
(* Good morning everyone, I'm currently learning ocaml for one of my CS class and needed to implement
an avl tree using ocaml. I thought that it would be interesting to go a step further and try
to verify the balance property of the avl tree using the type system. Here's the resulting code
annotated for people new to the ideas of type level programming :)
*)
(* the property we are going to try to verify is that at each node of our tree, the height difference between
the left and the right sub-trees is at most of 1. *)
(* let's start by implementing the stuff we need for our invariance checking, namely natural numbers
at the type level. They will be useful to help us track the height of a tree at the type level
letting us then use this height to construct new trees with the right height balance. *)
(* what are natural numbers? *)
(* well, zero is a natural number for sure! *)
type zero = Z;;
(* good, we have one natural number, another infinity of them and we should be good. The nice thing about nats is that
they can be constructed and defined in a recursive way. We have the zero case, now we can reach any other number
by repeatedly adding one to zero, for instance 3 is 1+1+1+0. Let's try to create the type for that. *)
type 'n succ = S of 'n;;
(* so now you're going to tell me that this is wrong... you're right, one can construct the type unit succ and this
wouldn't be a natural number anymore. But in practice, the user will never have to be touching this nat type
so the damages are limited there and we get really awesome stuff be using this. *)
(* Let's see how we can use translate our natural numbers into these types with some examples *)
type one = zero succ;;
type two = one succ;;
(* That's a function, at the type level. Let it sink. *)
type 'n plus_two = 'n succ succ;;
type three = one plus_two;;
(* Nice, now let's try to define the avl type. Remember that we don't want anyone to be able to create an invalid avl tree,
that is, we don't want anyone to be able to create a tree where one of both children is deeper than the other children by
two layers.
In order to enforce this property, we'll need to be creative in the definition of our type! *)
(* Here, the type parameter 'n encodes the height of our tree in order the be able to enforce the tree
balance invariant using smart constructors *)
type 'n avl =
(* The most basic tree is an empty tree, called a Leaf, as it doesn't contain any key or children it
has a 0 height *)
| Leaf: zero avl
(* Another valid form of avl tree is if both of the children have the same height.
Note the type of the children, they are both parametrized by the same type n, meaning they
should both have the same height!
Note also the type of the constructed tree, it has a height of ('n succ) which is one more
than the height of each of the children because we're adding a layer to the previous trees. *)
| Balanced: 'n avl * int * 'n avl -> ('n succ) avl
(* The only other valid shapes are trees where one of the children is one layer deeper than the other tree. We
enforce this property by matching n with the height of the smallest subtree and forcing the height of the largest
subtree to be exactly n+1.
The resulting tree, because it adds one layer to the subtrees has a height of 1 + largers subtree, which means
it has a height of n+2
*)
| LeftHeavy: ('n succ) avl * int * 'n avl -> ('n succ succ) avl
| RightHeavy: 'n avl * int * ('n succ) avl -> ('n succ succ) avl
;;
(* Let's try this out by implementing the singleton function. This function
creates a tree with a single element in it. As you can read in the type
signature, the created tree has a height of 1. The nice thing here is that one
wouldn't be able the get the height of the output wrong because the type system
checks that the height in our type signature corresponds to the height of the constructed
tree.
This means the tree returned by the singleton function can't be annotated as a two tree, so
let t : two tree = singleton 1
would be rejected by the type checker. *)
let singleton : int -> one avl = fun x -> Balanced (Leaf, x, Leaf);;
(* Next step is going to be a bit more interesting, we are going to start implementing the
building blocks for our insert function by implementing the single_rotate_left that
rebalances the tree as following:
(p) (q)
/ \ / \
A (q) (p) C
/ \ / \ / \
B C A B -----
/ \
-----
As you can see in the diagram, the right subtree has a height of n+2 while the left subtree has
a height one n, which can be the case when inserting an element in the right part of the tree. But now
we can't reconstruct the tree at the level of (p) because the height difference between it's children is
of two. That's why we perform what is called a rotation in order to be able to construct a tree with
the height and ordering invariances respected.
In this implementation we only check the height invariance.
*)
let rotate_single_left : type n. n avl -> int -> (n succ succ) avl -> (n succ succ) avl =
fun a p -> function
| RightHeavy (b, q, c) -> Balanced (Balanced (a, q, b), q, c)
| _ -> failwith "we'll take care of it later in the implementation of insert ;)"
;;
(* Well, that's pretty! Don't worry that our functon is not total, I just wanted to demonstrate
the case covered by our little diagram.
The important part here is the type signature. As you can see this function takes a deconstructed
non balanced tree, that is, a tree with one side two level longer than the other side, and creates
a well formed avl tree of a certain hight.
The cool thing here is that the compiler can actually verify that out implementation creates a tree
of height n+2 provided the right arguments. So the compiler knows from the input parameters that the
output we're constructing has the right height properties. Our implementation could not have been wrong
because the compiler would have complained about it otherwise.
Again, let it sink aind think about how cool that is.
*)
(* Before we go and implement the insert function, we will need to first create some little types
and functions that will help us and the compiler prove that our implementation is correct. *)
(* Because the tree might or might not grow in height after insering a value in
it, we need to create a new type that will allow us to return the two
different heights the tree can have after inserting a value in it. *)
type 'n avl_insert =
| Grew of ('n succ) avl
| Same of 'n avl
;;
(* Trichotomy of integers order. This type will let the totality checker verify in our insert
implementation that we have covered all the comparison cases we can have when comparing the
element we want to insert in the tree to the current node.
It sounds like a small detail, but it's another great example of properly using the type system
to model out domain enabling more expressiveness which is better for us to express ourselves and
for the compiler to check what we're doing.
*)
type ordering = Lt | Eq | Gt;;
let compare l r = if l == r
then Eq
else if l < r
then Lt
else Gt
;;
(* Here we are. The implementation of the insert function. It's rather large and some might even
argue that this is ugly. But all of this doesn't matter, what matters is that this implementation
is verifiably correct. That there is no input that could make our function fail at runtime and
that regardless of the input, the output will be a well balanced avl tree. No unit test, no property test
will ever give you this level of confidence. Here the our implementation proves it's invariance and
the type checker verifies our proof.
You can just skim through the implementation, it's not important. The idea is the important part.
Ideas are what's important. Take a moment to think about it :)
*)
let rec insert : type n. int -> n avl -> n avl_insert = fun x t -> match t with
| Leaf -> Grew (singleton x)
| Balanced (left, key, right) ->
(match compare x key with
| Eq -> Same t
| Lt -> (match insert x left with
| Grew new_left -> Grew (LeftHeavy (new_left, key, right))
| Same new_left -> Same (Balanced (new_left, key, right))
)
| Gt -> (match insert x right with
| Grew new_right -> Grew (RightHeavy (left, key, new_right))
| Same new_right -> Same (Balanced (left, key, new_right))
)
)
| LeftHeavy (left, key, right) ->
(match compare x key with
| Eq -> Same t
| Lt -> (match insert x left with
| Same new_left -> Same (LeftHeavy (new_left, key, right))
(* This can't happen because the only way to have a growing
balanced tree is to have appended a singleton. But this only
happens if the left tree was a Leaf, so the current tree
could not have been LeftHeavy if its left child was a Leaf.
TODO: check if the totality checker can receive a hint on that
*)
| Grew (Balanced _) -> failwith "unreachable pattern not detected"
(* Single rotation *)
| Grew (LeftHeavy (left_left, left_key, left_right)) ->
Same (Balanced (left_left,
left_key,
Balanced (left_right, key, right)))
(* Double Rotation *)
| Grew (RightHeavy (left_left, left_key, left_right)) ->
(match left_right with
| LeftHeavy (left_right_left, left_right_key, left_right_right)
-> Same (Balanced (
Balanced (left_left, key, left_right_left),
left_key,
RightHeavy (left_right_right, left_right_key, right)
))
| RightHeavy (left_right_left, left_right_key, left_right_right)
-> Same (Balanced (
LeftHeavy (left_left, key, left_right_left),
left_key,
Balanced (left_right_right, left_right_key, right)
))
| Balanced (left_right_left, left_right_key, left_right_right)
-> Same (Balanced (
Balanced (left_left, key, left_right_left),
left_key,
Balanced (left_right_right, left_right_key, right)
))
)
)
(* Simple Insert *)
| Gt -> (match insert x right with
| Grew new_right -> Same (Balanced (left, key, new_right))
| Same new_right -> Same (LeftHeavy (left, key, new_right))
)
)
| RightHeavy (left, key, right) ->
(match compare x key with
| Eq -> Same t
(* Simple Insert *)
| Lt -> (match insert key left with
| Grew new_left -> Same (Balanced (new_left, key, right))
| Same new_left -> Same (RightHeavy (new_left, key, right))
)
| Gt -> (match insert x right with
| Same new_right -> Same (RightHeavy (left, key, new_right))
(* This can't happen because the only way to have a growing
balanced tree is to have appended a singleton. But this only
happens if the right tree was a Leaf, so the current tree
could not have been RightHeavy if its right child was a tree.
TODO: check if the totality checker can receive a hint on that
*)
| Grew (Balanced _) -> failwith "unreachable pattern not detected"
(* Double Rotation *)
| Grew (LeftHeavy (right_left, right_key, right_right)) ->
(match right_left with
| Balanced (right_left_left, right_left_key, right_left_right) ->
Same ( Balanced (
Balanced (left, key, right_left_left),
right_left_key,
Balanced (right_left_right, right_key, right_right)
))
| LeftHeavy (right_left_left, right_left_key, right_left_right) ->
Same ( Balanced (
Balanced (left, key, right_left_left),
right_left_key,
RightHeavy (right_left_right, right_key, right_right)
))
| RightHeavy (right_left_left, right_left_key, right_left_right) ->
Same ( Balanced (
LeftHeavy (left, key, right_left_left),
right_left_key,
Balanced (right_left_right, right_key, right_right)
))
)
(* Simple Rotation *)
| Grew (RightHeavy (right_left, right_key, right_right)) ->
Same (Balanced (Balanced (left, key, right_left),
right_key,
right_right))
)
)
;;
(* That's all folk! We've shown that it's possible to leverage to verify more advanced properties about
our program. In this example one might also want to check that the elements in the left part
of the tree are indeed smaller than the current key, and that the right part of the tree larger
keys have than the current key.
The ideas are the same, it would just be a little bit more work in a language like ocaml
because you would need to encode integers at the type level and keep them in sync with the value
level integers you use as keys. Again, same idea, but more grunt work.
Here are some keyword that will be interesting if you want to learn more about all of this:
Dependent typing, Coq, Idris, Curry Howard Correspondence
*)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment