public
Last active

red-black trees in haskell, using GADTs and Zippers

  • Download Gist
RedBlackTree.hs
Haskell
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE StandaloneDeriving #-}
module RedBlackTree where
 
data Zero
data Succ n
type One = Succ Zero
 
data Black
data Red
 
-- red-black trees are rooted at a black node
data RedBlackTree a = forall n. T ( Node Black n a )
deriving instance Show a => Show (RedBlackTree a)
 
-- all paths from a node to a leaf have exactly n black nodes
data Node c n a where
-- all leafs are black
Leaf :: Node Black One a
-- internal black nodes can have children of either color
B :: Node cL n a -> a -> Node cR n a -> Node Black (Succ n) a
-- internal red nodes can only have black children
R :: Node Black n a -> a -> Node Black n a -> Node Red n a
deriving instance Show a => Show (Node c n a)
 
-- one-hole context for red-black trees
data Context m c n a where
-- if we're at the root, the hole is a black node
Root :: Context n Black n a
-- we can go left or right from a red node hole, creating a hole for a black node
BC :: Bool -> a -> Node Black n a -> Context m Red n a -> Context m Black n a
-- we can go left or right from a black node hole, creating a hole for either
EC :: Bool -> a -> Node cY n a -> Context m Black (Succ n) a -> Context m cX n a
 
data Zipper m a = forall c n. Zipper (Node c n a) (Context m c n a)
 
-- create a zipper
unZip :: Node Black n a -> Zipper n a
unZip = flip Zipper Root
 
-- destroy a zipper
zipUp :: Zipper m a -> Node Black m a
zipUp (Zipper x Root) = x
zipUp (Zipper x (BC goLeft a y c)) = zipUp $ Zipper (if goLeft then R x a y else R y a x) c
zipUp (Zipper x (EC goLeft a y c)) = zipUp $ Zipper (if goLeft then B x a y else B y a x) c
 
-- locate the node that should contain a in the red-black tree
zipTo :: Ord a => a -> Zipper n a -> Zipper n a
zipTo _ z@(Zipper Leaf _) = z
zipTo a z@(Zipper (R l a' r) c) = case compare a a' of
EQ -> z
LT -> zipTo a $ Zipper l (BC True a' r c)
GT -> zipTo a $ Zipper r (BC False a' l c)
zipTo a z@(Zipper (B l a' r) c) = case compare a a' of
EQ -> z
LT -> zipTo a $ Zipper l (EC True a' r c)
GT -> zipTo a $ Zipper r (EC False a' l c)
 
-- create a red-black tree
empty :: RedBlackTree a
empty = T Leaf
 
-- insert a node into a red-black tree
-- (see http://en.wikipedia.org/wiki/Red%E2%80%93black_tree#Insertion)
insert :: Ord a => a -> RedBlackTree a -> RedBlackTree a
insert a t@(T root) = case zipTo a (unZip root) of
-- find matching leaf and replace with red node (pointing to two leaves)
Zipper Leaf c -> insertAt (R Leaf a Leaf) c
-- if it's already in the tree, there's no need to modify it
_ -> t
 
insertAt :: Node Red n a -> Context m c n a -> RedBlackTree a
-- 1) new node is root => paint it black and done
insertAt (R l a r) Root = T $ B l a r
-- 2) new node's parent is black => done
insertAt x (EC b a y c) = T . zipUp $ Zipper x (EC b a y c)
-- 3) uncle is red => paint parent/uncle black, g'parent red. recurse on g'parent
insertAt x (BC pb pa py (EC gb ga (R ul ua ur) gc)) = insertAt g gc
where p = if pb then B x pa py else B py pa x
u = B ul ua ur
g = if gb then R p ga u else R u ga p
-- 4) node is between parent and g'parent => inner rotation
insertAt (R l a r) (BC False pa py pc@(EC True _ _ _)) = insertAt (R py pa l) (BC True a r pc)
insertAt (R l a r) (BC True pa py pc@(EC False _ _ _)) = insertAt (R r pa py) (BC False a l pc)
-- 5) otherwise => outer rotation
-- XXX: GHC seems unable to infer that gy is Black so I have to do both cases
-- explicitly, rather than
-- insertAt x (BC True pa py (EC True ga gy gc)) =
-- T . zipUp $ Zipper (B x pa $ R py ga gy) gc
-- insertAt x (BC False pa py (EC False ga gy gc)) =
-- T . zipUp $ Zipper (B (R gy ga py) pa x) gc
insertAt x (BC True pa py (EC True ga gy@Leaf gc)) =
T . zipUp $ Zipper (B x pa $ R py ga gy) gc
insertAt x (BC True pa py (EC True ga gy@(B _ _ _) gc)) =
T . zipUp $ Zipper (B x pa $ R py ga gy) gc
insertAt x (BC False pa py (EC False ga gy@Leaf gc)) =
T . zipUp $ Zipper (B (R gy ga py) pa x) gc
insertAt x (BC False pa py (EC False ga gy@(B _ _ _) gc)) =
T . zipUp $ Zipper (B (R gy ga py) pa x) gc
 
-- can't derive, since we abstract over n, so we have to manually
-- check for identical structure
instance Eq a => Eq (RedBlackTree a) where
T Leaf == T Leaf = True
T (B l@(B _ _ _) a r@(B _ _ _)) == T (B l'@(B _ _ _) a' r'@(B _ _ _)) =
a == a' && T l == T l' && T r == T r'
T (B (R ll la lr) a r@(B _ _ _)) == T (B (R ll' la' lr') a' r'@(B _ _ _)) =
a == a' && la == la' &&
T ll == T ll' && T lr == T lr' && T r == T r'
T (B l@(B _ _ _) a r@(R rl ra rr)) == T (B l'@(B _ _ _) a' r'@(R rl' ra' rr')) =
a == a' && ra == ra' &&
T l == T l' && T rl == T rl' && T rr == T rr'
T (B (R ll la lr) a r@(R rl ra rr)) == T (B (R ll' la' lr') a' r'@(R rl' ra' rr')) =
a == a' && la == la' && ra == ra' &&
T ll == T ll' && T lr == T lr' && T rl == T rl' && T rr == T rr'
_ == _ = False
 
-- can't derive, since B abstracts over child node colors, so
-- manually check for identical structure
instance Eq a => Eq (Node c n a) where
Leaf == Leaf = True
R l a r == R l' a' r' = a == a' && l == l' && r == r'
b@(B _ _ _) == b'@(B _ _ _) = T b == T b'
_ == _ = False

Just for the heck of it I tried it with DataKinds, to make it use the ordinary data Nat = ... and a data RedBlack = Red | Black. It worked fine as you no doubt would have expected, only a couple of signatures needed changing. What most surprised me is that the deriving still worked. You can add deriving Show ... for Context and Zipper, by the way, and then zip around in ghci just fine.

There is T . zipUp $ redundancy in your insertAt function. It would be nicer, I think, if it returned a Zipper instead:

  • less redudant code
  • a clearer signature: you take an almost-zipper with a potentially-wrong color on the node, and re-balance it into a correct zipper
  • while insert would zipUp the returned zipper immediately, there might be use cases where this zipper has value; I'm wondering for example if code to insert a list of (key,value) pair in one batch could not reuse the resulting context

I can think of two problems with having insertAt return a Zipper.

  1. User confusion with the Zipper location. If insertAt recurses, the final Zipper may not point to the same position in the tree as before the insertion - it could move up from a leaf all the way to the root. I'd put this down as surprising behaviour for a Zipper, so I'd want to code a return to the original location.

  2. Type. In case 1 we paint the red node black and insert it at the root. With a zipper, this would have a signature of Node Red n a -> Context n Black n a -> Zipper (Succ n) a since this adds one black node to every path from the root to a leaf. In case 2 however, there tree's number of black nodes remains the same, so that would have a signature of Node Red n a -> Context m c n a -> Zipper m a. So we'd actually need to return Either (Zipper (Succ m) a) (Zipper m a).

Although I suppose I could alter RedBlackTree a to contain a Zipper m a rather than a Node Black m a, and update insert to make sure we don't need to check upwards.

Nice. I've taken a copy for GHC's regression test suite (with a link back here), so that we'll know if this program ever fails to compile. Please yell if you don't want me to do that and I'll remove it again.

Simon

Does anyone have an explanation for why ghc isn't able to infer gy is Black? (line 87)
I tried using DataKinds and KindSignatures, which works fine, but doesn't seem to help infering this.

As types are closed (you cannot add new constructors), I would expect it's possible to create a mapping from types to and from constructors.

Well, gy :: Node cY n a
and cY is existentially bound with absolutely no other static information. In the code that works you pattern-match explicilty on Leaf and B, but you omit a match on R. So, why do you think R is impossible here? You need to explain why in the types!

I think you believe that cY must be Black. But why?

I believe cY must be Black because if it were Red it must have been constructed using R, which would have been covered by an earlier pattern match:

-- 3) uncle is red => paint parent/uncle black, g'parent red.  recurse on g'parent
insertAt x (BC pb pa py (EC gb ga (R ul ua ur) gc)) = --...

So it looks to me like I'm failing to prove that given gy :: Node cY n a and gy does not match R _ __ _, that cY is Black despite the fact that only other two constructors constrain cY to be Black.

Which I really think of as a failure on my part to understand GHC's type inference.

When I comment out the cases at line 93-100 and run with -Wall:

    Warning: Pattern match(es) are non-exhaustive
             In an equation for `insertAt':
                 Patterns not matched:
                     (R _ _ _) (BC False _ _ (EC False _ Leaf _))
                     (R _ _ _) (BC False _ _ (EC False _ (B _ _ _) _))
                     (R _ _ _) (BC True _ _ (EC True _ Leaf _))
                     (R _ _ _) (BC True _ _ (EC True _ (B _ _ _) _))

While I don't know a lot about ghc's type inference, this proves to me that some part of GHC is able to figure out that "Leaf" and "B" are the only 2 constructors possible in this spot. So when I write a _ there, it can be inferred that whatever is matched there is always gonna be Black.

Consider this

data T a where
TX :: T Int
TY :: T Bool
TZ :: T Bool

f :: a -> T a -> a
f v TX = v+1
f v _ = not v

Should this type check. You argue: sinc you have matched TX,
'a' must be 'Bool'. Hence (not v) should be fine.

But this does not work at all. For a start the equations are
treated separately by the type checker. And, more important,
there is not translation into our internal langauage FC. The
argument relies on the fact that other alternatives are eliminated
so the only remaining alternative is that a~Bool. But there is
no positive evidence that a~Bool.

So we have to write

f :: a -> T a -> Bool
f v TX = v > 1
f v TY = not v
f v TZ = not v

If we translate into FC we get:

f = /\a. (v::a) (t::T a).
case t of
TX (c :: a~Int) -> (v |> c) > 1
TY (c :: a~Bool) -> not (v |> c)
TZ (c :: a~Bool) -> not (v |> c)

Notice that the proof, c, that (a~Bool) is used in the RHS
to cast 'v' from 'a' to Bool.

Sorry!

Thanks for the explanation!

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.