Skip to content

Instantly share code, notes, and snippets.

@treeowl
Last active March 16, 2020 09:11
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save treeowl/b7d5daee549eb9bc6e6f2d9f851aab8c to your computer and use it in GitHub Desktop.
Save treeowl/b7d5daee549eb9bc6e6f2d9f851aab8c to your computer and use it in GitHub Desktop.
Breadth-first binary tree creation
-- | This module defines a function that produces a complete binary tree
-- from a breadth-first list of its (internal) node labels. It is an
-- optimized version of an implementation by Will Ness that avoids
-- any "impossible" cases. See
--
-- https://stackoverflow.com/a/60561480/1477667
module Bftr (Tree (..), bft, list, deftest) where
import Data.Function (fix)
import Data.Monoid (Endo (..))
-- | A binary tree.
data Tree a
= Empty
| Node a (Tree a) (Tree a)
deriving Show
-- An infinite stream. Why don't we just use a list? First, I think it's
-- easier to see what's going on this way. Second, the garbage collector
-- can clean up more garbage with this representation. The `bft` function
-- includes several lazy patterns. These cause the runtime system to
-- produce thunks to actually perform the pattern matches. Unlike a list
-- cons, a stream cons is a *record*, so the runtime system will produce
-- *selector thunks* for these pattern matches. That means that as soon
-- as one field of a particular cons is forced, all the selector thunks
-- matching on *either* field become available for GC simplification.
data SS a = a :< SS a
infixr 5 :<
bft' :: [a] -> Tree a
bft' xs = tree
where
-- `subtrees` is a stream of all proper subtrees of the result tree,
-- in breadth-first order, followed by infinitely many empty trees.
-- We form each tree in the result by combining a label from the input
-- list with consecutive subtrees.
tree :< subtrees = go xs subtrees
go :: [a] -> SS (Tree a) -> SS (Tree a)
go (a : as) ~(b1 :< ~(b2 :< bs)) = Node a b1 b2 :< go as bs
go [] _ = fix (Empty :<)
-- Ugh. The above definition, while beautiful, can leak space, thanks
-- to limitations of GHC's selector thunk mechanism. To avoid this,
-- at the cost of efficiency in some cases, we write this instead.
-- | Build a complete binary tree from a list of its breadth-first
-- traversal.
bft :: [a] -> Tree a
bft xs = tree
where
-- `subtrees` is a stream of all proper subtrees of the result tree,
-- in breadth-first order, followed by infinitely many empty trees.
-- We form each tree in the result by combining a label from the input
-- list with consecutive subtrees.
tree :< subtrees = go xs subtrees
go :: [a] -> SS (Tree a) -> SS (Tree a)
go (a : as) ys = Node a b1 b2 :< go as bs
where
{-# NOINLINE b2bs #-}
b1 :< b2bs = ys
b2 :< bs = b2bs
go [] _ = fix (Empty :<)
-- | Perform a simple laziness test; takes a non-negative integer.
-- This should not throw an exception.
deftest :: Int -> Int
deftest n = sum . take n . list . bft $ [1..n] ++ undefined
-- | Convert a tree to a list in breadth-first order (useful for testing).
--
-- @list . bft = id@
--
-- When @t@ is a complete tree,
--
-- @bft . list $ t = t@
list :: Tree a -> [a]
-- there's probably a better way to do this.
list t = appEndo (foldMap id (levels t)) []
levels :: Tree a -> [Endo [a]]
levels Empty = []
levels (Node a l r) = Endo (a:) : combine (levels l) (levels r)
where
combine [] ys = ys
combine (x : xs) ys = (x <> hd) : combine xs tl
where
(hd, tl) = case ys of
[] -> (mempty, [])
y:ys' -> (y, ys')
@WillNess
Copy link

WillNess commented Mar 7, 2020

isn't foldMap id just fold?

Wikipedia's Corecursion page suggests:

bflist :: Tree a -> [a]
bflist t = [x | Node x _ _ <- q]
     where
     q  =  t : go 1 q
     go 0  _                =          []
     go i (Empty      : q)  =          go (i-1) q
     go i (Node _ l r : q)  =  l : r : go (i+1) q

@treeowl
Copy link
Author

treeowl commented Mar 7, 2020

Yeah, I just didn't want to bother importing Data.Foldable to get fold. I don't think I'm going to understand that bflist tonight. Maybe I'll give it a try next week.

@WillNess
Copy link

WillNess commented Mar 7, 2020

thanks again; you version is amazingly clear and simple. yeah simple is hard.

btw I dumbed it down a bit in my SO answer - switched to plain : for simplicity.

@treeowl
Copy link
Author

treeowl commented Mar 7, 2020

Thanks! I really just manually fused yours to avoid intermediate structures, and to my surprise lots of stuff collapsed to give an ultimately simpler form.

@WillNess
Copy link

WillNess commented Mar 7, 2020

so, Why Can't Compilers Do That For Us?? :) /rant

I continue to be amazed humans still program "by hand".

I think the millions of programmers are bound to get replaced like the paralegals are going to be (or already are in the process of being replaced?)

@treeowl
Copy link
Author

treeowl commented Mar 8, 2020

I don't think the strictness annotation was actually exploitable ever, so I've removed it.

@treeowl
Copy link
Author

treeowl commented Mar 8, 2020

It's harder to understand and I have no particular reason to believe it's more efficient, but I thought it interesting to write this alternative implementation of go:

go :: [a] -> SS (Tree a) -> SS (Tree a)
go ys zs@~(_ :< _ :< bs) = t :< go ys' bs
  where
    (t, ys') = case ys of
      a : as -> case zs of
                       b1 :< b2 :< _ -> (Node a b1 b2, as)
      [] -> (Empty, [])

The interesting thing is that the result is "immediately infinite"; there's no need to be careful about how quickly it's consumed, so we can use a strict pattern match to get b1 and b2. I would like to play around with performance testing this variant at some point; it obviously produces the fringe less efficiently, but maybe it makes up for that elsewhere?

@treeowl
Copy link
Author

treeowl commented Mar 8, 2020

Wow.... It's a bit tough to convince GHC to actually do what I told it to in the last comment. It really wants to turn that into something else. Unfortunately, the something else looks very likely to leak memory, and also does some totally unnecessary pattern matching. Wat. (Caveat: I haven't checked with the latest GHC, so maybe that's changed.) I haven't dug into it to try to see what transformations lead to that weirdness. Anyway, here's a version that actually does what I meant it to:

go :: [a] -> SS (Tree a) -> SS (Tree a)
go ys zs@ ~(_ :< _ :< bs) = t :< go ys' bs
  where
    (t, ys') = tys'
    {-# NOINLINE tys' #-}
    tys' = case ys of
      a : as -> case zs of
        b1 :< b2 :< _ -> (Node a b1 b2, as)
      [] -> (Empty, [])

@WillNess
Copy link

WillNess commented Mar 10, 2020

interesting. it blazes the trail first i.e. creates more entries in the spine (than just the one as before), filled with completely uninstantiated variables at first - driven by the strict pattern match of zs. so by the time ys hits the [] the spine is built in full.

@treeowl
Copy link
Author

treeowl commented Mar 10, 2020

Sorry, I deleted the comment I think you just responded to, because I realized it was bogus. Anyway, Here's one based on a similar function for Data.Tree by @oisdk:

breadthFirst :: forall a. [Tree a] -> [a]
breadthFirst ts = foldr f b ts []
  where
    f (Node x l r) fw bw = x : fw (r : l : bw)
    f Empty fw bw = fw bw

    b :: [Tree a] -> [a]
    b [] = []
    b ts = foldl (flip f) b ts []

I don't really like this, TBH, because it's not remotely obvious that the foldl here actually represents a queue rotation. But here's a partially deobfuscated version of the same idea:

breadthFirst :: forall a. [Tree a] -> [a]
breadthFirst ts = foldr f b ts []
  where
    -- The forest we fold over represents the front of a banker's queue. bw
    -- represents its back. Once the front is empty (we've finished a level)
    -- we rotate the queue.
    f (Node x l r) fw bw = x : fw (r : l : bw)
    f Empty fw bw = fw bw

    b :: [Tree a] -> [a]
    b [] = []
    b ts = breadthFirst (reverse ts) 

@WillNess
Copy link

WillNess commented Mar 14, 2020

no, I replied to the comment directly above mine. I tried to summarize the workings of your go function there.

is "rotate the queue" common terminology for the switching it describes? for my mind, rotate (x:xs) == xs++[x].

also, I'd name b's argument bw, for uniformity and clarity. I can already see it's [Tree a], so calling it ts doesn't add any new clues; but seeing that it's the same bw as in f clarifies f's definition for me. Or, we could get rid of b altogether with

breadthFirst :: forall a. [Tree a] -> [a]
breadthFirst [] = []
breadthFirst ts = foldr f (breadthFirst . reverse) ts []
  where
    f (Node x l r) fw bw = x : fw (r : l : bw)
    f Empty        fw bw =     fw bw

my bflist can be made to skip over the Empty's ahead of time, at the cost of putting more clauses into its definition:

bflist2 :: Tree a -> [a]
bflist2 Empty =  []
bflist2 t     =  map (\ ~(Node x _ _) -> x) q
     where
     q  =  t : go 1 q
     go 0  _                 =  []
     go i (Node _ Empty Empty : ns)  =          go (i-1) ns
     go i (Node _ l     Empty : ns)  =  l     : go (i)   ns
     go i (Node _ Empty r     : ns)  =      r : go (i)   ns
     go i (Node _ l     r     : ns)  =  l : r : go (i+1) ns

Might perform better than the shorter version.

@oisdk
Copy link

oisdk commented Mar 14, 2020

I haven't followed the whole conversation here, but for breadth-first traversals I think the optimal function is the following:

levels :: Tree a -> [[a]]
levels t = takeWhile (not . null) (f t (repeat []))
  where
    f Empty                qs  = qs
    f (Node x ls rs) ~(q : qs) = (x:q) : f ls (f rs qs)

It would probably benefit from some optimisation, like using streams instead of lists:

data Stream a = a :< Stream a

toListWhile p (x :< xs)
  | p x       = x : toListWhile p xs
  | otherwise = []

levels :: Tree a -> [[a]]
levels t = toListWhile (not . null) (f t (fix ([] :<)))
  where
    f Empty                 qs  = qs
    f (Node x ls rs) ~(q :< qs) = (x:q) :< f ls (f rs qs)

It also looks the most like the "inverse" of the unfold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment