ocramz/comonad-graph.md

## comonad-graph.md

      
    Raw
  

              comonad-graph.md
            
          
  layout
  title
  date
  comments
  author
  categories
  commentIssueId
  
  
  post
  A comonad of graph decompositions
  2016-04-02 13:02:54 -0400
  true
  Rúnar
  scala comonads
  22
  
  
I want to talk about a comonad that came up at work the other day. Actually, two of them, as the data structure in question is a comonad in at least two ways, and the issue that came up is related to the difference between those two comonads.
This post is sort of a continuation of the Comonad Tutorial, and we can call this "part 3". I'm going to assume the reader has a basic familiarity with comonads.
Inductive Graphs

At work, we develop and use a Scala library called Quiver for working with graphs. In this library, a graph is a recursively defined immutable data structure. A graph, with node IDs of type V, node labels N, and edge labels E, is constructed in one of two ways. It can be empty:
empty :: Graph v n e
Or it can be of the form c & g, where c is the context of one node of the graph and g is the rest of the graph with that node removed:
data Context v n e =
    Context { inEdges :: Vector (e, v)
            , vertex :: v
            , label :: n
            , outEdges :: Vector (e, v)
            }

(&) :: Graph v n e -> Graph v n e
(&) g = undefined
By the same token, we can decompose a graph on a particular node:
decomp :: Graph v n e -> v -> Option (GDecomp v n e)
Where a GDecomp is a Context for the node v (if it exists in the graph) together with the rest of the graph:
data GDecomp v n e =
    GDecomp { ctx :: Context v n e
            , rest :: Graph v n e
            }
Recursive decomposition

Let's say we start with a graph g, like this:
![Example graph](/images/quiver/Quiver1.png =300x)
I'm using an undirected graph here for simplification. An undirected graph is one in which the edges don't have a direction. In Quiver, this is represented as a graph where the "in" edges of each node are the same as its "out" edges.
If we decompose on the node a, we get a view of the graph from the perspective of a. That is, we'll have a Context letting us look at the label, vertex ID, and edges to and from a, and we'll also have the remainder of the graph, with the node a "broken off":
![GDecomp on a](/images/quiver/decompa.png =400x)
Quiver can arbitrarily choose a node for us, so we can look at the context of some "first" node:
decompAny :: Graph v n e -> Maybe (GDecomp v n e)
We can keep decomposing the remainder recursively, to perform an arbitrary calculation over the entire graph:
fold :: Graph v n e -> b -> ((Context v n e, b) -> b) -> b
fold g b f = undefined
The implementation of fold will be something like:
fromMaybe b $ do
    (GDecomp ctx rest) <- decompAny g
    return f (ctx, fold rest b f)
For instance, if we wanted to count the edges in the graph g, we could do:
do
    (Context(ins, _, _, outs), b) <- fold g 0
    return size ins + size outs + b
The recursive decomposition will guarantee that our function doesn't see any given edge more than once. For the graph g above, (g fold b)(f) would look something like this:

Graph Rotations

Let's now say that we wanted to find the maximum [degree](https://en.wikipedia.org/wiki/Degree_(graph_theory) of a graph. That is, find the highest number of edges to or from any node.
A first stab might be:
maxDegree :: Graph v n e -> Int
maxDegree g = fold g 0 $ \(Context ins _ _ outs) -> max (size ins + size outs) z
But that would get the incorrect result. In our graph g above, the nodes b, d, and f have a degree of 3, but this fold would find the highest degree to be 2. The reason is that once our function gets to look at b, its edge to a has already been removed, and once it sees f, it has no edges left to look at.
This was the issue that came up at work. This behaviour of fold is both correct and useful, but it can be surprising. What we might expect is that instead of receiving successive decompositions, our function sees "all rotations" of the graph through the decomp operator:

That is, we often want to consider each node in the context of the entire graph we started with. In order to express that with fold, we have to decompose the original graph at each step:
maxDegree :: Graph v n e -> Int
maxDegree g = fold g 0 $ \(c, z) -> max z $ fromMaybe 0 $ do
    (GDecomp (Context ins _ _ outs) _) <- decompose g (vertex c)
    return size ins + size outs
But what if we could have a combinator that labels each node with its context?
contextGraph :: Graph v n e -> Graph v (Context v n e) e
Visually, that looks something like this:

If we now fold over contextGraph(g) rather than g, we get to see the whole graph from the perspective of each node in turn. We can then write the maxDegree function like this:
maxDegree :: Graph v n e -> Int
maxDegree g = fold (contextGraph g) 0 $ \(c, z) -> max z (insSize c + outsSize c)
  where
    insSize = size . ins . label
    outsSize = size . outs . label
Two different comonads

This all sounds suspiciously like a comonad! Of course, Graph itself is not a comonad, but GDecomp definitely is. The counit just gets the label of the node that's been decomped out:
type GDecomp2 v e n = GDecomp v n e

instance Comonad (GDecomp2 v e) where
    extract g = label (ctx g)
    extend g f = undefined
The cobind can be implemented in one of two ways. There's the "successive decompositions" version:
extend :: GDecomp v a e -> GDecomp v a e -> b
extend g f = GDecomp newCtx newRest
  where
    newCtx = (ctx g) { label = f g }
    newRest = fromMaybe empty $ do
        gdecomp <- decompAny (rest g)
        let (GDecomp c r) = extract gdecomp f
        return c & r
Visually, it looks like this:

It exposes the substructure of the graph by storing it in the labels of the nodes. It's very much like the familiar NonEmptyList comonad, which replaces each element in the list with the whole sublist from that element on.
So this is the comonad of recursive folds over a graph. Really its action is the same as as just fold. It takes a computation on one decomposition of the graph, and extends it to all sub-decompositions.
But there's another, comonad that's much more useful as a comonad. That's the comonad that works like contextGraph from before, except instead of copying the context of a node into its label, we copy the whole decomposition; both the context and the remainder of the graph.
That one looks visually more like this:

Its cobind takes a computation focused on one node of the graph (that is, on a GDecomp), repeats that for every other decomposition of the original graph in turn, and stores the results in the respective node labels:
extend :: GDecomp v a e -> GDecomp v a e -> b
extend g f = GDecomp newCtx newRest
  where
    newCtx = (ctx g) { label = f g }
    newRest = fold rest empty $ \(c, acc) -> c { label = newLabel }
    newLabel = f (get (decomp orig (vertex c))) & acc
This is useful for algorithms where we want to label every node with some information computed from its neighborhood. For example, some clustering algorithms start by assigning each node its own cluster, then repeatedly joining nodes to the most popular cluster in their immediate neighborhood, until a fixed point is reached.
As a simpler example, we could take the average value for the labels of neighboring nodes, to apply something like a low-pass filter to the whole graph:
lopass :: Graph v Int e -> Graph v Int e
lopass g = fromMaybe g $ do
    d <- decompAny g 
    return $ cobind d $ \x ->
        let neighbors = fmap (label . get . decomp g) (inEdges x ++ outEdges x) in
            (sum neighbors + label x) / (length neighbors + 1)
The difference between these two comonad instances is essentially the same as the difference between NonEmptyList and the nonempty list Zipper.
It's this latter "decomp zipper" comonad that I decided to ultimately include as the Comonad instance for quiver.GDecomp.