Last active
June 15, 2020 18:47
-
-
Save jaspervdj-luminal/6683e147fe5a3fc84846b7624e8b24a8 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cryptographic hashes and dependency cycles | |
========================================== | |
Introduction | |
------------ | |
_"Much more than encryption algorithms, one-way hash functions are the workhorses | |
of modern cryptography." - Bruce Schneier_ | |
NOTE: The above should be formatted as a quote. I don't know if we have anything | |
special for that. | |
Cryptographic hashes (or _one-way hash functions_) allow us to compute a digest | |
that uniquely identifies a resource. If we would make a small change anywhere | |
in a resource, its digest also changes (drastically, because of the [Avalanche] | |
effect). | |
[Avalanche]: https://en.wikipedia.org/wiki/Avalanche_effect | |
This makes them very practical for detecting changes in applications that deal | |
with dependency trees. If we include the cryptographic hashes of the | |
dependencies of a resource in its cryptographic hash, we have a cheap way to | |
check if a resource _or any of its dependencies_ was changed. | |
This is very commonly used in the software world. For example, think of git | |
hashes, blockchains, and the [nix] build system. | |
[nix]: https://nixos.org/nix/ | |
Implementing this is theoretically not that hard, given a good cryptographic | |
library. However, it gets trickier when we want to allow dependency cycles. At | |
that point, we get a bit of a chicken-and-egg probblem: we need resource _A_s | |
hash to compute resource _B_s hash, but that in turn depends on _B_s hash again! | |
Cyclical cloud resources | |
------------------------ | |
Let's start by giving a more concrete example of how we encountered this | |
problem. In [Fugue], we are dealing with constructing complicated cloud | |
infrastructure. Cyclical dependencies between resources inevitably pop up once | |
in a while. | |
[Fugue]: https://fugue.co/ | |
For example, it is not uncommon two have two [Security Groups] that refer to | |
each other. Imagine you are have a _default_ security group, which contains the | |
core of your application, and a _monitoring_ security group which performs | |
health checks. | |
[Security Groups]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html | |
- You want to allow traffic from _monitoring_ to _default_ in order to perform | |
external sanity checks; | |
- but you also want to allow traffic from _default_ to _monitoring_, in order to | |
perform heartbeat-like checks, or to have your application send detailed | |
information. | |
Configuring these in mainstream configuration languages often leads to annoying | |
cyclic dependency issues, which in turn require annoying and verbose | |
workarounds. | |
In Fugue, we use a statically typed, declarative programming language called | |
_Ludwig_ to configure infrastructure. This allows us to solve this problem in a | |
neat way. In this blogpost, we study how we can solved part of the problem: | |
detecting changes in these possibly cyclical dependency graphs. | |
Literate Haskell | |
---------------- | |
This blogpost is written in Literate Haskell. This is an implementation of | |
[literate programming] for the [Haskell] programming language. The idea of | |
literate crogramming was coined a long time ago by Dr. Donald Knuth, who cited | |
[CWEB] as his favorite programming language. | |
[literate programming]: https://en.wikipedia.org/wiki/Literate_programming | |
[Haskell]: http://haskell.org | |
[CWEB]: http://www-cs-faculty.stanford.edu/~uno/cweb.html | |
In literate programming, we communicate an idea by presenting a human-readable | |
text interspersed with snippets of code. This can be considered as a sort of | |
dual to _"normal"_ programming, where we intersperse our code with comment | |
snippets. | |
This is perfect for technical blogposts (and of course [we really like | |
Haskell]). | |
[we really like Haskell]: https://blog.fugue.co/2016-01-27-our-functional-future-or-how-i-learned-to-stop-worrying-and-love-haskell.html | |
As a result, you can compile this blogpost easily with a Haskell compiler, or | |
even better, load it into the [GHCi REPL] and play around with it. In order to | |
do that, download the [code to this blogpost] and load it in GHCi: | |
$ ghci cyclic-hash.lhs | |
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help | |
[1 of 1] Compiling Data.CyclicHash ( cyclic-hash.lhs, interpreted ) | |
Ok, modules loaded: Data.CyclicHash. | |
*Data.CyclicHash> | |
[GHCi REPL]: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/ghci.html | |
[code to this blogpost]: TODO | |
In order for this to blogpost work as a standalone Haskell module, we need to do | |
some setup first. We start with a reasonably standard Haskell module header: | |
first we declare language pragmas we need. | |
> {-# LANGUAGE BangPatterns #-} | |
> {-# LANGUAGE FlexibleContexts #-} | |
> {-# LANGUAGE GADTs #-} | |
> {-# LANGUAGE MultiParamTypeClasses #-} | |
> {-# LANGUAGE TypeFamilies #-} | |
Then we describe the exported interface of this module. Following good | |
technical design standards, we try to keep this to a minimum. | |
> module Data.CyclicHash | |
> ( CyclicHash (..) | |
> , HashField (..) | |
> , cyclicHashes | |
> | |
We also have some exports for testing: | |
> , Username | |
> , Person (..) | |
> , people | |
> , showCyclicHashes | |
> ) where | |
Lastly we have to import a number of other modules we rely on. Most of these | |
modules ship with the base library provided by the compiler, and the others are | |
provided by three very commonly used libraries: [containers], [hashable] and | |
[unordered-containers]. | |
[containers]: https://hackage.haskell.org/package/containers | |
[hashable]: https://hackage.haskell.org/package/hashable | |
[unordered-containers]: https://hackage.haskell.org/package/unordered-containers | |
> import Data.Hashable (Hashable, hashWithSalt) | |
> import Data.List (foldl') | |
> import Data.Map.Strict (Map) | |
> import qualified Data.Map.Strict as Map | |
> import Data.Maybe (fromMaybe, maybeToList) | |
> import Text.Printf (printf) | |
> import qualified Data.Graph as G | |
Problem definition | |
------------------ | |
In this blogpost, we will look at a simple example of cyclical dependencies. We | |
are modeling a trust graph, and we want to compute cryptographic hashes for all | |
the people in this trust graph. Because trust can be mutual, we quickly | |
encounter the problem of dependency cycles here. | |
Let's look at the datatype we will use to model our problem: | |
> type Username = String | |
> data Person = Person | |
> { pUserName :: Username | |
> , pFullName :: String | |
> , pTrusts :: [Username] | |
> } | |
A person has a `Username` (which we will use as key) and a full name. A person | |
also has a list of other people he or she *trusts*. The cryptographic hash of a | |
`Person` consists of the username, the full name, and *the hashes of the people | |
this person trusts*. | |
The CyclicHash class | |
-------------------- | |
We start out with a class that captures the essence of computing hashes for | |
cyclic data structures. | |
We need an associated type synonym, `Key`, which gives us a unique key for the | |
value. We also have a corresponding `key` function to obtain this key. | |
Usually the hash of a datatype is a hash of all the fields in that | |
datastructure. This is why we have the second important function, `fields`. It | |
returns a list of all the fields we want to hash. | |
> class (Hashable (Key a), Ord (Key a)) => CyclicHash a where | |
> type Key a | |
> key :: a -> Key a | |
> fields :: a -> [HashField a] | |
The fields we want to hash are represented using an intermediate datatype. | |
We make a distinction between two kinds of fields: | |
- Primitive fields (`Int`, `String`, and other things that have a reasonable | |
`Hashable` instance). These just contain the value that needs to be hashed. | |
- Dependency fields which refer to other values. These are the ones we need to | |
be careful about, since they can introduce dependency cycles. | |
> data HashField a where | |
> PrimitiveField :: Hashable b => b -> HashField a | |
> DependencyField :: Key a -> HashField a | |
We can now write down the `CyclicHash` instance for `Person`. It is fairly | |
straightforward: `pUserName` and `pFullName` contribute to the hash as primitive | |
fields, and we create a `DependencyField` for every person in `pTrusts`. | |
> instance CyclicHash Person where | |
> type Key Person = Username | |
> key p = pUserName p | |
> fields p = | |
> [ PrimitiveField (pUserName p) | |
> , PrimitiveField (pFullName p) | |
> ] ++ | |
> map DependencyField (pTrusts p) | |
Lastly, let's add an auxiliary method that gives simply us the dependencies of a | |
node. We can do this by finding all the `DependencyField` fields and | |
concatenating their keys in a list. | |
> dependencies :: CyclicHash a => a -> [Key a] | |
> dependencies node = [k | DependencyField k <- fields node] | |
Aside: the Hashable library | |
--------------------------- | |
In this blogpost, we use the [hashable] library, because of its easy interface | |
and wide adoption. However, we must keep in mind that this library does not | |
provide us with cryptographically strong hash functions. | |
In production code, it would be better to use an algorithm like SHA-2. The idea | |
behind the cyclic hashing stays the same though, so updating this code to use | |
such an algorithm is not a hard task. | |
A primitive hash | |
---------------- | |
We can start by defining a "primitive" hash function, which does not look at the | |
dependencies of a node. | |
> primitiveHash :: CyclicHash a => Int -> a -> Int | |
> primitiveHash salt = | |
> foldl' hashField salt . fields | |
> where | |
> hashField s (PrimitiveField x) = | |
> s `hashWithSalt` (0 :: Int) `hashWithSalt` x | |
> hashField s (DependencyField k) = | |
> s `hashWithSalt` (1 :: Int) `hashWithSalt` k | |
A component hash | |
---------------- | |
Now, we want some way to take the hashes of the dependencies into account. But | |
that is impossible to do if we have the following trust graph like this: | |
Mary ←- David | |
| ↑ | | |
↓ | ↓ | |
Rob Susan | |
In Haskell, this looks like: | |
> people :: [Person] | |
> people = | |
> [ Person "rob" "Robert Smith" ["mary"] | |
> , Person "mary" "Mary Johnson" ["rob"] | |
> , Person "susan" "Susan Williams" [] | |
> , Person "david" "David Brown" ["mary", "susan"] | |
> ] | |
Mary's hash depends on Rob's hash and vice versa! An elegant way to solve this | |
is provided to us by computing the [Strongly connected components] of the graph. | |
[Strongly connected components]: https://en.wikipedia.org/wiki/Strongly_connected_component | |
We will be able to compute the strongly connected components using the vastly | |
underrated [Data.Graph] module. The output for the example graph we saw above | |
is: | |
[Data.Graph]: https://hackage.haskell.org/package/containers/docs/Data-Graph.html | |
B. Mary | C. ←- David | A. + | |
| ↑ | | | | |
↓ | | | ↓ | |
Rob | | Susan | |
A very important property of the resulting components is that if we look at the | |
graph of components, **there cannot be any cycles**. This is easy to see: if | |
there was a cycle between two or more components, they would belong to the same | |
cyclic component! | |
Now, we can compute a hash *per component*. The simplest strongly connected | |
component is a single vertex (represented by the `G.AcyclicSCC` constructor). In | |
this case, we can rely on the `primitiveHash` function we saw above. Otherwise, | |
we have a list of vertices (`G.CyclicSCC`). We can hash these together by using | |
a *fold*. | |
Note that this hash also contains the *shape* of the component. | |
> componentHash :: CyclicHash a => Int -> G.SCC a -> Int | |
> componentHash salt0 (G.AcyclicSCC x) = | |
> salt0 `hashWithSalt` (0 :: Int) `primitiveHash` x | |
> componentHash salt0 (G.CyclicSCC xs0) = | |
> salt0 `hashWithSalt` (1 :: Int) `hashPrimitives` xs0 | |
> where | |
> -- | Chain of primitiveHash. | |
> hashPrimitives salt = foldl' primitiveHash salt | |
This gives us something like the following for our example (we use `<+>`) as an | |
_informal_ operator to denote a combination of hashes: | |
componentHash(A) = 0 <+> primitiveHash(susan) | |
componentHash(B) = 1 <+> primitiveHash(rob) <+> primitiveHash(mary) | |
componentHash(C) = 0 <+> primitiveHash(david) | |
Note that the the `<+>` operator is _not_ commutative, so the order in which we | |
do the fold is important. Fortunately, the `Data.Graph` library takes care of | |
that for us: the nodes returned in a component are sorted in the order we need. | |
A component hash per node | |
------------------------- | |
With the help of the `componentHash` function, we can define another function | |
which gives us the map containing the hash for each component, by node. | |
> componentHashes :: CyclicHash a => Int -> [G.SCC a] -> Map (Key a) Int | |
> componentHashes salt sccs = Map.fromList $ do | |
> scc <- sccs | |
> let !hash = componentHash salt scc | |
> node <- G.flattenSCC scc | |
> return (key node, hash) | |
For the example graph, this gives us the following mapping: | |
rob => componentHash(B) | |
david => componentHash(C) | |
susan => componentHash(A) | |
mary => componentHash(B) | |
The cyclic hash | |
--------------- | |
Now, we have the almost all the ingredients to compute the full cyclic hash for | |
a value. What we want to do is easy to informally define but a bit harder to | |
implement. | |
Since we know there are no cyclic dependencies between the strongly connected | |
components of the graph, we can run through them _in order_ now. | |
1. Let's start with component _A_ which is the simplest. Since the nodes in this | |
component have no dependencies whatsoever, we get: | |
cyclicHash(susan) = primitiveHash(susan) | |
2. For component _B_, it's a bit more complicated. Ideally we would have | |
something like this, which is of course impossible: | |
cyclicHash(rob) = primitiveHash(rob) <+> cyclicHash(mary) | |
cyclicHash(mary) = primitiveHash(mary) <+> cyclicHash(rob) | |
But we can fix this by, instead of taking the cyclic hashes of nodes in the | |
same component, taking the component hashes, since these hashes also contain | |
the whole component! This gives us: | |
cyclicHash(rob) = primitiveHash(rob) <+> componentHash(B) | |
cyclicHash(mary) = primitiveHash(mary) <+> componentHash(B) | |
3. When we arrive at component _C_, we have already computed the cyclic hashes | |
of its dependencies, so we simply get: | |
cyclicHash(david) = | |
primitiveHash(david) <+> | |
cyclicHash(mary) <+> | |
cyclicHash(susan) | |
The implementation of `cyclicHash` is similar to `primitiveHash`, except we now | |
pass in an argument `dependencyHashes`. The idea is that this will contain the | |
`cyclicHash`es of actual dependencies, and the `componentHash`es of cyclic | |
dependencies (which are in the same component). Fortunately, we don't really | |
need to check which of these cases we are in -- we can just grab the hash from | |
the map. | |
> cyclicHash :: CyclicHash a => Map (Key a) Int -> Int -> a -> Int | |
> cyclicHash dependencyHashes salt0 = foldl' hashField salt0 . fields | |
> where | |
> hashField !s (PrimitiveField x) = | |
> s `hashWithSalt` (0 :: Int) `hashWithSalt` x | |
> hashField !s (DependencyField k) = | |
> let !x = fromMaybe 0 $ Map.lookup k dependencyHashes in | |
> s `hashWithSalt` (1 :: Int) `hashWithSalt` x | |
Tying it all together | |
--------------------- | |
Finally, it's time to put everything together in a single function that users | |
can call. In fact, this is the only top-level function exported from this | |
module! | |
The function takes a number of values, and returns the values paired with their | |
hashes. Of course, the interesting bit is how we construct `fullHashes`. | |
> cyclicHashes :: CyclicHash a => Int -> [a] -> [(a, Int)] | |
> cyclicHashes salt0 nodes = | |
> [ (n, h) | |
> | n <- nodes | |
> , h <- maybeToList $ Map.lookup (key n) fullHashes | |
> ] | |
We start of by calculating the strongly connected components using the | |
`Data.Graph` library. Then we calculate a hash for every component, using the | |
`componentHashes` we defined before. | |
> where | |
> comps = G.stronglyConnComp [(n, key n, dependencies n) | n <- nodes] | |
> compHashes = componentHashes salt0 comps | |
It is important to note that `G.stronglyConnComp` returns the components in | |
[topological order]. For our use case, this means that the components are in the | |
"right" order with regards to dependencies: if _y_ depends on _x_, _x_ will | |
always be placed before _y_ in the list. | |
[topological order]: https://en.wikipedia.org/wiki/Topological_sorting | |
This means that we can just fold through the `comps` components. The accumulator | |
we use is the map we calculated using `componentHashes`. Initially, this map | |
contains the `componentHash` for every node, but as we fold through the list, we | |
update it to contain the full hash for every node. This is exactly what | |
`cyclicHash` expects! | |
> fullHashes = foldl' | |
> (\acc scc -> | |
> let !newHashes = Map.fromList | |
> [ (key x, cyclicHash acc salt0 x) | |
> | x <- G.flattenSCC scc | |
> ] | |
> in Map.union newHashes acc) compHashes comps | |
Full example code | |
----------------- | |
Let's add a quick printing function to check the results: | |
> showCyclicHashes :: (CyclicHash a, Show (Key a)) => [a] -> String | |
> showCyclicHashes values = unlines | |
> [ printf "%s: %x" (show (key x)) hash | |
> | (x, hash) <- cyclicHashes 42 values | |
> ] | |
*Data.CyclicHash> putStr $ showCyclicHashes people | |
"rob": 9923cc533bb3432 | |
"mary": e8c6608ddd959521 | |
"susan": 6dd70fb3a3e127ed | |
"david": 86b49a12d999cc35 | |
And then, after we edit e.g. `mary`'s name, we get: | |
*Data.CyclicHash> putStr $ showCyclicHashes people | |
"rob": 201403e3ca292874 | |
"mary": 8370448e0b8598cf | |
"susan": 6dd70fb3a3e127ed | |
"david": 853624228d76fe7f | |
Seems like it is behaving as expected! [^test] | |
[^test]: Of course, it goes without saying that algorithms like this need more | |
rigorous testing. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment