Skip to content

Instantly share code, notes, and snippets.

@jlouis
Created September 21, 2014 23:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jlouis/2c5cf7ab6bc9653df0f4 to your computer and use it in GitHub Desktop.
Save jlouis/2c5cf7ab6bc9653df0f4 to your computer and use it in GitHub Desktop.
— [2014-09-22] [GH/jlouis/dht_bt] Thoughts on QuickChecking a DHT
I am currently trying to get the BitTorrent DHT, which is Kademlia based into shape. I have code for all of it, but it is tied
deeply into the way we did stuff in BitTorrent and I want a way to do this outside of eTorrent, because a world-global DHT
for Erlang nodes are generally applicable and useful. So this is where we stand. The BitTorrent DHT has two important properties
* Very high resilience against node failure
* Slow query rates
For a global DHT, these are exactly the properties you are after. We don't care that queries will take a bit of time, as long as they
are resistent to error in other ways.
The correctness of such a DHT relies on extensive testing. My way of testing is to build QuickCheck models. At least for code I write
in my spare time at my own leisure. So here goes.
— General considerations:
A complete model would essentially be a simulator of a world. That is, our System-under-test (SUT) lives inside a simulator where we
can inject faulty scenarios and have the system fail in reproducible ways. This attacks the “distributed” part of being a DHT. However,
the problem with such a model is the complexity buildup. We don't want a model with this kind of complexity early on in a software
project. In fact, there has to be simpler models capturing simpler problems with DHT software for much less of the work involved.
— Metrics:
In Kademlia, each node is mapped into a space S on which we have a distance metric d : (S * S) → double(). The properties of a
metric must be present:
reflexivity: d(s, s) = 0 forall s in S
symmetry: d(x, y) = d(y, x) forall x,y in S
triangle-inequality: d(x, y) + d(y, z) ≥ d(x, y) forall x,y,z in S
The XOR operation has this property, but it is good to verify this. It is actually peculiar that the triangle inequality holds. XOR
takes operations from a 160 bit integer space into an integer, which are embedded in the real numbers. So it is valid as a metric.
But interestingly it turns out the triangle inequality is true for this.
— A single-node DHT:
The next phase is to observe that the simplest possible DHT is a single-node DHT. For such a DHT, a collapse happens and we
are left with a hash table—no distribution in sight. For such a hash table, we should be able to carry out standard DHT operations
where we store data into the DHT, query data out again. And we should get the data we stored.
While not testing the distributed features, it does validate our DHT operates as it should when storing data locally.
— A single-value DHT
Another simplifying assumption is a DHT consisting of several nodes, but only stores a single value. Such a DHT is simpler
to comprehend and understand. For a node, a store is either such that the node itself is closest, or there are another node
which is closer in which case the store should be forwarded.
For query, the same holds true. It is either a local query or a forwardable query.
— Cache correctness
In Kademlia, you keep a "gossip" cache of other nodes you know about. Since you can't store every node, you look at the
bitwise prefix of a node compared to your own. For each prefix count of equal bits, you store up to 8 foreign nodes. This means
you know qualitatively more about close nodes than about far away nodes. It also limits the total node count to 160 * 8 = 1280.
This is excellent since this is low enough we can construct real-world simulations of such a small number.
It is important to write a cache test which tests important properties of the node cache. But the actual properties we are seeking
are hiding themselves for me currently.
— What properties do we even want?
What is a correct hash table? We want to check that if enough nodes collaborate then we will eventually find stored data if we
search for it. And we want to make sure that storing data is actually stored.
But the last property doesn't hold! If we store data at foreign nodes and all of those nodes dies just thereafter, then the data
is lost. Forever. We rely on this to be highly unlikely to happen. So the state of the universe is stochastic, and we rely on it being
highly unlikely that bad things happens. For a test though, we have to be able to express safety properties exactly. So maybe these
are not the properties we are looking for.
Convergence could be a nice property. If I ask for nodes closer to an ID, then the node will never respond with nodes farther from
the ID.
Hunch: there are a lot of simple properties which describe DHT systems. But it is rather hard to find a single property which uniquely
describes the system as a whole. So a DHT can be described as a set of simple properties the system must have.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment