Created
September 21, 2014 23:43
-
-
Save jlouis/2c5cf7ab6bc9653df0f4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
— [2014-09-22] [GH/jlouis/dht_bt] Thoughts on QuickChecking a DHT | |
I am currently trying to get the BitTorrent DHT, which is Kademlia based into shape. I have code for all of it, but it is tied | |
deeply into the way we did stuff in BitTorrent and I want a way to do this outside of eTorrent, because a world-global DHT | |
for Erlang nodes are generally applicable and useful. So this is where we stand. The BitTorrent DHT has two important properties | |
* Very high resilience against node failure | |
* Slow query rates | |
For a global DHT, these are exactly the properties you are after. We don't care that queries will take a bit of time, as long as they | |
are resistent to error in other ways. | |
The correctness of such a DHT relies on extensive testing. My way of testing is to build QuickCheck models. At least for code I write | |
in my spare time at my own leisure. So here goes. | |
— General considerations: | |
A complete model would essentially be a simulator of a world. That is, our System-under-test (SUT) lives inside a simulator where we | |
can inject faulty scenarios and have the system fail in reproducible ways. This attacks the “distributed” part of being a DHT. However, | |
the problem with such a model is the complexity buildup. We don't want a model with this kind of complexity early on in a software | |
project. In fact, there has to be simpler models capturing simpler problems with DHT software for much less of the work involved. | |
— Metrics: | |
In Kademlia, each node is mapped into a space S on which we have a distance metric d : (S * S) → double(). The properties of a | |
metric must be present: | |
reflexivity: d(s, s) = 0 forall s in S | |
symmetry: d(x, y) = d(y, x) forall x,y in S | |
triangle-inequality: d(x, y) + d(y, z) ≥ d(x, y) forall x,y,z in S | |
The XOR operation has this property, but it is good to verify this. It is actually peculiar that the triangle inequality holds. XOR | |
takes operations from a 160 bit integer space into an integer, which are embedded in the real numbers. So it is valid as a metric. | |
But interestingly it turns out the triangle inequality is true for this. | |
— A single-node DHT: | |
The next phase is to observe that the simplest possible DHT is a single-node DHT. For such a DHT, a collapse happens and we | |
are left with a hash table—no distribution in sight. For such a hash table, we should be able to carry out standard DHT operations | |
where we store data into the DHT, query data out again. And we should get the data we stored. | |
While not testing the distributed features, it does validate our DHT operates as it should when storing data locally. | |
— A single-value DHT | |
Another simplifying assumption is a DHT consisting of several nodes, but only stores a single value. Such a DHT is simpler | |
to comprehend and understand. For a node, a store is either such that the node itself is closest, or there are another node | |
which is closer in which case the store should be forwarded. | |
For query, the same holds true. It is either a local query or a forwardable query. | |
— Cache correctness | |
In Kademlia, you keep a "gossip" cache of other nodes you know about. Since you can't store every node, you look at the | |
bitwise prefix of a node compared to your own. For each prefix count of equal bits, you store up to 8 foreign nodes. This means | |
you know qualitatively more about close nodes than about far away nodes. It also limits the total node count to 160 * 8 = 1280. | |
This is excellent since this is low enough we can construct real-world simulations of such a small number. | |
It is important to write a cache test which tests important properties of the node cache. But the actual properties we are seeking | |
are hiding themselves for me currently. | |
— What properties do we even want? | |
What is a correct hash table? We want to check that if enough nodes collaborate then we will eventually find stored data if we | |
search for it. And we want to make sure that storing data is actually stored. | |
But the last property doesn't hold! If we store data at foreign nodes and all of those nodes dies just thereafter, then the data | |
is lost. Forever. We rely on this to be highly unlikely to happen. So the state of the universe is stochastic, and we rely on it being | |
highly unlikely that bad things happens. For a test though, we have to be able to express safety properties exactly. So maybe these | |
are not the properties we are looking for. | |
Convergence could be a nice property. If I ask for nodes closer to an ID, then the node will never respond with nodes farther from | |
the ID. | |
Hunch: there are a lot of simple properties which describe DHT systems. But it is rather hard to find a single property which uniquely | |
describes the system as a whole. So a DHT can be described as a set of simple properties the system must have. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment