Skip to content

Instantly share code, notes, and snippets.

@paulkernfeld
Created March 3, 2016 14:24
Show Gist options
  • Save paulkernfeld/e409974464ed2f286812 to your computer and use it in GitHub Desktop.
Save paulkernfeld/e409974464ed2f286812 to your computer and use it in GitHub Desktop.
Modular dynamic Hypercore feeds

Modular Dynamic Hypercore Feeds

Purpose: Create distributed single-writer append-only logs with hypercore while keeping the hypercore protocol simple.

Design

The core of this design is that hypercore itself would remain mostly the same, and that most of the logic to support dynamic feeds would stay out of the hypercore protocol. This relies on a separate module that can securely replicate a small mutable value (a "signed value feed").

Every time data is appended to a hypercore feed, the ID of the feed will change. This ID is the hash of all the feed's Merkle tree roots. The feed's owner will then broadcast the new ID over a signed value feed. When a reader receives a new feed ID distributed over the signed value feed, it can use this ID to discover a swarm for the feed, connect to that swarm, and download the content in the normal way.

This is implemented in protozor, using BEP 44 as the mechanism for distributing signed values.

Replication

Example

Let's say that peerA is the writer, and peerB is the reader.

peerA's digest

      3
  1       5   
0   2   4   6   8
peerB's digest: (empty)

Replication works as follows:

  1. peerA joins the swarm for hash(3, 8)
  2. peerA broadcasts out hash(3, 8) to the signed value feed
  3. peerB receives hash(3, 8) from the signed value feed
  4. peerB joins the swarm for hash(3, 8)
  5. peerB starts to replicate blocks from peerA

Now, let's say peerB finishes replicating, and peerA appends another block to the feed.

peerA's digest

      3
  1       5       9
0   2   4   6   8   10
peerB's digest

      3
  1       5   
0   2   4   6   8

The hash of the feed is now hash(3, 9). Replication works as follows:

  1. peerA leaves the swarm for hash(3, 8)
  2. peerA joins the swarm for hash(3, 9)
  3. peerA broadcasts out hash(3, 9) to the signed value feed
  4. peerB receives hash(3, 9) from the signed value feed
  5. peerB joins the swarm for hash(3, 9)
  6. peerB starts to replicate blocks from peerA. Since it has everything except 10, it would only need block 10.

Problems

The main problem with this approach is the need to join and leave swarms each time the feed is updated. This should be fine for feeds that are updated every few minutes, but it would be terrible for feeds that were updated every few seconds.

Specifically:

  • Joining a swarm requires a few seconds, introducing latency.
  • Repeatedly making new swarms could cause a lot of load on signalling servers (this could be fixed by doing some signaling over a swarm).
  • It would be possible for the swarm to fragment into several swarms, depending on how far each node is behind in the signed value feed.

Variant: Distribute Merkle roots

Instead of only distributing the hash, the signed value feed could instead distribute a signed message that contained all Merkle roots. If the tree were as follows, then the message would include hash(3) and hash(8). This might make replication a little simpler.

      3
  1       5   
0   2   4   6   8

Signed value feeds

A draft implementation by me: pubkey-value.

A "signed value feed" is a single-writer distributed system that signs and replicates a mutable value. It is not an append-only log because only the latest value is stored; earlier values are thrown away. BEP 44 is an example of a system that meets these requirements.

Each signed value feed has a single public/private cryptographic key pair. The private key is held only by the owner of the feed. Anyone who wants to subscribe to the feed must have the public key. Each update to the feed must include the value itself, a sequence number and a signature. The sequence number is increased on every write, and it is used to identify which version of the data to keep when a new value is received. The signature can be verified with the public key.

In addition to the security model, there must be some way to distribute updates. BEP 44 distributes updates by writing them to BitTorrent's DHT. This is a slow way to distribute updates, because it requires a DHT traversal for every put and get. A more efficient distribution model could create a swarm for each feed, and broadcast updates throughout the swarm when a new value is written. It makes sense to use the public key as the ID of the swarm.

In addition to the security provided by verifying the public key, it would be possible to plug in an additional security model to a pubkey-value feed, by requiring that each received value pass through a user-defined "validate" function before being accepted. This could be used with hypercore by

Although a system with a single public/private key pair is simple and useful, it would be possible to substitute in other ways of signing values, such as M-of-N multisignatures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment