Skip to content

Instantly share code, notes, and snippets.

@max-mapper
Last active September 29, 2020 18:45
Show Gist options
  • Star 11 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save max-mapper/9ebd17dc839f065d12f6 to your computer and use it in GitHub Desktop.
Save max-mapper/9ebd17dc839f065d12f6 to your computer and use it in GitHub Desktop.
merkle dag replication draft

Synchronization for Merkle graphs

abstract

intro

  • Problem we’re solving: Efficient p2p Merkle DAG replication
  • Proposed solution in a nutshell: Simplistic and symmetric protocol with reference implementation in Node.js

design goals

  • resumability
  • concurrency
  • minimizing latency/roundtrips content addressability
  • modularity, decoupling replication scheme from on disk/on wire formats (for backwards compat)
  • branching
  • simplicity/low level

related work

  • what are the similar solutions?
  • why are the current ways of doing this not working well?

other papers/chapters/books:

methodology

The tool itself can cryptographically prove that the data on one machine is the same as another machine.

Merkle Directed Acyclic Graphs (DAG)

The Merkle Tree [Mer79 Section 4] is a tree where each node is a cryptographic hash of a value that includes all of the hashes of the child nodes. This applies recursively to all non-leaf nodes in the tree. Here is a simple example where the hash function for each node is the concatenation of the child hashes.

      a         a = hash(b + d)
     / \        b = hash(c)
    b   d       d = hash(e)
   /     \      
  c       e

The hash function can be customized to include other data than just the child hashes. For example, if your hash function is the hash of some external piece of content (such as a JPEG photo) plus the hashes of all of the child nodes, you can use each node to prove that someone else has the same photo and all other previous photos referenced in the graph.

    a         a = hash(hash(ant.jpg) + b + c)
   / \        b = hash(bunny.jpg)
  b   c       c = hash(cat.jpg)

In the above figure the value of a will only be possible to derive if the data for ant.jpg, bunny.jpg and cat.jpg are available. If you are trying to establish if you have the same set of files as someone else over a network connection, both sides construct a Merkle DAG using the same hash function, and simply send their latest hash (a in the above figure), which is usually small enough to fit in a single network packet. If this value matches then it means that both graphs must contain the exact same data, as that would be the only case that would result in the hashes being equal.

It is also possible to apply this concept to Directed Acyclic Graphs. These are graphs that contain no cyclic dependencies meaning it is always possible to calculate an acyclic path to the root of the graph. They are also directed, meaning nodes have a parent/child relationship. A node may have multiple children and/or multiple parents, but a node may not be the parent of a node that is already in it's own ancestry, as this would create a cyclical relationship.

  a         a = hash(hash(ant.jpg) + b + c)
  |\        b = hash(bunny.jpg + d)
  b c       c = hash(cat.jpg + d)
  |/        d = hash(dog.jpg)
  d

graph format

 a     e
 |    / \
 b   d   f
  \ /    |
   c     g
    \   /  
      j
      |
      k
      |
      l

All nodes (a, b, ...) looks like this

{
  key: <hash of the node>
  links: [keys of nodes this node link to]
  value: <some binary value>
}

The hash is calculated like this

node.key = hash(length(value) + '\n' + value + length(link-1) + '\n' + link-1 + ... link-n)

Where hash is a sha256 (the length is there to prevent imposter(?) attacks)

I.e the following should give different hashes

{
  links: [hash-1, hash-2]
  value: hash-3
}

vs

{
  links: [hash-2],
  value: hash-3 + hash-1
}

local vs. global

Replication Protocol

The replication wire protocol now simply looks like this

message Handshake {
  optional uint32 version = 1;
  optional MODE mode = 2;
  enum MODE {
    SYNC = 1;
    PUSH = 2;
    PULL = 3;
  }
}

message Question {
  optional uint32 id = 1;
  repeated bytes hashes = 2;
}

message Answer {
  optional uint32 id = 1;
  repeated uint32 matches = 2;
}

message Node {
  repeated bytes links = 1;
  required bytes value = 2;
}

A peer can send a question message to another peer with some graph node hashes and the other peer should reply back with an answer containing which of the requested hashes it has in its local graph.

The question/answer protocol is stateful for asker, not answerer.

when receiving question (never changes)

  • receive array of hashes
  • look up which of these you have
  • for every hash you have, put the relative index (of hash array) into answer array
  • send array of indices back

when asking question

  • send array of hashes you have (must fit in memory)
  • send more in parallel if you want, each question has an id
  • when you receive a response, use the array of indices in the response to construct a list of the hashes the other side has. you should keep the arrays and ids of the questions you send to be able to compute this step
  • do_something_with_answer(hashes)

TODO describe what ‘cut of the graph’ means

Requirements

You cannot delete nodes from the graph, it must be append only. This is so hashes can always be resolved during the answer phase.

The values stored in the individual graph nodes must fit in memory.

Replication Schemes

The actual algorithm used to satisfy the above replication protocol is up to the implementer. The main constraint is that the receiver has to write the nodes to disk in the correct order. A node cannot be written into the graph until all links below are already written.

Trivial replication

The asker wants to push data to the answerer. The asker sends all the hashes in their local graph to the answerer. The answerer replies back with the hashes it has. The asker then sends the nodes for the hashes the answerer didn’t have. The answerer receives these nodes, and will end up computing the same set of hashes as the asker.

Using binary search

Locally index all paths in the graphs from the roots to the heads Using the above example it might look like this

1,6    2,3   
 |    /  \  
1,5 3,1  2,2
  \ /     |
  1,4    2,1
    \   /  
      1,3
      |
      1,2
      |
      1,1 (path-id, seq)

That gives us 3 different local paths, 1, 2 and 3. These paths are only stored locally and two peers do not need to agree on which paths they choose.

You only need as many paths as your graph is “wide”. In the above graph you should see that the graph is 3 nodes wide just before the head so we’ll need 3 paths.

Using the above local path index we can do better. Since every local path is ordered we just need to find the “highest” node the other peer has on the same path and send the remaining ones. This can be accomplished by doing binary search.

Using our graph example we have 3 paths, (1,6), (2,3) and (3,1) so in our first message to the other peer we’ll ask if it has the hash stored at (1,3), (2,2) and (3,1) in our local path. If the peer answers yes to a hash we’ll take the above half of that path. If not we’ll take the bottom half. This gives us o(log(n)) round trips (instead of o(n) in the trivial solution).

Since every tcp packet can contain roughly 1400 bytes without fragmenting it can be beneficial to divide the paths into more than two sections. 1400 is around 40 sha256 hashes with a bit of protocol overhead so if you only have 4 paths it might make sense to divide the paths into 10 sections instead of 2 giving o(log10(n)) round trips instead of o(log2(n))

TODO

  • backwards compatibility with sha - yes
  • atomic batches
  • get count for a path
  • implementation with http is possible because it is stateless

evaluation

trivial vs. binary search

security

Bibliography

[Mer79] Ralph Merkle. Secrecy, authentication and public key systems/ A certified digital signature. Ph.D. dissertation, Dept. of Electrical Engineering, Stanford University, 1979.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment