Skip to content

Instantly share code, notes, and snippets.

@cobordism
Last active February 16, 2016 09:24
Show Gist options
  • Save cobordism/be6a8b6ac0bf40296fe9 to your computer and use it in GitHub Desktop.
Save cobordism/be6a8b6ac0bf40296fe9 to your computer and use it in GitHub Desktop.

Proposal for inclusion of Erasure Codes into the Swarm client.

This is a follow-up to https://gist.github.com/homotopycolimit/64997b8cd2e0916b4430

This is a proposal to include an m-of-n Cauchy-Reed-Solomon erasure code into the swarm trie.

1 Why?

Why do we need erasure coding? Can't we just rely on storing multiple copies of every chunk (just like Maidsafe)? Maidsafe cites the “industry standard” of storing three copies of all data as their policy. This is however only partially true. For starters, the cloud storage industry is trying very hard to get away from 3x redundancy because it is very wasteful and incurs high costs – erasure coding can guarantee the same level of security using only a fraction of the storage space. Secondly, in a data center a 3x redundancy applied to hard drives whose failure rates are low, independent and predictable; in a p2p cloud scenario 3x redundancy refers to nodes which are fickle and could disappear much more frequently than hard drives fail. Swarm does in fact plan for several layers of redundancy and (at least at first) we expect much higher than 3x retention of chunks. However as the swarm grows and storage space begins to fill up the redundancy will drop automatically. Erasure coding is then the best way to ensure file availability. It will also allow for a self-healing swarm – see “Added Benefits” below.

2 How?

There are open source libraries to do Reed Solomon or Cauchy-Reed Solomon encoding. More info: https://www.usenix.org/legacy/event/fast09/tech/full_papers/plank/plank_html/ and also https://www.backblaze.com/blog/reed-solomon/ http://rscode.sourceforge.net/

We just have to make sure that we use it at all levels of the Swarm trie.

We want to encode parity data into the Swarm file trie such that at every node we have the same m-of-128 level of redundancy. Here is how we might proceed:

  1. Read the file 4096 byte chunks at a time.
    • IF fewer than 4096 bytes are left in the file, fill up the last fraction to 4096
    • use Cauchy-Reed-Solomon to get a total of 128 chunks

    Else: *after m steps, use Cauchy Reed Solomon to encode 128-m parity chunks.

  2. Repeat until EOF

  3. Record a list of the hashes of the sets of 128 chunks constructed in previous step. Read the list of hashes 4096 bytes at a time.
    • IF fewer than 4096 bytes are left, fill up the last fraction to 4096
    • use Cauchy-Reed-Solomon to get a total of 128 chunks

    Else: *after m steps, use Cauchy Reed Solomon to encode 128-m parity chunks.

  1. Repeat until list is exhausted.
  1. Go to 3

The swarm trie also includes a file-size integer I believe. I do not think this is necessary; it should only be necessary to supply a filesize int along with the root hash. This then allows everyone to calculate what the Cauchy-Reed-Solomon redundancy coding is at every node and also which nodes are original file data and which are parity data.

3 Added Benefits

“All chunks are created equal”

A trie encoded as suggested above has the same (*) redundancy at every node. This means that chunks nearer to the root are no longer more important than chunks near the file. Every node as an m-of-128 redundancy level and no chunk after the root chunk is more important than any other chunk.

(*) If the filesize is not a specific multiple of 4096 bytes, then the last chunk at every level will actually have a higher redundancy even than the rest.

“The Swarm can heal itself”

Any(!) client downloading a file from the swarm can detect if a chunk has been lost. The client can reconstruct the file from the parity data (or reconstruct the parity data from the file) and re-sync this data into the swarm. That way, even if a large fraction of the swarm is wiped out simultaneously, this process should allow an organic healing process to occur and it is encouraged that the default client behavior should be to repair any damage detected.

“Speeding up lookups while keeping alpha = 1”

Alpha is the name Kademlia gives to the number of peers in a Kademlia bin that are queried simultaneously during a lookup. The original Kademlia paper sets alpha=3. This is impractical for Swarm because the peers do not report back with new addresses as they would do in pure Kademlia but instead forward all queries to their peers. Swarm is coded in this way to make use of semi-stable longer-term ethp2p connections. Setting alpha to anything greater than 1 thus increases the amount of network traffic substantially – setting up an exponential cascade of forwarded lookups. [ Lookups would cause an exponentially growing cascade at first but it would soon collapse back down onto the target of the lookup. ] However, setting alpha=1 has its own downsides. For starters, lookups can stall if they are forwarded to a dead node and even if all nodes are live, there could be large delays before a query is complete. The practice of setting alpha=2 in swarm is designed to speed up file retrieval and clients are configured to accept chunks from the first/fastest forwarding connection to be established. In an erasure coded setting we can in a sense have a best of both worlds. The default behavior should be the set alpha=1 i.e. to query one peer only for each chunk lookup, but crucially, to issue a lookup request not just for the data chunks but for the parity chunks as well. The client then could accept the first m of every 128 chunks queried to get some of the same benefits of faster retrieval that redundant lookups provide without a whole exponential cascade.

@cobordism
Copy link
Author

Does swarm use the same alpha for search queries as for forwarded search queries?

@zelig
Copy link

zelig commented Feb 16, 2016

they are the same.

Remember that I do not want to restrict parallelism at all, maybe in back-forwarding only mode, but even then I am not sure.

Normal cases (realtime web-browsing) we use full parallel r-kademlia with the beeline protocol ;)

There are 3 cases where this is not allowed:

  • originator chooses not to reveal his IP: private browsing
  • originator cannot be called (no udp piercing the NAT possible)
  • originator has low budget and/or lot of time (backup, syncing, download vs stream a movie

that said, in a lean kademlia full parallelism may not be much better than 2/3 anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment