cobordism/more-on-erasure-code-in-swarm.rst

## more-on-erasure-code-in-swarm.rst

      
    Raw
  

              more-on-erasure-code-in-swarm.rst
            
          
Proposal for inclusion of Erasure Codes into the Swarm client.

This is a follow-up to https://gist.github.com/homotopycolimit/64997b8cd2e0916b4430
This is a proposal to include an m-of-n Cauchy-Reed-Solomon erasure code into the swarm trie.

1 Why?

Why do we need erasure coding? Can't we just rely on storing multiple copies of every chunk (just like Maidsafe)?
Maidsafe cites the “industry standard” of storing three copies of all data as their policy. This is however only partially true. For starters, the cloud storage industry is trying very hard to get away from 3x redundancy because it is very wasteful and incurs high costs – erasure coding can guarantee the same level of security using only a fraction of the storage space. Secondly, in a data center a 3x redundancy applied to hard drives whose failure rates are low, independent and predictable; in a p2p cloud scenario 3x redundancy refers to nodes which are fickle and could disappear much more frequently than hard drives fail.
Swarm does in fact plan for several layers of redundancy and (at least at first) we expect much higher than 3x retention of chunks. However as the swarm grows and storage space begins to fill up the redundancy will drop automatically. Erasure coding is then the best way to ensure file availability. It will also allow for a self-healing swarm – see “Added Benefits” below.

2 How?

There are open source libraries to do Reed Solomon or Cauchy-Reed Solomon encoding.
More info:
https://www.usenix.org/legacy/event/fast09/tech/full_papers/plank/plank_html/
and also
https://www.backblaze.com/blog/reed-solomon/
http://rscode.sourceforge.net/
We just have to make sure that we use it at all levels of the Swarm trie.
We want to encode parity data into the Swarm file trie such that at every node we have the same m-of-128 level of redundancy. Here is how we might proceed:


Read the file 4096 byte chunks at a time.

IF fewer than 4096 bytes are left in the file, fill up the last fraction to 4096
use Cauchy-Reed-Solomon to get a total of 128 chunks

Else:
*after m steps, use Cauchy Reed Solomon to encode 128-m parity chunks.


Repeat until EOF


Record a list of the hashes of the sets of 128 chunks constructed in previous step. Read the list of hashes 4096 bytes at a time.

IF fewer than 4096 bytes are left,  fill up the last fraction to 4096
use Cauchy-Reed-Solomon to get a total of 128 chunks

Else:
*after m steps, use Cauchy Reed Solomon to encode 128-m parity chunks.


Repeat until list is exhausted.


Go to 3

The swarm trie also includes a file-size integer I believe. I do not think this is necessary; it should only be necessary to supply a filesize int along with the root hash. This then allows everyone to calculate what the Cauchy-Reed-Solomon redundancy coding is at every node and also which nodes are original file data and which are parity data.

3 Added Benefits


“All chunks are created equal”

A trie encoded as suggested above has the same (*) redundancy at every node. This means that chunks nearer to the root are no longer more important than chunks near the file. Every node as an m-of-128 redundancy level and no chunk after the root chunk is more important than any other chunk.
(*) If the filesize is not a specific multiple of 4096 bytes, then the last chunk at every level will actually have a higher redundancy even than the rest.

“The Swarm can heal itself”

Any(!) client downloading a file from the swarm can detect if a chunk has been lost. The client can reconstruct the file from the parity data (or reconstruct the parity data from the file) and re-sync this data into the swarm. That way, even if a large fraction of the swarm is wiped out simultaneously, this process should allow an organic healing process to occur and it is encouraged that the default client behavior should be to repair any damage detected.

“Speeding up lookups while keeping alpha = 1”

Alpha is the name Kademlia gives to the number of peers in a Kademlia bin that are queried simultaneously during a lookup. The original Kademlia paper sets alpha=3. This is impractical for Swarm because the peers do not report back with new addresses as they would do in pure Kademlia but instead forward all queries to their peers. Swarm is coded in this way to make use of semi-stable longer-term ethp2p connections. Setting alpha to anything greater than 1 thus increases the amount of network traffic substantially – setting up an exponential cascade of forwarded lookups.
[ Lookups would cause an exponentially growing cascade at first but it would soon collapse back down onto the target of the lookup. ]
However, setting alpha=1 has its own downsides. For starters, lookups can stall if they are forwarded to a dead node and even if all nodes are live, there could be large delays before a query is complete. The practice of setting alpha=2 in swarm is designed to speed up file retrieval and clients are configured to accept chunks from the first/fastest forwarding connection to be established.
In an erasure coded setting we can in a sense have a best of both worlds. The default behavior should be the set alpha=1 i.e. to query one peer only for each chunk lookup, but crucially, to issue a lookup request not just for the data chunks but for the parity chunks as well. The client then could accept the first m of every 128 chunks queried to get some of the same benefits of faster retrieval that redundant lookups provide without a whole exponential cascade.