Skip to content

Instantly share code, notes, and snippets.

@cobordism
Last active February 15, 2016 14:14
Show Gist options
  • Save cobordism/64997b8cd2e0916b4430 to your computer and use it in GitHub Desktop.
Save cobordism/64997b8cd2e0916b4430 to your computer and use it in GitHub Desktop.
Proposal to include erasure codes directly into the swarm client.

# Proposal for integrating Erasure Coding directly into the swarm client.

Erasure coding can increase file availability security by many orders of magnitude more than simply storing multiple copies can.

## What Problems are we trying to address?

### 1. Not all chunks (in the Merkle Trie) are created equal.

When we encode a file in Swarm, the chunks that represent nodes near the root of the tree are in some sense more important than the nodes nearer to the file layer. More specifically, if the root chunk is lost, then the entire file is lost; if one of the following chunks on the next layer is lost, then 1/128 of the file is lost and so on until a chunk is just a chunk. In many cases this distinction may seem unimportant; for example, if I upload an encrypted file to swarm, then every chunk is as important to me as any other because with even one chunk missing, I cannot decrypt. However, even in this case it seems unsettling that a single lost chunk (the root chunk) can result in my inability to access my file even if the file is still correctly stored on swarm in its entirety. Ultimately we want every chunk to be equally important to any other chunk.

### 2. Insurance Pricing

When storing files in the swarm and protected them by the SWEAR and SWINDLE contracts, we wish to communicate which level of security we want and possibly pay for extra redundancy. The mechanisms to ensure this have proved complex to implement and it is much simpler to add the extra security into the file itself by encoding it suitably before handing it off to the swarm.

## The Interaction of the Erasure Code and the Merkle Trie

Disclaimer: The encoding suggested here may be very costly to implement in terms of CPU overhead. It is suggested here as a sketch of how such a system might look and is not meant as a final suggestion.

Recall that at each node in the Trie, we have 128 children each of which represent the root hash of a subtree or, at the last level, they represent 128 consecutive hash-length pieces of the file. Let us now suppose that we divide our file into 100 equally sized pieces, and then add 28 more parity check pieces using a Reed-Solomon code so that now any 100 of the 128 pieces are sufficient to reconstruct the file. Assume also that, rather than appending the 28 pieces at the end, we intersperse them at random positions between the 100 data pieces. The result of this encoding would be that below the root hash (in the root chunk) we have 128 subtree root hashes each of which corresponds to one of the pieces of the file and and 100 of these is sufficient to reconstruct the file. Thus purely in terms of availability, every subtree is equally important to every other subtree at this level; though in terms of practicability we still prefer the data pieces over the parity pieces in order to avoid CPU overhead in reconstruction.

Our goal is to do this at every level.

Thus the procedure might look somewhat like this: take the first 100 hash-length pieces of the file and intersperse with 28 parity pieces, repeat this for the next 100 until you reach the end of the file. Next, we apply the same procedure to the first 100 batches (chunks) of 128-hash-length pieces and intersperse with 28 and so on. The upshot is that at the end, every single chunk in the trie whether at the file level or higher up will consist of 128 pieces/subtrees of which we need 100. Thus no lost chunk will be more detrimental than any other lost chunk.

## The effect of the encoding on Insurance Pricing

Using erasure codes in this way allows us to do away with our previous redundancy pricing model. We can posit that any chunk must be stored at the three(?) five(?) ten(?) closest nodes [We need multiple copies not only for redundancy purposes but also to make sure that DHT lookups are successful]. Beyond that, if a user requires a higher level of insurance that their file will remain available, the user may set the parameters of the erasure code (that was the 100 in the example above). In this way, a lower number means more security but also means a larger file size and more chunks and thus a higher insurance cost.

## The effect of the encoding on retrieval pricing and storage incentives

A problem that immediately presents itself is the following: if nodes are compensated only for serving chunks, then less popular chunks are less profitable and more likely to be deleted; therefore, if users only download the 100 data chunks and never request the parity chunks, then these are more likely to get deleted and ultimately not be available when they are finally needed. We partly alleviate this issue by placing the parity chunks at random positions thereby increasing the chances that users will end up downloading them too (let's face it, we trick the users into paying for the parity data) and as a side effect keeping the swarm properly incentivised.

Another approach would be to use non-systemic coding. Recall that a systemic code is one in which the data remains intact and we add extra parity data whereas in a non-systemic code we replace all data with parity data such that (in our example) all 128 pieces are really created equal. While the symmetry of this approach is appealing, this leads to forced decoding and thus to a high CPU usage and it also prevents us from streaming files from the swarm. Since we anticipate that streaming will be a common usage, we cannot choose any non-systemic code and would have to find/chose codes that are stream-able.

Finally there is the option of placing all 100 data pieces at the beginning (of the 128 in question) and all parity pieces at the end thereby explicitly allowing users to only request the data part; but buying insurance receipts from the SWEAR contract to cover specifically the parity data in order to have it available when it is needed.

## Further reduction in Swarm storage requirements

We could imagine that we further decrease the storage requirement of swarm (while maintaining security) by extending erasure coding to the last instance where we still use multiple copies: the most proximate bin. Under current operation a chunk is stored with multiple nodes that are close to the chunk-hash. These nodes are the 'most-proximate-bin' of the chunk. We could imagine that instead of requiring the 10 most proximate nodes store a chunk that instead they encode the chunk using a 5-of-10 erasure code for example. This would decrease their storage requirment while still allowing the chunk to be found if no more than half of the nodes disappear at the same time.

## Caveats

Erasure codes in general and Reed Solomon codes in particular are very sensitive to the parameters. It may be cheap to use a 4-of-6 code for small files but it may be very expensive to use a 100-of-128 code for large files... or any other variant. The scheme as described above uses 100-of-128 for small hash-sized pieces and for large 1/100-of-the-whole-file pieces. It is possible that at least some of these are prohibitively expensive to do in real life.

There are codes that are specifically tailored to various storage systems. For example, in a data center with 100 racks with 25 servers each and 10 hard drives per server, there are outages in which specific 10 drives become unavailable (server crash) that are more likely than outages in which certain other 13 drives crash (distributed across servers) and there are outages in which specific 250 drives become unavailable (rack failure) that are more likely than a random 500 drives failing.... These kind of considerations have led to new codes being designed that are specifically tailored to these outages. (locally repairable codes? pyramid codes?) We could also model our trie as a data canter with 128 racks of 128 servers of 128 drives of 128 sectors of ... Perhaps there is a much better code for us to use but it has yet to be designed.

A final point is that with this encoding, the swarm is not self healing. To recreate missing chunks, a user has to download the file, recreate it and upload the missing chunks back to swarm. We can imagine a more ideal encoding scheme in which missing chunks can be reconstructed locally and the swarm may self-heal from major outage events. Though to make this line up with the incentivisation may further complicate matters.

@cobordism
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment