Instantly share code, notes, and snippets.

@karalabe /chainwipe.md Secret
Last active Dec 10, 2018

Embed
What would you like to do?
Pruning historical chain segments

DISCLAIMER: All ideas and numbers in this document are preliminary! The goal is to have discussion starters, not to present polished final ideas! The document itself is a rough brain-dump too!

Background

Ethereum 1.0 has a storage scaling issue, specifically, Ethereum's current incarnation has unbounded disk growth. Yes, the rate of growth itself is implicitly limited by the block gas limit, but there is no limit on the total amount of data accumulated over time. This is a problem, because Ethereum 1.0 is not sustainable long term.

Historically the solution was to kick the can:

  • Ethereum 2.0 will introduce sharding, solving the data scaling issue.
  • Moore's Law ensures that disk capacity grows faster than required storage.

There are significant flaws in both arguments:

  • If Ethereum 2.0 introduces sharding, that will split the chain into N shards, reducing the storage requirement to 1/Nth of the current amount.
    • The current long term storage requirement is infinite. You can split that into arbitrarily many static pieces, the end result will still be infinite, but with a slower growth rate.
    • The slower growth rate is questionable, as any capacity increase is readily gobbled up by suboptimal applications. Realistically, splitting the chain into N pieces would only result in each sub-chain growing at the same rate as current mainnet.
  • If Moore's Law holds for storage, you will always be able to buy a system powerful enough to hold your (shard) chain.
    • Anyone joining the network not only needs to store the chain, but also needs to obtain it first, which at its most naive form means downloading raw data. There's latency and bandwidth limitations as well as associated costs, which outweigh the cost of raw storage. E.g. A 1TB HDD costs 47 USD, syncing 1TB on a perfectly saturated 0 latency 100mbit line would take ~23 hours and cost 85 USD at AWS pricing. These costs hit both the joiner and the network too!
    • A significant part of the total storage size on Ethereum is active data. That means it's not some historical blob you can stash away and not care about any more, rather may be accessed by arbitrary transactions. This means you need sub millisecond access times (i.e. memory caches and SSD indexes). Higher storage consumption means proportionally higher memory requirements and disk IO, which means additional hardware costs + SSD amortization (Ethereum will kill your SSD).

The above points are meant to highlight that Ethereum 1.0 has problems that Ethereum 2.0 doesn't aim to solve (yet), so it will be just as vulnerable to them as we are now. The argument that Ethereum 2.0's sharding + Moore's Law will net us enough time to fix these is based on the assumption that a production ready Ethereum 2.0 arrives before Ethereum 1.0 grinds to a halt, which seems a reckless position to take.

Before diving into possible solutions, lets try to put some numbers on the problem.

Chain growth

A full Ethereum node currently stores various types of information (we don't go into archive nodes, that's a separate issue). Whilst the exact list depends on client implementations, most will probably feature at least the following:

  • Chain of headers that cryptographically defines a blockchain's content.
  • Chain of block bodies that store past uncles and past transactions.
  • Chain of receipts that store past transaction results and contract logs.
  • Index of transaction-hash to block mappings and a few others.
  • Account and storage Merkle-Patricia tries.

Lets see how these numbers play out on mainnet from genesis till block #6775081 (please ignore the X axis, I just used the block number as a timestamp to make it simple for Grafana). The numbers shown are Geth's current storage usage, they may be different on other clients. We're mostly interested about magnitudes here, not exact numbers.

screenshot from 2018-11-29 14-32-26

The headers are mostly boring, each hovering around the 530 byte mark (minimally compressible since transaction traffic picked up), independent of block gas limits.

The block bodies (i.e. uncles + transactions) fluctuate a bit, but we can put 16KB compressed as a safe number for each recent block. Similarly receipts fluctuate around 14KB compressed per block. Blocks are receipts however are dependent on both block gas limit and fork rules: naively guessing, increasing the gas limit will increase the body/receipt sized proportionally; similarly forks might bump these numbers up (e.g. Constantinople charges less for SSTORE, supporting more transactions per block, thus larger storage is anticipated).

Putting these numbers into perspective, currently the Ethereum mainnet produces about 6100 blocks per day, meaning the current growth can be very (!!) roughly interpolated to:

Current +1 Day +1 Month +1 Year +5 Years
Headers 2.547GB +3.23MB +97MB +1.164GB +5.82GB
Bodies 42GB +99.94MB +3GB +36GB +197.9GB
Receipts 40GB +87.44MB +2.62GB +31.5GB +157.4GB
Total 84.547GB +190.61MB +5.71GB +68.66GB +361.12GB

These number however only represent the raw chain itself. Beside the account/storage trie (which this document will not go into), full nodes also need to maintain a number of fast-access indexes to allog filtering for events and to allow looking up blocks, transactions and receipts via a hash.

With Geth's current storage data model, these indices tally up to 28.6GB (sadly I don't have historical charts for these). The bulk of this data is the transaction lookups (on mainnet there are currently 288,731,936 transactions), as such, we can conclude that this number will grow proportionally with the transactions (i.e with bodies and receipts).

Adding this best guess to the previous table, we get to the database growth caused solely by the raw chain, not accounting for smart contract state itself.

Current +1 Day +1 Month +1 Year +5 Years
Headers 2.547GB +3.23MB +97MB +1.164GB +5.82GB
Bodies 42GB +99.94MB +3GB +36GB +197.9GB
Receipts 40GB +87.44MB +2.62GB +31.5GB +157.4GB
Indexes (guess) 28.6GB +62.52MB +1.875GB +22.5GB +112.5GB
Total 113.15GB +253.13MB +7.59GB +91.2GB +473.62GB

If we freeze the capacity of Ethereum 1.0 in place until Ethereum 2.0 is finalized, we're in the 0.6TB ballpark figure for storing the chain and its indexes, not accounting for the account and contract tries. Quite a lot, but manageable. If, however, we want to do a 2x, 5x or ideally 10x capacity bump, that will result in a 1.06TB, 2.48TB or 4.85TB storage load respectively. Realistically, even the 2x is an unreasonable expectation.

If we want decentralization, we need to get these numbers down by at least one order of magnitude, preferably two.

Theoretical solution

In the blockchain world, the philosophical debate of archive nodes vs. full nodes pops its head up every now and again. The mantra usually goes that only nodes that store every past state permutation should be considered a full nodes. In Ethereum we took the practical approach of pruning away past historical state, since there aren't really meaningful use cases for average people to care about past balances and contract states.

This however begs the question: if average users (who we'd like to run full nodes) don't care about what their balance was 4 years ago, will they care about the list of transactions executed 4 years ago? Will they care about an event that a smart contract raised 4 years ago? Not really. But if not, why burden everyone in the network with data nobody cares about? [There is one meaningful answer, security, but we'll get back to that a bit later].

Let's delete historical blocks, historical logs and historical indexes!

Perhaps it's not immediately apparent why this proposal is such a powerful suggestion: because it puts a hard, guaranteed cap on the amount of disk space that Ethereum 1.0 would consume. The definition of "historical" is not relevant. Full nodes could host 1 month of recent data (on the insane end of the spectrum) or 3 years of recent data (on the other insane end of the spectrum). No matter of the choice, we can calculate a final, hard number of disk requirements based on it.

If we take our extremely rough storage growth guesses from above, and calculate how much data we'd need at different gas limits and retention intervals, we'll get some interesting insights:

8M gas 2x 5x 10x 25x
1 month 7.59GB 15.18GB 37.95GB 75.9GB 189.75GB
3 months 22.77GB 45.54GB 113.85GB 227.7GB 569.25GB
6 months 45.54GB 91.08GB 227.7GB 455.4GB 1.14TB
1 year 91.08GB 182.16GB 455.4GB 910.8GB 2.28TB

If we'd say that we are currently comfortable with Ethereum's 113.15GB storage requirements for the raw chain + indexes, the same number could cater for:

  • 14.9x transaction throughput with 1 month data retention
  • 4.96x transaction throughput with 3 months data retention
  • 2.48x transaction throughput with 6 months data retention
  • 1.24x transaction throughput with 1 year data retention

Which one is the best choice? That's not up to me or this document to make really, but it's nice to know that a 1 month data retention could net us a 10x throughput increase while reducing storage at the same time! [Note again, we're not addressing account/storage trie growth in this document].

Requirements

Before diving into proposals on how we might achieve our goal of pruning historical chain segments, it's important to highlight the invariants that we want to retain or guarantee:

  • Data retention policies must be agreed upon across all clients.
    • Theoretically - even now - every client is free to keep or discard any data. Practically, if majority clients discard data that minority clients need, the minority client will be barred from joining the network. More generally, if there is data asymmetry among the clients, network health will suffer.
  • Historical data must not ever be totally forgotten by the network.
    • If a node wants to do a full sync (i.e. reprocessing every block from genesis), there must be public archives containing the original blocks. Retrieval latency and bandwidth are not required to be optimized for (the bottleneck is local hardware capacity, not network). We can even incentivize these archives, since they should not be needed for a fast/warp/leaf/light sync.
  • Cryptographic proofs of the historical blocks (i.e. headers) must remain in the network.
    • If the network prunes away all traces of the chain history, reconstructing a full sync becomes problematic as syncing nodes would need to reprocess millions of blocks at face value before being able to prove they are from the correct chain. By retaining proofs of block ancestry, historical chain segments can be retrieved from arbitrary untrusted sources.
  • Historical archives must be accessible in a generic, decentralized way.
    • There are easy ways to make data available for some specific network (e.g. mainnet), but if we want to retain Ethereum's usability as a generic platform, we need to keep private networks as first class citizens and come up with solutions that don't require central/coordinated management efforts.

Practical solution

The goal of the practical solution is to be practical! As dumb as this sounds, it means that certain suboptimality is acceptable if it keeps things simpler, especially in the context of multiple client implementations.

Historical chain proofs

The first challenge to solve with regard to pruning historical chain segments is to ensure that we can prove the past even though we've deleted the past. There are two possible approaches I see here:

  • Maintain a Merkle (or other cryptographic) proof of deleted chain segments
  • Maintain the header chain indefinitely

Maintaining a Merkle proof of deleted chain segments is exactly how light clients work currently, and how they're able to sync in a couple minutes. Instead of having to go through all the headers from genesis, clients are hard coded (or fed from the config file) a trusted checkpoint, which they start syncing from. This mechanism has two issues however:

  • To keep sync fast, we constantly have to update (release a new client or a new config file) the hard coded snapshots. This works for mainnet with an active maintenance schedule, but does not scale for private Ethereum networks.
  • If no release is made, sync currently takes longer. If however the full nodes would start deleting old headers themselves, old checkpoints would become useless, forcing devs to constantly issue new releases and users to constantly pull new releases. It just doesn't scale.

Maintaining the header chain indefinitely would solve all of the issues that the Merkle proof mechanism has: you can always fast sync based on the header chain with only the genesis (or light sync with arbitrary old snapshots). The downside is that opposed to the Merkle proof, which is 32 bytes for arbitrary history, keeping the headers available indefinitely means indefinite chain growth.

That said, the size of a header is independent of the transactions included in the block (530 bytes), so it doesn't matter how much we scale Ethereum, the growth rate is constant. Using our rough calculations from the previous sections, keeping the headers indefinitely would entail a storage growth of 1.164GB per year. That is imho an acceptible tradeoff for keeping the protocol and client implementations simple.

Synchronization changes

If we assume that full nodes only retain the header chain and the past N months of blocks/receipts from now on, the next obvious question is how a new node can join the network. This depends on the desired mode of synchronization.

  • If the new node is a light client, the existing snapshot + header sync algorithm will remain completely compatible with the pruned chain.
  • If the new node is a full node doing fast sync, some changes are needed. Currently fast sync streams the headers from the network, forming a skeleton for the chain. While the headers are progressing (throttled if they advance too much), older headers are filled with the associated block bodies and receipts. This will have a minor breakage, since bodies/receipts will become unavailable at chain genesis.'
    • The solution would be to download the entire header chain first and when the head is reached, backtrack the N month worth of blocks which are still available in the network and fast sync from that artificial "genesis". All nodes in the network need to agree on the same retention policy to allow proper syncing!
  • If the new node is a full node doing warp sync, only minimal changes would be needed. The node would download the same snapshot as currently from the network, but when back-filling, it would only download bodies/receipts up to N months, after which only headers would be back-filled.
    • Note, I'm not familiar with the warp sync algo beyond the concepts. Feel free to challenge me on this or request further ideas.
  • If the new node is a full node wanting to do a full or archive sync, things get a bit more involved. The headers would still be available from the network, but the bodies need to be pulled from an alternative data source.
    • This depends on later decisions, so I'll postpone describing it here.

Garbage collection

If we agree on an N month/block retention policy, whenever the chain progresses, each client would delete bodies and receipts older than HEAD-N. Furthermore each client would also need to delete any acceleration indices maintained for the old blocks (transaction lookups, bloom filters, etc).

This has an implication on the RPC APIs too however. We need to introduce the concept of a "vitual genesis block" (open for better names) which define the point of history before which the APIs cannot return data (or return that they don't maintain it any more).

Block / receipt archives

One of the hard parts of this proposal is archiving historical chain segments so they remain available for later reconstruction if need be. The ray of hope here is that both the chain of bodies as well as the chain of receipts are just an immutable list of binary blobs, which make them perfect for long term dumb archiving.

The first choice we need to make is whether to have these archives stored/accessible from within the Ethereum peer-to-peer protocol (whatever extension we add on to support it) or only from the outside? To give a new examples:

  • Extra-protocol storage means hosting the data files on classical external servers, mirrored and replicated according to our security needs: FTP, S3, CDNs, etc. These could be archived my major players (Ethereum Foundation, Consensys, Parity Technologies, Internet Archive, etc). Access to these could boil down to dumb web requests.
  • Intra-protocol storage means hosting the data files within some of the nodes in the Ethereum network itself: Swarm/devp2p, IPFS/libp2p, BitTorrent, etc. The arhives would still be run by the same major players, but running an archive would be approachable to anyone, thus closer to the ethos of decentralization.

Extra-protocol is simple but enterprisey, intra-protocol is flexible but needs work. All in all, the extra-protocol storage approach doesn't scale for private networks, test networks, etc. If we want Ethereum to be useful as a technology, we need to retain it's decentralized nature. As such, I'd argue that intra-protocol is the only way.

Decentralized archives

We have a lot of tools already in our toolkit for distributing files in the internet, there are however a lot of gotchas:

  • Swarm: As Ethereum developers, we could say that Swarm (our very own data distribution network) should be the choice of archiving and making history available for ourselves.
    • Problem is, Swarm is not production ready and we don't know when it will be.
    • Second problem is that Swarm is only implemented for go-ethereum, so although any client could run it as an external process, nobody can include it in their client binaries, making it a significant barrier of entry.
    • Lastly, arguing that clients developers should just implement Swarm themselves is of course misguided, since it's a huge effort that cannot be replicated into every language.
  • IPFS: An alternative to the Swarm idea is to host the historical data through IPFS.
    • Opposed to Swarm, IPFS is production ready.
    • As with Swarm, embedding IPFS is also limited to a handful of languages (Go, JavaScript), however a sliver of hope here is that there are many IPFS gateways (including Cloudflare) which make accessing the data easy for any client, even if hosting it is hard.
  • BitTorrent: An elegant possibility would be to piggyback the data distribution king of the last 10 years and create torrent archives out of our historical chain segments.
    • It's as production ready as it gets.
    • It's available from any meaningful language, embeddable into any client.
    • The significant hiccup is that BitTorrent is hard coded to operate on SHA1 hashes. From a security perspective this is irrelevant as clients have the header chain to cross reference data with. From a practicality perspective this is a huge problem: with only the headers available, clients don't know what SHA1 hash they need to download to get the desired data. We could have full nodes maintain hashes of past chain segments, but they are not part of consensus, so it's always an eclipse vulnerability and griefing factor.
  • LES/PIP: The light protocols are designed to retrieve data that only certain nodes have.
    • Light clients are not production ready.
    • Devp2p was not designed for asymmetric protocols, which is one of the reasons light clients have hard times syncing. Light servers are hacking around the issue of serving light clients by rotating them, but it's a weird client-server architecture on top of a p2p network.
    • Discovery does not support finding the required nodes. Geth has been working on ENR to fix this issue, which hopefully will open up a world of possibilities, but it's one more barrier of entry.

All in all, I can't say which solution is the best. I myself am leaning towards IPFS or BitTorrent, because it's less strain the Ethereum ecosystem to support them; and I myself think that retrieving this data in a peer-to-peer fashion, but off of the Ethereum network will help scale it better as it leaves our network speedy and clean of archive traffic.

If we can solve the hash discoverability issue, BitTorrent seems the best approach. If we cannot, IPFS might be the second best. Looking for input on these. My main design goal is to support it for arbitrary networks, not just for mainnet.

Broken invariants

Of course, every optimization has it's downsides too. Pruning historical chain segments breaks a few important invariants within the Ethereum ecosystem:

  • DApps expect that nodes can filter for contract events arbitrarily long in the past. Certain DApps (e.g. Akasha) also use logs are cheap storage, requiring users to constantly filter the entire chain for their data. This proposal breaks this invariant, DApps will no longer be able to access events past the retention policy.
    • The goal of contract logs in Ethereum is to allow external processes to watch for events happening on the chain. Their goal was never to be a data storage mechanism, and their retention is not specified in the Yellow paper / Ethereum consensus protocol.
  • Any Ethereum node can currently return all the information about a past transaction, both the input as well as the result. Pruning historical chain segments and indexes would break this invariant, nodes will have no way of knowing if a transaction was already deleted, or never existed in the first place.
    • Realistically speaking, is there a good reason why every node in the network would want to be able to look up arbitrary transactions that happened arbitrarily long in the past? Yes, it's a cute powerful feature, but is it genuinely needed?
  • The Ethereum peer-to-peer network is currently fully self contained. Any node that speaks the eth protocol can chose its own preferred way to sync and all required data is readily available from all peers. This invariant is broken as nodes doing a full sync will need a second data source to fetch the historical blocks from.
    • This is possibly the most painful part of this proposal, making the life of nodes wanting to do a full sync harder. That said, a full sync on Ethereum mainnet with current Geth takes about 5 days, 4 days out of which is the last 2.7M blocks. If we bump the transaction throughput to 10x, apart from very special users, nobody will be able to do a full sync, nor will want to really.

Summary

This document described a way to put a hard cap on the storage growth of the Ethereum network (apart from the state trie), and demonstrated a possible solution to its long term viability both from a decentralization perspective (manageable full nodes, manageable sync times) and from a scalability perspective (10x transaction throughput).

I also acknowledge that in the process of doing these improvements, certain invariants of the current network would break, nuking some DApps along the way. Some of these breakages would also draw the spotlight towards philosophical debates around immutability.

All in all, we've got a decision to make. Do we want Ethereum 1.0 to be here in 10 years (independent of the arrival of Ethereum 2.0) and make it a robust system as is, or do we go down the planned obsolescence path and hope for the best.

My personal choice would be to make Ethereum 1.0 the best we can and see what the future brings when it arrives. If there is a pragmatic way to make Ethereum 1.0 much more than it is today, it would seem (to me) irresponsible not to take the path.

This proposal requires cross client coordination. It does not however require a hard fork!

@cartercarlson

This comment has been minimized.

cartercarlson commented Nov 30, 2018

Awesome write up! From what I've seen, I definitely think the bulk of research and work towards Ethereum 2.0 has focused on scalability and privacy. The blockchain trilemma has had a lot of coverage, and from a developers standpoint, it's a little more exciting to work on than solving how chain data can be effectively and efficiently stored.

I like that you brought up Moores's law and how it applies to storing the Ethereum blockchain. Something that I want to see from the Ethereum dev team is an analysis on how much data will be required to store the current Sharding implementation. With that, an interesting comparison would be the increased size of storing Ethereum with Sharding over time relative to Moore's law, and if Moore's law will be able to keep our storage abilities ahead of Ethereum 2.0.

@illuzen

This comment has been minimized.

illuzen commented Dec 1, 2018

Great write up. Thanks for the thoughtful and honest assessment of the state of Ethereum resource usage.

You've set a high bar, constant storage requirements. It seems difficult to summarize arbitrarily large bodies of data in a meaningful way with constant sized data. Cryptographic hashes output constant sized data, but they destroy any semantic content in the process. The goal might be more attainable if we settle for logarithmic growth.

@fuzzyTew

This comment has been minimized.

fuzzyTew commented Dec 1, 2018

This proposal uses faulty logic and decreases security of the network.
The document states "The current long term storage requirement is infinite" but this will only happen given infinite time.
The document states "Yes, the rate of growth itself is implicitly limited by the block gas limit, but there is no limit on the total amount of data accumulated over time" -- this limit is time. By controlling the rate of growth, we can put a cap on the size of the blockchain at any point in the future. We can decide it will only grow by one more gigabyte in the lifetime of the sun, if we want. Moore's Law would rapidly overtake.

I thought everyone understood that the proper solution is to increase the fee, and wait for Moore's Law. Use of Ethereum has risen; prices need to rise to slow the growth rate. 1 TB will be nothing in a decade. This proposal provides for potentially a lot more data to be held in the network, but it beats around the bush by stating that it is serving long-term storage requirements, when these are already served by the gas price.

If there is something that is in the ethereum history that somebody needs to not be shared, store the data encrypted and code the clients to refuse to provide it in their API calls. Don't cripple the integrity and security of the whole network.

If you must prune history, please provide for clients to choose to hold it, as in SegWit. Many people will likely do this anyway.

@hadees

This comment has been minimized.

hadees commented Dec 2, 2018

@fuzzyTew we've already seen people create really insecure smart contracts because they were overly concerned with storage fees. Increasing fees is just going to encourage more costly mistakes so you are just shifting around the security problem. Ultimately it seems like there has to be a reasonable expiration for old data with the obvious option to pay to keep it around. Maybe even have a really expensive fee for having the data stored forever.

@mcdee

This comment has been minimized.

mcdee commented Dec 2, 2018

We need to introduce the concept of a "vitual genesis block" (open for better names) which define the point of history before which the APIs cannot return data (or return that they don't maintain it any more).

I'd suggest "event horizon", or if you don't like the use of "event" given that has a specific meaning in Ethereum just "horizon".

Regarding storage for the decentralised archives, I'm a little unsure about the rationale behind not considering the current archive node configuration as the preferred method of retaining complete historical state. An archive node could stay as-is; it would be unsuitable for lots of API requests but has the benefits of requiring no real changes to existing code. A new archive node could still sync from the existing Ethereum network without any code changes (although it might require some "well-known" archive node addresses to find a suitable source).

Staying within the Ethereum network also allows for incentives to be put in place for nodes that are able to serve all historical information, which I don't see as possible if the archives are outside of the network. If a more efficient mechanism than the existing ones are required to transfer the large amounts of data involved that could be added to the network protocols rather than built externally.

@illuzen

This comment has been minimized.

illuzen commented Dec 7, 2018

@hadees,

"Increasing fees is just going to encourage more costly mistakes so you are just shifting around the security problem."

There is a tradeoff here, but you are asserting the exchange rate between fees and money-lost-via-mistakes is constant, AKA a law of conservation, which is an assertion that is not justified here. Could be true, and it would be fascinating if it were.

@illuzen

This comment has been minimized.

illuzen commented Dec 7, 2018

Why don't we create a simple system by which nodes can specify their resource allocations and ethereum makes the best use of them, moving the event horizon for full blocks to the right spot, but keeping block headers all the way back to the beginning?

@Recmo

This comment has been minimized.

Recmo commented Dec 10, 2018

I'd suggest "event horizon", or if you don't like the use of "event" given that has a specific meaning in Ethereum just "horizon".

Actually, the Ethereum specific meaning of 'event' is quite appropriate in the phrase "event horizon". It is the point after which log events can no longer be observed. (Nitpick: the physical meaning has more to do with causality than observability, and this is not appropriate in the analogy as the forgotten events are definitely still in the causal past.)

Re: transactions receipts

If the state and block body is available, wouldn't a node be able to recompute the transaction receipt on demand? Obviously this would not be as fast, and may require different DOS mitigation for public nodes, but it would satisfy the security requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment