DISCLAIMER: All ideas and numbers in this document are preliminary! The goal is to have discussion starters, not to present polished final ideas! The document itself is a rough brain-dump too!
Ethereum 1.0 has a storage scaling issue, specifically, Ethereum's current incarnation has unbounded disk growth. Yes, the rate of growth itself is implicitly limited by the block gas limit, but there is no limit on the total amount of data accumulated over time. This is a problem, because Ethereum 1.0 is not sustainable long term.
Historically the solution was to kick the can:
- Ethereum 2.0 will introduce sharding, solving the data scaling issue.
- Moore's Law ensures that disk capacity grows faster than required storage.
There are significant flaws in both arguments:
- If Ethereum 2.0 introduces sharding, that will split the chain into N shards, reducing the storage requirement to 1/Nth of the current amount.
- The current long term storage requirement is infinite. You can split that into arbitrarily many static pieces, the end result will still be infinite, but with a slower growth rate.
- The slower growth rate is questionable, as any capacity increase is readily gobbled up by suboptimal applications. Realistically, splitting the chain into N pieces would only result in each sub-chain growing at the same rate as current mainnet.
- If Moore's Law holds for storage, you will always be able to buy a system powerful enough to hold your (shard) chain.
- Anyone joining the network not only needs to store the chain, but also needs to obtain it first, which at its most naive form means downloading raw data. There's latency and bandwidth limitations as well as associated costs, which outweigh the cost of raw storage. E.g. A 1TB HDD costs 47 USD, syncing 1TB on a perfectly saturated 0 latency 100mbit line would take ~23 hours and cost 85 USD at AWS pricing. These costs hit both the joiner and the network too!
- A significant part of the total storage size on Ethereum is active data. That means it's not some historical blob you can stash away and not care about any more, rather may be accessed by arbitrary transactions. This means you need sub millisecond access times (i.e. memory caches and SSD indexes). Higher storage consumption means proportionally higher memory requirements and disk IO, which means additional hardware costs + SSD amortization (Ethereum will kill your SSD).
The above points are meant to highlight that Ethereum 1.0 has problems that Ethereum 2.0 doesn't aim to solve (yet), so it will be just as vulnerable to them as we are now. The argument that Ethereum 2.0's sharding + Moore's Law will net us enough time to fix these is based on the assumption that a production ready Ethereum 2.0 arrives before Ethereum 1.0 grinds to a halt, which seems a reckless position to take.
Before diving into possible solutions, lets try to put some numbers on the problem.
A full Ethereum node currently stores various types of information (we don't go into archive nodes, that's a separate issue). Whilst the exact list depends on client implementations, most will probably feature at least the following:
- Chain of headers that cryptographically defines a blockchain's content.
- Chain of block bodies that store past uncles and past transactions.
- Chain of receipts that store past transaction results and contract logs.
- Index of transaction-hash to block mappings and a few others.
- Account and storage Merkle-Patricia tries.
Lets see how these numbers play out on mainnet from genesis till block #6775081 (please ignore the X axis, I just used the block number as a timestamp to make it simple for Grafana). The numbers shown are Geth's current storage usage, they may be different on other clients. We're mostly interested about magnitudes here, not exact numbers.
The headers are mostly boring, each hovering around the 530 byte mark (minimally compressible since transaction traffic picked up), independent of block gas limits.
The block bodies (i.e. uncles + transactions) fluctuate a bit, but we can put 16KB compressed as a safe number for each recent block. Similarly receipts fluctuate around 14KB compressed per block. Blocks are receipts however are dependent on both block gas limit and fork rules: naively guessing, increasing the gas limit will increase the body/receipt sized proportionally; similarly forks might bump these numbers up (e.g. Constantinople charges less for SSTORE, supporting more transactions per block, thus larger storage is anticipated).
Putting these numbers into perspective, currently the Ethereum mainnet produces about 6100 blocks per day, meaning the current growth can be very (!!) roughly interpolated to:
|Current||+1 Day||+1 Month||+1 Year||+5 Years|
These number however only represent the raw chain itself. Beside the account/storage trie (which this document will not go into), full nodes also need to maintain a number of fast-access indexes to allog filtering for events and to allow looking up blocks, transactions and receipts via a hash.
With Geth's current storage data model, these indices tally up to 28.6GB (sadly I don't have historical charts for these). The bulk of this data is the transaction lookups (on mainnet there are currently 288,731,936 transactions), as such, we can conclude that this number will grow proportionally with the transactions (i.e with bodies and receipts).
Adding this best guess to the previous table, we get to the database growth caused solely by the raw chain, not accounting for smart contract state itself.
|Current||+1 Day||+1 Month||+1 Year||+5 Years|
If we freeze the capacity of Ethereum 1.0 in place until Ethereum 2.0 is finalized, we're in the 0.6TB ballpark figure for storing the chain and its indexes, not accounting for the account and contract tries. Quite a lot, but manageable. If, however, we want to do a 2x, 5x or ideally 10x capacity bump, that will result in a 1.06TB, 2.48TB or 4.85TB storage load respectively. Realistically, even the 2x is an unreasonable expectation.
If we want decentralization, we need to get these numbers down by at least one order of magnitude, preferably two.
In the blockchain world, the philosophical debate of archive nodes vs. full nodes pops its head up every now and again. The mantra usually goes that only nodes that store every past state permutation should be considered a full nodes. In Ethereum we took the practical approach of pruning away past historical state, since there aren't really meaningful use cases for average people to care about past balances and contract states.
This however begs the question: if average users (who we'd like to run full nodes) don't care about what their balance was 4 years ago, will they care about the list of transactions executed 4 years ago? Will they care about an event that a smart contract raised 4 years ago? Not really. But if not, why burden everyone in the network with data nobody cares about? [There is one meaningful answer, security, but we'll get back to that a bit later].
Let's delete historical blocks, historical logs and historical indexes!
Perhaps it's not immediately apparent why this proposal is such a powerful suggestion: because it puts a hard, guaranteed cap on the amount of disk space that Ethereum 1.0 would consume. The definition of "historical" is not relevant. Full nodes could host 1 month of recent data (on the insane end of the spectrum) or 3 years of recent data (on the other insane end of the spectrum). No matter of the choice, we can calculate a final, hard number of disk requirements based on it.
If we take our extremely rough storage growth guesses from above, and calculate how much data we'd need at different gas limits and retention intervals, we'll get some interesting insights:
If we'd say that we are currently comfortable with Ethereum's 113.15GB storage requirements for the raw chain + indexes, the same number could cater for:
- 14.9x transaction throughput with 1 month data retention
- 4.96x transaction throughput with 3 months data retention
- 2.48x transaction throughput with 6 months data retention
- 1.24x transaction throughput with 1 year data retention
Which one is the best choice? That's not up to me or this document to make really, but it's nice to know that a 1 month data retention could net us a 10x throughput increase while reducing storage at the same time! [Note again, we're not addressing account/storage trie growth in this document].
Before diving into proposals on how we might achieve our goal of pruning historical chain segments, it's important to highlight the invariants that we want to retain or guarantee:
- Data retention policies must be agreed upon across all clients.
- Theoretically - even now - every client is free to keep or discard any data. Practically, if majority clients discard data that minority clients need, the minority client will be barred from joining the network. More generally, if there is data asymmetry among the clients, network health will suffer.
- Historical data must not ever be totally forgotten by the network.
- If a node wants to do a full sync (i.e. reprocessing every block from genesis), there must be public archives containing the original blocks. Retrieval latency and bandwidth are not required to be optimized for (the bottleneck is local hardware capacity, not network). We can even incentivize these archives, since they should not be needed for a fast/warp/leaf/light sync.
- Cryptographic proofs of the historical blocks (i.e. headers) must remain in the network.
- If the network prunes away all traces of the chain history, reconstructing a full sync becomes problematic as syncing nodes would need to reprocess millions of blocks at face value before being able to prove they are from the correct chain. By retaining proofs of block ancestry, historical chain segments can be retrieved from arbitrary untrusted sources.
- Historical archives must be accessible in a generic, decentralized way.
- There are easy ways to make data available for some specific network (e.g. mainnet), but if we want to retain Ethereum's usability as a generic platform, we need to keep private networks as first class citizens and come up with solutions that don't require central/coordinated management efforts.
The goal of the practical solution is to be practical! As dumb as this sounds, it means that certain suboptimality is acceptable if it keeps things simpler, especially in the context of multiple client implementations.
Historical chain proofs
The first challenge to solve with regard to pruning historical chain segments is to ensure that we can prove the past even though we've deleted the past. There are two possible approaches I see here:
- Maintain a Merkle (or other cryptographic) proof of deleted chain segments
- Maintain the header chain indefinitely
Maintaining a Merkle proof of deleted chain segments is exactly how light clients work currently, and how they're able to sync in a couple minutes. Instead of having to go through all the headers from genesis, clients are hard coded (or fed from the config file) a trusted checkpoint, which they start syncing from. This mechanism has two issues however:
- To keep sync fast, we constantly have to update (release a new client or a new config file) the hard coded snapshots. This works for mainnet with an active maintenance schedule, but does not scale for private Ethereum networks.
- If no release is made, sync currently takes longer. If however the full nodes would start deleting old headers themselves, old checkpoints would become useless, forcing devs to constantly issue new releases and users to constantly pull new releases. It just doesn't scale.
Maintaining the header chain indefinitely would solve all of the issues that the Merkle proof mechanism has: you can always fast sync based on the header chain with only the genesis (or light sync with arbitrary old snapshots). The downside is that opposed to the Merkle proof, which is 32 bytes for arbitrary history, keeping the headers available indefinitely means indefinite chain growth.
That said, the size of a header is independent of the transactions included in the block (530 bytes), so it doesn't matter how much we scale Ethereum, the growth rate is constant. Using our rough calculations from the previous sections, keeping the headers indefinitely would entail a storage growth of 1.164GB per year. That is imho an acceptible tradeoff for keeping the protocol and client implementations simple.
If we assume that full nodes only retain the header chain and the past N months of blocks/receipts from now on, the next obvious question is how a new node can join the network. This depends on the desired mode of synchronization.
- If the new node is a light client, the existing snapshot + header sync algorithm will remain completely compatible with the pruned chain.
- If the new node is a full node doing fast sync, some changes are needed. Currently fast sync streams the headers from the network, forming a skeleton for the chain. While the headers are progressing (throttled if they advance too much), older headers are filled with the associated block bodies and receipts. This will have a minor breakage, since bodies/receipts will become unavailable at chain genesis.'
- The solution would be to download the entire header chain first and when the head is reached, backtrack the N month worth of blocks which are still available in the network and fast sync from that artificial "genesis". All nodes in the network need to agree on the same retention policy to allow proper syncing!
- If the new node is a full node doing warp sync, only minimal changes would be needed. The node would download the same snapshot as currently from the network, but when back-filling, it would only download bodies/receipts up to N months, after which only headers would be back-filled.
- Note, I'm not familiar with the warp sync algo beyond the concepts. Feel free to challenge me on this or request further ideas.
- If the new node is a full node wanting to do a full or archive sync, things get a bit more involved. The headers would still be available from the network, but the bodies need to be pulled from an alternative data source.
- This depends on later decisions, so I'll postpone describing it here.
If we agree on an N month/block retention policy, whenever the chain progresses, each client would delete bodies and receipts older than HEAD-N. Furthermore each client would also need to delete any acceleration indices maintained for the old blocks (transaction lookups, bloom filters, etc).
This has an implication on the RPC APIs too however. We need to introduce the concept of a "vitual genesis block" (open for better names) which define the point of history before which the APIs cannot return data (or return that they don't maintain it any more).
Block / receipt archives
One of the hard parts of this proposal is archiving historical chain segments so they remain available for later reconstruction if need be. The ray of hope here is that both the chain of bodies as well as the chain of receipts are just an immutable list of binary blobs, which make them perfect for long term dumb archiving.
The first choice we need to make is whether to have these archives stored/accessible from within the Ethereum peer-to-peer protocol (whatever extension we add on to support it) or only from the outside? To give a new examples:
- Extra-protocol storage means hosting the data files on classical external servers, mirrored and replicated according to our security needs: FTP, S3, CDNs, etc. These could be archived my major players (Ethereum Foundation, Consensys, Parity Technologies, Internet Archive, etc). Access to these could boil down to dumb web requests.
- Intra-protocol storage means hosting the data files within some of the nodes in the Ethereum network itself: Swarm/devp2p, IPFS/libp2p, BitTorrent, etc. The arhives would still be run by the same major players, but running an archive would be approachable to anyone, thus closer to the ethos of decentralization.
Extra-protocol is simple but enterprisey, intra-protocol is flexible but needs work. All in all, the extra-protocol storage approach doesn't scale for private networks, test networks, etc. If we want Ethereum to be useful as a technology, we need to retain it's decentralized nature. As such, I'd argue that intra-protocol is the only way.
We have a lot of tools already in our toolkit for distributing files in the internet, there are however a lot of gotchas:
- Swarm: As Ethereum developers, we could say that Swarm (our very own data distribution network) should be the choice of archiving and making history available for ourselves.
- Problem is, Swarm is not production ready and we don't know when it will be.
- Second problem is that Swarm is only implemented for
go-ethereum, so although any client could run it as an external process, nobody can include it in their client binaries, making it a significant barrier of entry.
- Lastly, arguing that clients developers should just implement Swarm themselves is of course misguided, since it's a huge effort that cannot be replicated into every language.
- IPFS: An alternative to the Swarm idea is to host the historical data through IPFS.
- Opposed to Swarm, IPFS is production ready.
- BitTorrent: An elegant possibility would be to piggyback the data distribution king of the last 10 years and create torrent archives out of our historical chain segments.
- It's as production ready as it gets.
- It's available from any meaningful language, embeddable into any client.
- The significant hiccup is that BitTorrent is hard coded to operate on SHA1 hashes. From a security perspective this is irrelevant as clients have the header chain to cross reference data with. From a practicality perspective this is a huge problem: with only the headers available, clients don't know what SHA1 hash they need to download to get the desired data. We could have full nodes maintain hashes of past chain segments, but they are not part of consensus, so it's always an eclipse vulnerability and griefing factor.
- LES/PIP: The light protocols are designed to retrieve data that only certain nodes have.
- Light clients are not production ready.
- Devp2p was not designed for asymmetric protocols, which is one of the reasons light clients have hard times syncing. Light servers are hacking around the issue of serving light clients by rotating them, but it's a weird client-server architecture on top of a p2p network.
- Discovery does not support finding the required nodes. Geth has been working on ENR to fix this issue, which hopefully will open up a world of possibilities, but it's one more barrier of entry.
All in all, I can't say which solution is the best. I myself am leaning towards IPFS or BitTorrent, because it's less strain the Ethereum ecosystem to support them; and I myself think that retrieving this data in a peer-to-peer fashion, but off of the Ethereum network will help scale it better as it leaves our network speedy and clean of archive traffic.
If we can solve the hash discoverability issue, BitTorrent seems the best approach. If we cannot, IPFS might be the second best. Looking for input on these. My main design goal is to support it for arbitrary networks, not just for mainnet.
Of course, every optimization has it's downsides too. Pruning historical chain segments breaks a few important invariants within the Ethereum ecosystem:
- DApps expect that nodes can filter for contract events arbitrarily long in the past. Certain DApps (e.g. Akasha) also use logs are cheap storage, requiring users to constantly filter the entire chain for their data. This proposal breaks this invariant, DApps will no longer be able to access events past the retention policy.
- The goal of contract logs in Ethereum is to allow external processes to watch for events happening on the chain. Their goal was never to be a data storage mechanism, and their retention is not specified in the Yellow paper / Ethereum consensus protocol.
- Any Ethereum node can currently return all the information about a past transaction, both the input as well as the result. Pruning historical chain segments and indexes would break this invariant, nodes will have no way of knowing if a transaction was already deleted, or never existed in the first place.
- Realistically speaking, is there a good reason why every node in the network would want to be able to look up arbitrary transactions that happened arbitrarily long in the past? Yes, it's a cute powerful feature, but is it genuinely needed?
- The Ethereum peer-to-peer network is currently fully self contained. Any node that speaks the
ethprotocol can chose its own preferred way to sync and all required data is readily available from all peers. This invariant is broken as nodes doing a full sync will need a second data source to fetch the historical blocks from.
- This is possibly the most painful part of this proposal, making the life of nodes wanting to do a full sync harder. That said, a full sync on Ethereum mainnet with current Geth takes about 5 days, 4 days out of which is the last 2.7M blocks. If we bump the transaction throughput to 10x, apart from very special users, nobody will be able to do a full sync, nor will want to really.
This document described a way to put a hard cap on the storage growth of the Ethereum network (apart from the state trie), and demonstrated a possible solution to its long term viability both from a decentralization perspective (manageable full nodes, manageable sync times) and from a scalability perspective (10x transaction throughput).
I also acknowledge that in the process of doing these improvements, certain invariants of the current network would break, nuking some DApps along the way. Some of these breakages would also draw the spotlight towards philosophical debates around immutability.
All in all, we've got a decision to make. Do we want Ethereum 1.0 to be here in 10 years (independent of the arrival of Ethereum 2.0) and make it a robust system as is, or do we go down the planned obsolescence path and hope for the best.
My personal choice would be to make Ethereum 1.0 the best we can and see what the future brings when it arrives. If there is a pragmatic way to make Ethereum 1.0 much more than it is today, it would seem (to me) irresponsible not to take the path.
This proposal requires cross client coordination. It does not however require a hard fork!