Skip to content

Instantly share code, notes, and snippets.

@arowser
Forked from n1bor/readme.md
Created January 2, 2018 00:01
Show Gist options
  • Save arowser/b86f6d2429ccfaf6ac231fe89c2e51f9 to your computer and use it in GitHub Desktop.
Save arowser/b86f6d2429ccfaf6ac231fe89c2e51f9 to your computer and use it in GitHub Desktop.
Bitcoin Chainstate Download

Chainstate only download - Proof of Concept

Introduction

The bitcoin blockchain is currently (9th Sept 2016) 88Gig but the Chainstate is only 1.7 Gig. That is a factor of 50 different. One concern for bitcoin at the moment is a lack of full nodes. This is partly due to the length of time to download and process the blockchain from scratch. And in addition a significant cost of hosting a full node is the cost of bandwidth for serving historical blocks - and a lot of this is to nodes that start downloading the blockchain and then abandon. A node can still operate as a full node with just a valid chainstate - there is little need for the full blockchain. From my tests only 30% of chains that start syncing get to completion.

Overview of proposal

  1. Every 100'th block nodes take a snapshot of the LevelDB chainstate database.
  2. They then calculate a hash of hashes of that 1.7gig database (takes about 2mins).
  3. Miners embed that hash in the coinbase of a block 20 block later. This is a consensus rule so MUST be correct. This embeds a hash of the chainstate in the blockchain.
  4. Nodes maintain the 3 previous leveldb snapshots.
  5. A new node downloads all the block headers.
  6. The new nodes then request a "snapshot" message that contains the hash matching the hash in the blockchain and a hash of X chucks of hashes of transaction and their UTXO's. The hashes of hashes of the chunks MUST matches the hash in the blockchain.
  7. Node requests the one block containing the coinbase with the hash of hashes and confirm it is correct.
  8. New nodes then request each of the X chucks from other nodes. The chucks contain a subset of Tx and UTXOs. They request these in order, check the hash of the chunk vs the previous hashes and if correct load into their chainstate database.
  9. Once all the chunks are loaded the new node can update its active chain and operate in a fully validated pruned node fashion. It can even start serving chainstate to other nodes once it is past a 100 block.
  10. If needed it could then start back populating the blockchain - although little point for most users.

Proof of concept

A fully functioning (but totally insecure - due to lack of soft fork) proof of concept is available here: https://github.com/n1bor/bitcoin/tree/chaindownload

It includes functioning code for 1, 2, 4, 5, 6, 8 and 9 above.

Once built you can start a new node and chainstate sync by running:

./bitcoind -connect=ec2-52-25-188-92.us-west-2.compute.amazonaws.com -pruned=100000 -debug=snapshot -debug=chaindownload -debug=net

Sorry this node is now offline. But you can run one yourself using this codebase.

This will connect to a node that is also running the above code, snapshotting and serving chuncks, download the chain and the work as normal. Alternatively you can run the code on an existing blockchain and then run a second node with an empty blockchain and connect to the 1st.

By tailing debug.log you will be able to see 1st the progress in downloading the headers, then the 8192 chunks, then it will download the last 100 or so blocks and then be in sync. The second time you run you can let it connect to any nodes.

On a good quality home internet connection this will sync the chain and be up and running in about 30mins. This could be improved.

RPC additions

There are a number of additions to the rpc interface to aid with testing and understanding what is going on:

  • createsnapshot - this creates a snapshot of the chainstate at the current block. The hash will be computed in the next minute in the background
  • updateallsnapshots - this forces any unhashed snapshots to be hashed
  • listsnapshots - this list all the current snapshots and the 1st few chunk hashes. To see all add parameter all.
  • getchunk - pass the chainstate hash and the hash of one chunk and it will return the hashes of all the transations in that chunk.

Additional work needed before release

  1. Review of hashing of UTXOs. We have option to store that hash per tx in LevelDB. This should reduce the 2mins needed to hash the whole chain making the solution more scalable, but increase disk space used. Tests needed for best trade-off.
  2. Make the number of chunks (currently 8192) dynamic.
  3. Add code for getblocktemplate and other mining related code.
  4. Improvements to networking code so new nodes can "find" a node that has a snapshot.
  5. Retry and other improvements to chuck download code.
  6. Downloading of block containing chainstate hash and checking vs hash in snapshot message. And trying new node if wrong.
  7. GUI updates on progress.
  8. Softfork code.
  9. Unit tests.

Later release ideas

  1. Post chainsync block download.
  2. Making chainsync default mode.

Next Steps

  1. Get agreement that this is a "good idea" and confirmation that miners will support the soft fork needed.
  2. Assign BIP and formally document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment