Skip to content

Instantly share code, notes, and snippets.

@oleganza
Created April 30, 2014 12:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oleganza/c8fdd3f83f6c27df2fda to your computer and use it in GitHub Desktop.
Save oleganza/c8fdd3f83f6c27df2fda to your computer and use it in GitHub Desktop.
BIP-Fractional-Full-Node

BIP Draft: Fractional Full Node

Abstract

We explore the way to make full Bitcoin nodes store only a fraction of data while being able to fully validate every incoming block and get information about any output or transaction.

Strategy

Step 1. UTXO-only app. Stores all block headers, but only a fraction of blocks to help other peers (see previous BIP Fractional Storage Identifier). Stores entire UTXO index to validate new blocks and also keeps 300-500 recent blocks to allow for swift reorganization. This will bring down the size to 400-500 Mb this year from 17+ Gb of storage used by entire blockchain. Caveats: index should be crash-proof otherwise the client would have to re-download and re-validate all blocks from scratch.

Step 1.1. Block downloading can be done in multiple threads from several peers at once to speed up the process.

Step 2. Since UTXO will only grow in the future (proportionally to number of people owning bitcoins), it also needs to be scaled. A path to a solution is to organize all outputs (spent and unspent) into patricia merkle trees and keep around roots of these trees per each block. Then, the client is free to throw away 99% of the branches and request them from its peers when needed during block validation. Received branches are verified by checking that interior hashes all hash into a trusted root branch at a given block height (that's the usual merkle tree technique). Each output will point to a block height at which it is spent using 0xffffffff value as "unspent". Note that storing the entire index of all outputs would take at least as much space as raw blockchain, but nodes can keep only a fraction of that index.

The real difficulty is to efficiently store such full-output-index trees per each block to allow new nodes to validate past blocks while having only partial knowledge of all outputs. This means that any node should be able to return output state as seen from the chain of a given height. Naïve non-compact implementation would either increase storage requirements quadratically (when having a separate tree for each block) or increase computational requirements to O(N), where N is total number of outputs (when the peer needs to compute proper merkle hashes for an interior tree). This part is being in process of being solved.

Step 3. Profit. Lighter full node will enable much more people to participate in p2p network, thus making everyone keep smaller and smaller fraction of the blockchain. Also, every smartphone or notebook can benefit from new-generation protocols built on top of blockchain like a DNS+SSL replacement (everyone will be able to verify identity of a name without trusting DNS servers or Certificate Authorities).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment