O(1) Block Propagation
Bitcoin miners want their newly-found blocks to propagate across the network as quickly as possible, because every millisecond of delay increases the chances that another block, found at about the same time, wins the "block race."
With today's p2p protocol, this gives miners an incentive to limit the number of transactions included in their blocks. A transaction must pay more fees to the miner than they are statistically likely to lose due to the increased chance of losing a block race, since new block announcements include all of the data for all of the transactions in the block. This is inefficient (transaction data is transmitted across the network twice, using twice as much bandwidth) and artificially increases transaction fees to be much higher than they need to be.
Each fully-validating node on the Bitcoin network maintains a transaction memory pool, which is a list of valid but unconfirmed transactions. Transactions are added to nodes' memory pools as they are relayed across the network, and are removed from memory pools when new blocks are found. Mining nodes select a subset of memory pool transactions to include in their new blocks.
If memory pools were perfectly synchronized and all miners had exactly the same policy for choosing which transactions to include in their blocks and in what order to include them, then miners could just announce new blocks as the fixed 80-byte block header and the coinbase transaction; there would be no need to include any transaction data at all, it could be reconstructed from their peers' memory pools.
In the real world, memory pools are not perfectly synchronized. However, they tend to be very similar, and we can take advantage of that similarity to optimize the amount of information that needs to be transmitted when a new block is found.
Set reconciliation and Invertible Bloom Lookup Tables
Set reconciliation means finding the difference between two sets of data. To optimize new block broadcasts, the sets we care about are the set of transactions transactions in a new block (call that set "B") and the set of transactions that a node already knows about (call that set "P").
Ideally we want to transmit just the 80-byte block header and transaction data for B-P. Both B and P will grow in size as transaction volume and the size of blocks grows, but if transaction propagation is mostly reliable B-P will be a constant size (see the section below on incentives for more discussion).
Invertible Bloom Lookup Tables (IBLTs) are a new (described in 2011) data structure that can be used to solve the set reconciliation problem.
The idea of an IBLT is to squash together (using hashing and XOR) all of the transaction data into a fixed-size data structure. Then transmit that squashed data to a peer, who mostly knows what data you are sending them, but might be missing a few transactions or might be expecting that you put a few transaction in your block that you actually didn't.
The size of the IBLT needed for all of that to work is proportional to the number of differences between what you send and what your peer expects.
There are three key ideas to make this work:
- canonical ordering of transactions in blocks
- similar (but not identical!) policy for selecting which mempool transactions go into blocks
- Peer sends us an IBLT(newblock) large enough that we are very likely to be able to recover transaction data that is not yet in our mempool.
Canonical ordering of transactions
I propose the following algorithm as the canonical ordering for transactions in a block. Canonical ordering will not be a consensus rule; it is just used to eliminate the need to communicate the order of transactions when IBLTs are used.
- Start with the set of transactions in the block (including the coinbase transaction)
- Sort them in ascending order, using the smallest (previous-transaction-hash, index) of each transaction's inputs as the sort key. Note that the coinbase transaction will be first, since its single input's prevout/index is all zero.
- Add the first transaction on the sorted list that does not depend on a transaction later in the list to the block (and remove it from the sorted list).
- Continue adding transactions until the sorted list is empty.
Communicating transaction selection policy
The reference implementation allows miners to specify: Maximum size (in bytes) of their blocks Maximum size set aside for "high priority" transactions, included regardless of fee.
Transactions are selected based on priority and then fee-paid-per-kilobyte.
Communicating the 80-byte block header, the size (in bytes) of the transaction data, and the size (in bytes) of the high-priority transactions is sufficient for the IBLT set reconciliation algorithm to work.
Encoding transaction data in the IBLT
Invertible Bloom Lookup Tables store key-value pairs. So given a set of transactions in a new block, what is the best way to encode them in an IBLT? What should be used as the key, and what the value?
We want the values in the IBLT to be a fixed, not-too-long length. Transmitting an IBLT with values that can hold the maximum possible transaction size (100K bytes) would not save any bandwidth. So I propose that the transaction data be serialized and split into 8-byte (64-bit) chunks. Two bytes of the key can then be used as a sequence number to reassemble the transaction.
We want keys to be unique, but they only have to be unique for the transactions in this block, they don't have to be globally unique. Using 48 bits of the transaction data's hash gives about a 0.1% chance of a collision even if we scale up to blocks with a million transactions in them, and combined with a 16-bit sequence number that give 64-bit keys. The transaction selection code can simply refuse to select transactions whose hashes collide with a transaction already in the block (they will just have to wait and get confirmed in the next block; a 0.1% chance of collision in million-transaction blocks means one transaction in a billion gets delayed).
Using the transaction hash directly opens up a possible denial-of-service attack; an attacker might try to brute-force the first 48 bits of the transaction id to keep somebody else's transaction from getting included. Or they might try to create transaction ids that collide in the hash buckets used internally by the IBLT. Those attacks can be prevented by combining the transaction ID with 48 bits of random "salt" (which an attacker cannot know) before using it as a key in the IBLT.
Pseudo-code algorithm for encoding a new block in an IBLT:
Pick transactions from the memory pool to be in the new block, and put them in the block in canonical order. Create a new empty IBLT that is large enough to encode expected differences between the block we created and the blocks our peers are creating. Select a random 48-bit salt. Foreach transaction, including the coinbase: Calculate the 48-bit txid+salt Serialize the transaction and split it into 64-bit chunks Foreach chunk i: insert key=txid48+i, value=chunk Search for a solution to the new block. When a block solution is found: Foreach peer running IBLT-capable version of the protocol: Transmit block header, sizeof(txdata), sizeof(high_priority_tx), salt, IBLT Foreach peer running an older version of the protocol: Send the whole block in a 'block' message
Securely calculating txid48
To prevent brute-force attacks, we want at least 128 bits of the transaction ID to contribute to the IBLT key. I propose hashing the transaction ID with a 256-bit salt chosen by the miner, and taking 48 bits of the result.
Decoding an IBLT new block message
Peers receiving the block header and IBLT could immediately relay it to their peers after checking the proof-of-work in the block header-- although it is debatable as to whether or not they should before decoding the IBLT and validating the transactions.
Pseudo-code for decoding the data in an IBLT:
When an IBLT for a new block ("IBLT_new") is received from a peer: Create a new, empty IBLT ("IBLT_us") and fill it with transactions from the memory pool (based on byte-size hints received from peer). Calculate IBLT_diff = IBLT_new - IBLT_us (this takes constant time) Decode IBLT_diff; if decoding succeeds, the result is two lists of (txid48+sequence, chunk) : transactions that the peer included that need to be added to the block we created from our memory pool, and transactions that the peer did not include that we need to remove. Concatenate the chunks and deserialize the transactions, then insert them into the new block in canonical order. Then proceed with all the normal block and transaction validation. If decoding fails (memory pools or mining policies differ too much, or a bug in the peer's IBLT creation code leaves out a chunk of transaction data or something), fall back to getdata/block messages or maybe ask for an extension of the IBLT that will successfully decode.
What about blockchain re-orgs?
All of the above assumes we are receiving a new block building on our current idea of the best block chain.
If our current best chain is A->B, and we receive IBLTs for an alternative longer chain with block B' and C', how do we reconstruct B' ?
The simplest answer is to just request the full block data for B' -- re-orgs are rare.
But the set of transactions in B and B' should be very similar, so it might be worthwhile to reconstruct B' from it's IBLT and an IBLT created from block B.
Alternative or complementary approaches
"I know that you know..."
Matt Corallo has implemented a fast block relayer tool. It keeps track of which transactions have been sent to each peer, and then when a new block is found it replaces transaction data that it knows a peer has with 10 bytes of the transaction ID.
Transmitting just 10 bytes of the transaction ID instead of the full average-250-byte transaction gives a 25-times bandwidth savings.
TODO: measure Matt's approach against an IBLT approach. My guess is at small block sizes the simpler approach works better, but at some point IBLTs win.
THOUGHT EXPERIMENT: IBLT-ifiy the data Matt's fast relayer sends? Should theoretically get the same constant factor of 25 improvement but be O(1) instead of O(n)....
Relay just headers
Several people have proposed relaying just the 80-byte block header, with miners checking the proof-of-work and then beginning to mine an empty block on top of whichever header has the most proof-of-work (assuming that nobody waste time creating a valid-proof-of-work block that contained invalid transactions). Once the full transaction data was transmitted miners would then switch to mining a block full of transactions.
See bitcoin/bitcoin#3195 for discussion on why this is not a good idea.
Miners could broadcast the blocks they are working on to the network before they have solved any blocks. If they then do find a block, they could then just broadcast the winning nonce and coinbase transaction (and some identifier for "that unsolved block I sent you a while ago").
I think broadcasting "working blocks" is a great idea, for a couple of reasons.
First, it gives merchants an idea of what percentage of hashing power is working on including their transction in the blockchain, helping make 0-confirmation transactions more secure (if 90% of active miners are working to include your transaction after a few seconds, and none are working on including a double-spend, then it is reasonably likely it will be confirmed in the next block or two).
Second, it gives other miners a view into what the majority of hashing power is working on, and should allow miners to better monitor the health of their network connections.
A proposal for pre-broadcasting unsolved blocks would have to address bandwidth usage and denial-of-service attacks; perhaps such a proposal could use IBLTs to minimize the amount of memory and bandwidth used by the nodes on the network.
Road map / TODO
Instrument some bitcoind's running on the production network to find out how much memory pools and transaction selection policies differ today.
Calculate the IBLT size needed for that set size difference to achieve 1% decoding failure.
Create a test bench environment with real-world transactions and blocks, so the IBLT approach can be compared against other approaches or optimizations to the IBLT code can be measured to make sure they actually improve latency.
Write code, work out all the details, benchmark/test along the way.
Once stable and starting to be deployed: write up BIPs describing new p2p protocol changes.
Will this skew incentives?
Making propagation of new block announcements O(1) instead of O(n_transactions_in_block) means miners should include more fee-paying transactions in their blocks.
However, propagation is only O(1) for blocks containing transactions already seen and accepted by a majority of the network. There is an incentive for all miners to include roughly the same set of transactions in their blocks; no miner will want to create a block containing mostly transactions that some of the network refuses to relay or mine, because they will have to include that transaction data in their new block announcement, which slows down block propagation.
The converse is also true: miners that refuse to include transactions that the rest of the network would accept will have to transmit bigger IBLTs.
A miner trying to optimize the propagation of their blocks would only include transactions that are in all of their peers' memory pools; note that this gives power to "relay nodes" that are not mining. If a significant number of relay nodes refuse to remember very-small-fee transactions in their memory pools (perhaps they are dropping them because they run into CPU or bandwidth limits) then miners have an incentive not to include them in their blocks. This will put a floor on how low transaction fees will go.
But I heard that Bitcoin Doesn't Scale...
People claiming that "Bitcoin Doesn't Scale" are theoretically correct: you still need O(n) bandwidth and CPU to fully validate n transactions-per-second.
Someday, when Bitcoin is the number 2 payment network in the world, we might have to start worrying about that. Here are a couple of back-of-the-envelope calculations that show that we should be able to scale up to n=15,000 transactions per second before running into that O(n) bandwidth limit.
For perspective, the number 1 payment network in the world today (Visa) handles about 212 million transactions per day; 2,500 transactions per second on average. Their peak processing capacity, needed on the busiest shopping days, is reported to be 40,000 tps.
My home Internet connection is getting about 90 megabits download bandwidth per second right now. An average Bitcoin transaction is about 2,000 bits, so my current consumer-level Internet connection could download 45,000 transactions per second, over ten times average Visa transaction volume.
While it is nice to know that I could run a full node handling more-than-Visa-scale transaction volume from my house, running a dedicated machine in a data center somewhere makes more sense. 15,000 250-byte transactions per second works out to about 7 terabytes of bandwidth per month. One of my hosting providers charges $20 per month for a virtual private server with 8 TB of bandwidth per month-- or $240 per year to handle MasterCard-level transaction volume today (August 2014).
Goodrich, Michael T., and Michael Mitzenmacher. "Invertible bloom lookup tables." Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 2011. http://arxiv.org/pdf/1101.2245
Eppstein, David, et al. "What's the difference?: efficient set reconciliation without prior context." ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011. http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p218.pdf
Thanks to Gregory Maxwell and Emin Gün Sirer for pointing out relevant results from the information coding and set reconciliation literature.