gavinandresen/BlockPropagation.md

## BlockPropagation.md

      
    Raw
  

              BlockPropagation.md
            
          
    O(1) Block Propagation

The problem

Bitcoin miners want their newly-found blocks to propagate across the network
as quickly as possible, because every millisecond of delay increases the
chances that another block, found at about the same time, wins the "block race."
With today's p2p protocol, this gives miners an incentive to limit the
number of transactions included in their blocks. A transaction must pay more
fees to the miner than they are statistically likely to lose
due to the increased chance of losing a block race, since new block announcements
include all of the data for all of the transactions in the block. This is
inefficient (transaction data is transmitted across the network twice, using
twice as much bandwidth) and artificially increases transaction fees to be much
higher than they need to be.
Proposed approach

Each fully-validating node on the Bitcoin network maintains a transaction
memory pool, which is a list of valid but unconfirmed transactions.
Transactions are added to nodes' memory pools as they are relayed across
the network, and are removed from memory pools when new blocks are found.
Mining nodes select a subset of memory pool transactions to include in their
new blocks.
If memory pools were perfectly synchronized and all miners had exactly the same
policy for choosing which transactions to include in their blocks and in what
order to include them, then miners could just announce new blocks as the fixed
80-byte block header and the coinbase transaction; there would be no need to
include any transaction data at all, it could be reconstructed from their
peers' memory pools.
In the real world, memory pools are not perfectly synchronized. However,
they tend to be very similar, and we can take advantage of that similarity to
optimize the amount of information that needs to be transmitted
when a new block is found.
Set reconciliation and Invertible Bloom Lookup Tables

Set reconciliation means finding the difference between two sets of data.
To optimize new block broadcasts, the sets we care about are the set of
transactions transactions in a new block (call that set "B") and the set
of transactions that a node already knows about (call that set "P").
Ideally we want to transmit just the 80-byte block header and transaction
data for B-P. Both B and P will grow in size as transaction volume and the
size of blocks grows, but if transaction propagation is mostly reliable B-P
will be a constant size (see the section below on incentives for more discussion).
Invertible Bloom Lookup Tables (IBLTs) are a new (described in 2011) data
structure that can be used to solve the set reconciliation problem.
The idea of an IBLT is to squash together (using hashing and XOR) all of
the transaction data into a fixed-size data structure. Then transmit that
squashed data to a peer, who mostly knows what data you are sending them,
but might be missing a few transactions or might be expecting that you
put a few transaction in your block that you actually didn't.
The size of the IBLT needed for all of that to work is proportional
to the number of differences between what you send and what your peer
expects.
There are three key ideas to make this work:

canonical ordering of transactions in blocks
similar (but not identical!) policy for selecting which mempool
transactions go into blocks
Peer sends us an IBLT(newblock) large enough that we are very likely
to be able to recover transaction data that is not yet in our mempool.

Canonical ordering of transactions

I propose the following algorithm as the canonical ordering for transactions
in a block. Canonical ordering will not be a consensus rule; it is just used
to eliminate the need to communicate the order of transactions when IBLTs are
used.

Start with the set of transactions in the block (including the coinbase transaction)
Sort them in ascending order, using the smallest
(previous-transaction-hash, index) of each transaction's inputs as the sort key.
Note that the coinbase transaction will be first, since its single input's
prevout/index is all zero.
Add the first transaction on the sorted list that does not depend on
a transaction later in the list to the block (and remove it from the
sorted list).
Continue adding transactions until the sorted list is empty.

Communicating transaction selection policy

The reference implementation allows miners to specify:
Maximum size (in bytes) of their blocks
Maximum size set aside for "high priority" transactions, included regardless of fee.
Transactions are selected based on priority and then fee-paid-per-kilobyte.
Communicating the 80-byte block header, the size (in bytes) of the transaction data,
and the size (in bytes) of the high-priority transactions is sufficient for
the IBLT set reconciliation algorithm to work.
Encoding transaction data in the IBLT

Invertible Bloom Lookup Tables store key-value pairs. So given a set of
transactions in a new block, what is the best way to encode them in an
IBLT? What should be used as the key, and what the value?
We want the values in the IBLT to be a fixed, not-too-long length. Transmitting
an IBLT with values that can hold the maximum possible transaction size (100K bytes)
would not save any bandwidth. So I propose that the transaction data be serialized and
split into 8-byte (64-bit) chunks.
Two bytes of the key can then be used as a sequence number to reassemble the
transaction.
We want keys to be unique, but they only have to be unique for the
transactions in this block, they don't have to be globally unique.
Using 48 bits of the transaction data's hash gives
about a 0.1% chance of a collision even if we scale up to blocks with
a million transactions in them, and combined with a 16-bit sequence number
that give 64-bit keys. The transaction selection code can simply refuse to
select transactions whose hashes collide with a transaction already in the block
(they will just have to wait and get confirmed in the next block; a 0.1% chance
of collision in million-transaction blocks means one transaction in a billion gets
delayed).
Using the transaction hash directly opens up a possible denial-of-service attack;
an attacker might try to brute-force the first 48 bits of the transaction id to
keep somebody else's transaction from getting included. Or they might try to
create transaction ids that collide in the hash buckets used internally by the IBLT.
Those attacks can be prevented by combining the transaction ID with 48 bits of
random "salt" (which an attacker cannot know) before using it
as a key in the IBLT.
Pseudo-code algorithm for encoding a new block in an IBLT:
Pick transactions from the memory pool to be in the new block, and
put them in the block in canonical order.

Create a new empty IBLT that is large enough to encode expected
differences between the block we created and the blocks our peers
are creating. Select a random 48-bit salt.

Foreach transaction, including the coinbase:
  Calculate the 48-bit txid+salt
  Serialize the transaction and split it into 64-bit chunks
  Foreach chunk i:
    insert key=txid48+i, value=chunk

Search for a solution to the new block.

When a block solution is found:

Foreach peer running IBLT-capable version of the protocol:
  Transmit block header, sizeof(txdata), sizeof(high_priority_tx), salt, IBLT
Foreach peer running an older version of the protocol:
  Send the whole block in a 'block' message

Securely calculating txid48

To prevent brute-force attacks, we want at least 128 bits of the transaction ID
to contribute to the IBLT key. I propose hashing the transaction ID with a
256-bit salt chosen by the miner, and taking 48 bits of the result.
Decoding an IBLT new block message

Peers receiving the block header and IBLT could immediately relay it
to their peers after checking the proof-of-work in the block header--
although it is debatable as to whether or not they should before decoding
the IBLT and validating the transactions.
Pseudo-code for decoding the data in an IBLT:
When an IBLT for a new block ("IBLT_new") is received from a peer:

Create a new, empty IBLT ("IBLT_us") and fill it with transactions from the memory
pool (based on byte-size hints received from peer).

Calculate IBLT_diff = IBLT_new - IBLT_us  (this takes constant time)

Decode IBLT_diff; if decoding succeeds, the result is two lists of
(txid48+sequence, chunk) : transactions that the peer included
that need to be added to the block we created from our memory pool, and
transactions that the peer did not include that we need to remove.

Concatenate the chunks and deserialize the transactions, then insert
them into the new block in canonical order. Then proceed with all the
normal block and transaction validation.

If decoding fails (memory pools or mining policies differ too much, or a bug
in the peer's IBLT creation code leaves out a chunk of transaction data or
something), fall back to getdata/block messages or maybe ask for an extension
of the IBLT that will successfully decode.


What about blockchain re-orgs?

All of the above assumes we are receiving a new block building on our
current idea of the best block chain.
If our current best chain is A->B, and we receive IBLTs for
an alternative longer chain with block B' and C', how do we
reconstruct B' ?
The simplest answer is to just request the full block data for
B' -- re-orgs are rare.
But the set of transactions in B and B' should be very similar,
so it might be worthwhile to reconstruct B' from it's IBLT and
an IBLT created from block B.
Alternative or complementary approaches

"I know that you know..."

Matt Corallo has implemented a fast block relayer tool. It keeps track of
which transactions have been sent to each peer, and then when a new
block is found it replaces transaction data that it knows a peer has
with 10 bytes of the transaction ID.
Transmitting just 10 bytes of the transaction ID instead of the full
average-250-byte transaction gives a 25-times bandwidth savings.
TODO: measure Matt's approach against an IBLT approach. My guess is
at small block sizes the simpler approach works better, but at
some point IBLTs win.
THOUGHT EXPERIMENT: IBLT-ifiy the data Matt's fast relayer sends?
Should theoretically get the same constant factor of 25 improvement
but be O(1) instead of O(n)....
Relay just headers

Several people have proposed relaying just the 80-byte block header,
with miners checking the proof-of-work and then beginning to mine
an empty block on top of whichever header has the most proof-of-work
(assuming that nobody waste time creating a valid-proof-of-work
block that contained invalid transactions). Once the full transaction
data was transmitted miners would then switch to mining a block full of
transactions.
See bitcoin/bitcoin#3195 for discussion on why this
is not a good idea.
Pre-broadcast blocks

Miners could broadcast the blocks they are working on to the network before they
have solved any blocks. If they then do find a block, they could then just
broadcast the winning nonce and coinbase transaction (and some identifier for
"that unsolved block I sent you a while ago").
I think broadcasting "working blocks" is a great idea, for a couple of reasons.
First, it gives merchants an idea of what percentage of hashing power is working on including
their transction in the blockchain, helping make 0-confirmation transactions more secure
(if 90% of active miners are working to include your transaction after a few seconds, and none
are working on including a double-spend, then it is reasonably likely it will be confirmed in
the next block or two).
Second, it gives other miners a view into what the majority of hashing power is working on, and
should allow miners to better monitor the health of their network connections.
A proposal for pre-broadcasting unsolved blocks would have to address bandwidth usage and
denial-of-service attacks; perhaps such a proposal could use IBLTs to minimize the amount of
memory and bandwidth used by the nodes on the network.
Road map / TODO

Instrument some bitcoind's running on the production network to find out
how much memory pools and transaction selection policies differ today.
Calculate the IBLT size needed for that set size difference to achieve 1%
decoding failure.
Create a test bench environment with real-world transactions and blocks,
so the IBLT approach can be compared against other approaches or
optimizations to the IBLT code can be measured to make sure they actually
improve latency.
Write code, work out all the details, benchmark/test along the way.
Once stable and starting to be deployed: write up BIPs describing
new p2p protocol changes.
Will this skew incentives?

Making propagation of new block announcements O(1) instead of
O(n_transactions_in_block) means miners should include more fee-paying
transactions in their blocks.
However, propagation is only O(1) for blocks containing transactions already seen and
accepted by a majority of the network. There is an incentive for all miners to include
roughly the same set of transactions in their blocks; no miner will want to create
a block containing mostly transactions that some of the network refuses to relay
or mine, because they will have to include that transaction data in their new
block announcement, which slows down block propagation.
The converse is also true: miners that refuse to include transactions that the
rest of the network would accept will have to transmit bigger IBLTs.
A miner trying to optimize the propagation of their blocks would only include
transactions that are in all of their peers' memory pools; note that this gives
power to "relay nodes" that are not mining. If a significant number of relay
nodes refuse to remember very-small-fee transactions in their memory pools
(perhaps they are dropping them because they run into CPU or bandwidth limits)
then miners have an incentive not to include them in their blocks. This will put
a floor on how low transaction fees will go.
But I heard that Bitcoin Doesn't Scale...

People claiming that "Bitcoin Doesn't Scale" are theoretically correct:
you still need O(n) bandwidth and CPU to fully validate n transactions-per-second.
Someday, when Bitcoin is the number 2 payment network in the world,
we might have to start worrying about that. Here are a couple of
back-of-the-envelope calculations that show that we should be able
to scale up to n=15,000 transactions per second before running into
that O(n) bandwidth limit.
For perspective, the number 1 payment network in the world today (Visa)
handles about 212 million transactions per day; 2,500 transactions per second
on average. Their peak processing capacity, needed on the busiest shopping days,
is reported to be 40,000 tps.
My home Internet connection is getting about 90 megabits download bandwidth
per second right now. An average Bitcoin transaction is about 2,000 bits,
so my current consumer-level Internet connection could download 45,000
transactions per second, over ten times average Visa transaction volume.
While it is nice to know that I could run a full node handling
more-than-Visa-scale transaction volume from my house, running a dedicated
machine in a data center somewhere makes more sense. 15,000 250-byte
transactions per second works out to about 7 terabytes of bandwidth per month.
One of my hosting providers charges $20 per month for a virtual private
server with 8 TB of bandwidth per month-- or $240 per year to handle
MasterCard-level transaction volume today (August 2014).
References

Goodrich, Michael T., and Michael Mitzenmacher. "Invertible bloom lookup tables." Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on. IEEE, 2011.  http://arxiv.org/pdf/1101.2245
Eppstein, David, et al. "What's the difference?: efficient set reconciliation without prior context." ACM SIGCOMM Computer Communication Review. Vol. 41. No. 4. ACM, 2011.  http://conferences.sigcomm.org/sigcomm/2011/papers/sigcomm/p218.pdf
Acknowledgements

Thanks to Gregory Maxwell and Emin Gün Sirer for pointing out relevant results from the
information coding and set reconciliation literature.