JeremyRubin/gist:1a3260431b3ee4afbaa0

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Block Size Fix: Two Phase Commit

With thanks to:
Adam Back, Matt Corallo, Peter Todd, Alex Morcos, Dan Elitzer, Nathan Wilcox, Pindar Wong, Chelsea Barbaras, Jonathan Harvey-Buschel, and others for feedback on this proposal.
TL;DR:

The basics of this fix are switching the Block Confirmation protocol from a
single phase to a double phase commit protocol, where the second commit commits
an amount downloaded. The goal of the Two Phase Commit is to allow for
optimistic up-sizing of blocks while decentralizing power, giving the ability to
partially reject over sized blocks.
Introduction:

This proposal suggest adding the following field to the block header.


Bytes
Name
Data Type


32
two phase amount
char[32]


The two phase amount field specifies a pointer into the merkle tree of the
previous block which was mined.
The miner selects the pointer based on the
transactions which they were able to download and verify. The miner should
attempt to download the Merkle tree in a fixed order (ie, from 0 +).
All transactions less than the pointer are considered committed after these two
blocks. Transactions greater than that point may or may not be valid, they are
not checked and not committed.
For example, a miner who mines a block at height X-1 with 10,000 transactions would begin to spread it on the network.
The miner of block X would select a pointer (say, they were only able to get 1000 transactions) to the 1000th entry.
Essentially, this gives the rest of the miners the ability to try to make blocks
as large as possible, while giving the other miners a feedback mechanism to
respond to bandwidth concerns and antagonistic run time verification.
There are a couple of key concerns with this method.
Pointer Format:

Generally, there are multiple possible pointer algorithms which could be used, and fall generally into the following design constraints:


Pointer Scheme
Description


Direct addressing
Every single transaction has a position number in the Merkle tree which is the ID or each "word" does based on a very large max size (ie, kilobyte words with 64-bit pointer)


Percentage addressing
Direct addressing with a larger "word size" of transactions dynamically chosen with respect to the total size of the block. The challenge is that block size must also be propagated as it is not apparent from the Merkle root.


Hash addressing
Use the hash of the transaction read up till.


Merkle Addressing
A variable length bit string which specifies the highest branch fully known in the Merkle tree


Mask Addressing
A 32 or 64-bit integer representing a masking pattern over the block is used. The benefit of this is that a download can "nucleate" around the transactions already recieved so that out of order processing can occur. The downside is complexity!


There are likely other options as well. Hash addressing has been selected
for the sake of simplicity in this proposal but may be non-optimal. The
table of other ideas serves to demonstrate that clever addressing
schemes may convey interesting advantages or disadvantages.
Antagonistic Pointer Selection:

A miner might be incentivized to try to select a too-small pointer so that they
may claim the fees for the transactions. However, this can be prevented because
a block which sets a pointer to H at some intermediate position m could be
rejected if it contains any transaction present > m in the previous block
(which would suggest cheating). Rather than slowing down the consensus
protocol, because it may be hard to fully download these blocks as they can be
large, the threat of rejection can simply be that if this is later detected
(over the 100-block wait time for Coinbase transactions) that there is an
additional header field for a rejection proof set that would invalidate the
block rewards but not the block. This provides a strong incentive for honesty.
Another possible way to incentivize this would be phase shifting or smoothing out
the fee rewards.
Consensus Time:

Although an additional block is required to perform the two-phase commit,
consensus time is not worsened. This is because a user can still verify
that they were in the block, but they cannot guarantee that their transaction made
it into the next block's cutoff. However, this is not much worse than the
current situation, as a block may be orphaned anyways. Furthermore, a
transaction at an earlier position in the block could be considered to be
more likely to be accepted, giving a somewhat granular chance of
confirmation for arbitrary sized blocks.
Transaction Order:

A miner would prefer to order their block by transaction fees so that the
most likely to be committed are high value.
Effect on Block Size:

Block sizes will be floating as a result of this changes. A miner can try
to propagate any size block, and the rest of the network can try to mine it
with full verification correctness without worry of continuing an invalid
chain.
All bottle necks are accounted for implicitly, as a miner can increment the
pointer as they are mining and process more transactions in the last block.
Incentives:

In order to incentivize the adoption of this system, as touched on
earlier, it might also be desirable for a miner who commits a
higher pointer to share in the transaction fees.  However, this
becomes tricky because then miners might have an incentive to not
verify. However, both blocks would be invalid if that were the
case so this is not a true incentive.
Fork Resolution

When there are two competing blocks, fork resolution rules could remain the
same. However, other factors could also be considered, such as amount in
prior block committed.
User Downloads:

Unfortunately, this does not lighten the burden on users running nodes,
however a miner could select it's pointer based on the average of it's
peers pointers or some other metrics which take into account non miner user
bandwidths. Fortunately, the burden of SPV proofs is not greatly increased
Block Withholding:

This proposal creates new, interesting incentives with regards to
block withholding attacks (attacks where a miner only propagates to a
subset).. It is unclear how this affects optimal mining strategy,
which in current Bitcoin also involves block withholding.
Storage Increase

Storage increases are minimal in this system because the Merkle
tree can be lopped off, eliminating the non-committed
region. Merkle trees grow at log(n) so it isn't a major problem to
have a slightly larger proof for SPV clients.
Real World Use Case:

This system, at first glance, seems to suggest that block sizes will be
rampantly oscillating or trying to be as large as possible. This should
not be the case, as it will form a feedback loop to make appropriately
sized blocks so it should damp out, but provide a flexible method for
organic creep.
There is still a meta problem of a true max block size (eg, what
if just one path of a Merkle tree is a megabyte, would accepting
only a few transactions be ok?), however this meta-protocol could
be made naive and use a multiplier of 10x the last few blocks
pointer. Additionally, fees create a disincentive for creating
blocks that won't propagate.
Phase Delay + ith-Phase:

This document assumes the next block should file the commit. However, it
would be equally valid for the 2nd or 3rd following block to commit. This
would support potentially larger blocks, however there might be overlap in
an intermediary block. Special rules could apply which allow a transaction to
appear multiple times in the n-sized window but only be put in the mempool
once.
Additionally, some set of Phases could be used to increase the
pointer size over multiple blocks. There are many possible rules
for pointer selection, such as min, max, average, exponentially weighted moving average, etc, over the
window.
Pointer Scheme	Description
Direct addressing	Every single transaction has a position number in the Merkle tree which is the ID or each "word" does based on a very large max size (ie, kilobyte words with 64-bit pointer)
Percentage addressing	Direct addressing with a larger "word size" of transactions dynamically chosen with respect to the total size of the block. The challenge is that block size must also be propagated as it is not apparent from the Merkle root.
Hash addressing	Use the hash of the transaction read up till.
Merkle Addressing	A variable length bit string which specifies the highest branch fully known in the Merkle tree
Mask Addressing	A 32 or 64-bit integer representing a masking pattern over the block is used. The benefit of this is that a download can "nucleate" around the transactions already recieved so that out of order processing can occur. The downside is complexity!