stoichammer/Bitcoin-block-propagation-Ultra-compression.md

## Bitcoin-block-propagation-Ultra-compression.md

      
    Raw
  

              Bitcoin-block-propagation-Ultra-compression.md
            
          
    Bitcoin block propagation - Ultra compression

This solution paper is a sequel to my earlier post on reddit Bitcoin-SV: are terabyte blocks feasible?, I recommend reading that article first as it deals with requirements scoping & feasibility before continuing this paper.
Abstract:


A node, alongside its mempool, now maintains a 'stateful' index of unconfirmed transactions exchanged with each of its peers. So each node maintains a map of seqNo<->transactionIDs for every active peer connection.
Currently when a miner succeeds in the mining attempt, i.e. the nonce that produces the hash less than the network difficulty it transmits the entire contents of the block (legacy or Compact blocks BIP-152) to its peers.
In this proposal the miner can efficiently (in space & time) transmit a packed block containing the sequence of indices to its peer.
The peer unpacks the block, effectively reconstructing the ordered set of transactions contained in the block.
A node can relay the received block to another downstream peer by repacking the block (individually per peer) and transmitting the same.

In Depth:

Since peers are in an asynchronous network, and transactions (inbound & outbound) are disseminated highly concurrently through the network, it is necessary to ensure there are no index collisions, to avoid ambiguity while packing/unpacking the block.
This can be achieved by an interactive acknowledgement (ACK) based sequential numbering protocol b/w two peers but it increases time & computational complexity. Hence it is preferable to achieve a deterministic collision proof numbering by maintaining exclusive ingress & egress sequence numbers for inbound & outbound transactions resp.
The notional character stream of this naive implementation can be shown as below, i.e. ingress/egress flag & sequence number.
..., e|1539, e|1540, e|1541, e|1542, e|1543, e|1544, e|1545, e|1546, i|978, i|979, i|980, i|982, e|1549, ...

If we assume the average transaction size to be ~250 bytes, then in a typical block of 1 TB size we can expect total sequence numbers to be upwards of 4 billion (ingress & egress combined). In such a scheme an average ingress/egress seqNo will be large a integer about 9-10 digits long. There is scope to further data compression by switching from the above absolute addressing format to a more compact relative addressing format using Interval, Base & Offset counters as shown below.
Since nodes primarily order transactions ordered by time sequence of when they first see it, and since the sequence numbers are also associated in time based ordering, a given subset of consecutive outbound transactions can be represented succinctly with the following

Interval, a predetermined egress (or ingress) interval value, typically a small/tiny-int & is fixed for a session.
Base = previous Base +  interval; this value can be recomputed by using an efficient single bit flag as trigger.
Offset, an integer indicating distance from the Base

The notional stream would now be: ingress/egress flag, base recompute flag & offset
for: Interval = 20
..., e|n|19, e|y|0, e|n|1, e|n|2, e|n|3, e|n|4, e|n|5, e|n|6, i|n|18, i|n|19, i|y|0, i|n|2, e|n|9, ...

next, we can use number ranges to efficiently represent sets of consecutive sequence numbers of a given type (ingress/egress)
..., e|n|19, e|y|0-6, i|n|18-19, i|y|0, i|n|2, e|n|9, ...

we could have range-sets span across Intervals by including a span-count for the receiver to recompute Base value implicitly.
..., e|1|19-6, i|n|18-19, i|y|0, i|n|2, e|n|9, ...

Note: the stream representation is merely for human readability and a binary encoding would be used on the wire, it could be a custom binary encoding or an off-the-shelf one like Google's CBOR.
Besides the above, we can add bi-directional bit-wise compaction (ingress & egress) format to the list which vastly benefits compression especially when the sender is the original block publisher. In this format the individual bits of a integer (in binary) represent ingress vs. egress and sequence numbers are implicitly incremented under each direction.
..., b|4730, ...

4730 in binary is 1001001111010. 1s indicate ingress sequences and 0s for egress. Of-course this model is rigid & the seqNos are strictly sequential, and typically this format will be used in combination with the other formats. But when the sender is the original block publisher, it is probabilistically more likely  that the transaction ordering in the block is closely resembling the chronology of transactions exchanged with its peer. So the farther away the sender-receiver pair is from the block publisher (winning miner) the less likely this format will contribute to block compression. But since Bitcoin SV favours a Mandala'ish network topology the most powerful miners are likely to be connected directly to each other and hence this format would be very useful indeed to the peers who form the core of such a network.
So considering our initial assumption of an 1 TB block containing about 4 billion transactions, the best case scenario using bi-directional bit-wise format would be 4 billion bits => ~500MB. So shrinking a 1 TB block down to 500MB is about 2000x compression factor. Similarly absolute worst case scenario would be completely devoid of bi-directional bitwise & range-set features, and hence roughly about 4GB. For the miners forming the core (n/w topology) the size would be strongly leaning towards the best case scenario.
This solution does not need strict ordering of transactions of any sort, namely canonical ordering or CTOR (as adopted by BCH), and is designed to work with a loose topological (predominantly chronological) ordering of transactions that SV embraces.
Lastly, we can see how this packing(sender) & unpacking(receiver) process can itself be parallelized:

The ordered transaction set of transaction in a block can be segmented into a chunks and the chunks can be packed and transmitted to the peer independently.
Likewise the receiving peer will unpack the segments and can finally be coalesced together.
Interestingly the segments can be made to fit Merkle sub-trees (by having 2^n segments) which can be independently and hence parallelly computed and efficiently combined to ultimately validate the Merkle root contained in the block header.

Comparison with Graphene:

Graphene uses techniques such as Bloom filters and invertible bloom lookup tables (IBLTs) to compress blocks with better efficiency when compared to Compact Blocks (BIP-152). Graphene works on a probabilistic model for IBLT decoding, and there is a small chance of failure, in that event the sender must resend the IBLT with double the number of cells, the authors did some empirical testing and found this doubling was sufficient for the very few that actually failed. It seems the Graphene sizes are linearly proportional to the mempool sizes. But practically speaking, we need to take another factor "mempool divergence" into account, as network grows and mempools become larger the divergence increases, and in practice decoding failures will raise. One proposal to counter this is to request blocks from multiple (2/3) peers and merging them together, this decreases the probability of IBLT decoding errors at the cost of additional resources. There is also an open attack vector called the poison block attack where a malicious miner could mine a block with transactions that are held private, this will lead to a inevitable decode failure. Although this attack seems fatal to Graphene’s adoption, there is likely hope that game theoretical PoW underpinnings may come to the rescue.
Graphene distills the block propagation problem into the classical set reconciliation problem (Set theory; order of elements is irrelevant), it builds on the previous academic literature on Set reconciliation which also involved Bloom filters & IBLTs. It discards the concomitant time information of transactions and defaults to implicit ordering, typically canonical (ID sorting). But it supports supplemental order information to be included. If topological ordering of transactions is needed, additional ordering information has to be included at the cost of increasing the size of the block. It complements well with implicit ordering techniques like CTOR(Canonical Transaction ordering), although it deviates from Nakamoto style chronological ordering of transactions within a block.
Whereas Ultra compression (this paper) has a novel approach which leverages the concomitant time information of transactions to its advantage and achieves a much better compression factor. It does not approach the problem as merely that of Set reconciliation and instead by improves efficiency by encoding relative time sequence of transactions into a block.
The primary advantages are as below:

High compression factor, considerably smaller blocks compared to Graphene.
Not a probabilistic algorithm, and hence no decoding failures and hence no need for complex recovery mechanisms.
Mempool divergences are pre-addressed and hence corresponding problems do not arise, the efficiency of packing/unpacking does not get worse as network and mempools grow.
No serious attack vectors like the poison block attack (at-least not known yet)
Allows highly concurrent packing and unpacking
Allows Merkle sub-tree computations concurrently while unpacking
Highly scalable due to above, Tera-byte blocks & beyond.

Notable disadvantages:

Places a higher memory load on the system, needed for maintaining additional data structures like maps of seqNo<->transactionIDs
Packing & Unpacking needed at every hop, passive relaying is not possible. Although the high parallelism ensures very low latency.

Compression factors compared

Below are some straight forward calculations to help derive & compare the compression factors of Ultra >|< & Graphene, it avoids factoring in mempool divergence, decoding failures etc for simplicity.
The Graphene authors have produced empirical test results that it performs 10x better than Compact Blocks(BIP-152). Since each transaction is represented with a short ID of 6 bytes (SipHash) in compact blocks, it roughly gives a compression factor of 40x from the full uncompressed legacy block. (avg. Tx of ~240 bytes / 6 bytes)
So assuming the best case possibility, Graphene offers 400x compression factor from a full uncompressed block. So applying the 400x compression factor the 1TB block is reduced to 2.5GB.This assumes implicit transaction ordering by ID, if transactions are to be explicitly included then for a 1TB block with ~4000000000 transactions, we need additional n log2 (n) bits = 4000000000 × 31.89 = ~16GB. So for our example 1TB block its now 2.5GB, plus the ~16GB leading to a total of ~18.5GB. I am assuming CTOR (or any ID based implicit ordering) is unnecessary and has its own complexities & issues.
In contrast Ultra >|< offers a best case compression factor of 2000x & absolute worst case compression of 250x, and allows for any transaction ordering (unlike Graphene) without any additional space overhead.
Comparison with IBS (Incremental Block Synchronization)

IBS is a candidate block image mirroring protocol, where the candidate block is synced b/w peers incrementally until the block is found. In Ultra >|< the compressed block is propagated only AFTER the block is found. The distinction seems trivial at first but has important downstream consequences & trade-off distinctions.

In IBS, a node has to continually mirror candidate block images of every peer always, this includes computing merkle trees etc., it means significant computational complexity for each node(mining).When a winning block is found, all the other candidates are discarded hence wasted effort. One could choose to view it as a small portion of the effective wastage of PoW hashing, but the truth is this is not aiding in securing the chain and is inefficient.
Essentially linear/single threaded processing of transaction verification & merkle root computation/validation. Also, due to possibility of unconfirmed dependent transactions, the transaction order must be validated for each peer exclusively.
Mempool sharding will be exceedingly hard and exponentially resource intensive, implies scaling limitations.
Most significantly, IBS works well in a ‘complete digraph’ network topology (“A complete digraph is a directed graph in which every pair of distinct vertices is connected by a pair of unique edges”), but fails to work in a practical setting even when assuming a ‘small world network’ topology as embraced by BitcoinSV. This is due to the lack of provision (or at-least complexity involved) in efficiently relaying a block between nodes that are not directly connected. In such cases, it will either (a) - need to devolve into explicitly transmitting the entire formed block, which will be prohibitively expensive as sizes get larger (GBs/TBs) or ; (b) - resort to another fall back mechanism which means additional complexity, and defeats the purpose. In any case its not very prudent to adopt a propagation protocol which places undue restrictions on the network topology is can support in future.

But in case of Ultra >|<,

Efficient stable performance under a small world network topology even when the ‘avg. distance’ & ‘clustering coefficient’ values are pushed to the limits (no formal mathematical proof yet) merely stating intuitively.
Mempool sharding is feasible, by maintaining ZTXI, UTXO shards.
Transaction verification, mempool acceptance & merkle root computation/validationare parallelizable, as demonstrated in Mangrove architecture below https://gist.github.com/stoichammer/3bb383f3b4634b2196742ddac5ed6cd1
The computational complexity is much lower as there is no wasted mirroring of candidate blocks.
And although the memory/space complexity is high, it is still comparable to IBS.

In my subsequent post, I will cover a more comprehensive distributed system architecture for a Node that covers the following:

Parallel Transaction Verification
Parallel Block processing (under topological ordering)
Mempool sharding(Intra node)
UTxO set sharding
Parallel Merkle subtree computation