sipa/rsblockxfer.md Secret

## rsblockxfer.md

      
    Raw
  

              rsblockxfer.md
            
          
    Assume we are sending a block with t transactions, and are trying to reconstruct it using m mempool transactions. Each of those t transactions has an expected chance r to belong to the set of m mempool transactions.
Transactions are assigned a 64-bit identifier which is a function of the block hash and the txid, so this identifier is
assumed to not be grindable. Thus, we assume that these identifiers are indistinguishable from uniformly random.
For each transaction, we send i first bits of its identifier.
The chance of not hitting any given (wrong) mempool transaction is 1-2^(-i). Call this number x.
There are 5 different cases to consider for each transaction in the block:

The receiver has exactly one mempool transaction, which is the correct one. This has chance c1 = r*x^(m-1).
The receiver has no transactions with those initial bits in its mempool. This has chance c2 = (1-r)*x^m.
The receiver has the correct transaction in its mempool, but also at least one other transaction with the same i
bits. This has chance c3 = r*(1-x^(m-1)).
The receiver has at least two incorrect transactions in its mempool, but not the correct one. This has chance
c4 = (1-r)*(1-x^m-m*(1-x)*x^(m-1)).
The receiver has exactly one matching transaction in its mempool, but the wrong one. This has chance
c5 = (1-r)*m*(1-x)*x^(m-1).

Example numbers for i=32 m=10000 r=0.9:

c1=89.99979047%
c2=9.99997671%
c3=0.00020953%
c4=0.000000000027%
c5=0.00002328%

Note that c1 + c2 + c3 + c4 + c5 = 100%.
This results in 3 combined cases:

We match, with chance c1 = r*x^(m-1)
We think we match but don't, with chance c5 = (1-r)*m*(1-x)*x^(m-1)
We know we don't match, with chance c2 + c3 + c4 = 1 - c1 - c5.

This means that after filtering out the known mismatches, we are left with (1 - (c2 + c3 + c4)) * t = (c1 + c5) * t
transactions, each of which has a chance of c5 / (c1 + c5) = 1 - r / (r + (1-r)*m*(1-x)) to be wrong. This number is
0.00002587% in our case.
This leads to a binomial distribution B((c1 + c5) * t, c5 / (c1 + c5)) for the total number of errors.
As (c1 + c5) * t is large and c5 / (c1 + c5) is small, this can be approximated using a Poisson distribution with
lambda = c5 * t.
If t=2500, that means that there is a chance of 0.000017% that there are more than 2 errors.
The chance that every transaction in a block is either a match or a known nonmatch (so no error correction is needed) is (1 - c5) ^ t. This number can be approximated by m * t * r / 2**i.