Instantly share code, notes, and snippets.

# sipa/rsblockxfer.mdSecret Last active May 9, 2016

Assume we are sending a block with `t` transactions, and are trying to reconstruct it using `m` mempool transactions. Each of those `t` transactions has an expected chance `r` to belong to the set of `m` mempool transactions.

Transactions are assigned a 64-bit identifier which is a function of the block hash and the txid, so this identifier is assumed to not be grindable. Thus, we assume that these identifiers are indistinguishable from uniformly random.

For each transaction, we send `i` first bits of its identifier.

The chance of not hitting any given (wrong) mempool transaction is `1-2^(-i)`. Call this number `x`.

There are 5 different cases to consider for each transaction in the block:

1. The receiver has exactly one mempool transaction, which is the correct one. This has chance `c1 = r*x^(m-1)`.
2. The receiver has no transactions with those initial bits in its mempool. This has chance `c2 = (1-r)*x^m`.
3. The receiver has the correct transaction in its mempool, but also at least one other transaction with the same `i` bits. This has chance `c3 = r*(1-x^(m-1))`.
4. The receiver has at least two incorrect transactions in its mempool, but not the correct one. This has chance `c4 = (1-r)*(1-x^m-m*(1-x)*x^(m-1))`.
5. The receiver has exactly one matching transaction in its mempool, but the wrong one. This has chance `c5 = (1-r)*m*(1-x)*x^(m-1)`.

Example numbers for `i=32` `m=10000` `r=0.9`:

1. `c1=89.99979047%`
2. `c2=9.99997671%`
3. `c3=0.00020953%`
4. `c4=0.000000000027%`
5. `c5=0.00002328%`

Note that `c1 + c2 + c3 + c4 + c5 = 100%`.

This results in 3 combined cases:

• We match, with chance `c1 = r*x^(m-1)`
• We think we match but don't, with chance `c5 = (1-r)*m*(1-x)*x^(m-1)`
• We know we don't match, with chance `c2 + c3 + c4 = 1 - c1 - c5`.

This means that after filtering out the known mismatches, we are left with `(1 - (c2 + c3 + c4)) * t = (c1 + c5) * t` transactions, each of which has a chance of `c5 / (c1 + c5) = 1 - r / (r + (1-r)*m*(1-x))` to be wrong. This number is `0.00002587%` in our case.

This leads to a binomial distribution `B((c1 + c5) * t, c5 / (c1 + c5))` for the total number of errors. As `(c1 + c5) * t` is large and `c5 / (c1 + c5)` is small, this can be approximated using a Poisson distribution with `lambda = c5 * t`.

If `t=2500`, that means that there is a chance of `0.000017%` that there are more than 2 errors.

The chance that every transaction in a block is either a match or a known nonmatch (so no error correction is needed) is `(1 - c5) ^ t`. This number can be approximated by `m * t * r / 2**i`.