bissias/graphene_decode.md Secret

## graphene_decode.md

      
    Raw
  

              graphene_decode.md
            
          
    Improving Graphene Decode Rates by Predicting Missing Transactions

Graphene v1.0 assumes that the receiver's mempool is a superset of the block. When this assumption is not true, the IBLT will not decode and a fallback block must be sent. The goal of this gist is to explore ways to reduce the number of decode failures by having the sender detect and react to missing transactions on the receiver end. There is a fundamental tension to this problem: by acting conservatively, we can reduce the decode rate to nearly zero, but in doing so we might sacrifice the message efficiency of Graphene. In the solutions presented below, we will attempt to balance these competing factors.
Sender Detection

There are (at least) two means of detection by the sender: 1) comparison of mempool sizes, and 2) use of filterInventoryKnown.
Comparing mempool sizes

This is the most straightforward approach, but it requires slightly more involved reasoning by the sender than what is done currently. For clarity, in the description below I will ignore corner cases such as times when the receiver mempool is smaller than the size of the block.
Let nB denote the size of a new block. When the receiver initiates a Graphene block request, she sends the size of her mempool nR (in terms of transactions). From this, the sender must infer two things, nE, the number of excess transactions (beyond those in the block) in the receiver's mempool, and nM, the number of transactions from the block the receiver is likely missing. The current Graphene implementation assumes that nM = 1, and therefore, nE = nR - nB + 1. However, this assumption proves to be grossly inaccurate at times when the sender and receiver mempools are out-of-sync. In particular, the receiver is frequently missing a large number of transactions from the block so that nM > 1. Moreover, it is often the case that nR is approximately equal to nB, but only because nE is large enough to offset large nM.
The approach we take is to have the sender use the size of his mempool nS (after adding transactions from the block, and before clearing them from the mempool) to infer nE and nR. To that end, he estimates nE = nS - nB and nM = nB - (nR - nE). Note that these estimates have some nice properties. First, when the sender and receiver mempools actually are in-sync, nE and nM are exact. Second, when mempools are not entirely in-sync, the estimates get us closer to the correct values. For example, today's implementation conflates two common scenarios. Suppose nB = 50 and the receiver conveys nR = 50 as well. On one extreme nM = nE = 0, the receiver has exactly the transactions in the block. At the other extreme, it could be the case that nM = nE = nB, the receiver has no transactions from the block. Today's protocol always assumes the former, optimistic, case. With the proposed scheme, even if the sender is missing say 5 of the transactions in the receiver's mempool, he can still largely differentiate between the two scenarios. In the first scenario, the sender might find that nS = 95, this in turn suggests that nM = nE = 45. But for the second scenario he could find nS = 55, from which he would conclude that nM = nE = 5, much better estimates.
Using filterInventoryKnown

The filterInventoryKnown data structure is an instance of the rolling Bloom filter CRollingFastFilter that is used in Bitcoin Unlimited's main transaction processing loop. Any time we (the sender) either receives or sends an inv for a transaction to a peer (the receiver in our case), the transaction's ID is added to filterInventoryKnown. Thus, to the extent that the filter is accurate, we can pass the transactions from the block through filterInventoryKnown and determine the set of transactions the receiver is missing sM.
Of course the filter is not entirely accurate. In fact, because it is a rolling Bloom filter, it produces both false positives and false negatives. Currently the false positive rate is fixed at just under 1%, while the false negative rate floats depending on how full the filter is (how many bits are set). The precise false negative rate for filterInventoryKnown is not calculated and it may not be straightforward to do so. Note that false negatives will result in transactions being incorrectly added to sM, while false positive will result in transactions being incorrectly omitted from sM. Thus we can approximate the number of transactions the receiver is missing to be nM = 1.01 |sM|, but this will be an overestimate subject to the false negative rate. It is also important to note that this approach has a distinct advantage of the one presented in the previous section, the set sM can be used to proactively send transactions along with the block (see below).
Sender Reaction

There are also two approaches to reaction by the sender: 1) pad vAdditionalTxs amd 2) pad the IBLT.
Pad vAdditionalTxs

Here we use filterInventoryKnown to estimate the missing set of transactions sM. Because of the large number of false negatives, we observe empirically that sM typically includes several transactions that the receiver is not actually missing. Therefore, if we were to always send sM, then each block would be padded with several unnecessary full transactions. Yet comparing mempool sizes tends to be fairly robust. Empirically, when nR < nS, the receiver typically is missing some transactions. So our approach is to first test mempool sizes, and if nR < nS, then add the contents of sM to vAdditionalTxs, which is sent as part of the Graphene block.
The file pad_vAdditionalTxs.patch (linked below) is an implementation of this approach, which can be applied to commit b672681 on the BitcoinUnlimited dev branch. We ran this patch ("vAdd") on mainnet and processed over 600 blocks. During that time, 597 Graphene blocks were successfully decoded while 13 failed to decode (see link pad_vAdditionalTxs.log below). These numbers are slightly misleading because actually the first two blocks contained so many additional transactions that they were sent as full blocks. It's highly likely that those blocks would have failed to decode as well. At the same time, we ran a "baseline" Graphene peer that connected to 6 other Graphene peers within mainnet (see link baseline.log below). This peer successfully processed 597 Graphene blocks and experienced 16 failures.
The decode results for "vAdd" represent a modest improvement over "baseline", especially considering that the "vAdd" peer connected to the network through a single peer, which typically increases decode failure rates. Nevertheless, this approach does have some significant limitations.

On the "vAdd" peer, the compression ratios were frequently rather poor (in the single digits); in aggregate the "baseline" peer saved 69.29MB while the "vAdd" peer saved only 41.71MB (~40% less).
More troubling is the fact that "vAdd" does not significantly improve decode failures over "baseline" peers. It seems that for large blocks, the false positive rate of filterInventoryKnown (which is set at 1%) foils the approach. For example, a 500 transaction block is expected to produce 5 false positives. In cases where the receiver is missing a large portion of the block, it becomes quite likely that one or more of the 5 false positives will be among the set of missing transactions.
Graphene peers cannot be gradually upgraded with this approach because a receiver-side change is necessary to prevent erroneous collision detection.

Potential improvements


It is possible that compression could be improved by sending missing transaction cheap hashes instead of the entire transactions.
It might be possible to reduce decode failures by padding the IBLT with a quantity of cells equal to the expected number of false positives.

Note that improvement 1 would require a change to the Graphene spec to accommodate a vector of missing cheap hashes. The change would also require that all Graphene nodes upgrade, otherwise receivers on the old version of Graphene would not be able to deserialize blocks sent from upgraded senders.
Pad the IBLT

In this approach, we add the detection logic from Section Comparing mempool sizes to the constructor for CGrapheneSet, which requires that the sender passes his mempool size to the constructor. From there the receiver's excess  (mempool) transactions and missing transactions are estimated. These values are 1) passed to OptimalSymDiff to be used in determining the optimal IBLT size and 2) incorporated into the calculation of the optimal false positive rate for the Bloom filter. The changes are captured in pad_IBLT.patch (linked below), which can be applied to commit b672681 on the BitcoinUnlimited dev branch.
We ran this "padIBLT" patch concurrently with "vAdd" and "baseline" (see pad_IBLT.log linked below). In total, 611 Graphene blocks were successfully decoded with only 2 decode failures. Compression was also quite good with 58.96MB saved (~15% less than baseline). When compared to the compression rate for Compact Blocks (with 8 bytes per cheap hash), "padIBLT" offers consistently greater compression for all blocks containing more than 200 transactions (see figure pad_IBLT_performance.png below). This can be compared to empirical results (Figure 14), which show that "baseline" Graphene offers superior compression over Compact Blocks when the block contains more than 100 transactions. For larger blocks (e.g. 5K transactions, not shown in the figure), "padIBLT" produces Graphene blocks that are as little as 1/3 the size of the analogous Compact Block, even when order information is included. This is actually slightly better than the empirical results for "baseline". Thus, overall "padIBLT" offers vastly improved decode rates over "baseline" at the expense of slightly degraded compression rates for smaller blocks.
An important benefit to padding the IBLT is that, although the signatures of several methods in CGrapheneBlock and CGrapheneSet must be altered, no changes need be made to the Graphene spec. This means that "padIBLT" can be rolled out incementally and every receiver can process blocks from an updated sender, even if the receiver itself is running "baseline" Graphene code.
Links


pad_vAdditionalTxs.patch
pad_vAdditionalTxs.log
baseline.log
pad_IBLT.patch
pad_IBLT.log


## pad_IBLT_performance.png

      
    Raw
  

              pad_IBLT_performance.png