amitiuttarwar/transaction_rebroadcasting.md

## transaction_rebroadcasting.md

      
    Raw
  

              transaction_rebroadcasting.md
            
          
    Background: Currently, a node will only rebroadcast a transaction if it is the originating wallet. This is awful for privacy.
PR #16698 reworks the rebroadcast logic to significantly improve privacy.
Overview of changes


Instead of the wallet directly relaying transactions to peers, the wallet will submit unconfirmed txns to the node, and the node will apply logic to trigger txn rebroadcasts.
The node will apply rebroadcast conditions to all transactions.
The wallet will attempt to resubmit unconfirmed transactions to the node on a scheduled timer. This is only useful if the txn is dropped from the local mempool before it gets mined.
The mempool tracks locally submitted transactions (wallet & rpc) to ensure they are succesfully rebroadcast. Success is defined as receiving a GETDATA for the txn.

New rebroadcast conditions:

Regularly run a fee rate cache that computes top of mempool & stores min package fee rate for txn to be included
When it is time to rebroadcast, calculate top transactions that are older >30 mins.
Filter out an txns with fee rate < cached fee rate
Queue remaining set to be sent to peers

Params & currently proposed values

These constants are all able to change to adjust rebroadcast behavior.

Frequency of resubmission attempt from wallet to node -> wallet resubmits once / day
Frequency of triggering rebroadcast -> ~ once / hour
Defining highest priority transactions (top of mempool for potential txns to rebroadcast ) -> 3/4 block worth of txns based on package fee rate
Define what “recent” transaction means -> only rebroadcast if txn is >30 minutes old.
Frequency of fee rate cache -> 20 minutes

Fundamental concepts & design choices

topic: avoid extreme bandwidth spikes network-wide
current prevention strategies:

poisson distribution of rebroadcast timings per node
filtering logic for rebroadcast candidates
filterInventoryKnown is a per-peer rolling bloom filter that prevents resending invs to the same peer within a short time span.
worst case hard limit from chosen params (currently 3/4 block of txns every ~1 hr)

Monitoring the network

after running the patch for..

10 days, node has only outbound connections -> 30 additional invs sent to peers (28, 2, 1)
8 days, node also accepts incoming connections -> 186 additional invs sent to peers (22, 29, 28, 2, 3, 24, 35, 7, 5, 3,  28)

Since each inv message is 36 bytes, this means...

~1 kb of data sent in 10 days with only outbound connections
~6.5 kb of data sent in 8 days when also accepting incoming connections

Other things to monitor

2 rebroadcast nodes connected to each other
How many of these INV messages are actually followed with a GETDATA?

Other resources


PR review club: https://bitcoincore.reviews/16698.html
Conceptual example of how the rebroadcast filters work: https://gist.github.com/amitiuttarwar/17ddf44e28e3de896b9be0139621f6f9

Open questions & solutions

concern: excessive bandwidth usage per node

possible solution -> add a max of [# of rebroadcasts per duration] as a safety net (eg. 1000 txns / hour)
possible solution -> have ability to enable/disable new rebroadcast logic. could also be used for rolling out. downside would be fingerprint abilities, but privacy leak might be minimal. See walletbroadcast=0.

concern: introducing dependency on mining code

explanation -> bitcoin/bitcoin#16698 (review)
possible solution -> rebroadcast kill switch discussed above.
possible solution -> (proposed in same comment) rebuild mapTx index as a bounded-size priority queue. The current diff of the rebroadcast change is already significant so I'd prefer to avoid this route.

concern: nodes with old policies will always have txns that cannot be mined

explanation -> when policy rules are updated, nodes that are not upgraded can have their mempools cluttered with txns that will never be mined & never expire, since the rebroadcast logic ping pongs the txns between their pools. While this is already the case (it just takes 1 node to rebroadcast), these proposed changes increase these chances.
possible solution -> additional max_rebroadcast_count data structure. It would maintain a blacklist of txids, expiry time, and count (num times I rebroadcasted). The txn would be removed from the list if mined into a block. ATMP would reject a txn if maintained on the blacklist.

concern: one GETDATA is insufficient to ensure a txn was succesfully propagated to the network

explanation -> bitcoin/bitcoin#16698 (comment) & bitcoin/bitcoin#16698 (comment)
probably not a big deal. reasoning explained in links.
possible solution -> add a timer. Eg. timer starts when first GETDATA is received & txn is only removed from unbroadcast set after timer is up.
possible solution -> require x number of GETDATA messages before removing from unbroadcast set.

Follow up PRs


persist the unbroadcast txn set to mempool.dat
remove m_best_block_time
fix circular dependency introduced between txmempool & miner


Other stuff


There is an inherent tradeoff between defining param [top of mempool] vs param [age of transactions filtered out]. The values I have proposed opt to reduce #1 allow for leniency for #2. Having more recent transactions enables txns evicted from the mempool during volatility to be rebroadcasted and thus confirmed sooner.


Open question: what are privacy implications for when nodes have varied mempool expiry settings? For example, is it a privacy leak if you expire txns quicker, your wallet resubmits the txn, and you rebroadcast sooner than default 2-week expiration? -> While it would be great to understand the fingerprint possibilities, this is not a blocker because the privacy leak is dramatically less than the current situation.


Can the compact blocks relay code be used to minimize the data set? -> we'd need to introduce a different P2P message to indicate these are mempool transactions. Concerns about bandwidth usage can be addressed with simpler solutions.


What are the implications of empty blocks? -> TODO