jnewbery/tx_relay_peer_prioritization.md

## tx_relay_peer_prioritization.md

      
    Raw
  

              tx_relay_peer_prioritization.md
            
          
    Scenario

Bitcoin nodes relay txs to each other over the P2P network. If a node receives
a valid tx from a peer, it adds it to the mempool and relays it to its
other peers.  If it receives an invalid tx from a peer, it must decide what action to
take. We distinguish three cases:

a transaction which is valid according to consensus rules, but invalid
according to the node's policy rules.
a transaction which is valid according to long-established consensus rules,
but invalid according to a recent consensus rule change (there are currently no recent
changes to consensus rules, so this category doesn't exist, but in
the event of a future softfork, it will).
a transaction which is invalid according to long-established consensus
rules.

Additionally, we distinguish three types of peers that may send us invalid txs:
(A) Malicious actors which are trying to waste our resources (CPU, bandwidth,
etc)

(B) Consensus-incompatible nodes (eg Bitcoin Cash or similar nodes)

(C) 'Honest' peers that wish to remain in consensus and are relaying txs that
they believe are valid. In pratice, these are nodes on old versions of Bitcoin
Core.
Intuitively, we want to disconnect from types (A) and (B) and remain connected
to type (C) to avoid splitting the network when there are changes in policy or
consensus.
Problem

It isn't easy to distinguish between peers (A), (B) and (C), and distinguishing
between (1) and (2)/(3) consumes additional resources.
In the worst case, a malicious party (A) can construct transactions to make it
appear to be an unupgraded node (C).
Current behaviour

When we validate a transaction for acceptance to our mempool, we first validate
its scripts against our own policy rules. If any of those script validations fail, we
then revalidate the script against the "Mandatory consensus rules" (everything up to the
P2SH softfork) to distinguish between policy-invalid and
long-established-consensus-invalid transactions. We disconnect peers which
send us long-established-consensus-invalid transactions and do not punish peers
which send us policy-invalid transactions.
This achieves the goal of disconnecting from (B) and remaining connected to
(C), but does not protect us from (A). A malicious node can send us
transactions that are expensive to verify, consensus-valid and policy-invalid.
We will do the expensive verification (more than once, since ATMP calls
CheckInputs() twice more to determine whether the transaction failed due to
a witness mutation) and not disconnect the peer.
Suggested improvements

deprioritize traffic from policy-incompatible nodes

We should continue to ban nodes which send us consensus-invalid txs, and
remain connected to nodes which send us policy-invalid txs, but deprioritize
tx traffic from them. There are various ways we could do this:

visit the node less frequently in the ThreadMessageHandler() loop (ie when
looping through the peers to call ProcessMessages(), only call
it for the misbehaving node 1/x times).
add a delay to the next time we visit the node every time we receive a
policy-invalid message.
rate limit how many/frequently we request and process txs from the peer, by
adding logic to the CNodeState.TxDownloadState object.

All of these approaches are limited in their effectiveness by the fact that a
malicious peer can simply disconnect/reconnect to circumvent the
deprioritization.
An alternative approach is not to deprioritize traffic from nodes sending us
policy-invalid transactions, but prioritize traffic from nodes which aren't:

when we establish a new connection to a peer, rate-limit tx traffic from
it to 1 tx/s (max 1 tx in flight, getdata interval of 1 second)
over the course of one week, linearly increase that rate limit to 20 tx/s.
Whenever the peer sends us a policy-invalid tx, reset the rate limit to
1 tx/s and start linearly increasing the rate limit again.