Skip to content

Instantly share code, notes, and snippets.

@karalabe
Last active January 2, 2024 17:24
Show Gist options
  • Save karalabe/28aab3a5296d80cf7616a8b8b859131b to your computer and use it in GitHub Desktop.
Save karalabe/28aab3a5296d80cf7616a8b8b859131b to your computer and use it in GitHub Desktop.
Cleaner transaction propagation (eth/69)

Background

Transactions on Ethereum are currently propagated kind of ad-hoc. When tranactions enter a node's executable pool, they are immediately broadcast to sqrt(peers) and announced to the rest. Remote nodes maintain a set of random transactions hashes they heard about, and after a slight delay waiting for broadcasts, they retrieve non-received ones from the network.

This mechanism is fairly optimal for independent transactions:

  • The broadcast mechanism quickly ripples the full transaction across the network.
  • The announce mechanism notifies any isolated subnet of the transaction's existence.

The mechanism, however, if fairly suboptimal for dependent transactions:

  • Since the broadcast mechanism propagates only to a subset of the peers, it can get full transactions to a node that does not yet have the prerequisite transactions (i.e. previous nonces). This is an issue because the node cannot fully validate the received transaction (does the previous transaction even exit, does the account have enough balance to cover both txs, etc). The node could drop the transaction on the floor, but that would be very bad user experience as transactions would keep disappearing. The only solution is to keep some sort of a limbo pool where dangling transactions pile until they can be linked. With limited in-memory space, this opens up the door to pool wars, where transactions are evicting each other, further worsening the ordering issue.
  • The announce mechanism is also problematic. As the announced hashes have no context around them, from the perspective of the receiving node, it's a bag of random transactions. Furthermore, as certain transactions are broadcast and not announced, we surely know that even announced transactions have gaps among them. The receiver doens't have any information to prioritize retrieving one transaction vs another. If a single announce contains multiple hashes, we could optimistically say that some might depend on others in front of them, but without more infos, at best we could try to make a single sort order across all peers. Retrieving then becomes sequential or we end up with the same random arrival issue.

This proposal aims to tweak the eth protocol a bit to make it easier to convey ordering information, which would allow a node to make smarter decisions on what to retrieve and when, reducing the disappearing transaction issue that's plaguing the network. Furthermore, the proposal attempts to minimize the changes both bandwidth wise as well as client implementation wise (serving would be trivial; consuming could use the current behavior and extend when developers find the time).

Proposed protocol:

The proposal is a new eth protocol (eth/69) with the following changes:

  1. Instead of announcing transactions as a 1D list of hashes (ignoring the size/type metadata, that remains unchanged), the proposal is to announce transactions in a 2D list of hashes:

     [
       [types_A: B, [size_A0: P, size_A1: P, ...], [hash_A0: B_32, hash_A1: B_32, ...]],
       [types_B: B, [size_B0: P, size_B1: P, ...], [hash_B0: B_32, hash_B1: B_32, ...]],
       ...
     ]
    

    The cumulative content of the nested lists above would amount to the same original list eth/68 sent previously, but the proposal aims to split them by sender. The contextual information conveyed is that hash_B1 depends on hash_B0, but does not depend on hash_A1. The bandwidth cost of this layout change is 1-2 bytes per list depending on how many hashes are announced (i.e. essentially free). Implementation wise, clients already need to nonce-sort transactions per account to validate them, so it's just about announcing them as-stored instead of a flat list (i.e. essentially free).

  2. Reduce the use of broadcasts to the first (nonce order) non-included transaction. As most accounts issue transactions one-by-one, there would be no bandwidth or latency change for them, or nodes serving/propagating them. Tansactions that are behind already pooled (but not yet included) ones, broadcasting would be advised against in favor of announcing to everyone. This would slightly increase the bandwidth of the origin node (as it would end up seeding all its direct peers), and it would also slightly increase the propagation latency for the transaction. The upside however would be a high degree of stability in transaction propagation across the entire network. Implementation wise, clients need to broadcast depending on whether the transactions nonce is next for inclusion or not (i.e. essentially free).

  3. Implicitly remove nonce-gaps from announcements. With broadcasts limited to the first non-included transaction per account, announcements will be complete sort-order wise. This would permit remote nodes to schedule their retrievals in a way that's consistent with their pooling order, minimizing unexpeted transactions needed to limbo.

  4. For non-first transactions (i.e. all of the announced ones), also add the "parent" transaction to the list of announcements (i.e. either the broadcasted first one; or an announced one from a previous announcement). This would increase the size of announcements by one hash per group (account), but would allow linking hashes across different announcements as well as to broadcasts. Implementation wise, clients need to announce the parent (pooled transaction) of a newly pooled transaction, essentially free.

The effect of the above proposed protocol changes is that receiving clients will have all the information needed to run a topological sort across the entire set of announced transactions, permitting concurrent retrievals whilst at the same time permitting in-order retrievals that can be immediately pooled instead of having to limbo for later.

The advantage of the proposal is that adding supprt to clients to serve the new protocol is trivial; whereas adding support to consume the new protocol can be delayed to whenever by simply flattening the newly announced hash-lists together, ending up with the old protocol that each client consumes currently.

Open questions

  • What happens with malicious announcements?

    Ethereum being an untrusted network, it is expected that certain nodes will not honour the protocol, either due to bugs or with malicious intent. My random thought is that the transaction dependency graph should be weighed by how many peers announce a certain order, and the topology sort done on the heavyest weights, ignoring any lightweight cycles. In case enough peers are malicious, we would end up with a behavior similar to the current network, but being eclipsed is possibly a bigger issue than then transaction ordering.

  • How to evaluate the protocol vs the current version?

    This is a tougher nut to crack as we would need to have all (most) our peers serving announcements according to the newer model to be able to see how effective they are. Possibly the simplest solution is to just implement the above protocol - being fairly trivial - and run a node ecplised off the network with own-update nodes, seeing how it performs vs. stock Geth.

  • How does the proposal behave in a mixed eth/68 - eth/69 network?

    Whilst trivial to implement and served, it is expected that clients have a certain inertia on rolling out protocol updates; and also node operators have a significantly bigger inertia in updating to new protocols. Apart from linking it to a hard-fork, it is expected that a new protocol will run a long time with only a small subset of supporting peers. How does that impact us? Good question, todo :)

Compatibility

The protocol update does not require a hard fork and can be rolled out by individual clients independently. Serving is trivial to implement and consuming can be made backwards compatible with eth/68.

@xinbenlv
Copy link

Hi @karalabe, a great proposal. One curious question: is there by any chance a slight motivation or relevance of this proposal to possibly building a Directed acyclic graph of candidate TX groups in mempool?

@ariutokintumi
Copy link

gm @karalabe,

Your proposal for improving Ethereum's transaction propagation is insightful and addresses key inefficiencies in the current system. I appreciate the depth of your analysis and the clarity of your proposed solution. I'd like to add some thoughts and considerations, potentially adding value to your work.

  1. The concept of a 'Limbo Mempool' is fascinating and in fact it already exists! I was experimenting a lot with 'future nonce' transactions that reveals intriguing scenarios. When the expected nonce transaction is received, the subsequent transactions are processed in order, like they have huge priority. However, there's a notable "bug" in various clients when this method is exploited with malicious intent (we can go deeper on that if you want).

  2. I strongly agree with your point on the inefficiency of the current announcing/broadcasting methods. To shake things up, I've been advocating for separating the Mempool from nodes. A decentralized network of Mempool providers could optimize resources for every chain, enhancing validators' performance in terms of bandwidth and CPU usage. Imagine a Mempool chain for all the Blockchain nodes to consume.

  3. The assertion that clients need to nonce-sort transactions for validation might be an oversimplification. A different but also efficient approach seems to be waiting list construction based just on gas fees, thereby reducing the need to sort through ~1M pending transactions. This process would focus only on the ~1000 transactions likely to be included in a block, significantly reducing CPU usage (let's say to 0.1%).

  4. Reducing broadcasts to only the first non-included transaction (in nonce order) might induce unexpected network behaviors. For example, if I send transaction N followed by transaction N+1 (which I later want to cancel), and then send an overriding transaction N+1' with higher gas fees but somehow it is sent to a different node (this is totally valid, don't ask me why), there's a risk of unintended processing of N+1 due to propagation delays or issues. This scenario underscores the need for robust, multidirectional Mempool communication for all pending transactions.

  5. Also, from my point of view, while I find this very interesting, I wouldn't focus this into nonces since that will stuck us -the current state of Ethereum- in nonces dependency, something that should be 'fixed' soon. Not shilling, we (Roll a Mate) are working in an algorithm to implement unsorted nonces acceptance on transactions, since we believe the future is with asorted nonces for practical usage.

Thanks for your proposal and your time reading my contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment