karalabe/eth69.md Secret

## eth69.md

      
    Raw
  

              eth69.md
            
          
    Background

Transactions on Ethereum are currently propagated kind of ad-hoc. When tranactions enter a node's executable pool, they are immediately broadcast to sqrt(peers) and announced to the rest. Remote nodes maintain a set of random transactions hashes they heard about, and after a slight delay waiting for broadcasts, they retrieve non-received ones from the network.
This mechanism is fairly optimal for independent transactions:

The broadcast mechanism quickly ripples the full transaction across the network.
The announce mechanism notifies any isolated subnet of the transaction's existence.

The mechanism, however, if fairly suboptimal for dependent transactions:

Since the broadcast mechanism propagates only to a subset of the peers, it can get full transactions to a node that does not yet have the prerequisite transactions (i.e. previous nonces). This is an issue because the node cannot fully validate the received transaction (does the previous transaction even exit, does the account have enough balance to cover both txs, etc). The node could drop the transaction on the floor, but that would be very bad user experience as transactions would keep disappearing. The only solution is to keep some sort of a limbo pool where dangling transactions pile until they can be linked. With limited in-memory space, this opens up the door to pool wars, where transactions are evicting each other, further worsening the ordering issue.
The announce mechanism is also problematic. As the announced hashes have no context around them, from the perspective of the receiving node, it's a bag of random transactions. Furthermore, as certain transactions are broadcast and not announced, we surely know that even announced transactions have gaps among them. The receiver doens't have any information to prioritize retrieving one transaction vs another. If a single announce contains multiple hashes, we could optimistically say that some might depend on others in front of them, but without more infos, at best we could try to make a single sort order across all peers. Retrieving then becomes sequential or we end up with the same random arrival issue.

This proposal aims to tweak the eth protocol a bit to make it easier to convey ordering information, which would allow a node to make smarter decisions on what to retrieve and when, reducing the disappearing transaction issue that's plaguing the network. Furthermore, the proposal attempts to minimize the changes both bandwidth wise as well as client implementation wise (serving would be trivial; consuming could use the current behavior and extend when developers find the time).
Proposed protocol:

The proposal is a new eth protocol (eth/69) with the following changes:


Instead of announcing transactions as a 1D list of hashes (ignoring the size/type metadata, that remains unchanged), the proposal is to announce transactions in a 2D list of hashes:
 [
   [types_A: B, [size_A0: P, size_A1: P, ...], [hash_A0: B_32, hash_A1: B_32, ...]],
   [types_B: B, [size_B0: P, size_B1: P, ...], [hash_B0: B_32, hash_B1: B_32, ...]],
   ...
 ]

The cumulative content of the nested lists above would amount to the same original list eth/68 sent previously, but the proposal aims to split them by sender. The contextual information conveyed is that hash_B1 depends on hash_B0, but does not depend on hash_A1. The bandwidth cost of this layout change is 1-2 bytes per list depending on how many hashes are announced (i.e. essentially free). Implementation wise, clients already need to nonce-sort transactions per account to validate them, so it's just about announcing them as-stored instead of a flat list (i.e. essentially free).


Reduce the use of broadcasts to the first (nonce order) non-included transaction. As most accounts issue transactions one-by-one, there would be no bandwidth or latency change for them, or nodes serving/propagating them. Tansactions that are behind already pooled (but not yet included) ones, broadcasting would be advised against in favor of announcing to everyone. This would slightly increase the bandwidth of the origin node (as it would end up seeding all its direct peers), and it would also slightly increase the propagation latency for the transaction. The upside however would be a high degree of stability in transaction propagation across the entire network. Implementation wise, clients need to broadcast depending on whether the transactions nonce is next for inclusion or not (i.e. essentially free).


Implicitly remove nonce-gaps from announcements. With broadcasts limited to the first non-included transaction per account, announcements will be complete sort-order wise. This would permit remote nodes to schedule their retrievals in a way that's consistent with their pooling order, minimizing unexpeted transactions needed to limbo.


For non-first transactions (i.e. all of the announced ones), also add the "parent" transaction to the list of announcements (i.e. either the broadcasted first one; or an announced one from a previous announcement). This would increase the size of announcements by one hash per group (account), but would allow linking hashes across different announcements as well as to broadcasts. Implementation wise, clients need to announce the parent (pooled transaction) of a newly pooled transaction, essentially free.


The effect of the above proposed protocol changes is that receiving clients will have all the information needed to run a topological sort across the entire set of announced transactions, permitting concurrent retrievals whilst at the same time permitting in-order retrievals that can be immediately pooled instead of having to limbo for later.
The advantage of the proposal is that adding supprt to clients to serve the new protocol is trivial; whereas adding support to consume the new protocol can be delayed to whenever by simply flattening the newly announced hash-lists together, ending up with the old protocol that each client consumes currently.
Open questions


What happens with malicious announcements?
Ethereum being an untrusted network, it is expected that certain nodes will not honour the protocol, either due to bugs or with malicious intent. My random thought is that the transaction dependency graph should be weighed by how many peers announce a certain order, and the topology sort done on the heavyest weights, ignoring any lightweight cycles. In case enough peers are malicious, we would end up with a behavior similar to the current network, but being eclipsed is possibly a bigger issue than then transaction ordering.


How to evaluate the protocol vs the current version?
This is a tougher nut to crack as we would need to have all (most) our peers serving announcements according to the newer model to be able to see how effective they are. Possibly the simplest solution is to just implement the above protocol - being fairly trivial - and run a node ecplised off the network with own-update nodes, seeing how it performs vs. stock Geth.


How does the proposal behave in a mixed eth/68 - eth/69 network?
Whilst trivial to implement and served, it is expected that clients have a certain inertia on rolling out protocol updates; and also node operators have a significantly bigger inertia in updating to new protocols. Apart from linking it to a hard-fork, it is expected that a new protocol will run a long time with only a small subset of supporting peers. How does that impact us? Good question, todo :)


Compatibility

The protocol update does not require a hard fork and can be rolled out by individual clients independently. Serving is trivial to implement and consuming can be made backwards compatible with eth/68.