Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save benhenryhunter/687299bcfe064674537dc9348d771e83 to your computer and use it in GitHub Desktop.
Save benhenryhunter/687299bcfe064674537dc9348d771e83 to your computer and use it in GitHub Desktop.
Blob Propagation Issues Mar 27-28th

On March 27-28 the Ethereum network suffered from extremely high rate of miss slots. Most of these slots were first relayed from the bloXroute relays. We identified that the bloXroute relays worked properly throughout the incident, publishing blocks and blobs correctly, however they propagated the blocks fast thru the BDN while the blobs sidecar propagated through the p2p more slowly (the sidecar is expected to propagate slower, and is allowed to be accepted until t=8 sec) this uncovered a specific CL behavior which caused clients to reject these blocks and cause missed slots. In the current Lighthouse version, the node is expecting the peer that first provided the block to also provide the blobs. The BDN does not propagate blobs and that caused the BDN connected consensus nodes to ignore blocks that were first received from the BDN. A recent release of the BDN improved the speed of gossiped blocks without blobs, relying on the rest of the p2p network to propagate blobs as needed which caused the significant increase of the missed slots. The BDN relies heavily on Lighthouse, which makes up the majority of our beacon nodes at bloXroute, due to its performance and speed. Post release we witnessed successful block propagation through our BDN and made the assumption this release was valid. This also showcased itself mainly on the bloXroute relays due to their tight coupling with the BDN. The BDNs speed of providing the beacon nodes with the block caused this behavior even in scenarios where other relays were publishing blocks that bloXroute did not have.

Throughout this time the bloXroute relays were providing blocks with blobs back to validators and also publishing blocks with blobs to our BDN and to our network of beacon nodes. These publish requests would return a 202 response due to the beacon nodes already seeing that block from the BDN.

This issue was able to be resolved after a series of tests were done isolating this issue to lighthouse’s behavior after seeing a block first through the BDN and then slowly migrating our relay away from using the BDN for block publishing and then disabling the BDN’s block propagation of any blocks containing blobs.

@djrtwo
Copy link

djrtwo commented Mar 29, 2024

the sidecar is expected to propagate slower, and is allowed to be accepted until t=8 sec

sidecars are the same size magnitude as blocks. you expect the to propagate at roughly the same speed and there to commonly be a race between which a particular node might get first at any give slot (the block or an associated blob).

There is not a strict acceptance cut-off at 8s. It is unlikely under normal conditions that a block with late blobs (4s***) would garner attestations and be accepted in the canonical chain but under adverse network conditions, a block whose blobs are found after 4, 8, or even 12+ seconds can be seen as valid and available and integrated into the block tree.

In the current Lighthouse version, the node is expecting the peer that first provided the block to also provide the blobs.

This not accurately explaining the lighthouse issue. Lighthouse is highly well tested to expect race conditiosn between blocks and blobs on the p2p and has no assumptions about which peers send which.

This issue you ran into is only via the HTTP api and not a p2p rule/issue. Thus this is not in relation to "peers" as the HTTP API is not a p2p API. This is in relation to advanced publishing techniques in which you were trying to share blobs directly to nodes you control. These were thus dropped before ever hitting the p2p.

@benhenryhunter
Copy link
Author

benhenryhunter commented Mar 29, 2024

This issue you ran into is only via the HTTP api and not a p2p rule/issue.

We had confirmed multiple different times that the blobs were included on all beacon node publishing via HTTP but the beacon nodes had all seen the blocks* from p2p first via the BDN so every response was a 202 due to already being seen and the beacon nodes would not use those requests containing the blobs.

@benhenryhunter
Copy link
Author

Also for @djrtwo for more context from Sproul: Behavior seen is when blobs are not provided over p2p.
Screen Shot 2024-03-29 at 4 32 17 PM

@michaelsproul
Copy link

michaelsproul commented Mar 29, 2024

@benhenryhunter I haven't seen any evidence of a Lighthouse bug at the p2p level. This statement of yours is not correct:

In the current Lighthouse version, the node is expecting the peer that first provided the block to also provide the blobs.

What Danny wrote is correct: Lighthouse's issue is at the HTTP level, and is only reachable if the only blobs sent by the relay are sent via HTTP to nodes that have already received the block via gossip. See my full response on Twitter: https://twitter.com/sproulM_/status/1773853486373130708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment