Skip to content

Instantly share code, notes, and snippets.

@karalabe

karalabe/ethx.md Secret

Last active July 11, 2024 09:22
Show Gist options
  • Save karalabe/47c906f0ab4fdc5b8b791b74f084e5f9 to your computer and use it in GitHub Desktop.
Save karalabe/47c906f0ab4fdc5b8b791b74f084e5f9 to your computer and use it in GitHub Desktop.
Execution layer cross-validation

Changelog

  • 2024.06.26: Convert witness to fully opaque RLP on the engine API
  • 2024.06.25: First detailed draft with specs and benchmarks
  • 2023.11.16: First braindump with the base idea

Background

Client diversity in Ethereum is exceedingly important due to the agressive slashing penalties: in case of a consensus error, the more validators are in the wrong, the heavyer the penalties are. Even worse, if a majority of validators are in the wrong, the bad chain can get finalized, leading to gnarly governance issues of how to recover from the error with perverse incentives from the majority validators not to. Such an event would have the capcity to have a chilling effect on the entire Ethereum adoption.

The standard solution to the problem is client diversity: making sure that every client flavor in the network (consensus and execution too) has a market share less than 50%. In case of a consensus error, the faulty clients would get penalized, but it wouldn't be exorbitant, and it wouldn't have a detrimental effect on the network as a whole. This approach worked well on the consensus layer. However, the CL clients were all relatively new with similar performance profiles.

On the execution layer, things are a bit more complicated (at least for now). Although it's not clear how large of an edge Geth has over other clients market cap wise (the stats are unreliable at best), it is generally accepted that Geth does dominate over other clients. This places both Geth users as well as the entire network into a higher risk bracket than ideal. The usual mantra of "use a minority client" helps a bit, but switching can surface hidden incompatibilities, different resource requirements, new monitoring systems, etc. Doable, but less than ideal.

A better solution is to run more than one client side by side, and cross-reference blocks between them. This is an approach "expected" of large players (staking pools, high risk providers), but it is also an expensive solution both hardware and effort wise: every client has their quirks that the operator needs to be aware of, and every client has their maintenance burden that has to be performed. A non-issue for dedicated teams, but a definite issue from a decentralization perspective.

Pushing for diversity seems like a necessity, but the practicalities make it somewhat unrealistic, at least in the immediate short term future. We do need, however, a short term solution too.

Rethinking the problem

The only solution for true resilience is verifying blocks with multiple clients. For most users, however, running multiple clients is unrealistic. The theoreticals and practicals seems to be at odds at one another, until we realise, we're not in a binary situation. Instead of looking at it as either running one client or running many clients, there is a third option: running one client but verifying with all clients (without actually running them). A wha'? 🤔

The observation is that verifying a block is actually quite cheap. 100-200ms worth of computation is enough for most clients running on most hardware. So from a computational perspective - taking into account that any remotely recent computer has ample CPU cores - verifying a block with one client or all clients is kind of the same.

The problem is not CPU or memory or even networking. The problem is state. Even though it takes 100ms to verify a block with almost any client, having the necessary state present to serve the verification makes it very expensive. Except... it is absolutely redundant to maintain the state N times, it's the same exact thing, just stored a bit differently in each client. We of course, cannot share the state across clients, so seems we're back to square one... seems...

Whilst it is true that we cannot share a 250GB state across clients (ignoring historical blocks now), there's also no need to do such a thing. Executing a single block only needs a couple hundred accounts and storage slots to be available, and verifying the final state roots only requires the proofs for those state items. In short, we don't need to share an entire state with other clients to verify a block, we just need to share (or rather send) a witness to them.

This changes everything...

Diversity, without diversity

Instead of asking people to run a minority client (may be inconvenient), or asking them to run multiple clients (may be expensive); we can let them use whatever client they fancy, and rather only ask them to cross-validate with other clients, statelessly.

From a high level perspective, block validation within the execution client consists of running a bundle of transactions and then comparing the produced results with the expected ones according to the block header.

The proposal is to extend this very last step a bit. Instead of just running the transactions and considering the results final, the user's client would at the same time also create a witness for the block as it is executing it. After execution finishes, we propose to add an extra cross validation step, sending the witness to a variety of other clients to run statelessly.

Aggregating the result from them, if all (or most) clients agree with the host, the result can be transmitted to the consensus client. On the other hand, if there are multiple cross-validating clients disagreeing, the user's client would refuse to accept and attest the block.

Technicalities

There are a few number of problems - with different tradeoffs - that need to be solved for cross client validation:

  • A witness bundle needs to be defined that contains all the necessary information for EL clients to statelessly run a block and derive a resulting post-root hash. Ideally this needs to be completely trustless, practically some tradeoffs are needed.
  • Clients need to implement gathering both read and write witnesses during block execution. Here the catch is that certain trie operation ordering produces different witnesses, so clients need to agree on update application order.
  • Clients need to implement running a block execution backed solely by a witness, which means that certain operations need to be abstracted out from accessing their local chain (i.e. blockhash). The witness should force clients to verify the content.
  • Clients need a way to run in "stateless verification" mode, where they receive encoded witnesses over some transport and respond with verification replies without incurring unproductive runtime costs (i.e. startup, warmup, etc).

Witness content

The witness needs to contain all the data needed to execute a block (doh). This means the the accessed account/storage trie nodes (read/written slots or siblings during inserts/deletes), accessed bytecodes (via contracts ran or CODESIZE/BYTECODE) and accessed blockhashes (via BLOCKHASH). Whilst the block being executed itself is naturally also needed, that would not be part of the witness, rather would live along side it.

Some caveats to the above:

  • The block being executed should have its stateRoot and receiptRoot fields zeroed out. The rationale is that we want the verifier to actually execute the block. By retaining the mentioned roots, the cross validator could fail a number of different ways and still erroneously say everything is fine. Removing hese two fields ensure that the witness was sufficient, correct and EVM cross-execution produced the same results.
  • The accessed block hashes would be inserted into the witness as the past N headers (from head to the earliest having been accessed). Here the design choice is that we could have only added the accessed blockhashes verbatim, but that would make them trusted (no way to verify); whereas with a set of headers, at worse we'd need 256 to be able to access the earliest BLOCKHASH and still be able to verify it (so 128KB-ish). Long term with EIP-2935 pushing the hashes to state and a followup fork switching over BLOCKHASH to use it, this field will become obsolete (i.e. don't over-engineer now).

For the state, in its most abstract form, we need to gather a soup of MPT nodes that will act as our data source + proofs for the state root calculations; and we need the starting root hash to define the trie. The cross validating client can then walk the received MPT from the root through available children and convert it into it's own local node representation (e.g. Geth uses MPTs keyed by path, so it would walk the witness and populate an in-memory pathdb).

The pre-root hash, however, is not added to the witness standalone, as a malicious trusted pre-root would allow arbitrary post-root to be derived. Rather, the pre-root is shipped inside the parent header, which is always included in every witness. That allows the current block to be linked to the parent block and the post-state to be linked to the pre-state trustlessly.

  • One caveat is, that Geth could just as easily generate witnesses not as a trie node soup, rather with the trie nodes keyed by MPT path (i.e. the trie structure would also be part of the witness). The downside of sending over the structure too is that cross validating clients would have the "opportunity" to skip validating the MPT, and rely on the paths for quick / direct lookups, thus potentially leading to undetected consensus faults. Omitting the explicit structure forces clients to walk the MPT, thus also verifying it.
  • Another caveat here is that Verkle will most probably behave completely differently. Verkle might need trie nodes actually keyed by path. However, there might be some additional proofs needed for Verkle + codes will be part of the trie and possibly blockhashes too by that time. As such, even though we could make things a bit more Verkle friendly, a redesign is going to be needed either way, so it's cleaner to focus on MPT now and add versioning capability for future needed changes.

With all of the above said and done, the proposed witness content is:

type Witness struct {
	headers []*Header // Past headers in reverse order (0=parent (always present), 1=parent's-parent, etc)
	codes   [][]byte  // Set of bytecodes ran or accessed
	state   [][]byte  // Set of MPT state trie nodes (account and storage together)
}

The RLP encoding of it (it's just the expected RLP encoding, added for clarity):

witness = [headers, codes, state]	
header  = [	
    parent-hash:      B_32,	
    ommers-hash:      B_32,	
    coinbase:         B_20,	
    state-root:       B_32, // Zeroed out	
    txs-root:         B_32,	
    receipts-root:    B_32, // Zeroed out	
    bloom:            B_256,	
    difficulty:       P,	
    number:           P,	
    gas-limit:        P,	
    gas-used:         P,	
    time:             P,	
    extradata:        B,	
    mix-digest:       B_32,	
    block-nonce:      B_8,	
    basefee-per-gas:  P,	
    withdrawals-root: B_32,	
]		
headers = [header₁, header₂, ...]	
codes   = [B₁, B₂, ...]	
state   = [B₁, B₂, ...]	

The witness will end up containing thousands of trie nodes. Encoding those as individual items in any textual format (e.g. JSON) would be very under-performant. As such, we will be using RLP encoding first and envelope that as an opaque binary blob in anything else that doesn't need to look inside it.

To digress a bit, there have been very complex witness formats proposed in the past (e.g. Erigon's Block Witness Format Specification). As long as we're in MPT world, the witnesses will be too large for internet transmission (i.e. stateless clients), so spending a significant time in over-optimizing beats the purpose. Our goal is to run with something minimally viable and expand if/when needed. The proposed format in this doc is minimally complex, potentially trading off some local-interproces-bandwidth. Again, we can always do V+1 when needed.

Witness gathering

Creating the witness isn't particularly hard, but there are some correctness and performance gotchas.

The bytecodes accessed is fairly simple, but care needs to be taken to aggregate all occurances. Specifically you can hit a new bytecode via calls, as in CALL (and naturally, an outer tx), CALLCODE, DELEGATECALL and STATICCALL. Other opcodes that can hit new contracts is EXTCODESIZE and EXTCODECOPY. The ops CODESIZE and CODECOPY are not needed because they operate on already access code; CODEHASH and EXTCODEHASH do not trigger code access as they resolve via the account content, not the code content.

To generate the witness for BLOCKHASH (and make it trustless), the client needs to track the earliest block having been accessed, and then collect all the headers from chain head to the earliest one (capped at 256, naturally).

Lastly, the state, which is slightly more complex:

  • Most clients nowadays have some sort of acceleration structure to handle account accesses and SLOADs directly, without toughing the state trie (state snapshot in the case of Geth). The witness however, has to contain reads too, so clients do need to access the MPT during execution. To avoid a very hard performance hit, Geth has a background trie prefetcher (concurrent in the different tries being accessed). While the EVM is executing (reading data from the snapshots), every accessed account and slot is scheduled for background loading. FWIW, the same happens on writes too, Geth preloads trie nodes, but we do that always, not just for witnesses, to accelerate hashing. At the end of the block, we wait for all prefetcher threads to complete before calculating the post-root hash.
  • Pre-fethcing trie nodes during execution is a wonderful way to speed up witness creation, but it's important to note that such tries may be incomplete. Before the block is finalized and the final root hash can be computed, the self-destructed accounts and deleted slots are removed from the tries. This can end up with trie paths being collapsed form full nodes to short nodes, resulting in sibling trie nodes to be accessed for the hashing. Trie insertions that are on close paths might also interfere, causing different siblings to be accessed based on whether delete or insert happens first. To make this part of the witness deterministic, clients need to apply deletions first and updates afterwards (I think it produces smaller witnesses than other way around (applying updates and then deletes)).

We ran a rough, 3.5h, follow-the-live-chain benchmark on Ethereum mainnet on 24th June 2024 (around block 20162050) with two Geth instances. The baseline instance followed the chain without witness collection; whereas the benchmarked one had witness collection enabled, but it just discarded the collected data afterwards. The difference (extra trie reads) was 20ms (about 21%) block processing overhead (93ms vs 113ms on average).

The chart above is a rough visualization - hence the negative values - due to having to merge measurements from different machines with different reporting timestamps.

Witness execution

The purpose of executing a witness (or rather, the purpose of executing a batch of transactions statelessly based on a witness and a given block context) is to verify whether an EVM implementation reaches the same execution results as a different EVM implementation, before accepting a newly built- or a propagated block. To guarantee proper execution across multiple EVM validators, we will deliberately withhold the root hash and the receipt root of the to-be-validated block, and instead require stateless executors to return it to the caller.

Other than that, however, stateless witness execution is 100% equivalent to a full node running a block, and ideally, should be implemented with the exact same code that the live client is running. That is indeed what we have done within Geth itself. When receiving a Witness to execute, Geth creates an in-memory database with the contents of the witness (parent headers, bytecodes, state tries); and runs the production go-ethereum chain code on top as if it was a real disk database.

If the client you are working on has support for representing the MPT state tries keyed by hash (the legacy (also consensus) representation), then creating the database is trivial, just dump the witness to disk. If your client's EVM code relies on acceleration structures (e.g. Geth's snapshots, and it's newer pathdb MPT layout), then it may need those structures to be created and seeded into the database too. But as long as your client's data access pathways are clean, it should not be hard to fake a database.

Another alternative would be to have the witness itself act as a database for your EVM, or create a separate tool altogether for running EVM on top of a witness. For go-ethereum, we have explicitly foregone that path to ensure that stateless execution is exactly the same as full-node production execution.

We ran a rough, 2h, follow-the-live-chain benchmark on Ethereum mainnet on 24th June 2024 (around block 20163650) with two Geth instances. The baseline instance followed the chain without witness collection (snapshots on, pathdb on); whereas the benchmarked one had witness collection enabled, on top of which it did a self-stateless execution (snapshots off, pathdb off). We only measured the execution part in this test (witness creation was presented in the previous section).

Interestingly, stateless execution seems to have ran approximately 15ms slower (about 16%) in these benchmarks (91ms vs 106ms on average). Our hunch is that this is simply the stateless execution code running in Geth's legacy hashdb mode (with no acceleration structures) vs. mainline Geth using pathdb tries and snapshot direct state accelerators. Obviously work needs to be done to see where the bottlenecks are when the state is so hot in memory, yet still in a faked database.

Cross-validation integration

Whilst witness generation and execution are obviously the task of execution layer clients, integrating everything together opens up an interesting design possibility.

It might seem like an good starting attempt keep everything within ELs. We could define a new validation protocol through which multiple ELs could talk to one another (or rather a full EL to some stateless counterparts), sending witnesses and validation results back and forth. Whilst this wouldn't be complicated in theory, in practice there have been so many DoS issues across everything in the past, that nobody has the appetite for yet another thing that can break.

The proposal thus, is to move the integration to the boundary of the EL and CL clients, specifically into the engine API. By piggy-backing on existing infra, we avoid two thorny issues: DoS and configuration overhead. The only question is whether the functioanlity we need can fit naturally into the engine API without forcing unnecessary complexity.

Engine API specs

And the answer is a resounding yes! Whilst any final design would, naturally, rely on reaching consensus across a lot of ecosystem actors, a starting point is presented below.

We need one small helper data type for the stateless execution results. The witness itself will be passed as an opaque binary blob back and forth on the engine API since CLs don't need to interpret it's contents at all.

// statelessPayloadStatusV1 is the result of a stateless payload execution.
var statelessPayloadStatusV1 = {
	status:          "same as payloadStatusV1.status",
	stateRoot:       "0x0000000000000000000000000000000000000000000000000000000000000000",
	receiptsRoot:    "0x0000000000000000000000000000000000000000000000000000000000000000",
	validationError: "same as payloadStatusV1.validationError",
}

We can define some new methods on the engine API that fit naturally and enables all the use-cases we are interested in. Again, the approach taken was to allow shipping these without even potentially messing with production calls. A final, agreed-upon integration might have a tigher coupling.

  • Add forkchoiceUpdatedWithWitnessV1,2,3 with same params and returns as forkchoiceUpdatedV1,2,3, but triggering a stateless witness building if block production is requested.
  • Extend getPayloadV2,3 to return executionPayloadEnvelope with an additional witness field of type bytes iff created via forkchoiceUpdatedWithWitnessV2,3.
  • Add newPayloadWithWitnessV1,2,3 with same params and returns as newPayloadV1,2,3, but triggering a stateless witness creation during payload execution to allow cross validating it.
  • Extend payloadStatusV1 with a witness field of type bytes if returned by newPayloadWithWitnessV1,2,3.
  • Add executeStatelessPayloadV1,2,3 with same base params as newPayloadV1,2,3 and one more additional (witness) param of type bytes. The method returns statelessPayloadStatusV1, which mirrors payloadStatusV1 but replaces latestValidHash with stateRoot and receiptRoot.

Engine API usage

The above primitives permit us to enable the following use-cases:

  • Cross validating locally created blocks:

    • Call forkchoiceUpdatedWithWitness instead of forkchoiceUpdated to trigger witness creation too.
    • Call getPayload as before to retrieve the new block and also the above created witness.
    • Call executeStatelessPayload against another client to cross-validate the block.
  • Cross validating locally processed blocks:

    • Call newPayloadWithWitness instead of newPayload to trigger witness creation too.
    • Call executeStatelessPayload against another client to cross-validate the block.
  • Block production for stateless clients (local or MEV builders):

    • Call forkchoiceUpdatedWithWitness instead of forkchoiceUpdated to trigger witness creation too.
    • Call getPayload as before to retrieve the new block and also the above created witness.
    • Propagate witnesses across the consensus libp2p network for stateless Ethereum.
  • Stateless validator validation:

    • Call executeStatelessPayload with the propagated witness to statelessly validate the block.

Again, note, the various WithWitness methods could also just be an additional boolean flag on the base methods, but this PR wanted to keep the methods separate until a final consensus is reached on how to integrate in production.

Stateless Ethereum

This whole document revolved around execution layer cross validation, but in reality it actually defined, designed and implemented stateless Ethereum as a whole. The last step where it falls short, is in the size of the witnesses.

As seen in the above benchmarks, Merkle-Patricia stateless witnesses are small enough to be passed across processes running on the same machine (or perhaps even on high throughput local networks), but they are too heavy to be a robust, cross-internet validation mechanism for Ethereum.

(The super slow encoding and decoding speed seems to be Go's stdlib hex encoder/decoder running the dumb way, not using SIMD instructions. We'll look into replacing that in Geth to make it near-RLP performant)

That said, with all the primitives, mechanisms and APIs defined and implemented, the only missing piece is a drop-in replacement of Merkle-Patricia tries to Verkle tries, and stateless Ethereum should be ready to go with minimal changes to the witness format.

Until then, the work can immediately be immensely valuable for avoiding slashing penalties caused by consensus faults.

Epilogue

The work described in this document is implemented and functional in go-ethereum and was benchmarked through it. You should naturally not consider it final (and by the time you are reading this it might have changed), but every attempts was made to keep the design simple and naturally fitting into how Ethereum functions currently.

@vshvsh
Copy link

vshvsh commented Jun 27, 2024

This is very cool! If universally implemented should make running and cross-validating 3-4 execution clients at once possible on a usual EL machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment