Something we've been discussing a bit on the geth-team, is the new failure mode for Ethereum. That is, basically, it would be interesting to first theoretically outline, and later in practice go through, the case that a non-acceptable consensus-issue occurs.
Scenario: geth
version X
contains a flaw, whereby the coinbase
receives 1 eth
in every block, after block N
. This bug causes a chain split (geth-x
: chain A
, other
:chain B
) at block N
.
Version A: geth-x
is super-majority, and the erroneous fork reaches finality.
Version B: geth-x
is not super-majority, but large enough (e.g. ~50%
) that neither fork reaches finality.
In both scenarios, it is decided that the flaw is unacceptable, and that none of the blocks on chain A
can be accepted. A fixed version of geth is released.
A few possible narratives here are. Once chain A
has finalized, the validators that were on that fork cannot jump to chain B
unless it finalizes at an equal or higher block. And chain B
cannot finalize until sufficient validators have been removed to make the set B
validators a super-majority. In other words
- If a subset of
set A
voluntarily equivocates and gets slashed, then a new finality can be reached somewhat quickly, at blockM
. - If all validators in
set A
wait it out, then the inactivity leak will make theset B
become a super-majority after a non-finality period of ~2.5 weeks.
In both these cases, set A
validators stand to lose substantial amounts of money, and the community decides to implement slashing impunity for this incident.
- It is decided that the fork-version will be incremented twice (slashings can only be done from prev fork-version to current).
After ~1 week, the decision has been made. It is decided that
- As of epoch (three days from now), the new fork "plustwo" is scheduled.
- All CLs roll out the "plustwo" fork.
There are a lot of open questions regarding this scenario, primarily how, and when to act, as a validator. Client teams may individually have attempted rollbacks, but do week-long rollbacks involving both layers work as intended? How is "slashing impunity" realized in practice -- in this case, 1 week was spent discussing/analyzing, three days for scheduling the fork leads to 10 days of inactivity leak. Is that also discounted, or is that simply facts that have to be factored in?
- A modified
geth
version which can be instructed to contain the flaw-fork at a specified blocknumber. - Are modified CL clients needed?
- Chain-definitions for a new dedicated network
- Transaction-generators, we don't want an empty network.
- Ideally, we would want to fork off an existing large network, otherwise problems with rolling back state are not true-to-life.
This whole exercise should be carried out by a group of people with expertise from both El, CL and devops.
TBD, let's focus on Version A first .
Voluntary slashing would take 18 days for the secondary slashing to 32 ETH and dropping weight to 0, wouldn't it?
Inactivity leak takes 39 days in my calculation, at 84% Geth. Best case at 67% Geth is 31 days.
The loss in either case is not reasonable. The loss if this "bailout" idea is adopted within 7 days is still staggering.
Calculations at https://docs.google.com/spreadsheets/d/1N9Rjia84SQSedFzmBtnipnWj8_ND0tFS0p1C6q8lybc/ . I highly welcome a set of eyes on whether I have the numbers right, or at least directionally right.