harding/21377-activation-review.md Secret

## 21377-activation-review.md

      
    Raw
  

              21377-activation-review.md
            
          
    TODO


 Read through all comments
 Read through complete src/test/versionbits_tests.cpp and think about
whether there's any state transitions not adequetely tested

Comments to re-review maybe after additional discussion:
https://github.com/bitcoin/bitcoin/pull/21377/commits/aa6a95de12b265cd3dc16b2d1303b89a20331a56#r598900096
tests: pull ComputeBlockVersion test into its own function

3da5ca7acebcc874b8f3c319008841319727ebfe


 Code review LGTM, simple move/rename as described in the commit message


 Compiled cleanly


 Confirmed make check ran cleanly


 Deliberately broke versionbits and confirmed test_bitcoin detected
it:
test/versionbits_tests.cpp(137): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got FAILED)
test/versionbits_tests.cpp(137): error: in "versionbits_tests/versionbits_test": Test 77 for ACTIVE height 14333 (got FAILED)


tests: test ComputeBlockVersion for all deployments

754ab11f2085a0c5e417cfbba80cd024fc8ea67a


 Code review LGTM.  Generalizes like described; also adds several
checks


 Compiled cleanly


 Confirmed make check ran cleanly


 Deliberately broke two different deployments on two different
networks and confirmed test_bitconi detected it:
test/versionbits_tests.cpp(262): fatal error: in "versionbits_tests/versionbits_computeblockversion": critical check nStartTime < nTimeout has failed
test/versionbits_tests.cpp(266): fatal error: in "versionbits_tests/versionbits_computeblockversion": critical check ((1 << bit) & VERSIONBITS_TOP_MASK) == 0 has failed


Outstanding issues

        // This works because VERSIONBITS_LAST_OLD_BLOCK_VERSION happens
        // to be 4, and the bit we're testing happens to be bit 28.

        confused what this is trying to say; addressed in follow up
        by being removed

tests: clean up versionbits test

423a730da6f423f36a57859d1e9d2f5d9f5e6d38


 Code review LGTM.  Minor cleanup like described.


 Compiles cleanly


 make check ran cleanly


Didn't try to break the tests since it's just a minor refactoring.
versionbits: Add support for delayed activation after lockin

aa6a95de12b265cd3dc16b2d1303b89a20331a56


 Code review LGTM.  Adds minimum activation height but requires it
be set to 0 in chainparams, presumably to ensure all tests continue
working


 Compiles cleanly (some unrelated warning for addrman; I probably
need toclean thigs)


 make check ran cleanly


 Deliberatly set min_activation_height > 0 knowing that should
break tests, and it did


 Deliberately delayed an activation by hardcoding an activation
delay and saw that break multiple tests in test_bitcoin
test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN)
test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN)
test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN)
test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 76 for StateSinceHeight
test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 78 for StateSinceHeight
test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 78 for StateSinceHeight
test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 80 for StateSinceHeight
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0]
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [4 != 0]
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0]
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [4 != 0]
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0]
test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0]


 Checked updated help for getblockchainfo


 Set a min_activation_height for testdummy and verified it appeared
in results from getblockchaininfo (and not in the results for the
other soft fork)


TODO


 Ensure a future commit lifts the min_activation_height == 0 requirement.

Changed to >=0 in next commit


tests: test versionbits delayed activation

e94fc819d8a996ce3690274c307c537329dfb95b


 Code review LGTM


 Builds cleanly


 Passes make check using my normal ./configure flags


 Deliberately introduced failures, most of which were caught by
tests, but the following small error wasn't.  Tried to see if fuzz
testing would catch it but it didn't in 30 minutes.  I'll try
overnight or provision a VM when I get a chance


diff --git a/src/versionbits.cpp b/src/versionbits.cpp
index 11da729596..598c5925cf 100644
--- a/src/versionbits.cpp
+++ b/src/versionbits.cpp
@@ -80,7 +80,7 @@ ThresholdState AbstractThresholdConditionChecker::GetStateFor(const CBlockIndex*
}
case ThresholdState::LOCKED_IN: {
// Progresses into ACTIVE provided activation height will have been reached.


           if (pindexPrev->nHeight + 1 >= min_activation_height) {


           if (pindexPrev->nHeight + 2 >= min_activation_height) {
               stateNext = ThresholdState::ACTIVE;
           }
           break;


fuzz: test versionbits min activation height

90c40342ec7a8f4c4df4fa16bf9ff2dca49b9ec1


 Code review LGTM


 Builds cleanly


 Passes make check with my normal ./configure flags


 Passes make check with clang and --enable-fuzz


 Fuzz tests run without error (ran for 10 minutes)


 Fuzz tests fail when I deliberately introduced a large computational error


versionbits: Add explicit NEVER_ACTIVE deployments

c7008a1a07ae8bd8dfb9c2001a4fdef6e08eb395o


 Code review LGTM


 Builds cleanly


 Passes make check


 Tests fail when I introduced a deliberate error into vbparams


 Checked NEVER_ACTIVE deployments don't show in getblockchaininfo


versionbits: simplify state transitions

ccfe854884276fbdd2ddbcc65689dbd3306f6332


 Code review LGTM, although I need to really dig into the state
transitions


 Builds cleanly


 Passes make check


 I restored the previous logic but the unit tests didn't catch it
(addressed in follow up; confirmed it was also caught by fuzz tests)


           if (pindexPrev->GetMedianTimePast() >= nTimeTimeout) {


              stateNext = ThresholdState::FAILED;


           } else if (pindexPrev->GetMedianTimePast() >= nTimeStart) {


               stateNext = ThresholdState::STARTED;


           }


chainparams: drop versionbits threshold to 90% for mainnnet and signet

be7d1368d8e98b6f9f692289b7d25ac369bf2984


 Code review LGTM.  (It's two lines changing an arbitrary number!)


 Builds cleanly


 Passes make check


(I'm not aware of any tests that apply to this magic number.)

2021-04-14


 Played around with fuzz testing to ensure they actually caught
problems, then ran for five days on 48 physical cores, no problems found


 git diff be7d1368d8e98b6f9f692289b7d25ac369bf2984..ffe33dfbd4c3b11e3475b022b6c1dd077613de79 --- delta since my last review; code review LGTM


 Tested updated getblockchaininfo RPC results


 Reviewed getblocktemplate

In defined:   "version": 536870912,
In started:   "version": 805306372,


State change analysis

What I know about MTP:


Median of block x and the previosu 10 blocks.  Must increase by at
least one second from previous block.  Only considered in retarget
algo based on second block in period and last block in period
(off-by-one error), allowing time warp attack where miners hold open
window most of the time and then jump around within that window to
lower difficulty without lowering block production rate.  Requires a
signiifacnt number of miners to cooperate to hold window open.
Cannot be set higher than two hours in the future (enforced by nodes
using their local clocks, so miners probably want to set it a bit
below two hours in the future but might be able to get away with a
bit further in the future than two hours; I'm not sure how the reject
caching works for time).


Time warp takes about a month to really get going, so we'd have
time to react.  A single well-timed invalidateblock can cause the
attacking miners to lose a significant amonut of invested PoW.
Also, MTP requiring +1 seconds every block means it peaks at about
one block per second (any faster and MTP starts moving into the
future, hitting the max two hour future limit).  That implies
miners can only use time warp to delay a transition if they limit
the rate at which they steal subsidy, so I'd expect them to prefer
stealing subsidy over delaying an activation.


If there's a reorg near a retarget period, this can change the MTP
evaluated as part of a versionbits attempt, changing the state for
the subsequent period.  E.g., block x is the first block of a new
period; when you first see it, it's in STARTED but after a reorg
that moves MTP back a bit, it's still in DEFINED.  Practical
implications of this:


If miners false signal (set their versionbits manually), they
may start or stop signaling at the wrong retarget boundary.
This won't cost them money unless there's a UASF mandatory
signaling required.  Miners who want to false signal and don't
care about mimicking Bitcoin Core can just signal
indescriminately or can use realtime or getblocktemplate's
mintime to signal around the parameters.  This is slightly more
difficult than false signaling for BIP8, where they can convincingly
mimick Bitcoin Core by checking getblocktemplates height
parameter.  Making it harder to convincingly false signal is
arguably a benefit.


Someone could report that signaling had started or stopped
based on the time stamp in a single block; a one block reorg
could bump that start or stop date forward by two weeks
(theoretically more if miners are willing to seriously screw
with block times).  This should rarely be a problem (we'll be
near a retarget block about 0.1% of the time if parameter
selection were random; less often than that for carefully
chosen parameters; my node sees a one block reorg about once
every 2,000 blocks (0.05%); so the overall expected probability
assuming no maliciousness is 0.0005%).  It's solvable by just
waiting for a few confirmations before reporting.  False
reporting shouldn't be an issue for most applications, but it
could theoretically be a problem for anyone betting on start or
stop times specifically (anyone betting more generally, e.g.
on whether taproot locked in, may need to wait for several
confirmations anyway to be resistant against small reorgs
changing the outcome).  Heights have a clear advantage for this
very narrow application due to generally being monotonically
increasing, but having to wait for a few confirmations 0.1% of
the time before reporting start or stop doesn't seem like a
significant difference to me.


    // A block's state is always the same as that of the first of its period, so it is computed based on a pindexPrev whose height equals a multiple of nPeriod - 1.
    if (pindexPrev != nullptr) {
        pindexPrev = pindexPrev->GetAncestor(pindexPrev->nHeight - ((pindexPrev->nHeight + 1) % nPeriod));
    }

DEFINED is the default state, e.g. the consensus rules defined by the
previous implementation.  In this code, we rewind until we find the
last block in a retarget period that either had a state defined or which
was before the nTimeStart (giving it state DEFINED).
    // Walk backwards in steps of nPeriod to find a pindexPrev whose information is known
    std::vector<const CBlockIndex*> vToCompute;
    while (cache.count(pindexPrev) == 0) {
        if (pindexPrev == nullptr) {
            // The genesis block is by definition defined.
            cache[pindexPrev] = ThresholdState::DEFINED;
            break;
        }
        if (pindexPrev->GetMedianTimePast() < nTimeStart) {
            // Optimization: don't recompute down further, as we know every earlier block will be before the start time
            cache[pindexPrev] = ThresholdState::DEFINED;
            break;
        }

This builds a list of the pre-retarget blocks that we'll revist later.
        vToCompute.push_back(pindexPrev);
        pindexPrev = pindexPrev->GetAncestor(pindexPrev->nHeight - nPeriod);
    }

Crash if we somehow didn't get a state.
    // At this point, cache[pindexPrev] is known
    assert(cache.count(pindexPrev));

Walking forward, the only option for DEFINED is to move to STARTED at or
after nStartTime.  Miners can delay nStartTime from arriving, but they
can't skip over it.
The easiest way for miners to delay nStartTime (or an nStartHeight) is
to simply not mine blocks; this is an identical problem for both MTP and
heights.  With MTP, miners can lie about block times, keeping MTP in the
past, but without timewarp, this will penalize them by increasing
difficulty.  With timewarp, this will be obvious to everyone that
they're going to be taking the remaining subsidy, so we'll actually have
a strong incentive to fix this old problem in Bitcoin (which we know how
to fix).  This bad behavior can be fixed in the short term with an
invalidate block.
    ThresholdState state = cache[pindexPrev];

    // Now walk forward and compute the state of descendants of pindexPrev
    while (!vToCompute.empty()) {
        ThresholdState stateNext = state;
        pindexPrev = vToCompute.back();
        vToCompute.pop_back();

        switch (state) {
            case ThresholdState::DEFINED: {
                if (pindexPrev->GetMedianTimePast() >= nTimeStart) {
                    stateNext = ThresholdState::STARTED;
                }
                break;
            }

In STARTED, we can move to either LOCKED_IN or FAILED.  LOCKED_IN is
evaluated first, and if it's true, we skip the FAILED test, so there's
no way to get to FAILED without giving us a chance to get to LOCKED_IN.
It doesn't make any difference, but we actually count the blocks
backwards here, startig from the last block in the period and going to
the first block.

 Tweaked the for statement to confirm tests catch an
off-by-one error (tested -1, +1, and starting count = 1).

            case ThresholdState::STARTED: {
                // We need to count
                const CBlockIndex* pindexCount = pindexPrev;
                int count = 0;
                for (int i = 0; i < nPeriod; i++) {
                    if (Condition(pindexCount, params)) {
                        count++;
                    }
                    pindexCount = pindexCount->pprev;
                }
                if (count >= nThreshold) {
                    stateNext = ThresholdState::LOCKED_IN;
                } else if (pindexPrev->GetMedianTimePast() >= nTimeTimeout) {
                    stateNext = ThresholdState::FAILED;
                }
                break;
            }

In LOCKED_IN, we're guaranteed to move to ACTIVE but we required the
min_activation_height to pass.  Since we're looking at the final block
in a retarget period but activation actually happens in the first block
of a new retarget period, we add one to the height we're evaluating.
Miners can delay a height by not mining blocks or by focusing on
remining old blocks (theoretically allowing them to capture transaction
fees, which will rise if no new blocks are mined; this is theoretical
because reorgs break confidence in confirmation scores, so it shouldn't
be possible for this to be a stable "new normal").  Since both of those
things are hard to do and not stable in a
            case ThresholdState::LOCKED_IN: {
                // Progresses into ACTIVE provided activation height will have been reached.
                if (pindexPrev->nHeight + 1 >= min_activation_height) {
                    stateNext = ThresholdState::ACTIVE;
                }
                break;
            }

What it says in the comment:
            case ThresholdState::FAILED:
            case ThresholdState::ACTIVE: {
                // Nothing happens, these are terminal states.
                break;
            }
        }

Move forward to the next retarget period.
        cache[pindexPrev] = state = stateNext;
    }

    return state;


Feedback still needing to be addressed IMO:

bitcoin/bitcoin#21377 (comment)