Skip to content

Instantly share code, notes, and snippets.

@harding
Last active April 15, 2021 11:37
Show Gist options
  • Save harding/e622323eaf80d620826a7cb74ab3fb40 to your computer and use it in GitHub Desktop.
Save harding/e622323eaf80d620826a7cb74ab3fb40 to your computer and use it in GitHub Desktop.

TODO

  • Read through all comments
  • Read through complete src/test/versionbits_tests.cpp and think about whether there's any state transitions not adequetely tested

Comments to re-review maybe after additional discussion:

https://github.com/bitcoin/bitcoin/pull/21377/commits/aa6a95de12b265cd3dc16b2d1303b89a20331a56#r598900096

tests: pull ComputeBlockVersion test into its own function

3da5ca7acebcc874b8f3c319008841319727ebfe

  • Code review LGTM, simple move/rename as described in the commit message

  • Compiled cleanly

  • Confirmed make check ran cleanly

  • Deliberately broke versionbits and confirmed test_bitcoin detected it:

    test/versionbits_tests.cpp(137): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got FAILED) test/versionbits_tests.cpp(137): error: in "versionbits_tests/versionbits_test": Test 77 for ACTIVE height 14333 (got FAILED)

tests: test ComputeBlockVersion for all deployments

754ab11f2085a0c5e417cfbba80cd024fc8ea67a

  • Code review LGTM. Generalizes like described; also adds several checks

  • Compiled cleanly

  • Confirmed make check ran cleanly

  • Deliberately broke two different deployments on two different networks and confirmed test_bitconi detected it:

    test/versionbits_tests.cpp(262): fatal error: in "versionbits_tests/versionbits_computeblockversion": critical check nStartTime < nTimeout has failed test/versionbits_tests.cpp(266): fatal error: in "versionbits_tests/versionbits_computeblockversion": critical check ((1 << bit) & VERSIONBITS_TOP_MASK) == 0 has failed

Outstanding issues

        // This works because VERSIONBITS_LAST_OLD_BLOCK_VERSION happens
        // to be 4, and the bit we're testing happens to be bit 28.

        confused what this is trying to say; addressed in follow up
        by being removed

tests: clean up versionbits test

423a730da6f423f36a57859d1e9d2f5d9f5e6d38

  • Code review LGTM. Minor cleanup like described.

  • Compiles cleanly

  • make check ran cleanly

Didn't try to break the tests since it's just a minor refactoring.

versionbits: Add support for delayed activation after lockin

aa6a95de12b265cd3dc16b2d1303b89a20331a56

  • Code review LGTM. Adds minimum activation height but requires it be set to 0 in chainparams, presumably to ensure all tests continue working

  • Compiles cleanly (some unrelated warning for addrman; I probably need toclean thigs)

  • make check ran cleanly

  • Deliberatly set min_activation_height > 0 knowing that should break tests, and it did

  • Deliberately delayed an activation by hardcoding an activation delay and saw that break multiple tests in test_bitcoin

    test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN) test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN) test/versionbits_tests.cpp(138): error: in "versionbits_tests/versionbits_test": Test 75 for ACTIVE height 4000 (got LOCKED_IN) test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 76 for StateSinceHeight test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 78 for StateSinceHeight test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 78 for StateSinceHeight test/versionbits_tests.cpp(116): error: in "versionbits_tests/versionbits_test": Test 80 for StateSinceHeight test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0] test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [4 != 0] test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0] test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [4 != 0] test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0] test/versionbits_tests.cpp(384): error: in "versionbits_tests/versionbits_computeblockversion": check ComputeBlockVersion(lastBlock, params) & (1<<bit) == 0 has failed [268435456 != 0]

  • Checked updated help for getblockchainfo

  • Set a min_activation_height for testdummy and verified it appeared in results from getblockchaininfo (and not in the results for the other soft fork)

TODO

  • Ensure a future commit lifts the min_activation_height == 0 requirement.

    • Changed to >=0 in next commit

tests: test versionbits delayed activation

e94fc819d8a996ce3690274c307c537329dfb95b

  • Code review LGTM

  • Builds cleanly

  • Passes make check using my normal ./configure flags

  • Deliberately introduced failures, most of which were caught by tests, but the following small error wasn't. Tried to see if fuzz testing would catch it but it didn't in 30 minutes. I'll try overnight or provision a VM when I get a chance

diff --git a/src/versionbits.cpp b/src/versionbits.cpp index 11da729596..598c5925cf 100644 --- a/src/versionbits.cpp +++ b/src/versionbits.cpp @@ -80,7 +80,7 @@ ThresholdState AbstractThresholdConditionChecker::GetStateFor(const CBlockIndex* } case ThresholdState::LOCKED_IN: { // Progresses into ACTIVE provided activation height will have been reached.

  •            if (pindexPrev->nHeight + 1 >= min_activation_height) {
    
  •            if (pindexPrev->nHeight + 2 >= min_activation_height) {
                   stateNext = ThresholdState::ACTIVE;
               }
               break;
    

fuzz: test versionbits min activation height

90c40342ec7a8f4c4df4fa16bf9ff2dca49b9ec1

  • Code review LGTM

  • Builds cleanly

  • Passes make check with my normal ./configure flags

  • Passes make check with clang and --enable-fuzz

  • Fuzz tests run without error (ran for 10 minutes)

  • Fuzz tests fail when I deliberately introduced a large computational error

versionbits: Add explicit NEVER_ACTIVE deployments

c7008a1a07ae8bd8dfb9c2001a4fdef6e08eb395o

  • Code review LGTM

  • Builds cleanly

  • Passes make check

  • Tests fail when I introduced a deliberate error into vbparams

  • Checked NEVER_ACTIVE deployments don't show in getblockchaininfo

versionbits: simplify state transitions

ccfe854884276fbdd2ddbcc65689dbd3306f6332

  • Code review LGTM, although I need to really dig into the state transitions

  • Builds cleanly

  • Passes make check

  • I restored the previous logic but the unit tests didn't catch it (addressed in follow up; confirmed it was also caught by fuzz tests)

  •            if (pindexPrev->GetMedianTimePast() >= nTimeTimeout) {
    
  •               stateNext = ThresholdState::FAILED;
    
  •            } else if (pindexPrev->GetMedianTimePast() >= nTimeStart) {
    
  •                stateNext = ThresholdState::STARTED;
    
  •            }
    

chainparams: drop versionbits threshold to 90% for mainnnet and signet

be7d1368d8e98b6f9f692289b7d25ac369bf2984

  • Code review LGTM. (It's two lines changing an arbitrary number!)

  • Builds cleanly

  • Passes make check

(I'm not aware of any tests that apply to this magic number.)


2021-04-14

  • Played around with fuzz testing to ensure they actually caught problems, then ran for five days on 48 physical cores, no problems found

  • git diff be7d1368d8e98b6f9f692289b7d25ac369bf2984..ffe33dfbd4c3b11e3475b022b6c1dd077613de79 --- delta since my last review; code review LGTM

  • Tested updated getblockchaininfo RPC results

  • Reviewed getblocktemplate

    • In defined: "version": 536870912,
    • In started: "version": 805306372,

State change analysis

What I know about MTP:

  1. Median of block x and the previosu 10 blocks. Must increase by at least one second from previous block. Only considered in retarget algo based on second block in period and last block in period (off-by-one error), allowing time warp attack where miners hold open window most of the time and then jump around within that window to lower difficulty without lowering block production rate. Requires a signiifacnt number of miners to cooperate to hold window open. Cannot be set higher than two hours in the future (enforced by nodes using their local clocks, so miners probably want to set it a bit below two hours in the future but might be able to get away with a bit further in the future than two hours; I'm not sure how the reject caching works for time).

  2. Time warp takes about a month to really get going, so we'd have time to react. A single well-timed invalidateblock can cause the attacking miners to lose a significant amonut of invested PoW. Also, MTP requiring +1 seconds every block means it peaks at about one block per second (any faster and MTP starts moving into the future, hitting the max two hour future limit). That implies miners can only use time warp to delay a transition if they limit the rate at which they steal subsidy, so I'd expect them to prefer stealing subsidy over delaying an activation.

  3. If there's a reorg near a retarget period, this can change the MTP evaluated as part of a versionbits attempt, changing the state for the subsequent period. E.g., block x is the first block of a new period; when you first see it, it's in STARTED but after a reorg that moves MTP back a bit, it's still in DEFINED. Practical implications of this:

    • If miners false signal (set their versionbits manually), they may start or stop signaling at the wrong retarget boundary. This won't cost them money unless there's a UASF mandatory signaling required. Miners who want to false signal and don't care about mimicking Bitcoin Core can just signal indescriminately or can use realtime or getblocktemplate's mintime to signal around the parameters. This is slightly more difficult than false signaling for BIP8, where they can convincingly mimick Bitcoin Core by checking getblocktemplates height parameter. Making it harder to convincingly false signal is arguably a benefit.

    • Someone could report that signaling had started or stopped based on the time stamp in a single block; a one block reorg could bump that start or stop date forward by two weeks (theoretically more if miners are willing to seriously screw with block times). This should rarely be a problem (we'll be near a retarget block about 0.1% of the time if parameter selection were random; less often than that for carefully chosen parameters; my node sees a one block reorg about once every 2,000 blocks (0.05%); so the overall expected probability assuming no maliciousness is 0.0005%). It's solvable by just waiting for a few confirmations before reporting. False reporting shouldn't be an issue for most applications, but it could theoretically be a problem for anyone betting on start or stop times specifically (anyone betting more generally, e.g. on whether taproot locked in, may need to wait for several confirmations anyway to be resistant against small reorgs changing the outcome). Heights have a clear advantage for this very narrow application due to generally being monotonically increasing, but having to wait for a few confirmations 0.1% of the time before reporting start or stop doesn't seem like a significant difference to me.

    // A block's state is always the same as that of the first of its period, so it is computed based on a pindexPrev whose height equals a multiple of nPeriod - 1.
    if (pindexPrev != nullptr) {
        pindexPrev = pindexPrev->GetAncestor(pindexPrev->nHeight - ((pindexPrev->nHeight + 1) % nPeriod));
    }

DEFINED is the default state, e.g. the consensus rules defined by the previous implementation. In this code, we rewind until we find the last block in a retarget period that either had a state defined or which was before the nTimeStart (giving it state DEFINED).

    // Walk backwards in steps of nPeriod to find a pindexPrev whose information is known
    std::vector<const CBlockIndex*> vToCompute;
    while (cache.count(pindexPrev) == 0) {
        if (pindexPrev == nullptr) {
            // The genesis block is by definition defined.
            cache[pindexPrev] = ThresholdState::DEFINED;
            break;
        }
        if (pindexPrev->GetMedianTimePast() < nTimeStart) {
            // Optimization: don't recompute down further, as we know every earlier block will be before the start time
            cache[pindexPrev] = ThresholdState::DEFINED;
            break;
        }

This builds a list of the pre-retarget blocks that we'll revist later.

        vToCompute.push_back(pindexPrev);
        pindexPrev = pindexPrev->GetAncestor(pindexPrev->nHeight - nPeriod);
    }

Crash if we somehow didn't get a state.

    // At this point, cache[pindexPrev] is known
    assert(cache.count(pindexPrev));

Walking forward, the only option for DEFINED is to move to STARTED at or after nStartTime. Miners can delay nStartTime from arriving, but they can't skip over it.

The easiest way for miners to delay nStartTime (or an nStartHeight) is to simply not mine blocks; this is an identical problem for both MTP and heights. With MTP, miners can lie about block times, keeping MTP in the past, but without timewarp, this will penalize them by increasing difficulty. With timewarp, this will be obvious to everyone that they're going to be taking the remaining subsidy, so we'll actually have a strong incentive to fix this old problem in Bitcoin (which we know how to fix). This bad behavior can be fixed in the short term with an invalidate block.

    ThresholdState state = cache[pindexPrev];

    // Now walk forward and compute the state of descendants of pindexPrev
    while (!vToCompute.empty()) {
        ThresholdState stateNext = state;
        pindexPrev = vToCompute.back();
        vToCompute.pop_back();

        switch (state) {
            case ThresholdState::DEFINED: {
                if (pindexPrev->GetMedianTimePast() >= nTimeStart) {
                    stateNext = ThresholdState::STARTED;
                }
                break;
            }

In STARTED, we can move to either LOCKED_IN or FAILED. LOCKED_IN is evaluated first, and if it's true, we skip the FAILED test, so there's no way to get to FAILED without giving us a chance to get to LOCKED_IN. It doesn't make any difference, but we actually count the blocks backwards here, startig from the last block in the period and going to the first block.

  • Tweaked the for statement to confirm tests catch an off-by-one error (tested -1, +1, and starting count = 1).
            case ThresholdState::STARTED: {
                // We need to count
                const CBlockIndex* pindexCount = pindexPrev;
                int count = 0;
                for (int i = 0; i < nPeriod; i++) {
                    if (Condition(pindexCount, params)) {
                        count++;
                    }
                    pindexCount = pindexCount->pprev;
                }
                if (count >= nThreshold) {
                    stateNext = ThresholdState::LOCKED_IN;
                } else if (pindexPrev->GetMedianTimePast() >= nTimeTimeout) {
                    stateNext = ThresholdState::FAILED;
                }
                break;
            }

In LOCKED_IN, we're guaranteed to move to ACTIVE but we required the min_activation_height to pass. Since we're looking at the final block in a retarget period but activation actually happens in the first block of a new retarget period, we add one to the height we're evaluating.

Miners can delay a height by not mining blocks or by focusing on remining old blocks (theoretically allowing them to capture transaction fees, which will rise if no new blocks are mined; this is theoretical because reorgs break confidence in confirmation scores, so it shouldn't be possible for this to be a stable "new normal"). Since both of those things are hard to do and not stable in a

            case ThresholdState::LOCKED_IN: {
                // Progresses into ACTIVE provided activation height will have been reached.
                if (pindexPrev->nHeight + 1 >= min_activation_height) {
                    stateNext = ThresholdState::ACTIVE;
                }
                break;
            }

What it says in the comment:

            case ThresholdState::FAILED:
            case ThresholdState::ACTIVE: {
                // Nothing happens, these are terminal states.
                break;
            }
        }

Move forward to the next retarget period.

        cache[pindexPrev] = state = stateNext;
    }

    return state;

Feedback still needing to be addressed IMO:

@ajtowns
Copy link

ajtowns commented Apr 11, 2021

if (pindexPrev->nHeight + 2 >= min_activation_height) { ->

$ echo '//+93v8AAAAAAH//l2AAAAD/O///vd7//+ylv6V/K/8=' | base64 -d | FUZZ=versionbits test/fuzz/fuzz
fuzz: test/fuzz/versionbits.cpp:314: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `always_active_test || min_activation <= current_block->nHeight + 1' failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment