Skip to content

Instantly share code, notes, and snippets.

@RubenSomsen
Created March 24, 2025 17:12

CoreDev Feb 2025

Fast IBD Session

Idea: Speed up IBD by giving "hints" to whoever does it

  • With Core, there would be some information that we give to every client that tells them whether or not an output is going to enter the utxo set

    • For simplicity take asssumevalid, you take the same points, the question is -- once we reach that point, are the outputs gonna be in the utxo set -- create a bitfield, 200mb of data, compressable because you have a bunch of 0s and very few 1s. Not going be much.
    • When you have this, on the output side, when it enters the utxo set, great, write it to disk, append-only
    • If it doesn't enter utxo set, spent by the time you reach that point, the question is what do you need in order to validate the spending of this output
  • With assumevalid you can say, we dont need the scripts. the output scripts become irrelevant for the transition

  • If we take some liberties with assumevalid and we take the remaining checks, e.g. coinbase output, you need height, and nsequence fields, you also need height.

    • We put them under the umbrella of assumevalid, dont need coinbase or height bits, can even the forgo the amounts -- could be controversial. Would mean core no longer checking amounts. For simplicity leave that out. all that's left are the outpoints
    • All that's left is whether an outpoint was consumed
    • Nearly stateless. A single MuHash where you add the hashed outpoints that didn't enter the utxo set
    • When checking the inputs you hash the outpoint and subtract it from the muhash
    • If the hints were correct, you end up with a muhash of 0
    • You could check the amounts too, but you would need to receive extra data (the amounts for each input) to support this
    • An extra 15gb? roughly, 2.5% more data. How you receive the data is not part of this discussion
    • Without checking the amounts is most elegant
    • This is 100% parallelizable
    • Doesn't matter if you remove the inputs first, then add the outputs, you still end up with 0
    • No disk writes other than writing the utxo set to disk. state is a single muhash
  • To what extent is this diff from downloading utxo set? assumeutxo?

    • Still validating that outputs are spent once, dont check the scripts, dont check the spending happens after the creation of an output anymore?
    • Fundamental difference when you skip checking amounts: you don't check for inflation bugs
  • Other than the order, everything else -- you don't need assumevalid - you need more data. 15% more

  • We can at the end just sum the values in the utxo set, and make sure the 21M limit is preserved

  • Are there any things in the utxo not taken into account? Burn coins?

  • What if your helper data contains for every spent utxo you dont have just a bitset, but inform what height it was set. then you can add that height into the muhash commitments, and at spending time remove them. so if anyone lies about what has been spent it will be considered a failure

  • If you add the heights at creation time and amounts at spending time, if you wanted to validate everything.

  • If you were to check everything, it's 100GB. Now whenever you are checking the input, someone needs to give you the data (similar to utreexo), but more parallelizable

  • Maybe similar to XYZ idea where you have utxo set where it just stores the hash of utxo, at spending time you change the block format to express which utxo you are spending. you cant do script validation statically. disentangle the utxo creation and removal. if you process things out of order you can store the hash of the utxo being spent as a spent-but-not-created-yet. utxo set contains both created but unspent output hashes but also spent but not yet created hashes, then you can process in any order. but significant bandwidth changes


  • We could have this simple case vs complexity case, but maybe interesting tradeoffs

  • Do version without assumevalid (AV) now you have more data, but could reintroduce a cache, e.g. some in cache so you dont have to re-download. Save bandwidth but lose some parallelization benefits

  • Version with AV: for every input you need to download the entire output script (similar to other idea described XYZ).

  • Version without AV: but now you download the UTXO set AND every output script. need to commit to more things

You can combine this with assumeutxo where you do background validation with ibd and now you don't need two chainstates. Does mean you need this fast-IBD version for assumeutxo

This could be extended even beyond the assumevalid point if you trust a third party with further hints up to the tip, though this part would not use assumevalid and thus use more bandwidth


Expect codebase to commit to these hints? YES

  • Packing it together with the assumeutxo snapshot may make sense in this context.

Is MuHash performant enough?

  • Even if slower, should be okay if throwing lots of cores at it?
  • Perhaps it appears slower because the "finalize" step. Doing that on every block connection. But in this Fast IBD design you don't need to finalize except for once at the end. But possibility of overflow, so may need to finalize more than once (not often)
@RubenSomsen
Copy link
Author

RubenSomsen commented Mar 24, 2025

This protocol is now called SwiftSync and no longer requires MuHash. I have a full write-up coming up which is not ready yet, but here's a shorter version that at least explains the basics.

Full write-up now available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment