Skip to content

Instantly share code, notes, and snippets.

@fjahr
Last active April 30, 2021 17:00
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save fjahr/fa4892874b090d3a4f4fccc5bafa0210 to your computer and use it in GitHub Desktop.
Save fjahr/fa4892874b090d3a4f4fccc5bafa0210 to your computer and use it in GitHub Desktop.
Rolling UTXO Set Hash proposal

Rolling UTXO Set Hash proposal

TL;DR

I have picked up Pieter Wuille's proposal from 2017 to use a rolling hash for the UTXO set hash. It deals with the problem of a long computation time of the UTXO set hash which results in a slow RPC call gettxoutsetinfo (can take several minutes depending on hardware). I investigated three hash functions: MuHash, ECMH and LtHash and started implementing them in Bitcoin Core for comparison. However only MuHash has a rolling hash implementation so far and my LtHash code is not as optimized as MuHash and ECMH. I am looking for feedback on concept, which choice to make for the hash function and implementation details before filing a PR to Bitcoin Core.

MuHash (rolling): ChaincodeResidency/bitcoin#25

Elliptic Curve Multiset Hash (ECMH) (non-rolling): https://github.com/fjahr/bitcoin/tree/ecmh

LtHash (non-rolling): https://github.com/fjahr/bitcoin/tree/lthash

Content

  1. Motivation and prior work
  2. Implementation
  3. Use Cases
  4. Acknowledgements

Motivation and prior work

The RPC call gettxoutsetinfo returns (among other things) a hash of the whole UTXO set which can be used as an efficient way to compare your UTXO set with your peers. Since the hash is calculated in real time this call can take several minutes depending on your hardware. A solution to this was presented on the Bitcoin Mailing List by Pieter Wuille. He discusses MuHash and ECMH as potential homomorphic hashing function solutions to implement the UTXO set hash as a rolling hash which could be efficiently updated with every new block by adding hashes of new UTXOs and removing hashes of spent UTXOs. The resulting hash of each new chainstate could be cached and returned instantly without spending CPU cycles.

Pieter followed up with a PR that implemented MuHash in Bitcoin Core but not as a rolling hash, yet. This was certainly meant as an intermediate step to a rolling hash but did not receive enough reviewer attention in order to get merged.

Tomas van der Wansem implemented ECMH in Secp256k1 but the PR has remained open after some good discussion.

Implementation

Hash function choices

I looked at each function's security, performance and maintainability.

Security

From Pieter's Mailing list message:
  • On MuHash: "Very common security assumption. Even if the DL assumption would be broken (but no k-sum algorithm faster than Wagner's is found), this still maintains 110 bits of security."
  • On ECMH: "Identical security assumption as Bitcoin's signatures."
LtHash

According to Facebook researchers the current choice of specialization of LtHash (1024 16bit lattices) should offer over 200 bits of security.

Facebook research suggests that smaller LtHash with smaller parameters would not offer sufficient security.

Performance

Add operation benchmarks
Benchmark Min Max Median
MuHash (optimized; bitcoin benchmark) 1.9878e-05 s 2.2394e-05 s 2.0323e-05 s
ECMH (optimized; secp256k1 benchmark) 1.3700e-05 s 1.4300e-05 s 1.4000e-05 s
LtHash (non-optimized; bitcoin benchmark) 11.1928e-05 s 11.5482e-05 s 11.3306e-05 s
MuHash (non-optimized for LtHash reference) 2.5360e-05 s 2.9062e-05 s 2.6299e-05 s

All benchmark created on my personal laptop (3 year old macbook pro) on 09/03/2019.

The slow performance of LtHash has only limited value for comparison. First of all the security of the current configuration of LtHash that I am using is higher than what is used in MuHash or ECMH (estimated over 200 bits of security). There are currently also no optimizations being used as there are in MuHash and Secp256k1.

IBD

TODO: Since the rolling functionality is only implemented for MuHash so far I am not able to show good comparison benchmarks for IBD. Also I did not have time create reliable benchmarks for IBD with MuHash vs. without MuHash.

Maintenance

Much of this is a matter of perspective so here are my personal thoughts on maintainability pros and cons of the different hashing options:

MuHash
  • Introduces new but common security assumption into Bitcoin Core
  • Basic concept is easy enough to grasp but implementation including hand-optimized assembly is a bit more complicated (implementation has about 320 lines of code including optimizations)
ECMH
  • Is part of Secp256k1 so maintainers of that repo should speak up if they are willing to take this on or if functionality of the rolling hash should live in Bitcoin Core
  • Otherwise reuses much of the code from Secp256k1 which makes it very manageable (impl file has just over 200 lines of code)
  • For others may be hard to get into it as it requires deep knowledge of how Secp256k1 works
LtHash

It introduces a new cryptographic primitve in Bitcoin (lattice cryptography). Talking to people I often heard something like "I don't know much about lattice cryptography". If we want to go this route I think at least a few people need to show interest in getting a better understanding of this or maybe some people step forward that I have not met. Otherwise lack of knowledge in this area might be an issue.

Integration/design discussion

The one design decision that I spent considerable time on was whether to implement the rolling hash as an index (subclass of BaseIndex) or hook the calculations of the hash into ValidationInterface directly. I opted to implement it as an index at the end as it seems like the safer in terms of performance impact on IBD while hooking into ValidationInterface directly this could have turned out to be an issue.

I also thought about using a flat file structure instead of LevelDB but opted to use the database as it gives more options and is easier to implement when using BaseIndex.

Use Cases

gettxoutsetinfo RPC call/P2P layer

The gettxoutsetinfo RPC call is the current way to get a hash of the UTXO set. However, as mentioned before, it can take several minutes to compute which may be surprising to some users (there is no feedback while the calculation is running) and makes usage of the call less than ideal. There are other values returned in the call that would not need this much compute time.

Given that the current hash value can be cached and returned instantly, exposing the hash of the UTXO set on the P2P layer is now an option if other new features want to build on top of this.

AssumeUTXO

The rolling hash can be used to validate snapshots in AssumeUTXO. See discussion in proposal.

BTCPayServer FastSync

BTCPayServer offers a FastSync feature where you can download a signed UTXO set snapshot from Nicolas Dorier. The UTXO set hash serves as a security check to make sure your UTXO set is correct but it has to be done manually.

Acknowledgements

Special thanks to Jonas Nick and Chaincode Labs for their support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment