Rolling UTXO Set Hash proposal
I have picked up Pieter Wuille's proposal
from 2017 to use a rolling hash for the UTXO set hash. It deals with the problem of
a long computation time of the UTXO set hash which results in a slow RPC call
gettxoutsetinfo (can take several minutes depending on hardware). I investigated
three hash functions: MuHash, ECMH and LtHash and started implementing them
in Bitcoin Core for comparison. However only MuHash has a rolling hash implementation
so far and my LtHash code is not as optimized as MuHash and ECMH. I am looking for
feedback on concept, which choice to make for the hash function and implementation details
before filing a PR to Bitcoin Core.
- Motivation and prior work
- Use Cases
Motivation and prior work
The RPC call
gettxoutsetinfo returns (among other things) a hash of the whole
UTXO set which can be used as an efficient way to compare your UTXO set with your
peers. Since the hash is calculated in real time this call can take several
minutes depending on your hardware. A solution to this was presented on the
Bitcoin Mailing List by Pieter Wuille.
He discusses MuHash and ECMH as potential homomorphic hashing function solutions to implement
the UTXO set hash as a rolling hash which could be efficiently updated with every
new block by adding hashes of new UTXOs and removing hashes of spent UTXOs. The
resulting hash of each new chainstate could be cached and returned instantly without
spending CPU cycles.
Pieter followed up with a PR that implemented MuHash in Bitcoin Core but not as a rolling hash, yet. This was certainly meant as an intermediate step to a rolling hash but did not receive enough reviewer attention in order to get merged.
Tomas van der Wansem implemented ECMH in Secp256k1 but the PR has remained open after some good discussion.
Hash function choices
I looked at each function's security, performance and maintainability.
From Pieter's Mailing list message:
- On MuHash: "Very common security assumption. Even if the DL assumption would be broken (but no k-sum algorithm faster than Wagner's is found), this still maintains 110 bits of security."
- On ECMH: "Identical security assumption as Bitcoin's signatures."
According to Facebook researchers the current choice of specialization of LtHash (1024 16bit lattices) should offer over 200 bits of security.
Facebook research suggests that smaller LtHash with smaller parameters would not offer sufficient security.
Add operation benchmarks
|MuHash (optimized; bitcoin benchmark)||1.9878e-05 s||2.2394e-05 s||2.0323e-05 s|
|ECMH (optimized; secp256k1 benchmark)||1.3700e-05 s||1.4300e-05 s||1.4000e-05 s|
|LtHash (non-optimized; bitcoin benchmark)||11.1928e-05 s||11.5482e-05 s||11.3306e-05 s|
|MuHash (non-optimized for LtHash reference)||2.5360e-05 s||2.9062e-05 s||2.6299e-05 s|
All benchmark created on my personal laptop (3 year old macbook pro) on 09/03/2019.
The slow performance of LtHash has only limited value for comparison. First of all the security of the current configuration of LtHash that I am using is higher than what is used in MuHash or ECMH (estimated over 200 bits of security). There are currently also no optimizations being used as there are in MuHash and Secp256k1.
TODO: Since the rolling functionality is only implemented for MuHash so far I am not able to show good comparison benchmarks for IBD. Also I did not have time create reliable benchmarks for IBD with MuHash vs. without MuHash.
Much of this is a matter of perspective so here are my personal thoughts on maintainability pros and cons of the different hashing options:
- Introduces new but common security assumption into Bitcoin Core
- Basic concept is easy enough to grasp but implementation including hand-optimized assembly is a bit more complicated (implementation has about 320 lines of code including optimizations)
- Is part of Secp256k1 so maintainers of that repo should speak up if they are willing to take this on or if functionality of the rolling hash should live in Bitcoin Core
- Otherwise reuses much of the code from Secp256k1 which makes it very manageable (impl file has just over 200 lines of code)
- For others may be hard to get into it as it requires deep knowledge of how Secp256k1 works
It introduces a new cryptographic primitve in Bitcoin (lattice cryptography). Talking to people I often heard something like "I don't know much about lattice cryptography". If we want to go this route I think at least a few people need to show interest in getting a better understanding of this or maybe some people step forward that I have not met. Otherwise lack of knowledge in this area might be an issue.
The one design decision that I spent considerable time on was whether to implement the
rolling hash as an index (subclass of
BaseIndex) or hook the calculations of the
ValidationInterface directly. I opted to implement it as an index at
the end as it seems like the safer in terms of performance impact on IBD while
ValidationInterface directly this could have turned out to be an issue.
I also thought about using a flat file structure instead of LevelDB but opted to
use the database as it gives more options and is easier to implement when using
gettxoutsetinfo RPC call/P2P layer
gettxoutsetinfo RPC call is the current way to get a hash of the UTXO set.
However, as mentioned before, it can take several minutes to compute which may be
surprising to some users (there is no feedback while the calculation is running)
and makes usage of the call less than ideal. There are other values returned in the
call that would not need this much compute time.
Given that the current hash value can be cached and returned instantly, exposing the hash of the UTXO set on the P2P layer is now an option if other new features want to build on top of this.
The rolling hash can be used to validate snapshots in AssumeUTXO. See discussion in proposal.
BTCPayServer offers a FastSync feature where you can download a signed UTXO set snapshot from Nicolas Dorier. The UTXO set hash serves as a security check to make sure your UTXO set is correct but it has to be done manually.
Special thanks to Jonas Nick and Chaincode Labs for their support.