Skip to content

Instantly share code, notes, and snippets.

@gavinandresen
Last active February 20, 2021 09:26
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gavinandresen/639ffe092de1a65de755add226c1dfe4 to your computer and use it in GitHub Desktop.
Save gavinandresen/639ffe092de1a65de755add226c1dfe4 to your computer and use it in GitHub Desktop.
Balanced Merkle Forest design

I want a cryptographic accumulator1 that has the following properties:

  • It can be build incrementally using a small amount of persistent memory
  • Witness proofs are reasonably small.
  • Witness proofs are computationally easy to verify.
  • Must have at least 100-bit security.

The idea is to minimize on-chain storage; the accumulator will live on-chain, maintained by a smart contract. Transactions submitted to the smart contract can add to the accumulator. And then later transactions can be created that include proofs that a value is part of the accumulator.

For example, a lottery contract might work as follows:

Transactions are submitted to a lottery contract, each paying for one (or more) lottery 'tickets'. The smart contract adds each 'ticket' to the accumulator, but does not remember every entry individually -- they all get mushed together.

At some point, one or more winners are chosen (this is non-trivial, but I won't go into details for this gist). The winner then has a certain amount of time to claim their winnings, by submitting a proof that their 'ticket' is part of the accumulator. If the ticket is a winner and the proof is valid, the contract pays out.

The reason to use an accumulator and not just store every entry is scalability; blockchain storage is expensive, and a lottery that cost less than a cent to enter is better than one that costs a dollar (or more if transaction fees are high).

There are many academic papers on cryptographic accumulators. The simplest accumulator is a merkle tree, but a simple merkle tree doesn't work for our application because it can't be built incrementally2.

There are wicked-cool accumulator schemes that rely on pairing based cryptography, and one of them would probably meet my criteria, but after spending a lot of time researching and thinking I decided not to go that route for two reasons: 1) I don't think I'm smart enough to get the code right. 2) CPU costs for a pairing operations (in ethereum) are just as bad as costs for on-chain storage.

Merkle trees are almost perfect, I just need to be able to build them incrementally. Here is how:

Instead of storing one merkle root as the accumulator, store log2(N) merkle roots, where N is the number of items added. Merkle root 'i' will contain either nothing or will be a balanced merkle tree containing 2i items.

This structure can be built incrementally; the algorithm to add an entry is just a few lines of code:

if there is nothing at merkle_roots[0]:
  set merkle root 0 to the item being added (remember the new item)
otherwise:
  build a new 2-item balanced merkle tree using the new item and merkle_roots[0]
  set merkle_roots[0] to 'nothing'
  repeat this algorithm with the new 2-item tree and merkle_roots[1] 

merkle_roots[] is essentially counting in binary; after the first item is added, merkle_roots[0] will contain it. After eleven items are added (binary 1011), merkle_roots[3] will be the root of an 8-element balanced merkle tree, merkle_roots[2] will be empty, merkle_roots[1] will have 2 items, and merkle_roots[0] will have the last item added.

Anybody watching all of the accumulated values can easily create a full merkle tree, and create merkle paths up to one of the merkle_roots.

Continuing our lottery example: a lottery with four billion entries (one for almost every person in the world) would need just 32 entries in the merkle_roots[] array (32 hash values, about 900 bytes). A winner's proof would also be 32 hash values, and validating the proof is just 32 hash operations.

Removing a value from the accumulator, given a merkle path proving that the value is part of the accumulator, is also easy-- all the information needed to update the accumulator is in the merkle path. The remove algorithm is:

 Use the path to verify that the value is in the accumulator; if it is not, return 'fail'
 set i = length of the path
 set roots[i] to 'nothing'
 for j in 0 to length of the path:
     combine path[j] with roots[j] using the add_item algorithm
 return 'success'

For example, start with eleven items in the accumulator, 0 through b. The merkle forest will have three trees-- one with eight items, one with two, and one with one. The accumulator will look like:

      h12345678        h9a        hb

... where hb is the hash of the last item, h9a is the root of the 2-element hash tree containing items 9 and ten, etc.

The merkle path proof for item number 3 can be just the array of hashes: [h4, h12, h5678] ... if we use a hash-combining function that doesn't care about the order in which siblings are hashed (if the order does matter then the merkle path has to keep track of left/right siblings). I use if hash1 < hash2: return hash(hash1+hash2) else return hash(hash2+hash1). Hash together item number three and [h4, h12, h45678] and you get the root hash h12345678, proving that item 3 is in the accumulator.

Add a twelfth item 'c', and it first gets combined with 'b' and then '9a', and the new accumulator state is:

    h12345678        h9abc

The merkle path proof for item number 3 is the same as before: [h4, h12, h5678]. Removing it means the h12345678 root disappears, and the sub-trees h4, h12, and h5678 get re-added:

    h12345678        h9abc   <-- accumulator state before removal
    h9abc                    <-- remove the root containing item 3
    h9abc    h4              <-- add the first item in the merkle path, h4
    h9abc    h12    h4       <-- add h12
    h56789abc    h12    h4   <-- add h5678, it combines with h9abc and we're done with removal.

Accumulator roots can change with every add or remove, invalidating any merkle paths that might be part of transactions waiting to interact with the smart contract (not a problem for a lottery, with distinct 'enter' and 'payout' time periods). I have created another gist describing how old paths can be updated as items are added or removed.

References

Efficient Asynchronous Accumulators for Distributed PKI : 2016 paper I found that also describes the 'balanced merkle forest' data structure (but they don't use that name).

Footnotes

Footnotes

  1. A cryptographic accumulator is a weird representation of set of values. You can add lots of values to the accumulator, which mixes them all together in a clever way so they take up many fewer bits than just storing a list of all the values. And you can create a "witness" proof for a value that is in the set that proves to somebody who only has the accumulator that the value was, indeed, added to the accumulator.

  2. That's not actually true; you can build an extremely unbalanced merkle tree incrementally, but then one of our other criteria (proof sizes must be reasonable) is violated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment