phyro/pruning_grin.md Secret

## pruning_grin.md

      
    Raw
  

              pruning_grin.md
            
          
    Summary

A Grin full node is already lighter than a Bitcoin full node because of its cut-through ability. We are able to run the full node on mobile device today, but in order to make it possible to keep it running as the chain size grows, we will likely need to shrink the requirements to run a full node otherwise it might become too heavy to run one or you'd need an expensive mobile device which not everyone can afford. A lighter version of a full node could be possible by introducing a new flag ./grin --prune that would run a full node with pruning enabled. A pruning node is also good for devices that have lower disk space available. There are other use cases where a lighter full node could be convenient e.g. cross-chain atomic swaps. For making an atomic swap, you probably don't want to keep the whole chain history around, you likely want as little data as needed to validate your transaction.
Motivation

Most blockchains require storing large amounts of data to run a full node. This is why SPV nodes became the way to participate in the network when using a mobile phone. The difference between a full node and an SPV node is that the SPV node only validates a subset of rules which means it requires more trust than a full node. Having the ability to fully verify the network state on a mobile phone enables easier onboarding for many more users in a trustless way. While we already have a mobile full node implementation for Grin, we need to make sure this is still possible as the chain size grows.
Reference-level explanation

This document describes a way to prune all the 5 MMRs that Grin has while keeping the ability to fully validate new blocks.
MMR is an append-only data structure which means that the past entries never change. Every new entry that gets added will either build on top of the existing mountains or create a new mountain.
The main idea is that given an MMR, we can prune everything apart from the MMR peaks and still be able to fully validate the state transition to the new MMR peaks with some additional information.
To give an example, given the following MMR, we can prune everything except for the MMR peaks:
           unpruned                                 pruned
           
              14                                      14
            /    \                    
           /      \                   
          /        \                  
         /          \                       =>
        6            13               
      /   \        /    \             
     2     5      9     12     17                                      17
    / \   / \    / \   /  \   /  \    
   0   1 3   4  7   8 10  11 15  16 18                                      18

We show how to prune all the 5 MMRs that Grin uses and describe how to fully validate the history and new blocks even when the node only holds the information about the MMR peaks.
Pruning rangeproofs

When we have validated a new block, we add the new rangeproofs to the rangeproof MMR as we would have done in a non-pruning node, but as soon as we finished processing the new block we can prune everything except the MMR peaks right away. The past rangeproofs are never needed for further validation which is why we are able to prune everything apart from the peaks and retain full validation.
Pruning kernels

In order to get rid of the kernels, we use a similar trick that Mimblewimble uses to validate the supply. Given two signed curve points y*G and z*G that sum to (y+z)*G, we know that we collectively know the private key y+z because y*G and z*G have been signed. This means that if we have two kernels with commitments y*G and z*G, once we have validated their signatures, we can sum their public keys to obtain a curve point (y+z)*G and pretend we have a signature for this point. Note that we can do this because we have validated the signature of both points and so we only need to trust the validation process which does not bring any new assumptions. This means that we can reduce two kernel commitments into a single curve point that represents their sum. We use this trick to only keep the kernel MMR peaks along with their sums around.
Since MMR is an append only structure, we know that kernels have an order from oldest to newest. We only need to keep the leaves of the kernels younger than 2 weeks. This means that we can prune most of the kernel MMR and make it a PMMR. If we assume an average full block would add 0.24 MB of kernels, then for a week we get to 0.24 * 60 * 24 * 7 = 2419 MB which is ~2.5GB of kernels. We could improve this by pruning signatures and only keeping the kernel commitment points which would get us to around 800MB of data. We can do better here as well. We don't need to store all the past kernels. If we ignore NRD kernels for a moment, we (again) only need to store the MMR peaks BUT for every peak, we need to know the sum of the kernel commitments in the subtree it represents. When we merge two peaks into a new parent peak, we sum the children's values and assign this sum to the parent peak. Once we have this, we know the sum of this peak and can prune the children. This brings us to the same log(N) number of peaks for the kernel PMMR for each block which should reduce the state we save to ~10 MB or so which is a huge space saving! The only exception to this are the NRD kernels which should be handled separately and we could use the other technique for these, so keeping a separate set that only contains the kernel commitments and drops the signatures.
Alternative kernel storage implementations

If storing the sums for each peak is not simple to do, we could instead just remember the sum of all the peaks for each block header. We start off with 0*G accumulator value and keep adding the sum of the kernels a block produces to this accumulator. To take reorgs into account, we only need to remember the accumulator value at each height for the past week.
Pruning headers

The Header MMR can also be pruned. Similarly as every other MMR structure, the new block headers get added to the right side of the MMR, so we can keep only the last K headers and prune old ones. We don't need to do any further optimizations to this because this will be a small constant since the size of a block header is ~415 bytes. This means that the whole week of headers is 415 * 60 * 24 * 7 = 4183200 bytes which is ~4.2 MB.
Pruning UTXO set

We compressed the state of 3 out of 5 MMR structures, the only two we did not are the Output PMMR and the Bitmap MMR. We can reduce also these two to keep only their peaks. Utreexo describes a dynamic hash based accumulator where the idea is that a full node does not need to keep all the outputs locally, instead, it just stores the hash of the utreexo root and the transactions provide the merkle inclusion proofs which convince us that the inputs indeed exist in the UTXO set and have not been spent.
We are in luck here because we already have all the data structures we need to get this functionality without the need to add the Utreexo accumulator!
Assuming we have only the MMR peaks, then given a transaction where every input comes with an inclusion proof for the output PMMR and the inclusion proof for the bitmap MMR, then we can:

prove - Given an output, we can prove it is in the output MMR by computing the merkle inclusion proof and adding it along the input in the transaction. The solution we propose does not change the way a wallet constructs the transaction, this remains the same, instead the archive nodes provide the inclusion proofs.
verify - Given a merkle inclusion proof for the output MMR, MMR peaks are enough to verify the output exists
prove not spent - Given a merkle inclusion proof for the bitmap MMR, MMR peaks are enough to verify the output was not spent
add - Given the previous MMR peaks, we can continue to add elements to the MMR and prune everything but the MMR peaks (because MMRs are append-only)
update Bitmap MMR - Bitmap MMR is the only MMR that we mutate. But to produce a new state of the bitmap MMR, it is enough to know the inclusion proof of the leaf and with this, we can update the path to the root of the peak.

I think that we don’t need to support the “delete” operation that Utreexo supports, because the inclusion proof for the Bitmap MMR can already answer whether an output has been ‘deleted’ or not. Similarly, this inclusion proof is enough to update the state of 'deleted' outputs.
We have now reduced also the data needed to validate the output MMR and bitmap MMR transitions. Both scale with the number of MMR peaks per block which is at most logarithmic with respect to the MMR size. In the same way that we need to keep all the MMR peaks for kernel MMR and rangeproof MMR for the last block, we'd also need the MMR peaks for both the output PMMR and the bitmap MMR. This brings us to the whole state of less than 100 MB (does not account for the majority of the kernels being NRD!) for almost arbitrary size blockchain and we would still be able to fully validate the node.
Block witness

As mentioned above, this would not require the wallet to provide the inclusion proof. In fact, this would not work because the inclusion proof could change before the transaction lands in a block and would hence need to be updated by others. Instead, a simpler way to approach this is to have a function that computes the inclusion proofs needed for a block GetBlockWitness(blockHash: BlockHash) => BlockWitness. Block witness would be a compact representation of both output MMR and bitmap MMR inclusion proofs. We say compact because the more inputs the block transaction has, the higher the chance of an overlap between the inclusion proofs. We only need to store each path vertex only once. For any subsequent occurrence, we can refer to it with a shorter identifier that would save space. In the naive implementation, we'd add 20*32 bytes (output MMR) +  10*32 bytes (bitmap MMR) per input - assuming the average proof path of 20 and 10 respectively.
To support this, we would need to add a support for a new pair of P2P messages:

GetBlockWitness(blockHash) - p2p request message for a block witness
BlockWitness - a compact block witness representation for a block

Upon receiving a block, a pruning node would pick a random peer with the capability of an archive node and send a p2p request message for the block witness.
Handling chain reorganization

Having only the latest MMR peaks is not enough information to recover from a chain reorganization. The node would need to know the state of the MMR peaks for every MMR at a given block hash. Given the size of the MMR, it is possible to tell how many MMR peaks (the number of merkle trees) it will have. The number of trees (MMR peaks) scales logarithmically with the size of the MMR, so we need to store at most log(N) MMR peaks to represent the "MMR peaks state" for a certain block. Assuming the worst case where a block has exactly log(N) trees, it takes log(N)*32 bytes. We have 60 * 24 * 7 = 10080 blocks per week so if the size of the MMR was 50 million leaves, then we'd have ~26 MMR peaks in the worst case. This means that to represent the MMR peak state, we would need additional 26 * 32 = 832 bytes that would need to be stored for each block and 832 * 10080 = 8386560 bytes which is ~8.4 MB of data needed to represent the MMR peaks for all the blocks in the last week (regardless how full the blocks are).
We don't need to store a whole week of MMR peaks, a pruning node could define a parameter K that would define how big reorg it would be able to handle. If the reorg is bigger than K blocks, then the pruning node starts syncing again from scratch.
Initial sync

We assume parallel IBD is implemented.
As we are downloading the MMR data, we can also prune the state that is no longer needed. This means that when a tree has been validated completely, the children can be pruned.
After we finished the PIBD and our node is synced up until the sync horizon point from where we receive the full blocks and additionaly, the with inclusion proofs and keep pruning whatever we can, keeping only the MMR peaks for the reorg scenario. Once we reach the head of the chain, we validate a new block by using the MMR peaks and the block witness. We would need to decide how to communicate with a wallet which outputs belong to us, this is still an open question.
NOTE: NRD kernels are special because their validation depends on another kernel instance. This means that we need to keep the kernel commitments (signatures can be pruned) around until they are out of the window of 2 weeks - that is 1 week in case of a week long reorg + 1 week for the NRD window.
Instant full validation

Given a block header at horizon point, we can obtain a provable information on what the peaks of each MMR are. We simply need a list of hashes that hash to the committed root hash. An adversary would need to find a hash collision to fool us. We could start validating new blocks under the following assumption "The block header at horizon point has a valid chain history". This would allows us to start processing blocks instantly. This is possible because we don't need to check the full balance of the chain every time a new block comes in. A new valid block (balanced sums and inputs exist) will always preserve the grand formula balance (I believe this can be shown with an induction proof). This means that we can start processing blocks from the horizon point while assuming the past is balanced. Meanwhile, we ask for the kernel sums and the utxo set in the background (in reverse of the merkle tree to find invalid entries sooner) to make for a smoother user experience. This would make for an instant optimistic sync that is verified in the background. Such a sync never gets worse because it is bound to download a week of blocks while also having a very high reliability because the assumption we start with has a week worth of energy spent on top of it.
Network impact

There are two parts that a node can help share with other nodes. The history of the blockchain and the present action at the head of the chain. A pruning node can't help with the history of the chain or the current UTXO set because it aggressively prunes all the data, but it can help with the block propagation at the head. This includes propagating the block witness to other pruning nodes.
Pruning node - block propagation

A pruning node would apart from the mentioned MMR peaks for the last K blocks also keep the last K blocks and their block witness. This makes them as helpful as the archive nodes for keeping everyone being able to validate the head of the network. This does add some storage requirements though, but if we kept the last 60 blocks, it would likely be less than 200 MB of data.
Bandwidth

Inclusion proofs take up space. The additional bandwidth here is the block witness message that all the pruning nodes need to receive. Pruning nodes being able to send these to each other helps, but we'd need actual data points to measure the additional bandwidth they introduce and think how this propagates through the network.
Network topology

TODO

How should the nodes be connected to each other?
Should a node try to be connected to at least some archive nodes?
What happens when the network has many more pruning nodes than archive?
What happens if a pruning node is not connected to an archive node? I guess it can't be seeded, but others should be able to propagate blocks and block witnesses

Further possible improvements

MMR peaks overlap

We need to store the MMR peaks for every block that happened in the last week for both Kernel PMMR and RangeProof PMMR. Due to the persistent nature of MMRs, there will be a lot of overlap in the MMR peaks. We could, like with the block witness, reference each hash with 32 bytes only once and use a shorter identifier as a pointer to these hashes. I don't think this is needed since we probably don't expect 1 week of MMR peaks for all the blocks to take up much space, so a simpler code would be prefered in this case. But it is possible to futher reduce this size if ever the need arises.
In-memory MMRs

Perhaps we could minimize I/O operations by keeping all the MMRs in memory to speed up the everything including IBD. Possible benefits have been mentioned in section 3 in the Utreexo paper. This could also be done later if needed.
Drawbacks


wallet restore and other commands that rely on knowing the data that is no longer around can't be used without resyncing

We need to make it clear that a pruning node is a different type of node that can't bootstrap other nodes. This is because it does not contain all the data needed for full validation.
Given that many new nodes could come from mobile phones, the ratio of regular full nodes vs pruning nodes would probably start moving towards pruning nodes dominating over time. How does that impact the network stability/health? Are there any new sync attacks that are made possible through pruning nodes? What should the peer connection ratio of regular/pruning nodes look like for a mobile node and what for a regular full node?
Open questions

Consensus changes

How simple and unified can we make the consensus/block validation?
Is it possible to make it differ only on how they get the inclusion proofs? An archive node would call a local function to get these while the pruning variant would make a p2p request. They would both add elements to the MMRs, but after the validation there would be a check
if is_pruning {
  self.prune_mmrs();  // prunes all mmrs (except the block header mmr) up to the peaks
  // pruning_blocks - rolling window list that remembers the snapshot of the mmr peaks for last K blocks
  self.pruning_blocks.append(self.mmrs.snapshot())  // add this new block to the list
}

When do we scan if the outputs in a block belong to us and where do we store them? If we don't do this, it would be handy so that wallets can keep track of these.
How does the current block validation look like? Does it only depend on the mmr peaks?
Can it be done to only depend on the mmr peaks? and for the data where it needs the inclusion proof it calls a function to obtain these. Not sure how the current lookups work.
Restore and other functionality

How does this impact grin-wallet restore or similar commands that rely on the state/data that is no longer available? what options do we have here?
How exactly would a pruning node for a wallet sync? Would the pruning node accept the seeds to try to rewind? Can this be done in a safe way?
Implementation details


Does this mean we need to change the data structure to be Bitmap PMMR and kernel PMMR? Or are these already PMMRs, but we never prune them?
Would we need to introduce an interface for MMR structure and then have two separate implementations of the interface where one uses a database and the other is held in memory?
Would it make sense to define which MMRs to prune in a configuration setting? Here's a couple of options we may have

pruned_mmrs = []
# uncomment below to prune rangeproofs and kernels
# pruned_mmrs = ["kernel", "rangeproof"]
# uncomment below to prune everything and get a constant size node
# pruned_mmrs = ["kernel", "rangeproof", "header", "output", "bitmap"]

# Note: perhaps a simpler option would be to define three history capability modes as a configuration e.g.
retain_history = "full"
# retain_history = "semi"  // prunes kernels and rangeproofs
# retain_history = "minimal"  // prunes everything to get a constant size node

The pruned ones would be held purely in memory. But we'd need to think hard which kinds of jumps between the options would be possible e.g. adding new MMRs to the pruned list means we can just take the peaks from the existing full MMR, but removing an MMR from the pruned list might require a resync/refetch of the data.
Attack vectors

Does this open any DOS vector?
What happens when a chain reorgs > default_K blocks and all the pruning nodes start requesting witness blocks?
Alternative approaches

Should we consider Bridge nodes like in the Utreexo paper?