Skip to content

Instantly share code, notes, and snippets.

@phyro

phyro/nimble.md Secret

Last active April 10, 2021 09:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phyro/347f16ba4dfd32ca63c18258681c7111 to your computer and use it in GitHub Desktop.
Save phyro/347f16ba4dfd32ca63c18258681c7111 to your computer and use it in GitHub Desktop.
Nimble node

Summary

A Grin full node is already lighter than a Bitcoin full node because of its cut-through ability, but the current implementation of a full node is still too heavy to run on mobile devices. A mobile full node could be possible by introducing a new flag ./grin --nimble that would run a full node with prunning.

Motivation

Most blockchains require storing large amounts of data to run a full node. This is why SPV nodes became the way to participate in the network when using a mobile phone. The difference between a full node and an SPV node is that the SPV node only validates a subset of rules which means it requires more trust than a full node. Having the ability to fully verify the network state on a mobile phone would enable easier onboarding for many more users while keeping full validation.

Reference-level explanation

The main idea is to reduce the size of the data by prunning rangeproofs and merging kernels commitments. Once a rangeproof has been validated a node no longer needs it so it can be pruned. The prunning of rangeproof reduces the size of a UTXO from ~700 bytes to 33 bytes. UTXOs should now be small enough, but we still have a long sequence of kernels that each transaction left behind. Here we use a similar trick that Mimblewimble uses to validate the supply. Given two signed curve points y*G and z*G that sum to (y+z)*G, we know that we collectively know the private key y+z because y*G and z*G have been signed. This means that if we have two kernels with commitments y*G and z*G, once we have validated their signatures, we can sum their public keys to obtain a curve point (y+z)*G and pretend we have a signature for this point. Note that we can do this because we have validated the signature of both points and so we only need to trust the validation process which does not bring any new assumptions. This means that we can reduce two kernel commitments into a single one. Below we will introduce a made up kernel commitment we call a kernel commitment accumulator that is a sum of multiple kernel commitments that came with a valid signature. Even though we won't have a signature for the kernel commitment accumulator point, we can assume we do because we have seen the signatures for the points that sum up to the accumulator point.

Initial sync

We assume parallel IBD is implemented. We start off with a kernel commitment accumulator that starts with a curve point 0*G. When syncing with Parallel IBD we obtain kernel chunks through GetKernelChunk calls which gives us a subset of kernels that can be validated separately. Once we have validated a kernel chunk we can add their commitments together to obtain a single kernel commitment and add that to our kernel commitment accumulator. For outputs we create a set of Pedersen commitments that starts of as an empty set. In the IBD phase we download the output chunks through GetOutputChunk calls and once we have validated the chunks we can drop the rangeproofs and put the Pedersen commitments in our set. After parallel IBD our node is synced up until the sync horizon point. From that point on we receive each block separately and validate it.

NOTE: NRD kernels are special because their validation depends on another kernel instance. This means that we can't just sum NRD kernels because we wouldn't be able to validate them which means their validation needs to happen separately. A possible solution to this is to keep all NRD kernels intact and validate them with a moving window NRD validation check after we got all of them. Those that have been validated (not necessarily all of them because a NRD relative kernel could appear at horizon_point + 1 block) can be added to the kernel accumulator. There are better solutions to handling NRD kernel validation but they could be implemented later on.

Block validation

We define a Pedersen commitment set to be a set that is an evaluation of a sequence of set computations. We start off with a set that we obtained with IBD up to the horizon point. To compute the latest UTXO state we loop through each block from the horizon onwards and remove the inputs that have been used in the block from the set and add the outputs that were introduced in the block to the set. We need to keep the inputs and outputs for the blocks from horizon point onwards in case of a reorg.

When a new block arrives we validate it and check if the set of inputs is a subset of our Pedersen commitment set (we know that they have valid rangeproofs because they are in our set). Once the block has been validated we keep its contents but can prune the rangeproofs that came with the outputs and merge the kernel commitments.

For a block that has gone out of the horizon window, we remove its inputs from the starting Pedersen commitment set and add its outputs to the starting set.

Drawbacks

We need to make it clear that a nimble node is a different type of node that can't bootstrap other nodes. This is because it does not contain all the data needed for full validation. I also think that rangeproofs are needed to identify your UTXOs but I'm not sure about this part.

Given that many new nodes could come from mobile phones, the ratio of regular full nodes vs nimble nodes would probably start moving towards nimble nodes dominating over time. How does that impact the network stability/health? Are there any new sync attacks that are made possible through nimble nodes? What should the peer connection ratio of regular/nimble nodes look like for a mobile node and what for a regular full node?

@antiochp
Copy link

antiochp commented May 3, 2020

Thanks for writing this up. This has been discussed in various forms before but I don't think anybody actually sat down and took the time to write this out.

We have 4 MMRs on disk - output, rangeproof, kernel and header.
And 1 MMR currently held purely in memory - the UTXO bitmap

Nodes currently store all headers, all kernels and all unspent outputs and rangeproofs.

We technically do not need to store headers or kernels once they have been validated. So a nimble node could potentially discard these (presumably keeping some recent history around to handle fork/reorg as necessary).

The MMR data structure effectively allows leaves (the data) to be pruned and replaced with hashes (parent nodes) and we can do this because the MMR is strictly append-only. So we can take advantage of the MMR here to prune old data, leaving parent hashes in place, and allowing the full MMR to continue to be validated in terms of the root hash.

So a nimble node can aggressively prune historic data from the kernel and header MMRs and still fully validate.

Rangeproofs get pruned over time (once the outputs are spent). But in a nimble node these can be pruned more aggressively as well. We only need to validate rangeproofs when the outputs are first created (or processed during IBD etc.)

And we can take advantage of the MMR structure here as well. The rangeproof MMR root can continue to be fully validated even if rangeproofs associated with unspent outputs are pruned locally.

@antiochp
Copy link

antiochp commented May 3, 2020

Regarding NRD kernels: Nodes will always need to maintain some recent history of full blocks to support rewind/reorg/forks.
NRD kernels may require a larger period of recent history, but only 2 weeks of data total (1 week of reorg support + 1 week of NRD locks).

@phyro
Copy link
Author

phyro commented Jul 2, 2020

@antiochp I am so sorry, I missed your comments - I seem to have forgotten I shared this gist. After some ping-pong people told me that full nodes would likely drain the battery way too soon which I assume is true so I stopped thinking about this. I thought about it again today and I still think it's a good idea to have this option as it allows querying blockchain information in a trustless way e.g. you run a nimble Grin stack locally:
1.) nimble node
2.) nimble explorer

and you don't need to trust external explorers e.g. BitGrin explorer which (from what I understood) lied to people. The plus here is clearly that it is a minimal fully verifying stack (preserves trust minimization) and can be run by anyone.

@antiochp
Copy link

antiochp commented Jul 2, 2020

There is definitely value in having a "nimble" node along the lines of what you suggest. Nodes on the network do not all necessarily need to be able to provide all data to other nodes.
I am interested in seeing what the lightest-weight fully validating node would look like, even if its not currently possible to do this on a mobile device.

@phyro
Copy link
Author

phyro commented Apr 10, 2021

Just dropping the link to the gist which I believe answers how the lightest-weight fully validating node would look like https://gist.github.com/phyro/0ad4ccf71e936dd90545648110224e96
This is only for keeping links to discussions/ideas for when we decide to actually tackle this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment