Skip to content

Instantly share code, notes, and snippets.

@rjl493456442
Last active March 15, 2021 08:01
Show Gist options
  • Save rjl493456442/85832a0c760f2bafe2a69e33efe68c60 to your computer and use it in GitHub Desktop.
Save rjl493456442/85832a0c760f2bafe2a69e33efe68c60 to your computer and use it in GitHub Desktop.

Snapshot generation purposal

Martin pointed out the problem that the snapshot needs to be regenerated from the scratch after the snap sync which is really inefficient and unnecessary. So this proposal tries to optimize this procedure by reusing the received flat state data during the sync and amend the outdated states by range prover.

The heal trie idea

In this Martin's proposal, the healtrie is introduced. The basic idea is that during the heal stage, several trie paths will be retrieved in order to fill the gaps in the "incomplete" state trie.

Each healed path can somehow represent that the flat state data(e.g. raw account data, storage slots) it points is incompatible with the current trie. Some mutations are applied between the "snap version" and "heal version" and the state becomes outdated.

So in the healtrie, there are two types of nodes: "hash" and "node". "hash" means that the range of state data covered by this node is correct in the local snap database. And "node" means that the range of state data covered by this path is outdated.

For all the "healed" paths, the regeneration is required, which is basically wiping all the state data in this range, load the correct states by iterating the synced trie and persist them into the snap database.

The snap version refers to the state trie version when the flat data is received. The heal version refers to the state trie version when healing.

But there is a major concern. During the heal stage, the healing target can be switched. Although

the frequency won't be very high(the switch happens very 64 blocks, it's around 16 minutes). But if the healing target is switched, it will be complicated to handle the healing paths belong to the different healing targets.

What's more. We have to consider the restart scenario. If the restart happens in the healing stage, then we should somehow persist the healing paths. And it's not trivial if the persistence is involved.

Also, we have to reinvent a new trie structure in order to meet the requirement of healing trie. I'm not sure it's 100 percent needed, but it's the extra work which we want to prevent.

The range prover based correction

The main idea is quite similar. After the snap sync, 99% state data is correct and unchanged during the syncing, these parts of state can be reused as the snapshot directly.

The effective and easy detection approach should be figured out which can find the "outdated" state segments quickly and then fix them by regeneration.

The range prover sounds like a good candidate. After the snap sync, we have a complete trie and the almost complete flat states. In order to detect the outdated state segments, we can construct the range proofs against the trie root.

Note, generate the edge proofs from the local trie is efficient and iterate the flat states in the database is also efficient(sequential read).

And the algorithm can also be simple. Firstly a reasonable range is picked for verification. The range shouldn't be too large or too small. If the range is too large, then the possibility the range is outdated is very high. Otherwise the prove overhead is not trival. Currently we can pick the 128 for account range and 1024 for storage range by default. A few iteration can be made in order to get a better number.

If the range prover says ok, then the state segment in this range is correct, otherwise, wipe the entire range and regenerate them using fallback method: iterate from the trie.

Whenever the contract is met, the storage task should be created for verification too.

The background correction

In the current generation mechanism, the disk layer is paired with the pivot point trie(the healing target trie). All the following blocks will be executed one by one and the diff layers are accumulated.

If there are too many diff layers, the bottom diff layers will be merged and flushed into the disk. Then the disk layer generation will be restarted with the new target. The entire snapshot tree is bound with the head block.

The new generation mechanism should also meet these requirements. We can do the background correction in this way.

  • Initially, set the genMarker as the nil, it prevents the read from the "incomplete" disk layer
  • Whenever the range prover makes some progress(pass the verification or detect the outdated segments and regenerate them), update the genMarker as well.
  • Whenever there are too many diff layers accumulated, still flush the bottom diff layers and only flush out the updates before the genMarker. The updates after the genMarker will be fixed by the range prover later with new trie.
  • Switch the target trie with new disk layer root

Conclusion

The biggest advantage of this mechanism is to reusing the existent flat state data. Iterate the trie directly is very expensive while iterate the flat state segment and do the range proof is cheap. The assumption is held that 99% state data received during the snap sync is still correct, so only 1% state needs to be regenerated(the same complexity as the original generation mechanism).

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment