Skip to content

Instantly share code, notes, and snippets.

@kocubinski
Last active October 26, 2023 21:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kocubinski/e365787a8ee7b1275f46b845fbb8fb0e to your computer and use it in GitHub Desktop.
Save kocubinski/e365787a8ee7b1275f46b845fbb8fb0e to your computer and use it in GitHub Desktop.
iavl v2 design

IAVL v2 design

1 folder per module prefix, each with 2 databases, changelog.db and tree.db. storekey defaults to “root” for a single tree implementation.

receive changeset, write leaves to leaf table in changelog.db, always. this table contains a btree index. the writes are append only on the end.

write all tree nodes to tree.db in the tree_shard_n table. this db has 2 operating modes.

  1. fast this table is without an index. it will not be read until after a checkpoint when the index is built. this means that historical proofs cannot be produced for the current epoch, and that rollbacks cannot be done until an index is built.
  2. slow this table has a btree index. it can be read at any time. write speed will suffer, but historical proofs and rollback are possible immediately after commit.

tree.db has a very large WAL policy to make writes blazing fast.

write all tree nodes evicted by the height/level policy to tree_n table. this table contains a btree index. when using the maximum amount of memory, no nodes are written to this table. when using the minimum amount of memory, it is identical to tree.db/tree_shard_n, and therefore one of them is obviated.

Pruning: track orphans into an unordered table history.db/orphans_n where n is the epoch. during pruning build an index and just join into tree.db/tree_shard_n to delete old versions.

DELETE FROM tree
JOIN orphan on orphan.node_key = tree.node_key
WHERE version < prune_version

Latest mode

  • all changes (including deletes) persisted to changelog.db
  • tree is only materialized to disk at checkpoints
  • a checkpoint:
    1. writes novel nodes (since last checkpoint) to table tree_shard_n where n=checkpoint height. create index on tree_shard_n. tree_shard_n is now readonly.
    2. writes a root node with checkpoint=true
  • SaveVersion writes a root node (with root hash) with checkpoint=false
  • rollback to height n: read most recent root node where checkpoint=true then replay leafs to height n

Road to mainnet

  • Complete latest mode implementation
  • Implement migration code in costor from iavl-v0 -> iavl-v2 from latest pruned state
  • Create & maintain patched osmosis (or hub) importing iavl-v2 instead of iavl-v0
  • Start node process and catch up. checkpoint every 3600 blocks (around 4 times a day given current 6s block speed on both hub and osmosis)

TODOs

  • per storekey mmap estimator. Given a global mmap limit, estimate and divide space per storekey informed by tree size and mean key size. overridable by operator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment