Skip to content

Instantly share code, notes, and snippets.

@pervognsen
Created November 10, 2013 07:45
Show Gist options
  • Save pervognsen/7395107 to your computer and use it in GitHub Desktop.
Save pervognsen/7395107 to your computer and use it in GitHub Desktop.
lsm_tree.c
/*
** MVCC NOTES
**
** The in-memory tree structure supports SQLite-style MVCC. This means
** that while one client is writing to the tree structure, other clients
** may still be querying an older snapshot of the tree.
**
** One way to implement this is to use an append-only b-tree. In this
** case instead of modifying nodes in-place, a copy of the node is made
** and the required modifications made to the copy. The parent of the
** node is then modified (to update the pointer so that it points to
** the new copy), which causes a copy of the parent to be made, and so on.
** This means that each time the tree is written to a new root node is
** created. A snapshot is identified by the root node that it uses.
**
** The problem with the above is that each time the tree is written to,
** a copy of the node structure modified and all of its ancestor nodes
** is made. This may prove excessive with large tree structures.
**
** To reduce this overhead, the data structure used for a tree node is
** designed so that it may be edited in place exactly once without
** affecting existing users. In other words, the node structure is capable
** of storing two separate versions of the node at the same time.
** When a node is to be edited, if the node structure already contains
** two versions, a copy is made as in the append-only approach. Or, if
** it only contains a single version, it is edited in place.
**
** This reduces the overhead so that, roughly, one new node structure
** must be allocated for each write (on top of those allocations that
** would have been required by a non-MVCC tree). Logic: Assume that at
** any time, 50% of nodes in the tree already contain 2 versions. When
** a new entry is written to a node, there is a 50% chance that a copy
** of the node will be required. And a 25% chance that a copy of its
** parent is required. And so on.
**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment