Skip to content

Instantly share code, notes, and snippets.

@Dentrax
Last active December 13, 2022 07:55
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Dentrax/bca3aaa6d71826f7495e435797c7e324 to your computer and use it in GitHub Desktop.
Save Dentrax/bca3aaa6d71826f7495e435797c7e324 to your computer and use it in GitHub Desktop.
How etcd defragmentation works?

Abstract

Bolt operations are copy-on-write. When a page is updated, it is copied to a completely new page. The old page is added to a "freelist", which Bolt refers to when it needs a new page. This means that deleting large amounts of data will not actually free up space on disk, as the pages are instead kept on Bolt's freelist for future use. In order to free up this space to disk, you will need to perform a defrag.

The process of defragmentation releases this storage space back to the file system. Defragmentation is issued on a per-member so that cluster-wide latency spikes may be avoided.

Algorithm

  1. lock batchTx to ensure nobody is using previous tx, and then close previous ongoing tx.
  2. lock database after lock tx to avoid deadlock.
  3. block concurrent read requests while resetting tx
  4. create a db.tmp.* file for new db and open it
  5. start defrag
    1. open a tx on tmpdb for writes
    1. open a tx on old db for read
    1. traverse the actual db from first to end using cursor
    1. create a new bucket for each
    1. traverse all keys
    1. start commit, copy the bucket and put
    1. rollback if any error
  6. close all databases
  7. rename tmp db to actual db
  8. observe metrics
  9. release all locks

Notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment