Skip to content

Instantly share code, notes, and snippets.

@debasishg
Last active February 17, 2024 13:12
Show Gist options
  • Star 60 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save debasishg/ac04695a1f221ea5ce74cdfaf7093edc to your computer and use it in GitHub Desktop.
Save debasishg/ac04695a1f221ea5ce74cdfaf7093edc to your computer and use it in GitHub Desktop.

Basics

  1. The Log-Structured Merge-Tree (LSM-Tree)
  2. B-Tree vs Log-Structured Merge-Tree
  3. Modern B-tree techniques
  4. LSM-based Storage Techniques: A Survey

B-trees and CPU Caches

  1. B-tree Indexes and CPU Caches by Goetz Graefe and Per-Åke Larson
  2. Cache-Conscious Index Structures for Main-Memory Databases by Vilho Raatikka
  3. Making B+-Trees Cache Conscious in Main Memory by Jun Rao and Kenneth Ross

Scaling write intensive key-value LSM trees

  1. Monkey: Optimal Navigable Key-Value Store by Niv Dayan et. al.
  2. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging by Niv Dayan et. al.
  3. The Log-Structured Merge-Bush & the Wacky Continuum by Niv Dayan et. al.
  4. Scaling Write-Intensive Key-Value Stores by Niv Dayan

Research Articles

  1. Stratified B-trees and versioning dictionaries
  2. Reducing Bloom Filter CPU Overhead in LSM-Trees on Modern Storage Devices
  3. Optimal Bloom Filters and Adaptive Merging for LSM-Trees
  4. SHaMBa: Reducing Bloom Filter Overhead in LSM Trees
  5. A Comprehensive Performance Evaluation of Modern in-Memory Indices
  6. FPGA-Accelerated Compactions for LSM-based Key-Value Store
  7. Revisiting the design of LSM-tree Based OLTP storage engine with persistent memory
  8. LB+Trees: optimizing persistent index performance on 3DXPoint memory
  9. SLM-DB: Single-Level Key-Value Store with Persistent Memory - this inspired LotusDB
  10. Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design
  11. External-memory Dictionaries in the Affine and PDAM Models
  12. A High Throughput B+tree for SIMD Architectures
  13. TreeLine: An Update-In-Place Key-Value Store for Modern Storage
  14. FD-Tree: a Tree Index on Solid State Drives
  15. Tree Indexing on Solid State Drives
  16. Tree Indexing on Flash Disks
  17. Revisiting B+-tree vs. LSM-tree

Cassandra

  1. Trie Memtables in Cassandra
  2. How Cassandra Stores Data: An Exploration of Log Structured Merge Trees

Percona / TokuDB

Percona XtraDB is an enhanced version of the InnoDB storage engine, designed to better scale on modern hardware.

  1. Write Optimization: Myths, Comparison, Clarifications
  2. Write Optimization: Myths, Comparison, Clarifications - Part 2
  3. How TokuDB Fractal Tree Indexes work
  4. TokuMX Fractal Tree(R) indexes, what are they?
  5. A Comparison of Fractal Trees to LSM Trees
  6. Percona Live Slides and Video Available: The Right Read Optimization is Actually Write Optimization
  7. An article explaining B-trees, LSM trees and Fractal tree indexes

ScyllaDB

An open-source distributed NoSQL wide-column data store. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies.

  1. The taming of the B-Trees

DuckDB

An in-process SQL OLAP database management system, in process, serverless and optimized for analytics.

  1. Persistent Storage of Adaptive Radix Trees in DuckDB

SplinterDB

SplinterDB is a key-value store from vmware designed for high performance on fast storage devices.

  1. SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores
  2. SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Policy
  3. SplinterDB - a key-value store designed for high performance on fast storage devices from vmware

PolarDB

Alibaba Cloud PolarDB is a cloud-native relational database service that decouples computing resources from storage resources and uses integrated software and hardware to provide secure and reliable services with high performance, auto scaling capabilities within seconds, and a large storage capacity.

  1. PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba
  2. Review: PolarDB-SCC: A Cloud-Native Database Ensuring Low Latency for Strongly Consistent Reads
  3. PolarDB - cloud native distributed sql database from ali baba

LotusDB

LotusDB is the most advanced key-value store written in Go, extremely fast, compatible with LSM tree and B+ tree, and optimization of badger and bbolt.

  1. LotusDB—A fast kv database in Go

FoundationDB

An open source transactional key value store created more than ten years ago. It is one of the first systems to combine the flexibility and scalability of NoSQL architectures with the power of ACID transactions.

  1. FoundationDB: A Distributed Unbundled Transactional Key Value Store
  2. How FoundationDB works and why it works

OrioleDB

OrioleDB is a new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities and performance to the world's most-loved database platform.

  1. OrioleDB – building a modern cloud-native storage engine

RocksDB

RocksDB is a key-value store targeting large-scale distributed systems and optimized for Solid State Drives (SSDs)

  1. Disaggregating RocksDB: A Production Experience
  2. Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience

VectorDB / Astra(Cassandra)

Vector search integrated with Astra

  1. What is a Vector Database
  2. 5 Hard Problems in Vector Search, and How Cassandra Solves Them
  3. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
  4. FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment