sreeix/LSM-tree.md Secret

## code-refs.md

      
    Raw
  

              code-refs.md
            
          
    C++ Implementations


Rocks Db has 3 implementations for Memtables
LevelDb has a skiplist based implementation of memtable and tables

Java Implementations


[Cassandra] (https://cassandra.apache.org/) has a SSTable Implementation here and memtables
 Fast impl of LSM tree

Erlang Implementation


HanoiDB


## LSM-tree.md

      
    Raw
  

              LSM-tree.md
            
          
    Notes for reading the LSM tree paper


Slow disk access: Disk(not withstanding SSD) are very slow for access compared to memory. Generally established number for this is 3 orders of magnitude. ie. a memory lookup that may take 10 microseconds will take 10 ms when it hits the disk. So all databases will avoid reading/writing to disk as much as possible(think where does mongodb's perf gains come from?)
Five minute rule refers to the Jim gray paper on when to cache an object in memory. Wikipedia, [Paper] (http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf), a later paper Five minute rule, ten years later and a later paper Five minute rule 20 years later and how flash memory changes it.

But from the perspective of the paper. The cost of increasing the Ram size
Intuition for B tree based indexes.
A sparse index, that has the Pointer to the start of the first record on the page, thse
Assume we want to do select * from employees.
We can't just scan every available disk block. So we store all the elements in this relation in some form in a different place on the disk(same/different, typically different ) file. Obviously storing the data here does not make sense, so only pointers to the disk block are stored.
Now we just have to find something like select * from employees where id = 100;
Assume we are querying on the db for something like select * from employees where salary > 10000;
Remember salary is(typically) stored contigiously on the disk based on the employeeid(assuming employeeid is primary key)
Terms


TPC - A bunch of tests suites that are used to test for database performance. Usually you will hear something like "DB2 beats Oracle on the TPCC benchmark"
Transaction Logs - For durability, most databases will first write the changes it is going to perform on to the database to a file. These are transaction logs, also known as Write ahead logging (WAL). Because this is write only and a lot of data is being written fairly continuosly and flushed, the performance of TLOGS become super important for db performance
Btrees -> loosly refers to B+ trees. A data structure that allows fast inserts and finds (for a specific id) and also sequential ordered access.
Blocks -> refers to the the contents of one node in the btree. A block in b+ tree contains the keys for guiding the search process. And number size of the block has is related to how tall the tree is going to be.It has nothing to do with the disk blocks(sectors)
Pages -> Refers to a smallest chunk of information db can work with. Ie. do locks and persistence. Data in a db are kept in a series of pages that are mapped to the disk(sectors)
A Btree will hold links to these data pages. Multiple pages can be resident on a single disk block(not the same block as above)
Multi-page blocks -> A block that contains multiple pages and can be fit into a single read/write operations. In LSM tree a block contains as many entires as can fit a disk block.
Leaf nodes - Typically the pointers to the records on disk?
Buffer(pool) -> A collection of data pages that have been loaded into the main memory. Typically managed by buffer manager. All operation on pages happen in memory(ie. search on the recordid/match record fields.), so if the page requested is not in memory(buffer) then it is loaded from the disk into the main memory, some exisitng pages may have to be evicted.
Records -> Each row of data stored on the disk. One page typically has multiple records stored sequentially(clustered)
Row id (RID) -> Record id pointer to the record on the disk.(

Merge process(also called compaction in some papers) -> On reaching memory thresholds, the tree is merged downstream tree, similar to merge sort where 2 pointers are used to track the current smallest( remember that indexes need to be kept in sorted order.)
Things to think about while reading the paper


What are the big ideas?
How does this data structure work for concurrent Access?. Most databases use locking for acheiving Isolation and Atomicity. Does the locking work better on BTrees or LSM trees?
What about durability? Is it easier to checkpoint and write to disk with btrees?
What sort of in memory structure would be better for fast insertions and scans?
Why are the new databases(KV) now using LSM?

Things that may make sense?


Can/Should you have multiple memtables. Does it improve performance?


## lsm-tree-presentation.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              lsm-tree-presentation.pdf
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## lsmtree-my-notes.md

      
    Raw
  

              lsmtree-my-notes.md
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## lsmtree.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              lsmtree.pdf
            
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.