rjl493456442/Leveldb_thoughts_3.md

## Leveldb_thoughts_3.md

      
    Raw
  

              Leveldb_thoughts_3.md
            
          
    Leveldb overview

Leveldb will migrate the data from low level to high level. The deeper level has more data.
For example, the size limit of level1 is 100MB(by default), level2 is 1GB, level3 is 10GB...).
If the leveldb is full, then 90% data is in the deepest level.

But usually it's not full. E.g. Level0 is 100MB, level1 is 1GB, level2 is 5GB. But we can still hold the conclusion that most of the data is in the deepest level.

If we want to retrieve entries from the database, we need first visit memory db, then visit level0->level_deepest.
Real benchmark case


Level
File number
Level size(B)


0
0
0


1
2
258367(~0.25MB)


2
529
904000266 (~900MB)


3
2834
5717206863 (~5.7GB)


We can see 86% data is in the level_3(deepest).
When we run the read benchmark, we can see the total touched file numer is 36293635. There are 26842700 read operations, so for each read, it will touch 1.35 files.
And also I count the touched file in each level.


Level
File number
File touched
Data hit


0
0
0
0


1
2
27492
1047


2
529
13088646
3664156


3
2834
23177497
23177497


For data in level3, we need to first find in level0, then level2(9424490), then finally in level3. So the average touched file number for level3 is: 1 + 9424490/23177497 = 1.4
Concurrent read

If most the data is in the deepest level, why not read all(necessary) level files at the same time!
For read overhead, since most of the data in the deepest level, there is no extra overhead.

Yes for low level data, we have unnecessary read, but the ratio is very small, so it's ok

The benefits is we can read concurrently. Image if we need to touch 4 files for one entry, at we can cut the read cost to 1/4.
What about cache


Level
File number
Aveage touch times


0
0
0


1
2
13746


2
529
24742


3
2834
8178


We can see that lower level files have higher chance to be touched(level1 only has 2 files, we can ignore). We can think about how to increase the cache hit rate with these characteristics.


Level
Data cache hit rate
Meta cache hit rate
Average touch times


0
0
0
0


1
5.7%
85%
13746


2
0.96%
95%
24742


3
0.24%
71%
8178


We can see that if the file is touched frequently, it has the higher meta cache hit rate(LRU cache).
For data cache? No meaningful conclusion get.
Level	File number	Level size(B)
0	0	0
1	2	258367`(~0.25MB)`
2	529	904000266 `(~900MB)`
3	2834	5717206863 `(~5.7GB)`