Leveldb will migrate the data from low level to high level. The deeper level has more data.
For example, the size limit of level1 is 100MB(by default), level2 is 1GB, level3 is 10GB...).
If the leveldb is full, then 90% data is in the deepest level.
But usually it's not full. E.g. Level0 is 100MB, level1 is 1GB, level2 is 5GB. But we can still hold the conclusion that most of the data is in the deepest level.
If we want to retrieve entries from the database, we need first visit memory db, then visit level0->level_deepest
.
Level | File number | Level size(B) |
---|---|---|
0 | 0 | 0 |
1 | 2 | 258367(~0.25MB) |
2 | 529 | 904000266 (~900MB) |
3 | 2834 | 5717206863 (~5.7GB) |
We can see 86%
data is in the level_3(deepest).
When we run the read benchmark, we can see the total touched file numer is 36293635
. There are 26842700
read operations, so for each read, it will touch 1.35
files.
And also I count the touched file in each level.
Level | File number | File touched | Data hit |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 2 | 27492 | 1047 |
2 | 529 | 13088646 | 3664156 |
3 | 2834 | 23177497 | 23177497 |
For data in level3, we need to first find in level0, then level2(9424490
), then finally in level3. So the average touched file number for level3 is: 1 + 9424490
/23177497
= 1.4
If most the data is in the deepest level, why not read all(necessary) level files at the same time!
For read overhead, since most of the data in the deepest level, there is no extra overhead.
Yes for low level data, we have unnecessary read, but the ratio is very small, so it's ok
The benefits is we can read concurrently. Image if we need to touch 4 files for one entry, at we can cut the read cost to 1/4.
Level | File number | Aveage touch times |
---|---|---|
0 | 0 | 0 |
1 | 2 | 13746 |
2 | 529 | 24742 |
3 | 2834 | 8178 |
We can see that lower level files have higher chance to be touched(level1 only has 2 files, we can ignore). We can think about how to increase the cache hit rate with these characteristics.
Level | Data cache hit rate | Meta cache hit rate | Average touch times |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 5.7% | 85% | 13746 |
2 | 0.96% | 95% | 24742 |
3 | 0.24% | 71% | 8178 |
We can see that if the file is touched frequently, it has the higher meta cache hit rate(LRU cache).
For data cache
? No meaningful conclusion get.