Skip to content

Instantly share code, notes, and snippets.

@rjl493456442
Created June 5, 2020 09:25
Show Gist options
  • Save rjl493456442/f7f2ef2a8ef9e5a84150c9e2e5d5513d to your computer and use it in GitHub Desktop.
Save rjl493456442/f7f2ef2a8ef9e5a84150c9e2e5d5513d to your computer and use it in GitHub Desktop.

Leveldb overview

Leveldb will migrate the data from low level to high level. The deeper level has more data.

For example, the size limit of level1 is 100MB(by default), level2 is 1GB, level3 is 10GB...).

If the leveldb is full, then 90% data is in the deepest level.

But usually it's not full. E.g. Level0 is 100MB, level1 is 1GB, level2 is 5GB. But we can still hold the conclusion that most of the data is in the deepest level.

If we want to retrieve entries from the database, we need first visit memory db, then visit level0->level_deepest.

Real benchmark case

Level File number Level size(B)
0 0 0
1 2 258367(~0.25MB)
2 529 904000266 (~900MB)
3 2834 5717206863 (~5.7GB)

We can see 86% data is in the level_3(deepest).

When we run the read benchmark, we can see the total touched file numer is 36293635. There are 26842700 read operations, so for each read, it will touch 1.35 files.

And also I count the touched file in each level.

Level File number File touched Data hit
0 0 0 0
1 2 27492 1047
2 529 13088646 3664156
3 2834 23177497 23177497

For data in level3, we need to first find in level0, then level2(9424490), then finally in level3. So the average touched file number for level3 is: 1 + 9424490/23177497 = 1.4

Concurrent read

If most the data is in the deepest level, why not read all(necessary) level files at the same time!

For read overhead, since most of the data in the deepest level, there is no extra overhead.

Yes for low level data, we have unnecessary read, but the ratio is very small, so it's ok

The benefits is we can read concurrently. Image if we need to touch 4 files for one entry, at we can cut the read cost to 1/4.

What about cache

Level File number Aveage touch times
0 0 0
1 2 13746
2 529 24742
3 2834 8178

We can see that lower level files have higher chance to be touched(level1 only has 2 files, we can ignore). We can think about how to increase the cache hit rate with these characteristics.

Level Data cache hit rate Meta cache hit rate Average touch times
0 0 0 0
1 5.7% 85% 13746
2 0.96% 95% 24742
3 0.24% 71% 8178

We can see that if the file is touched frequently, it has the higher meta cache hit rate(LRU cache).

For data cache? No meaningful conclusion get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment