As an exercise, I started to design and implement a database in some of my free time. As I started working through some of the details, there were a few things that I knew I wanted to work with and a few things I wanted to evaluate. Since I'm looking at more of a CP system, my mind immediately jumped to Raft. But which implemenation to use? And what storage mechanism? Since I had more familiarity with Hashicorps implemenation, I started there.
The first thing I wanted to do was consider the performance characteristics of the underlying data stores. One of the nice features of the hashicorp implementation is it allows callers to plugin in different stores for logs, stable, and snapshots. There's a whole slew of community implementations. In this test, I evaluated BadgerDB from dgraph.io, BoltDB, and pebble from CockroachDB. There was no community implementation for pebble, but the interfaces Hashicorp asks you to implement are rather straight forward.
Using Go and some benchmark tests, I was able to setup some quick tests to assess the underlying performance of Get
and Set
operations.
I executed the benchmarks two different ways.
First, I used the default go test configuration which runs each test for 1s and records and measures operations.
In the second run, I executed each operation for a fixed number of executions (1000x).
Here are my learnings...
-
BadgerDB had the lowest write latency, but at the cost of slightly higher read latency. Throughout the course of my testing, I noticed that Badger reads can take anywhere from 1.5 to 2x times longer than BoltDB or Pebble. But their writes are significantly faster (~ 375x).
-
When compared to BoltDB, Pebble had lower latencies in both the write and read cases making it a much more appealing option.
-
Finally, we had BoltDB.... oh, good ole boltdb. It's worth noting that the latest version of the
raft-boltdb
library usesboltdb/bolt
and not the mainained fork by etcd. I'd be curious to run through this again later on with a bbolt implementation to see if it affects the outcome of this or not.
Anyway, to me BadgerDB here seems like a no brainer. Sure, reads against the raft log take up to 2x longer, but writes comprise the majority of the operations. Having a fast embedded database would allow control planes backed by raft to process writes faster.
Other things I'm curious about:
- hashicorp/raft vs etcd/raft .... which is "better"?
- Longer running test evaluations
- One area of the implementations I was curious about but wasn't able to test was the GC loop for Badger. Since badger is backed by an LSM Tree, it occasionally needs to compact its records. I'm curious how / if this impacts the core functionality outside of raft.
Fast stable store writes probably matter less when you eventually want to implement multiraft. Hashicorp/raft is very beginner friendly, but Ectd/raft is used in projects like dragonboat, which is a multiraft package.
A single raft group provides high availability and fault tolerance, but does nothing by itself to distribute the work. Each node is doing the same work as the FSM gets advanced to the next state. If you have a single group that constantly grows, the bottleneck just ends up being something other than troughput. In KV stores, disk usage is a realistic bottle neck, for which mutliraft can be the tool used to fix.
Single group raft clusters like etcd and consul are very reliable metadata stores from which you can acquire locks to coordinate disperse systems. The pattern can be seen in systems built for Kubernetes and Nomad.
Hopefully some of this helps you out a bit while you're checking out raft packages. If you're aiming to build a scalable database, there is probably a significant amount of info I can share on that as well, if you like.