Skip to content

Instantly share code, notes, and snippets.

@mjpitz
Last active May 31, 2024 12:22
Show Gist options
  • Save mjpitz/875a1a951812068b112d4a8779841839 to your computer and use it in GitHub Desktop.
Save mjpitz/875a1a951812068b112d4a8779841839 to your computer and use it in GitHub Desktop.
comparison of embedded databases for raft

As an exercise, I started to design and implement a database in some of my free time. As I started working through some of the details, there were a few things that I knew I wanted to work with and a few things I wanted to evaluate. Since I'm looking at more of a CP system, my mind immediately jumped to Raft. But which implemenation to use? And what storage mechanism? Since I had more familiarity with Hashicorps implemenation, I started there.

The first thing I wanted to do was consider the performance characteristics of the underlying data stores. One of the nice features of the hashicorp implementation is it allows callers to plugin in different stores for logs, stable, and snapshots. There's a whole slew of community implementations. In this test, I evaluated BadgerDB from dgraph.io, BoltDB, and pebble from CockroachDB. There was no community implementation for pebble, but the interfaces Hashicorp asks you to implement are rather straight forward.

Using Go and some benchmark tests, I was able to setup some quick tests to assess the underlying performance of Get and Set operations. I executed the benchmarks two different ways. First, I used the default go test configuration which runs each test for 1s and records and measures operations. In the second run, I executed each operation for a fixed number of executions (1000x).

Here are my learnings...

  1. BadgerDB had the lowest write latency, but at the cost of slightly higher read latency. Throughout the course of my testing, I noticed that Badger reads can take anywhere from 1.5 to 2x times longer than BoltDB or Pebble. But their writes are significantly faster (~ 375x).

  2. When compared to BoltDB, Pebble had lower latencies in both the write and read cases making it a much more appealing option.

  3. Finally, we had BoltDB.... oh, good ole boltdb. It's worth noting that the latest version of the raft-boltdb library uses boltdb/bolt and not the mainained fork by etcd. I'd be curious to run through this again later on with a bbolt implementation to see if it affects the outcome of this or not.

Anyway, to me BadgerDB here seems like a no brainer. Sure, reads against the raft log take up to 2x longer, but writes comprise the majority of the operations. Having a fast embedded database would allow control planes backed by raft to process writes faster.

Other things I'm curious about:

  • hashicorp/raft vs etcd/raft .... which is "better"?
  • Longer running test evaluations
    • One area of the implementations I was curious about but wasn't able to test was the GC loop for Badger. Since badger is backed by an LSM Tree, it occasionally needs to compact its records. I'm curious how / if this impacts the core functionality outside of raft.
package raft_test
import (
"crypto/rand"
"os"
"testing"
raftbadger "github.com/BBVA/raft-badger"
"github.com/hashicorp/raft"
raftbolt "github.com/hashicorp/raft-boltdb"
"github.com/stretchr/testify/require"
raftpebble "github.com/mjpitz/store/internal/raft/raft-pebble"
)
func harness(b *testing.B, store raft.StableStore) {
b.Cleanup(func() {
_ = store.(raft.WithClose).Close()
_ = os.RemoveAll(b.Name())
})
keys := []string{
"01F6Z7V0Y932RNSEFY54NK0QVM",
"01F6Z7V0YD8964MS32QP86WJSN",
"01F6Z7V0YJPEJN1S72GX6PNK9V",
"01F6Z7V0YPY3XFGMJ5X3HV3BCG",
"01F6Z7V0YTK9JYW9EMNFJ2S037",
"01F6Z7V0YYXTCF8EH9MFPJ3WKW",
"01F6Z7V0Z231RZR3MJXGE592EY",
"01F6Z7V0Z51S2Q51G3E86JB2G3",
"01F6Z7V0Z9GZ8M0KQ5K20SW130",
"01F6Z7V0ZCR9A7HGMMZXT8S544",
"01F6Z7V0ZG4C4VVY2Q0C6HT6JK",
}
// 512 bit payload (64 bytes)
expectedValue := make([]byte, 1 << 8)
_, err := rand.Reader.Read(expectedValue)
require.NoError(b, err)
b.Run("Set", func(b *testing.B) {
for i := 0; i < b.N; i++ {
err = store.Set([]byte(keys[i % len(keys)]), expectedValue)
require.NoError(b, err)
}
})
b.Run("Get", func(b *testing.B) {
for i := 0; i < b.N; i++ {
value, err := store.Get([]byte(keys[i % len(keys)]))
require.NoError(b, err)
require.Equal(b, expectedValue, value)
}
})
}
func BenchmarkBadgerStore(b *testing.B) {
store, err := raftbadger.New(raftbadger.Options{
Path: b.Name(),
})
require.NoError(b, err)
harness(b, store)
}
func BenchmarkBoltStore(b *testing.B) {
store, err := raftbolt.New(raftbolt.Options{
Path: b.Name(),
})
require.NoError(b, err)
harness(b, store)
}
func BenchmarkPebbleStore(b *testing.B) {
store, err := raftpebble.New(raftpebble.Options{
Path: b.Name(),
})
require.NoError(b, err)
harness(b, store)
}
➜ store git:(main) ✗ go test -bench=. -benchtime=1000x ./...
badger 2021/05/31 10:40:56 INFO: All 0 tables opened in 0s
badger 2021/05/31 10:40:57 INFO: Discard stats nextEmptySlot: 0
badger 2021/05/31 10:40:57 INFO: Set nextTxnTs to 0
goos: darwin
goarch: amd64
pkg: github.com/mjpitz/store/internal/raft
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkBadgerStore/Set-16 1000 104887 ns/op
BenchmarkBadgerStore/Get-16 1000 3826 ns/op
BenchmarkBoltStore/Set-16 1000 39334328 ns/op
BenchmarkBoltStore/Get-16 1000 3346 ns/op
BenchmarkPebbleStore/Set-16 1000 19034666 ns/op
BenchmarkPebbleStore/Get-16 1000 2696 ns/op
PASS
ok github.com/mjpitz/store/internal/raft 59.066s
? github.com/mjpitz/store/internal/raft/raft-pebble [no test files]
➜ store git:(main) ✗ go test -bench=. ./...
badger 2021/05/31 10:27:48 INFO: All 0 tables opened in 0s
badger 2021/05/31 10:27:48 INFO: Discard stats nextEmptySlot: 0
badger 2021/05/31 10:27:48 INFO: Set nextTxnTs to 0
goos: darwin
goarch: amd64
pkg: github.com/mjpitz/store/internal/raft
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkBadgerStore/Set-16 9079 353335 ns/op
BenchmarkBadgerStore/Get-16 400754 2728 ns/op
BenchmarkBoltStore/Set-16 30 39137143 ns/op
BenchmarkBoltStore/Get-16 747498 1582 ns/op
BenchmarkPebbleStore/Set-16 60 18795033 ns/op
BenchmarkPebbleStore/Get-16 862392 1443 ns/op
PASS
ok github.com/mjpitz/store/internal/raft 11.389s
? github.com/mjpitz/store/internal/raft/raft-pebble [no test files]
@protosam
Copy link

Fast stable store writes probably matter less when you eventually want to implement multiraft. Hashicorp/raft is very beginner friendly, but Ectd/raft is used in projects like dragonboat, which is a multiraft package.

A single raft group provides high availability and fault tolerance, but does nothing by itself to distribute the work. Each node is doing the same work as the FSM gets advanced to the next state. If you have a single group that constantly grows, the bottleneck just ends up being something other than troughput. In KV stores, disk usage is a realistic bottle neck, for which mutliraft can be the tool used to fix.

Single group raft clusters like etcd and consul are very reliable metadata stores from which you can acquire locks to coordinate disperse systems. The pattern can be seen in systems built for Kubernetes and Nomad.

Hopefully some of this helps you out a bit while you're checking out raft packages. If you're aiming to build a scalable database, there is probably a significant amount of info I can share on that as well, if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment