Skip to content

Instantly share code, notes, and snippets.

@aphyr
Created October 5, 2017 19:13
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aphyr/0ad3458a1478db97517e7ac2faf2da00 to your computer and use it in GitHub Desktop.
Save aphyr/0ad3458a1478db97517e7ac2faf2da00 to your computer and use it in GitHub Desktop.
Advice on benchmarking databases
Hi S-------. I'm not an expert in performance benchmarking--I focus on
correctness and safety--but I have a few pieces of advice here.
0. Pick multiple workloads that cover the gamut of behaviors in each DB.
*Don't* choose a completely sharded workload for VoltDB. Don't choose a
purely commutative workload for Cassandra. Cassandra's Paxos
implementation is slow and a good benchmark will demonstrate
that--however, it *doesn't* (I think?) require a global coordinator,
which means it might *scale* better than a single-coordinator system
like, say, VoltDB. Talk about those differences!
1. Have someone independent run the test. Everyone assumes vendor
benchmarks are bullshit--not only because the company is biased to
select workloads which paint them in a good light, but also because when
they know the tuning parameters required to adapt their own product to
that specific workload--and in performance-land, tuning is king. Pick a
neutral party with a track record of running independent tests. I...
honestly don't know anyone who does this, but they've gotta be out
there.
2. Get experts from each vendor to tune the test and OS for their
particular DB.
3. Common-denominator tests are helpful, but also keep in mind that the
safety properties and APIs of the DBs will change the shape of queries
dramatically. If it takes 5 queries to do something atomically in
Cassandra, and 1 to do it atomically in Volt, talk about those
differences.
4. Report concurrency, throughput, goodput, *and* latency distributions.
Keep latencies reasonable if you're talking about an online benchmark.
Throughput of 100khz is meaningless if it takes 10 seconds to get an
answer to a query a user's waiting for.
5. Benchmarks should take multiple days to run, and should operate on
realistically sized data sets. Lots of storage engines have significant
inflection points at medium to large data volumes. LSM trees often start
real fast but drop off after several days of writing.
6. Use real hardware. It's great to test on cloud stuff too, but real
hardware is gonna make it easier to tell when you've, say, written
enough data to force the SSDs to start reclaiming sectors. TRIM and
rebuild disks between benchmarks. Test both hot and cold. The usual.
7. Okay gosh I have a lot of opinions I should stop here but have fun
and good luck!
--Kyle
@jhugg
Copy link

jhugg commented Oct 5, 2017

Good advice. I would add that benchmarks should be reproducable if report/blog readers are willing to spend time and money on it. I'm sure it almost never happens, but having access to all config (db, OS, cloud, etc) and the actual workload and data make me trust a benchmark more.

I'm also skeptical there's a truly independent party out there... Jepsen's ethics policy is pretty unique in the consulting space. I've worked with some that are better than others though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment