aphyr/gist:0ad3458a1478db97517e7ac2faf2da00

## gistfile1.txt
Hi S-------. I'm not an expert in performance benchmarking--I focus on
correctness and safety--but I have a few pieces of advice here.

0. Pick multiple workloads that cover the gamut of behaviors in each DB.
*Don't* choose a completely sharded workload for VoltDB. Don't choose a
purely commutative workload for Cassandra. Cassandra's Paxos
implementation is slow and a good benchmark will demonstrate
that--however, it *doesn't* (I think?) require a global coordinator,
which means it might *scale* better than a single-coordinator system
like, say, VoltDB. Talk about those differences!

1. Have someone independent run the test. Everyone assumes vendor
benchmarks are bullshit--not only because the company is biased to
select workloads which paint them in a good light, but also because when
they know the tuning parameters required to adapt their own product to
that specific workload--and in performance-land, tuning is king. Pick a
neutral party with a track record of running independent tests. I...
honestly don't know anyone who does this, but they've gotta be out
there.

2. Get experts from each vendor to tune the test and OS for their
particular DB.

3. Common-denominator tests are helpful, but also keep in mind that the
safety properties and APIs of the DBs will change the shape of queries
dramatically. If it takes 5 queries to do something atomically in
Cassandra, and 1 to do it atomically in Volt, talk about those
differences.

4. Report concurrency, throughput, goodput, *and* latency distributions.
Keep latencies reasonable if you're talking about an online benchmark.
Throughput of 100khz is meaningless if it takes 10 seconds to get an
answer to a query a user's waiting for.

5. Benchmarks should take multiple days to run, and should operate on
realistically sized data sets. Lots of storage engines have significant
inflection points at medium to large data volumes. LSM trees often start
real fast but drop off after several days of writing.

6. Use real hardware. It's great to test on cloud stuff too, but real
hardware is gonna make it easier to tell when you've, say, written
enough data to force the SSDs to start reclaiming sectors. TRIM and
rebuild disks between benchmarks. Test both hot and cold. The usual.

7. Okay gosh I have a lot of opinions I should stop here but have fun
and good luck!

--Kyle
	Hi S-------. I'm not an expert in performance benchmarking--I focus on
	correctness and safety--but I have a few pieces of advice here.

	0. Pick multiple workloads that cover the gamut of behaviors in each DB.
	Don't choose a completely sharded workload for VoltDB. Don't choose a
	purely commutative workload for Cassandra. Cassandra's Paxos
	implementation is slow and a good benchmark will demonstrate
	that--however, it doesn't (I think?) require a global coordinator,
	which means it might scale better than a single-coordinator system
	like, say, VoltDB. Talk about those differences!

	1. Have someone independent run the test. Everyone assumes vendor
	benchmarks are bullshit--not only because the company is biased to
	select workloads which paint them in a good light, but also because when
	they know the tuning parameters required to adapt their own product to
	that specific workload--and in performance-land, tuning is king. Pick a
	neutral party with a track record of running independent tests. I...
	honestly don't know anyone who does this, but they've gotta be out
	there.

	2. Get experts from each vendor to tune the test and OS for their
	particular DB.

	3. Common-denominator tests are helpful, but also keep in mind that the
	safety properties and APIs of the DBs will change the shape of queries
	dramatically. If it takes 5 queries to do something atomically in
	Cassandra, and 1 to do it atomically in Volt, talk about those
	differences.

	4. Report concurrency, throughput, goodput, and latency distributions.
	Keep latencies reasonable if you're talking about an online benchmark.
	Throughput of 100khz is meaningless if it takes 10 seconds to get an
	answer to a query a user's waiting for.

	5. Benchmarks should take multiple days to run, and should operate on
	realistically sized data sets. Lots of storage engines have significant
	inflection points at medium to large data volumes. LSM trees often start
	real fast but drop off after several days of writing.

	6. Use real hardware. It's great to test on cloud stuff too, but real
	hardware is gonna make it easier to tell when you've, say, written
	enough data to force the SSDs to start reclaiming sectors. TRIM and
	rebuild disks between benchmarks. Test both hot and cold. The usual.

	7. Okay gosh I have a lot of opinions I should stop here but have fun
	and good luck!

	--Kyle