Skip to content

Instantly share code, notes, and snippets.

@christineyen
Created November 28, 2012 00:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save christineyen/4158158 to your computer and use it in GitHub Desktop.
Save christineyen/4158158 to your computer and use it in GitHub Desktop.
cassandra benchmarking

How should I think of the efficiency of a single operation? If there is one row that we are incrementing 10 counters in, will that cost about as much as incrementing a single counter in that row, or more like 10x?

(Note: Updating the same row n times, without batching, produces the same numbers as any other n operations without batching.)

$ ./benchmark.rb 1000
user system total real
# Inserting 1000 rows normally
insertion 0.290000 0.030000 0.320000 ( 0.396852)
# Updating 1000 rows normally
updating 0.280000 0.020000 0.300000 ( 0.381354)
# Updating 100 rows, adding 10 columns each
insertion/10 0.280000 0.020000 0.300000 ( 0.410743)
# Updating 100 rows, updating 10 columns each
updating/10 0.270000 0.020000 0.290000 ( 0.378530)
# Updating 100 rows for comparison - not batched
update/10, not batched 0.030000 0.010000 0.040000 ( 0.034458)
# Updating 100 rows, batched by row, adding 10 columns each
batch insertion/10 0.110000 0.000000 0.110000 ( 0.141858)
# Updating 100 rows, batched by row, updating 10 columns each
batch update/10 0.110000 0.000000 0.110000 ( 0.133757)
######################################################################
$ ./benchmark.rb 5000
user system total real
# Inserting 5000 rows normally
insertion 1.390000 0.120000 1.510000 ( 1.955963)
# Updating 5000 rows normally
updating 1.370000 0.110000 1.480000 ( 1.845302)
# Updating 500 rows, adding 10 columns each
insertion/10 1.370000 0.110000 1.480000 ( 1.826632)
# Updating 500 rows, updating 10 columns each
updating/10 1.380000 0.110000 1.490000 ( 1.852316)
# Updating 500 rows for comparison - not batched
update/10, not batched 0.140000 0.020000 0.160000 ( 0.189078)
# Updating 500 rows, batched by row, adding 10 columns each
batch insertion/10 0.540000 0.010000 0.550000 ( 0.647655)
# Updating 500 rows, batched by row, updating 10 columns each
batch update/10 0.540000 0.010000 0.550000 ( 0.642064)
######################################################################
$ ./benchmark.rb 10000
user system total real
# Inserting 10000 rows normally
insertion 2.810000 0.240000 3.050000 ( 3.846309)
# Updating 10000 rows normally
updating 2.750000 0.230000 2.980000 ( 3.712084)
# Updating 1000 rows, adding 10 columns each
insertion/10 2.770000 0.220000 2.990000 ( 3.699602)
# Updating 1000 rows, updating 10 columns each
updating/10 2.760000 0.230000 2.990000 ( 3.715243)
# Updating 1000 rows for comparison - not batched
update/10, not batched 0.280000 0.020000 0.300000 ( 0.365043)
# Updating 1000 rows, batched by row, adding 10 columns each
batch insertion/10 1.090000 0.030000 1.120000 ( 1.304325)
# Updating 1000 rows, batched by row, updating 10 columns each
batch update/10 1.080000 0.030000 1.110000 ( 1.285272)
#!/usr/bin/env ruby
require 'cassandra/1.1'
require 'benchmark'
include Benchmark
iterations = ARGV.first.to_i
if iterations.zero?
puts 'Usage: ./benchmark.rb ITERATIONS'
exit 1
end
store = Cassandra.new('Parse')
cf_def = CassandraThrift::CfDef.new(:keyspace => 'Parse',
:name => 'Benchmarks',
:default_validation_class => 'CounterColumnType',
:comparator_type => 'UTF8Type',
:key_validation_class => 'UTF8Type')
store.add_column_family(cf_def)
Benchmark.bm(25) do |x|
puts "# Inserting #{ iterations } rows normally"
x.report('insertion') do
for i in 0...iterations
store.add(:Benchmarks, "key#{ i }", rand(5) + 1, 'count')
end
end
puts "\n# Updating #{ iterations } rows normally"
x.report ('updating') do
for i in 0...iterations
store.add(:Benchmarks, "key#{ i }", 1, 'count')
end
end
puts "\n# Updating #{ iterations / 10 } rows, adding 10 columns each"
x.report('insertion/10') do
for i in 0...iterations
row_key = i / 10
col_key = i % 10
store.add(:Benchmarks, "key#{ row_key }", rand(5) + 1, "count#{ col_key }")
end
end
puts "\n# Updating #{ iterations / 10 } rows, updating 10 columns each"
x.report('updating/10') do
for i in 0...iterations
row_key = i / 10
col_key = i % 10
store.add(:Benchmarks, "key#{ row_key }", 1, "count#{ col_key }")
end
end
puts "\n# Updating #{ iterations / 10 } rows for comparison - not batched"
x.report('update/10, not batched') do
for row in 0...(iterations/10)
store.add(:Benchmarks, "key#{ row }", 1, "count")
end
end
puts "\n# Updating #{ iterations / 10 } rows, batched by row, adding 10 columns each"
x.report('batch insertion/10') do
for row in 0...(iterations/10)
store.batch do
for col in 10...20
store.add(:Benchmarks, "key#{ row }", rand(5) + 1, "count#{ col }")
end
end
end
end
puts "\n# Updating #{ iterations / 10 } rows, batched by row, updating 10 columns each"
x.report('batch update/10') do
for row in 0...(iterations/10)
store.batch do
for col in 10...20
store.add(:Benchmarks, "key#{ row }", 1, "count#{ col }")
end
end
end
end
end
store.drop_column_family('Benchmarks')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment