Instantly share code, notes, and snippets.

Embed
What would you like to do?
Hash vs Array vs Set

For simple duplicate checking, which is faster, a Ruby hash, array, or set?

For arrays, we try two approaches, when inserting data either avoid duplicates, or just add them to the list.

This is most likely flawed, and should probably be taken with a grain of salt.

ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]
ruby set_hash_array_bm.rb
Generating Sample Data For Writes
Generating Sample Data For Reads
Rehearsal -----------------------------------------------------
Hash Insert 0.010000 0.000000 0.010000 ( 0.006304)
Hash Read 0.000000 0.000000 0.000000 ( 0.003243)
Uniq Array Insert 2.420000 0.010000 2.430000 ( 2.418208)
Uniq Array Read 3.710000 0.000000 3.710000 ( 3.718720)
Array Insert 0.000000 0.000000 0.000000 ( 0.000787)
Array Read 0.000000 0.000000 0.000000 ( 0.000755)
Set Insert 0.010000 0.000000 0.010000 ( 0.003888)
Set Read 0.000000 0.000000 0.000000 ( 0.003065)
-------------------------------------------- total: 6.160000sec
user system total real
Hash Insert 0.010000 0.000000 0.010000 ( 0.002398)
Hash Read 0.000000 0.000000 0.000000 ( 0.002536)
Uniq Array Insert 2.450000 0.000000 2.450000 ( 2.452248)
Uniq Array Read 6.550000 0.000000 6.550000 ( 6.552133)
Array Insert 0.000000 0.010000 0.010000 ( 0.001377)
Array Read 0.000000 0.000000 0.000000 ( 0.000986)
Set Insert 0.000000 0.000000 0.000000 ( 0.003936)
Set Read 0.000000 0.000000 0.000000 ( 0.003019)
require 'benchmark'
require 'securerandom'
require 'set'
count = 10000
dup_count = (count * 0.10).to_i
samples = []
reads = []
puts "Generating Sample Data For Writes"
(count - dup_count).times{ samples << SecureRandom.base64 }
dup_count.times{ samples << samples.sample }
samples.shuffle!
puts "Generating Sample Data For Reads"
# Data to be read, some in the samples, some not
(count/2).to_i.times{ reads << samples.sample }
(count/2).to_i.times{ reads << SecureRandom.base64 }
reads.shuffle!
Benchmark.bmbm do |x|
hash = {}
uniq_array = []
array = []
set = Set.new
x.report("Hash Insert") do
samples.each{|k| hash[k] = true }
end
x.report("Hash Read") do
reads.each{|k| hash.has_key?(k) }
end
x.report("Uniq Array Insert") do
samples.each{|k| uniq_array << k unless uniq_array.include?(k) }
end
x.report("Uniq Array Read") do
reads.each{|k| uniq_array.include?(k) }
end
x.report("Array Insert") do
samples.each{|k| uniq_array << k }
end
x.report("Array Read") do
reads.each{|k| array.include?(k) }
end
x.report("Set Insert") do
samples.each{|k| set << k }
end
x.report("Set Read") do
reads.each{|k| set.include?(k) }
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment