Skip to content

Instantly share code, notes, and snippets.

@stephan-nordnes-eriksen
Last active February 21, 2024 03:22
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save stephan-nordnes-eriksen/6c9c56f63f36d5d100b2 to your computer and use it in GitHub Desktop.
Save stephan-nordnes-eriksen/6c9c56f63f36d5d100b2 to your computer and use it in GitHub Desktop.
Performance testing different Key-Value stores in Ruby

For a project I am on I need to use a key-value store to converts file-paths to fixnum IDs. The dataset will typically be in the range of 100 000 to 1 000 000. These tests use 305 000 file paths to fixnum IDs.

The Different Key-Value stores tested are:

Daybreak: "Daybreak is a simple and very fast key value store for ruby" GDBM: GNU dbm. "a simple database engine for storing key-value pairs on disk." DBM: "The DBM class provides a wrapper to a Unix-style dbm or Database Manager library" PStore: "PStore implements a file based persistence mechanism based on a Hash. "

Out of these, all except Daybreak are in the Ruby standard library.

This test was run 3 times on my SSD based Macbook Air 2011 with 4GB ram.

##Test code:

    #benchmarking different DB systems for load of 305_000 file paths.
    
    require "benchmark"
    require "fileutils"
    require "daybreak"
    require "PStore"
    require "gdbm"
    require "dbm"
    
    main_path = File.join(Dir.pwd, "test_file")
    testdatat = (1..305_000).map{|e| ["#{main_path}#{e}.pdf", e]}
    
    def delete_files
    	begin
    		FileUtils.rm("testDaybreak.db") if File.file?("testDaybreak.db")
    		FileUtils.rm("testGDBM.db")     if File.file?("testGDBM.db")
    		FileUtils.rm("testDBM.db")      if File.file?("testDBM.db")
    		FileUtils.rm("testPStore.db")   if File.file?("testPStore.db")
    	rescue Exception => e
    		puts "Error when deleting: "
    		puts e.message
    		puts e.backtrace.inspect
    	end
    end
    
    delete_files
    
    
    
    class DaybreakWrapper
    	@store = nil
    	def initialize
    		@store = Daybreak::DB.new("testDaybreak.db")
    	end
    	
    	def []=(key,val)
    		@store[key] = val
    	end
    	
    	def [](key)
    		@store[key]
    	end
    	def values
    		@store.keys.map{|k| @store[k]}
    	end
    	def keys
    		@store.keys
    	end
    	
    	def delete(key)
    		@store.delete(key)
    	end
    	
    	def stop
    		@store.close unless @store.closed?
    	end
    	
    	def destroy
    		stop
    		FileUtils.rm("testDaybreak.db")
    	end
    
    	def sync_lock
    	end
    end
    class GDBMWrapper
    	@store = nil
    	def initialize
    		@store = GDBM.new("testGDBM.db")
    	end
    	
    	def []=(key,val)
    		@store[Marshal.dump(key)] = Marshal.dump(val)
    	end
    	
    	def [](key)
    		Marshal.load(@store[Marshal.dump(key)])
    	end
    	def values
    		@store.values
    	end
    	def keys
    		@store.keys.map{|e| Marshal.load(e)}
    	end
    	
    	def delete(key)
    		@store.delete(Marshal.dump(key))
    	end
    	
    	def stop
    		@store.close unless @store.closed?
    	end
    	
    	def destroy
    		stop
    		FileUtils.rm("testGDBM.db")
    	end
    
    	def sync_lock
    	end
    end
    
    class DBMWrapper
    	@store = nil
    	def initialize
    		# @store = DBM.open("testDBM", 666, DBM::WRCREAT)
    		@store = DBM.new("testDBM")
    	end
    	
    	def []=(key,val)
    		@store[key] = val
    	end
    	
    	def [](key)
    		@store[key]
    	end
    	def values
    		@store.values
    	end
    	def keys
    		@store.keys
    	end
    	
    	def delete(key)
    		@store.delete(key)
    	end
    	
    	def stop
    		@store.close unless @store.closed?
    	end
    	
    	def destroy
    		stop
    		FileUtils.rm("testDBM.db")
    	end
    
    	def sync_lock
    	end
    end
    
    class PStoreWrapper
    	@store = nil
    	def initialize
    		@store = PStore.new("testPStore.db")
    	end
    	
    	def []=(key,val)
    		transaction do
    			return @store[key] = val
    		end
    	end
    	
    	def [](key)
    		transaction do
    			return @store[key]
    		end
    	end
    	def values
    		transaction do
    			return @store.roots.map{|e| @store[e]}
    		end
    	end
    	def keys
    		transaction do
    			return @store.roots
    		end
    	end
    	
    	def delete(key)
    		transaction do
    			return @store.delete(key)
    		end
    	end
    	
    	def stop
    		# transaction do
    		# 	@store.commit
    		# end
    	end
    	
    	def destroy
    		# transaction do
    		# 	@store.destroy
    		# end
    		FileUtils.rm("testPStore.db")
    	end
    
    	def sync_lock
    		@store.transaction do
    			yield
    		end
    	end
    
    	# Public: Creates a transaction. Nested transactions are allowed.
    	#
    	# Returns nothing.
    	def transaction
    		unless @in_transaction
    			@in_transaction = true
    			sync_lock do
    				yield
    			end
    			@in_transaction = false
    		else
    			yield
    		end
    	end
    end
    
    
    class HashWrapper
    	@@superhash = {}
    	def initialize
    		#@@superhash = {} unless @@superhash
    	end
    	def []=(key,val)
    		@@superhash[key] = val
    	end
    	def [](key)
    		@@superhash[key]
    	end
    	def values
    		@@superhash.values
    	end
    	def keys
    		@@superhash.keys
    	end
    
    	def stop
    	end
    end
    
    # require "pry-byebug"
    n = 50000
    Benchmark.bm(7) do |x|
    	x.report("daybreak insert:") { db = DaybreakWrapper.new(); testdatat.each{|v| db[v[0]] = v[1]}  ; db.stop}
    	x.report("gdbm     insert:") { db = GDBMWrapper.new()    ; testdatat.each{|v| db[v[0]] = v[1]}  ; db.stop}
    	x.report("dbm      insert:") { db = DBMWrapper.new()     ; testdatat.each{|v| db[v[0]] = v[1]}  ; db.stop}
    	x.report("PStore   insert:") { db = PStoreWrapper.new()  ; db.transaction do ; testdatat.each{|v| db[v[0]] = v[1]} end ; db.stop}
    	x.report("hash     insert:") { db = HashWrapper.new()    ; testdatat.each{|v| db[v[0]] = v[1]} }
    
    	x.report("daybreak read:  ") { db = DaybreakWrapper.new(); n.times do ; db[testdatat.sample[0]] end ; db.stop}
    	x.report("gdbm     read:  ") { db = GDBMWrapper.new()    ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
    	x.report("dbm      read:  ") { db = DBMWrapper.new()     ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
    	x.report("PStore   read:  ") { db = PStoreWrapper.new()  ; db.transaction do ; n.times do ; db[testdatat.sample[0]] end end; db.stop}
    	x.report("hash     read:  ") { db = HashWrapper.new()    ; n.times do ; db[testdatat.sample[0]] end ; db.stop}
    
    	x.report("daybreak keys:  ") { db = DaybreakWrapper.new(); raise "Key error in daybreak" unless  db.keys.count == 305_000 ; db.stop}
    	x.report("gdbm     keys:  ") { db = GDBMWrapper.new()    ; raise "Key error in gdbm"     unless  db.keys.count == 305_000 ; db.stop}
    	x.report("dbm      keys:  ") { db = DBMWrapper.new()     ; raise "Key error in dbm"      unless  db.keys.count == 305_000 ; db.stop}
    	x.report("PStore   keys:  ") { db = PStoreWrapper.new()  ; raise "Key error in PStore"   unless  db.keys.count == 305_000 ; db.stop}
    	x.report("hash     keys:  ") { db = HashWrapper.new()    ; raise "Key error in hash"     unless  db.keys.count == 305_000 ; db.stop}
    
    	x.report("daybreak values:") { db = DaybreakWrapper.new(); raise "Value error in daybreak" unless  db.values.count == 305_000 ; db.stop}
    	x.report("gdbm     values:") { db = GDBMWrapper.new()    ; raise "Value error in gdbm"     unless  db.values.count == 305_000 ; db.stop}
    	x.report("dbm      values:") { db = DBMWrapper.new()     ; raise "Value error in dbm"      unless  db.values.count == 305_000 ; db.stop}
    	x.report("PStore   values:") { db = PStoreWrapper.new()  ; raise "Value error in PStore"   unless  db.values.count == 305_000 ; db.stop}
    	x.report("hash     values:") { db = HashWrapper.new()    ; raise "Value error in hash"     unless  db.values.count == 305_000 ; db.stop}
    end
    
    def format_mb(size)
      conv = [ 'b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb' ];
      scale = 1024;
    
      ndx=1
      if( size < 2*(scale**ndx)  ) then
        return "#{(size)} #{conv[ndx-1]}"
      end
      size=size.to_f
      [2,3,4,5,6,7].each do |ndx|
        if( size < 2*(scale**ndx)  ) then
          return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
        end
      end
      ndx=7
      return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
    end
    
    puts "daybreak file size: #{format_mb(File.size("testDaybreak.db"))}" if File.file?("testDaybreak.db")
    puts "gdbm     file size: #{format_mb(File.size("testGDBM.db"))}"     if File.file?("testGDBM.db")
    puts "dbm      file size: #{format_mb(File.size("testDBM.db"))}"      if File.file?("testDBM.db")
    puts "PStore   file size: #{format_mb(File.size("testPStore.db"))}"   if File.file?("testPStore.db")
    
    delete_files
    
    puts "-------------------------------------------------------------"

##Results

              user     system      total        real
daybreak insert:  6.630000   2.940000   9.570000 ( 10.290654)
gdbm     insert:  3.510000   1.770000   5.280000 (  5.904929)
dbm      insert:  1.010000   1.560000   2.570000 (  5.002721)
PStore   insert:  1.190000   0.080000   1.270000 (  5.790313)
hash     insert:  0.330000   0.010000   0.340000 (  0.348286)
daybreak read:    3.250000   0.100000   3.350000 (  3.688214)
gdbm     read:    0.450000   0.170000   0.620000 (  1.727852)
dbm      read:    0.210000   0.330000   0.540000 (  1.593883)
PStore   read:    2.080000   0.080000   2.160000 (  2.439700)
hash     read:    0.070000   0.000000   0.070000 (  0.072445)
daybreak keys:    3.010000   0.110000   3.120000 (  3.230572)
gdbm     keys:    1.240000   0.250000   1.490000 (  2.769978)
dbm      keys:    0.190000   0.270000   0.460000 (  1.646055)
PStore   keys:    1.710000   0.120000   1.830000 (  2.078772)
hash     keys:    0.010000   0.000000   0.010000 (  0.003377)
daybreak values:  3.320000   0.050000   3.370000 (  3.543207)
gdbm     values:  0.720000   0.180000   0.900000 (  2.084107)
dbm      values:  0.400000   0.180000   0.580000 (  1.380364)
PStore   values:  1.150000   0.030000   1.180000 (  1.194967)
hash     values:  0.000000   0.000000   0.000000 (  0.003910)
daybreak file size: 28.627 mb
gdbm     file size: 45.885 mb
dbm      file size: 41.754 mb
PStore   file size: 25.137 mb
-------------------------------------------------------------

              user     system      total        real
daybreak insert:  6.900000   3.070000   9.970000 ( 10.570726)
gdbm     insert:  3.760000   1.970000   5.730000 (  6.984838)
dbm      insert:  0.990000   1.580000   2.570000 (  2.864508)
PStore   insert:  1.200000   0.100000   1.300000 (  1.452362)
hash     insert:  0.320000   0.000000   0.320000 (  0.327974)
daybreak read:    3.250000   0.130000   3.380000 (  3.683599)
gdbm     read:    0.470000   0.180000   0.650000 (  1.988697)
dbm      read:    0.130000   0.150000   0.280000 (  0.502423)
PStore   read:    2.090000   0.070000   2.160000 (  2.447758)
hash     read:    0.080000   0.000000   0.080000 (  0.077678)
daybreak keys:    2.960000   0.120000   3.080000 (  3.211114)
gdbm     keys:    1.260000   0.270000   1.530000 (  3.299173)
dbm      keys:    0.190000   0.270000   0.460000 (  1.530169)
PStore   keys:    1.710000   0.180000   1.890000 (  1.986568)
hash     keys:    0.000000   0.000000   0.000000 (  0.004191)
daybreak values:  3.250000   0.070000   3.320000 (  3.778728)
gdbm     values:  0.850000   0.220000   1.070000 (  3.457204)
dbm      values:  0.370000   0.190000   0.560000 (  1.823945)
PStore   values:  1.150000   0.030000   1.180000 (  1.202744)
hash     values:  0.010000   0.000000   0.010000 (  0.008623)
daybreak file size: 28.627 mb
gdbm     file size: 45.885 mb
dbm      file size: 41.754 mb
PStore   file size: 25.137 mb
-------------------------------------------------------------

              user     system      total        real
daybreak insert:  6.530000   2.910000   9.440000 (  9.938166)
gdbm     insert:  3.500000   1.770000   5.270000 (  7.916740)
dbm      insert:  0.990000   1.580000   2.570000 (  3.926296)
PStore   insert:  1.280000   0.080000   1.360000 (  6.565056)
hash     insert:  0.320000   0.010000   0.330000 (  0.339820)
daybreak read:    3.180000   0.130000   3.310000 (  4.165183)
gdbm     read:    0.460000   0.170000   0.630000 (  1.667406)
dbm      read:    0.270000   0.600000   0.870000 (  3.078985)
PStore   read:    2.080000   0.090000   2.170000 (  2.514186)
hash     read:    0.070000   0.000000   0.070000 (  0.074237)
daybreak keys:    2.910000   0.110000   3.020000 (  3.136291)
gdbm     keys:    1.270000   0.250000   1.520000 (  2.784880)
dbm      keys:    0.190000   0.260000   0.450000 (  1.463298)
PStore   keys:    1.640000   0.100000   1.740000 (  1.810947)
hash     keys:    0.000000   0.000000   0.000000 (  0.002555)
daybreak values:  3.150000   0.060000   3.210000 (  3.306449)
gdbm     values:  0.750000   0.200000   0.950000 (  2.202966)
dbm      values:  0.380000   0.180000   0.560000 (  1.565708)
PStore   values:  1.140000   0.040000   1.180000 (  1.253058)
hash     values:  0.000000   0.000000   0.000000 (  0.004755)
daybreak file size: 28.627 mb
gdbm     file size: 45.885 mb
dbm      file size: 41.754 mb
PStore   file size: 25.137 mb
-------------------------------------------------------------



Verdict:
daybreak:
10.290654+3.688214+3.230572+3.543207+10.570726+3.683599+3.211114+3.778728+9.938166+4.165183+3.136291+3.306449 = 62,542903
gdbm:
5.904929+1.727852+2.769978+2.084107+6.984838+1.988697+3.299173+3.457204+7.916740+1.667406+2.784880+2.202966 = 42,78877
dbm:
5.002721+1.593883+1.646055+1.380364+2.864508+0.502423+1.530169+1.823945+3.926296+3.078985+1.463298+1.565708 = 26,378355
PStore:
5.790313+2.439700+2.078772+1.194967+1.452362+2.447758+1.986568+1.202744+6.565056+2.514186+1.810947+1.253058 = 30,736431
hash:
0.348286+0.072445+0.003377+0.003910+0.327974+0.077678+0.004191+0.008623+0.339820+0.074237+0.002555+0.00475 = 1,267846


hash     =  1,267846 #Note: Does not persist
dbm      = 26,378355 #Note: Stored data is cpu-architecture-dependent. Hard to debug if needed.
PStore   = 30,736431 #Note: Requires all in same transaction. if not, time ~= infinity
gdbm     = 42,78877
daybreak = 62,542903 #Note: Does not work right on Windows

hash     file size: 0 bytes #Do not persist
dbm      file size: 41.754 mb
PStore   file size: 25.137 mb
gdbm     file size: 45.885 mb
daybreak file size: 28.627 mb

As you can see, the dbm seems to be fastest overall, but has the issue that the stored file is very dependent on the machine you are on, so if you move it to another machine, it might not read at all. Pstore seems to be performin well, however, if the tests were not run inside one db.transaction, the performance was so bad I had to abort the execution. One point to make though is that PStore seems to use fairly litte disk space (smallest in this test).

GDBM seems to be the best allround solution. It works on all platforms (yes, even Windows) and it seems to be performing well. Keep in mind that it has to marshal.dump the values and keys used, and even though that is done it performs fairly well.

Daybreak seems to use fairly little disk space also, but is somewhat slow on this dataset contrary to what the moneta tests seem to indicate, which uses a very small dataset (100 and 1000 keys). Daybreak, despite being "pure ruby" does not work in windows because it uses some fancy file-locking which is only supported in POSIX.

The hash times are added for a simple comparison to the (to my knowledge) fastest in-memory alternative. As one would expect, the hash is way faster.

Notes: There are multiple key-value stores left out of this test. This test was meant as a test of some cross-platform alternatives compared to the daybreak which I wrongly assumed to be cross platform. Feel free to add any ones you feel is missing :)

@shreyasbharath
Copy link

Can we put sqlite3_hash to the test too?

@rajsahae
Copy link

rajsahae commented Jun 7, 2019

This is a fantastic gist and I will be using this (with some modification) to benchmark the same utilities in our environment for some in memory storage issues we are having. Thanks!

@xanni
Copy link

xanni commented Jul 29, 2020

No sdbm?

@stephan-nordnes-eriksen
Copy link
Author

Thanks for your feedback! Keep in mind; I made this is 2015. If there is enough demand, I could update with newer versions of everything, and see if there are any new databases that should be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment