Skip to content

Instantly share code, notes, and snippets.

@jimeh
Created December 24, 2009 11:51
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jimeh/263161 to your computer and use it in GitHub Desktop.
Save jimeh/263161 to your computer and use it in GitHub Desktop.
A quick head-to-head performance test of JSON vs. BSON.
.DS_Store
data.rb
results.txt

JSON vs. BSON Perfomance Tests in Ruby

Don't trust these tests and results for life or death situations, cause I can't guarantee it's very scientific. The example data I'm using is what I consider could be your average user data, but possibly it's not structured in a optimal way for these tests.

If you don't know what BSON is, it's the raw binary format that MongoDB uses internally to store all it's data. It has some interesting advantages over JSON.

The Tests

I've created four tests, each to time how long JSON and BSON each take to generate their format from a Ruby Hash, and how long time they take to parse their data back into a Ruby Hash.

My findings

Using Ruby 1.8.7, BSON and JSON are roughly just as fast with a dataset of 100 rows or less when generating their respective formats. As for parsing/reading back to a Ruby Hash, BSON is faster than JSON (0.003 seconds vs. 0.01). With a dataset of 10,000 rows on the other hand, BSON is taking on average 10 seconds to read BSON data to a Ruby Hash, while JSON takes 0.2-0.5 seconds, writing this data though BSON does slightly faster.

With Ruby 1.9.1 however, the story looks different. BSON is about five times faster than JSON at generating their respective formats with large (10,000 rows in 0.3 secongs) and small datasets. Parsing back to a Ruby Hash though, BSON is about twice as slow as JSON with large datasets.

One oddity with Ruby 1.9.x however is that when the data.rb file contains 10,000 rows (2.3MB), requiring the file takes almost a minute, while with Ruby 1.8 it's instant.

In short, BSON is terribly slow to read compared to JSON, but faster to build. Also, the BSON data seems to be slightly larger in byte size than JSON data.

Run the Tests

Clone this Gist

git clone git://gist.github.com/263161.git json_vs_bson_gist

Requirements

You will need to install the mongo and mongo_ext to get the BSON module with optimal performance.

sudo gem install mongo mongo_ext

Create Test Data

To run the tests, you will for need to generate the data.rb file that the tests use:

./make_data 1000

This will generate a dataset with 1000 "rows" in data.rb.

Run the Tests

Then to run all the tests from the tests folder run:

./run_tests

The time of each test along with the size of the used dataset is displayed, and logged to results.txt.

Small Mistake

When I first pushed this gist, I had an mistake in my tests and findings. Namely, I had managed to flip around the two BSON tests, so what I thought was the read test, was actually the make/write test, and vise verse. I've corrected it, and updated everything accordingly.

Showdown.run do
BSON.serialize($data)
end
bson = BSON.serialize($data)
Showdown.run do
BSON.deserialize(bson)
end
Showdown.run do
{ :result => BSON.serialize($data).size.to_s + " bytes" }
end
require "rubygems"
require "mongo"
require "json"
require "lib/showdown"
Showdown.init
Showdown.run do
JSON.generate($data)
end
json = JSON.generate($data)
Showdown.run do
JSON.parse(json)
end
Showdown.run do
{ :result => JSON.generate($data).size.to_s + " bytes" }
end
#! /usr/bin/env ruby
require "init"
Showdown.make_data((!ARGV[0].nil?) ? ARGV[0].to_i : 1000)
#! /usr/bin/env ruby
require "init"
puts Showdown.run_all_tests
class Showdown
class << self
attr_accessor :root
attr_accessor :results
attr_accessor :current_test
end
def self.init
@root = File.dirname(File.dirname(__FILE__))
end
def self.load_data
if $data.nil? && File.exist?(@root + "/data.rb")
require(@root + "/data.rb")
end
end
def self.run_all_tests
tests = Dir.glob(@root + "/tests/*.rb").map { |file| File.basename(file).gsub(/\.rb$/, "") }
tests.each do |test|
Showdown.run_test(test)
end
result = "--------------------------------------------\n\n"
result << "Ruby Version: #{RUBY_VERSION}\n"
result << "Data Rows: #{$data.size}\n"
result << "Test results:\n"
Showdown.results.sort{ |a,b| a[0].to_s <=> b[0].to_s }.each do |item|
result << " #{item[0]}: #{item[1]}\n"
end
result << "\n"
File.open(@root + "/results.txt", "a") do |f|
f.write(result)
end
return result
end
def self.run_test(name)
load_data
@results = {} if @results.nil?
if File.exist?(@root + "/tests/#{name}.rb")
@current_test = name
require @root + "/tests/#{name}.rb"
@current_test = nil
end
end
def self.run(&block)
started = Time.now
result = yield(block)
taken = Time.now - started
if result.is_a?(Hash) && result.has_key?(:result)
@results[@current_test.to_sym] = result[:result]
else
@results[@current_test.to_sym] = taken.to_s + " seconds"
end
end
def self.make_data(limit = 100)
firstnames = [ "John", "Jim", "James", "Sarah", "Anna", "Maria", "Sophia", "Martin", "Nick", "Bart" ]
middlenames = [ "", "Richard", "Hanna", "Drew", "Jonas", "Marie", "Linc", "Matthew", "David", "Mark" ]
lastnames = [ "Smith", "Johnsson", "Andrews", "McCloud", "Windsgate",
"Integra", "Hellfire", "Mickelsson", "Rickson", "Dickson" ]
File.open(@root + "/data.rb", "w") do |f|
f.write("$data = {\n")
limit.times do |i|
first = firstnames[rand(firstnames.count-1)]
middle = middlenames[rand(middlenames.count-1)]
last = lastnames[rand(lastnames.count-1)]
email = "#{first}#{middle}#{last}@email.com".downcase
created_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
updated_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
row = %Q{\t"person_#{i}" => \{"first_name" => "#{first}", "middle_name" => "#{middle}", "last_name" => "#{last}", "email" => "#{email}", }
row << %Q{"active" => #{rand(2)}, "created_at" => "#{created_at}", "updated_at" => "#{updated_at}"\},\n}
f.write(row)
end
f.write("}")
end
end
end
@chuckremes
Copy link

These results don't surprise me very much. Several users (including myself) have contributed back patches to make the C extension perform better. You should have seen the BSON performance before those patches.... argh.

I don't know if this is a BSON issue so much as it's a Ruby issue though. It would be instructive to profile (with oprofile, Instruments.app, etc) the C code and try to identify and optimize the bottlenecks. Also, try your test on JRuby which has a completely different performance profile.

@gjmurakami-10gen
Copy link

Alas, as the idiom goes, "The devil is in the details." To pull in the BSON C extension, make sure that you require "bson_ext". There's a huge difference between the pure Ruby implementation and the C extension. Even with the C extensions, the 10gen supported Ruby driver is not what it could be. Analysis of the C extension shows some inefficiencies . For serialization to BSON, there are some extraneous malloc's (rb_str_new2 for symbol, rb_ary_* for extra key passing) that should ideally be eliminated but that are painfully difficult to extract from the code as currently written. BSON deserialization is dominated by Ruby object creation which includes malloc's, we certainly could benefit from optimization. It's worth taking a look at Moped to see how Ruby meta-programming can be used to simplify serialization/deserialization (have classes/objects operate on themselves). In all cases, whether JSON or BSON, extra object creation overhead and malloc's are expensive and will probably dominate over other costs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment