jimeh/.gitignore

## .gitignore
.DS_Store
data.rb
results.txt

## README.md

      
    Raw
  

              README.md
            
          
    JSON vs. BSON Perfomance Tests in Ruby

Don't trust these tests and results for life or death situations, cause I can't guarantee it's very scientific. The example data I'm using is what I consider could be your average user data, but possibly it's not structured in a optimal way for these tests.
If you don't know what BSON is, it's the raw binary format that MongoDB uses internally to store all it's data. It has some interesting advantages over JSON.
The Tests

I've created four tests, each to time how long JSON and BSON each take to generate their format from a Ruby Hash, and how long time they take to parse their data back into a Ruby Hash.
My findings

Using Ruby 1.8.7, BSON and JSON are roughly just as fast with a dataset of 100 rows or less when generating their respective formats. As for parsing/reading back to a Ruby Hash, BSON is faster than JSON (0.003 seconds vs. 0.01). With a dataset of 10,000 rows on the other hand, BSON is taking on average 10 seconds to read BSON data to a Ruby Hash, while JSON takes 0.2-0.5 seconds, writing this data though BSON does slightly faster.
With Ruby 1.9.1 however, the story looks different. BSON is about five times faster than JSON at generating their respective formats with large (10,000 rows in 0.3 secongs) and small datasets. Parsing back to a Ruby Hash though, BSON is about twice as slow as JSON with large datasets.
One oddity with Ruby 1.9.x however is that when the data.rb file contains 10,000 rows (2.3MB), requiring the file takes almost a minute, while with Ruby 1.8 it's instant.
In short, BSON is terribly slow to read compared to JSON, but faster to build. Also, the BSON data seems to be slightly larger in byte size than JSON data.
Run the Tests

Clone this Gist

git clone git://gist.github.com/263161.git json_vs_bson_gist

Requirements

You will need to install the mongo and mongo_ext to get the BSON module with optimal performance.
sudo gem install mongo mongo_ext

Create Test Data

To run the tests, you will for need to generate the data.rb file that the tests use:
./make_data 1000

This will generate a dataset with 1000 "rows" in data.rb.
Run the Tests

Then to run all the tests from the tests folder run:
./run_tests

The time of each test along with the size of the used dataset is displayed, and logged to results.txt.
Small Mistake

When I first pushed this gist, I had an mistake in my tests and findings. Namely, I had managed to flip around the two BSON tests, so what I thought was the read test, was actually the make/write test, and vise verse. I've corrected it, and updated everything accordingly.

  
## bson_make.rb
Showdown.run do
  BSON.serialize($data)
end

## bson_read.rb
bson = BSON.serialize($data)
Showdown.run do
  BSON.deserialize(bson)
end

## bson_size.rb
Showdown.run do
  { :result => BSON.serialize($data).size.to_s + " bytes" }
end

## init.rb
require "rubygems"
require "mongo"
require "json"
require "lib/showdown"

Showdown.init

## json_make.rb
Showdown.run do
  JSON.generate($data)
end

## json_read.rb
json = JSON.generate($data)
Showdown.run do
  JSON.parse(json)
end

## json_size.rb
Showdown.run do
  { :result => JSON.generate($data).size.to_s + " bytes" }
end

## make_data
#! /usr/bin/env ruby
require "init"

Showdown.make_data((!ARGV[0].nil?) ? ARGV[0].to_i : 1000)

## run_tests
#! /usr/bin/env ruby
require "init"

puts Showdown.run_all_tests

## showdown.rb
class Showdown

  class << self
    attr_accessor :root
    attr_accessor :results
    attr_accessor :current_test
  end

  def self.init
    @root = File.dirname(File.dirname(__FILE__))
  end

  def self.load_data
    if $data.nil? && File.exist?(@root + "/data.rb")
      require(@root + "/data.rb")
    end
  end

  def self.run_all_tests
    tests = Dir.glob(@root + "/tests/*.rb").map { |file| File.basename(file).gsub(/\.rb$/, "") }
    tests.each do |test|
      Showdown.run_test(test)
    end
    result =  "--------------------------------------------\n\n"
    result << "Ruby Version: #{RUBY_VERSION}\n"
    result << "Data Rows: #{$data.size}\n"
    result << "Test results:\n"
    Showdown.results.sort{ |a,b| a[0].to_s <=> b[0].to_s }.each do |item|
      result << "  #{item[0]}: #{item[1]}\n"
    end
    result << "\n"
    File.open(@root + "/results.txt", "a") do |f|
      f.write(result)
    end
    return result
  end

  def self.run_test(name)
    load_data
    @results = {} if @results.nil?
    if File.exist?(@root + "/tests/#{name}.rb")
      @current_test = name
      require @root + "/tests/#{name}.rb"
      @current_test = nil
    end
  end

  def self.run(&block)
    started = Time.now
    result = yield(block)
    taken = Time.now - started
    if result.is_a?(Hash) && result.has_key?(:result)
      @results[@current_test.to_sym] = result[:result]
    else
      @results[@current_test.to_sym] = taken.to_s + " seconds"
    end
  end

  def self.make_data(limit = 100)
    firstnames  = [ "John", "Jim", "James", "Sarah", "Anna", "Maria", "Sophia", "Martin", "Nick", "Bart" ]
    middlenames = [ "", "Richard", "Hanna", "Drew", "Jonas", "Marie", "Linc", "Matthew", "David", "Mark" ]
    lastnames   = [ "Smith", "Johnsson", "Andrews", "McCloud", "Windsgate",
                    "Integra", "Hellfire", "Mickelsson", "Rickson", "Dickson" ]

    File.open(@root + "/data.rb", "w") do |f|
      f.write("$data = {\n")
      limit.times do |i|
        first = firstnames[rand(firstnames.count-1)]
        middle = middlenames[rand(middlenames.count-1)]
        last = lastnames[rand(lastnames.count-1)]
        email = "#{first}#{middle}#{last}@email.com".downcase
        created_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
        updated_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
        row = %Q{\t"person_#{i}" => \{"first_name" => "#{first}", "middle_name" => "#{middle}", "last_name" => "#{last}", "email" => "#{email}", }
        row << %Q{"active" => #{rand(2)}, "created_at" => "#{created_at}", "updated_at" => "#{updated_at}"\},\n}
        f.write(row)
      end
      f.write("}")
    end
  end

end
	bson = BSON.serialize($data)
	Showdown.run do
	BSON.deserialize(bson)
	end
	Showdown.run do
	{ :result => BSON.serialize($data).size.to_s + " bytes" }
	end
	require "rubygems"
	require "mongo"
	require "json"
	require "lib/showdown"

	Showdown.init
	json = JSON.generate($data)
	Showdown.run do
	JSON.parse(json)
	end
	Showdown.run do
	{ :result => JSON.generate($data).size.to_s + " bytes" }
	end
	#! /usr/bin/env ruby
	require "init"

	Showdown.make_data((!ARGV[0].nil?) ? ARGV[0].to_i : 1000)
	#! /usr/bin/env ruby
	require "init"

	puts Showdown.run_all_tests
	class Showdown

	class << self
	attr_accessor :root
	attr_accessor :results
	attr_accessor :current_test
	end

	def self.init
	@root = File.dirname(File.dirname(__FILE__))
	end

	def self.load_data
	if $data.nil? && File.exist?(@root + "/data.rb")
	require(@root + "/data.rb")
	end
	end

	def self.run_all_tests
	tests = Dir.glob(@root + "/tests/*.rb").map { \|file\| File.basename(file).gsub(/\.rb$/, "") }
	tests.each do \|test\|
	Showdown.run_test(test)
	end
	result = "--------------------------------------------\n\n"
	result << "Ruby Version: #{RUBY_VERSION}\n"
	result << "Data Rows: #{$data.size}\n"
	result << "Test results:\n"
	Showdown.results.sort{ \|a,b\| a[0].to_s <=> b[0].to_s }.each do \|item\|
	result << " #{item[0]}: #{item[1]}\n"
	end
	result << "\n"
	File.open(@root + "/results.txt", "a") do \|f\|
	f.write(result)
	end
	return result
	end

	def self.run_test(name)
	load_data
	@results = {} if @results.nil?
	if File.exist?(@root + "/tests/#{name}.rb")
	@current_test = name
	require @root + "/tests/#{name}.rb"
	@current_test = nil
	end
	end

	def self.run(&block)
	started = Time.now
	result = yield(block)
	taken = Time.now - started
	if result.is_a?(Hash) && result.has_key?(:result)
	@results[@current_test.to_sym] = result[:result]
	else
	@results[@current_test.to_sym] = taken.to_s + " seconds"
	end
	end

	def self.make_data(limit = 100)
	firstnames = [ "John", "Jim", "James", "Sarah", "Anna", "Maria", "Sophia", "Martin", "Nick", "Bart" ]
	middlenames = [ "", "Richard", "Hanna", "Drew", "Jonas", "Marie", "Linc", "Matthew", "David", "Mark" ]
	lastnames = [ "Smith", "Johnsson", "Andrews", "McCloud", "Windsgate",
	"Integra", "Hellfire", "Mickelsson", "Rickson", "Dickson" ]

	File.open(@root + "/data.rb", "w") do \|f\|
	f.write("$data = {\n")
	limit.times do \|i\|
	first = firstnames[rand(firstnames.count-1)]
	middle = middlenames[rand(middlenames.count-1)]
	last = lastnames[rand(lastnames.count-1)]
	email = "#{first}#{middle}#{last}@email.com".downcase
	created_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
	updated_at = Time.at(rand(Time.now.to_i)).strftime("%Y-%m-%d %H:%M:%S")
	row = %Q{\t"person_#{i}" => \{"first_name" => "#{first}", "middle_name" => "#{middle}", "last_name" => "#{last}", "email" => "#{email}", }
	row << %Q{"active" => #{rand(2)}, "created_at" => "#{created_at}", "updated_at" => "#{updated_at}"\},\n}
	f.write(row)
	end
	f.write("}")
	end
	end

	end