Last active
February 29, 2016 23:48
-
-
Save jpmckinney/7549633 to your computer and use it in GitHub Desktop.
Oj::ScHandler and Oj.sc_parse have seen little usage, yet they are considered by Oj's author to be the fastest way to parse. I wrote a few parsers to understand how Oj::ScHandler works and to compare its performance to Oj.load. Conclusion: If you want to parse an entire document (usually the case), then the simple Oj.load is still fastest.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'json' | |
require 'oj' | |
require 'multi_json' | |
# @see https://github.com/ohler55/oj/blob/master/lib/oj/schandler.rb | |
# @see https://github.com/ohler55/oj/blob/master/test/test_scp.rb#L37 | |
# @see https://github.com/platzhirsch/metadata-harvester/blob/master/lib/dump_handler.rb | |
# Prints all calls, identifying hashes and arrays with integers. | |
class DebugHandler | |
def initialize | |
@id = 0 | |
end | |
def hash_start(*args) | |
debug :hash_start, *args | |
@id += 1 | |
end | |
def hash_end(*args) | |
debug :hash_end, *args | |
end | |
def array_start(*args) | |
debug :array_start, *args | |
@id += 1 | |
end | |
def array_end(*args) | |
debug :array_end, *args | |
end | |
def hash_set(*args) | |
debug :hash_set, *args | |
end | |
def array_append(*args) | |
debug :array_append, *args | |
end | |
def add_value(*args) | |
debug :add_value, *args | |
end | |
def error(*args) | |
debug :error, *args | |
end | |
private | |
def debug(*args) | |
p args | |
end | |
end | |
# Constructs Ruby objects and stores all objects added with `add_value` in an array. | |
class ConservativeHandler | |
attr_reader :values | |
def initialize | |
@values = [] | |
end | |
# @return [Hash] the first argument to `hash_set` | |
def hash_start | |
{} | |
end | |
def hash_end | |
end | |
# @return [Array] the first argument to `array_append` | |
def array_start | |
[] | |
end | |
def array_end | |
end | |
def hash_set(h, key, value) | |
h[key] = value | |
end | |
def array_append(a, value) | |
a << value | |
end | |
# There seems to be only one call to `add_value`. | |
def add_value(value) | |
@values << value | |
end | |
def error(message, line, column) | |
raise Exception.new("#{message} line #{line} column #{column}") | |
end | |
end | |
# Constructs Ruby objects and stores one object added with `add_value`. | |
class ParseHandler | |
attr_reader :value | |
# @return [Hash] the first argument to `hash_set` | |
def hash_start | |
{} | |
end | |
def hash_end | |
end | |
# @return [Array] the first argument to `array_append` | |
def array_start | |
[] | |
end | |
def array_end | |
end | |
def hash_set(h, key, value) | |
h[key] = value | |
end | |
def array_append(a, value) | |
a << value | |
end | |
# There seems to be only one call to `add_value`. | |
def add_value(value) | |
@value = value | |
end | |
def error(message, line, column) | |
raise Exception.new("#{message} line #{line} column #{column}") | |
end | |
end | |
f = File.read('/path/to/name.json') | |
# Inspect the output of the handlers. | |
Oj.sc_parse(DebugHandler.new, f) | |
cnt = ConservativeHandler.new | |
Oj.sc_parse(cnt, f) | |
puts JSON.pretty_generate(cnt.values) | |
cnt = ParseHandler.new | |
Oj.sc_parse(cnt, f) | |
puts JSON.pretty_generate(cnt.value) | |
# Compare the running time of different parsers. | |
t = Time.now;100000.times{Oj.sc_parse(ParseHandler.new, f)};Time.now - t | |
t = Time.now;100000.times{Oj.load(f)};Time.now - t | |
t = Time.now;100000.times{MultiJson.load(f)};Time.now - t | |
t = Time.now;100000.times{JSON.load(f)};Time.now - t | |
# `Oj.load` is faster than `Oj.sc_parse` if you are parsing a full document. | |
# `Oj.sc_parse` can be faster if you only want to parse part of a document. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment