Skip to content

Instantly share code, notes, and snippets.

@billdueber
Last active December 1, 2022 20:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save billdueber/e04f49409b1588968f5cee64a2c50dfb to your computer and use it in GitHub Desktop.
Save billdueber/e04f49409b1588968f5cee64a2c50dfb to your computer and use it in GitHub Desktop.
Slowness of parsing a string read from a gzipped file
truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
Base unit is an array of 20 integers
JSON-decode a string encoding an array of 10 of those base units.
Calculating -------------------------------------
JSON 10 plain 6.817k (±14.0%) i/s - 133.364k in 19.995768s
JSON 10 gzdata 1.204k (±28.0%) i/s - 15.611k in 20.047997s
JSON 10 flattened 6.954k (± 9.6%) i/s - 138.006k in 20.090469s
Comparison:
JSON 10 flattened: 6954.1 i/s
JSON 10 plain: 6817.4 i/s - same-ish: difference falls within error
JSON 10 gzdata: 1203.7 i/s - 5.78x (± 0.00) slower
JSON-decode a string encoding an array of 500 of those base units.
Calculating -------------------------------------
JSON 500 plain 111.994 (± 8.9%) i/s - 2.224k in 20.040061s
JSON 500 gzdata 0.643 (± 0.0%) i/s - 13.000 in 20.250865s
JSON 500 flattened 105.381 (±10.4%) i/s - 2.079k in 20.000481s
Comparison:
JSON 500 plain: 112.0 i/s
JSON 500 flattened: 105.4 i/s - same-ish: difference falls within error
JSON 500 gzdata: 0.6 i/s - 174.20x (± 0.00) slower
require 'json'
require 'zlib'
require 'tmpdir'
require 'benchmark'
require 'oj'
LINES = (ARGV.shift || 300).to_i
BMTIMES = (ARGV.shift || 3).to_i
PLAIN = Dir.tmpdir + "/test.json"
GZIP = Dir.tmpdir + "/test.json.gz"
arr = (1..LINES).each_with_object([]) do |i, a|
a << (i..(i+20)).to_a
end
File.open(PLAIN, "w:utf-8") {|f| f.puts arr.to_json}
Zlib::GzipWriter.open(GZIP) {|f| f.puts arr.to_json}
plain_data = File.read(PLAIN)
gzdata = Zlib::GzipReader.open(GZIP).read
flattened = Truffle::Debug.flatten_string(gzdata)
forced_copy = gzdata + " "
puts "\n" + RUBY_DESCRIPTION
puts "Benchmarking with an array of #{LINES} 20-element arrays (repeat #{BMTIMES} times)"
puts "\nBEGIN STDLIB JSON"
Benchmark.bm do |x|
BMTIMES.times do
puts "\n"
x.report("%-25s" % "Plain" ) do
3.times do
json = JSON.parse(plain_data)
end
end
x.report('%-25s' % "Previously gzipped") do
3.times do
json = JSON.parse(gzdata)
end
end
x.report("%-25s" % "Flattened" ) do
3.times do
json = JSON.parse(flattened)
end
end
x.report('%-25s' % "Gzipped/forced copy") do
3.times do
json = JSON.parse(forced_copy)
end
end
end
end
puts "\n\nBEGIN Oj"
Oj.default_options = {:mode => :compat }
BMTIMES.times do
Benchmark.bm do |x|
puts "\n"
x.report('%-25s' % "Plain" ) do
3.times do
json = Oj.load(plain_data)
end
end
x.report('%-25s' % "Previously gzipped") do
3.times do
json = Oj.load(gzdata)
end
end
x.report('%-25s' % "Gzipped/forced copy") do
3.times do
json = Oj.load(forced_copy)
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment