Skip to content

Instantly share code, notes, and snippets.

@billdueber
Created December 6, 2022 03:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save billdueber/87b4f100c4a2d5d1c756470e09615857 to your computer and use it in GitHub Desktop.
Save billdueber/87b4f100c4a2d5d1c756470e09615857 to your computer and use it in GitHub Desktop.
Truffleruby JSON parsing slow on previously-gzipped data

Benchmark JSON parsing: gzipped vs. non-gzipped data

This is a very simple, self-contained (well, except for benchmark-ips) benchmark. Conceptually, it does the following:

  • Create an array-of-arrays, each element being a 20-element array of integers
  • Write it out to a file as JSON
  • Write it out to a gzipped file as JSON
  • Read them both back in as strings
  • Compare how long it takes to JSON.parse the never-gzipped (plain) string vs the previously-gzipped string

Usage via slow_gzip_bench.rb -h

Running with 1000 rows, 10 second warmup, 10 second test

✦ ❯ ruby slow_gzip_bench.rb 1000 10 10

truffleruby 22.3.0, like ruby 3.0.3, GraalVM CE Native [x86_64-darwin]
Base unit is an array of 20 integers
Rows: 1000; Warmup seconds: 10; run test for 10 seconds

Warming up --------------------------------------
               plain     7.000  i/100ms
              gzdata     1.000  i/100ms
Calculating -------------------------------------
               plain     92.525  (±10.8%) i/s -    924.000  in  10.134733s
              gzdata      0.176  (± 0.0%) i/s -      2.000  in  11.384485s

Comparison:
               plain:       92.5 i/s
              gzdata:        0.2 i/s - 526.66x  (± 0.00) slower
require 'json'
require 'zlib'
require 'tmpdir'
require 'benchmark/ips'
IS_TRUFFLE = RUBY_DESCRIPTION =~ /truffle/i
if ARGV.size == 1 and ["-h", "--help"].include? ARGV.first
puts "#{$0} <rows> <run time> <warmup time>"
puts "Defaults: 300 5 5"
exit 1
end
rows = (ARGV.shift || 300).to_i
runtime = (ARGV.shift || 5).to_i
warmup = (ARGV.shift || 5).to_i
def generate_files_and_read_back_in(rows)
arr = (1..rows).each_with_object([]) { |i, a| a << (i..(i + 20)).to_a }
plain_filename = Dir.tmpdir + "/test.json"
gzdata_filename = Dir.tmpdir + "/test.json.gz"
File.open(plain_filename, "w:utf-8") { |f| f.puts arr.to_json }
Zlib::GzipWriter.open(gzdata_filename) { |f| f.puts arr.to_json }
{
"plain" => File.read(plain_filename),
"gzdata" => Zlib::GzipReader.open(gzdata_filename).read,
}
end
FILEDATA = generate_files_and_read_back_in(rows)
puts "\n" + RUBY_DESCRIPTION
puts "Base unit is an array of 20 integers"
puts "Rows: #{rows}; Warmup seconds: #{warmup}; run test for #{runtime} seconds"
puts "\n"
Benchmark.ips do |x|
x.config(time: runtime, warmup: warmup)
FILEDATA.keys.each do |n|
x.report(n) do
json = JSON.parse(FILEDATA[n])
end
end
x.compare!
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment