Skip to content

Instantly share code, notes, and snippets.

Created February 18, 2016 06:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save anonymous/bfa5926a9d990df52dd0 to your computer and use it in GitHub Desktop.
Save anonymous/bfa5926a9d990df52dd0 to your computer and use it in GitHub Desktop.
Compare VPack to Sereal - *Size* only

At $work, we are looking to replace JSON encoding with another format, to increase encode/decode speed and required storage size.

Requirements:

  • fast decode: we can trade slower encode speed for size, but decode should be fast;
  • language support: stack is Perl, Go, and JavaScript. PHP is a plus, but not required;
  • no schema requirement: data is JSON-compatible, deeply nested in cases, but we don't have a schema to start from.

We are testing msgpack, cbor, sereal, and others, but here I wanted to compare just sereal (the current forerunner) with the new VPack from ArangoDB project.

We used the sample files from VPack project tests/jsonSample/.

Please note: we are only comparing size at the moment (spoiler: frankly it was enough, no need to compare it further, Sereal won...)

#!/usr/bin/env perl
use strict;
use JSON::XS;
use Path::Tiny;
use Sereal::Encoder qw( SRL_SNAPPY SRL_ZLIB SRL_UNCOMPRESSED );
use Text::Table;
die "Usage: json2sereal.pl <dir>\n\n Scans <dir> for .json files, converts to seral and compares sizes\n" unless @ARGV;
my $enc_snappy = Sereal::Encoder->new({ compress => SRL_SNAPPY, dedupe_strings => 1 });
my $enc_zlib = Sereal::Encoder->new({ compress => SRL_ZLIB, dedupe_strings => 1 });
my $enc_none = Sereal::Encoder->new({ compress => SRL_UNCOMPRESSED, dedupe_strings => 1 });
my $enc_def = Sereal::Encoder->new({ compress => SRL_UNCOMPRESSED });
my %best_vpack = (
'api-docs.json' => 994160,
'commits.json' => 20789,
'countries.json' => 956786,
'directory-tree.json' => 244716,
'doubles.json' => 899982,
'doubles-small.json' => 89998,
'file-list.json' => 133536,
'object.json' => 118630,
'pass1.json' => 804,
'pass2.json' => 51,
'pass3.json' => 108,
'random1.json' => 6836,
'random2.json' => 5815,
'random3.json' => 51515,
'sample.json' => 153187,
'small.json' => 30,
);
my $it = path(@ARGV)->iterator;
my (@rows, %totals);
while (my $f = $it->()) {
my $b = $f->basename;
next unless $f->is_file and $b =~ m/[.]json$/;
my $c = eval { decode_json($f->slurp_raw) };
debug("Skip file '$b', could not JSON-parse it: $@"), next unless defined $c;
my $v = $best_vpack{$b};
debug("Skip file '$b', no VPack comparison"), next unless $v;
my $s = $f->stat->size;
my ($def, $none, $snap, $zlib) = (
length($enc_def->encode($c)), length($enc_none->encode($c)),
length($enc_snappy->encode($c)), length($enc_zlib->encode($c))
);
$totals{json} += $s;
$totals{vpack} += $v;
$totals{def} += $def;
$totals{none} += $none;
$totals{snap} += $snap;
$totals{zlib} += $zlib;
push @rows, table_row($b, $s, $v, $def, $none, $snap, $zlib);
}
push @rows,
table_row('-- Total --', $totals{json}, $totals{vpack}, $totals{def}, $totals{none}, $totals{snap}, $totals{zlib});
my $tb = Text::Table->new(
'File',
'JSON Size',
'VPack best',
'Defaults',
'% JSON',
'% VPack',
'No Compr',
'% JSON',
'% VPack',
'Snappy',
'% JSON',
'% VPack',
'ZLib',
'% JSON',
'% VPack',
);
$tb->load(@rows);
print $tb;
sub debug {
return unless $ENV{DEBUG};
print STDERR "[DEBUG] @_\n";
}
sub table_row {
my ($b, $s, $v, $def, $none, $snap, $zlib) = @_;
return [
$b,
$s,
$v,
$def,
sprintf('%.2f%%', $def / $s * 100),
sprintf('%.2f%%', $def / $v * 100),
$none,
sprintf('%.2f%%', $none / $s * 100),
sprintf('%.2f%%', $none / $v * 100),
$snap,
sprintf('%.2f%%', $snap / $s * 100),
sprintf('%.2f%%', $snap / $v * 100),
$zlib,
sprintf('%.2f%%', $zlib / $s * 100),
sprintf('%.2f%%', $zlib / $v * 100)
];
}
File JSON Size VPack best Defaults % JSON % VPack No Compr % JSON % VPack Snappy % JSON % VPack ZLib % JSON % VPack
api-docs.json 1205964 994160 963633 79.91% 96.93% 908778 75.36% 91.41% 210387 17.45% 21.16% 113945 9.45% 11.46%
commits.json 25216 20789 9955 39.48% 47.89% 9707 38.50% 46.69% 6593 26.15% 31.71% 4786 18.98% 23.02%
countries.json 1134029 956786 586108 51.68% 61.26% 528054 46.56% 55.19% 321509 28.35% 33.60% 218687 19.28% 22.86%
directory-tree.json 297695 244716 178976 60.12% 73.14% 168129 56.48% 68.70% 90779 30.49% 37.10% 64656 21.72% 26.42%
doubles-small.json 158706 89998 89990 56.70% 99.99% 89990 56.70% 99.99% 80815 50.92% 89.80% 52183 32.88% 57.98%
doubles.json 1187062 899982 899876 75.81% 99.99% 899876 75.81% 99.99% 804100 67.74% 89.35% 423361 35.66% 47.04%
file-list.json 151317 133536 123061 81.33% 92.16% 112113 74.09% 83.96% 59574 39.37% 44.61% 40492 26.76% 30.32%
object.json 157781 118630 118756 75.27% 100.11% 118756 75.27% 100.11% 87250 55.30% 73.55% 54952 34.83% 46.32%
pass1.json 1441 804 806 55.93% 100.25% 806 55.93% 100.25% 806 55.93% 100.25% 806 55.93% 100.25%
pass2.json 52 51 38 73.08% 74.51% 38 73.08% 74.51% 38 73.08% 74.51% 38 73.08% 74.51%
pass3.json 148 108 110 74.32% 101.85% 110 74.32% 101.85% 110 74.32% 101.85% 110 74.32% 101.85%
random1.json 9672 6836 6094 63.01% 89.15% 5863 60.62% 85.77% 4099 42.38% 59.96% 3088 31.93% 45.17%
random2.json 8239 5815 5192 63.02% 89.29% 4986 60.52% 85.74% 3544 43.01% 60.95% 2682 32.55% 46.12%
random3.json 72953 51515 45064 61.77% 87.48% 42278 57.95% 82.07% 25295 34.67% 49.10% 18146 24.87% 35.22%
sample.json 687491 153187 98144 14.28% 64.07% 83096 12.09% 54.24% 82961 12.07% 54.16% 75847 11.03% 49.51%
small.json 82 30 54 65.85% 180.00% 54 65.85% 180.00% 54 65.85% 180.00% 54 65.85% 180.00%
-- Total -- 5097848 3676943 3125857 61.32% 85.01% 2972634 58.31% 80.85% 1777914 34.88% 48.35% 1073833 21.06% 29.20%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment