melo/intro.md

## intro.md

      
    Raw
  

              intro.md
            
          
    At $work, we are looking to replace JSON encoding with another format, to increase encode/decode speed and required storage size.
Requirements, in order of importance for our use case, YMMV:

no schema requirement: data is JSON-compatible, deeply nested in cases, but we don't have a schema to start from;
smallest size: we store the objects in memory on Redis DB's, so size is the main factor;
fast decode: we can trade slower encode speed for size, but decode should be fast;
language support: stack is Perl, Go, and JavaScript. PHP is a plus, but not required.

We are testing msgpack, cbor, sereal, and others,
but here I wanted to compare just sereal (the current forerunner)
with the new VPack from ArangoDB project.
We used the sample files from VPack project
tests/jsonSample/,
and I took the best results for VPack from the
Performance.md file,
last column, VPack-c.
Please note: we are only comparing size at the moment
(it was enough for our use case,
where size is more important, YMMV)
Please don't make this a "mine is better" competition, this is based on our criteria for our use case
If you find a bug on our methodology I would appreciate a note here or on @pedromelo.

  
## json2sereal.pl
#!/usr/bin/env perl

use strict;
use JSON::XS;
use Path::Tiny;
use Sereal::Encoder qw( SRL_SNAPPY SRL_ZLIB SRL_UNCOMPRESSED );
use Text::Table;

die "Usage: json2sereal.pl <dir>\n\n  Scans <dir> for .json files, converts to seral and compares sizes\n" unless @ARGV;

my $enc_snappy = Sereal::Encoder->new({ compress => SRL_SNAPPY,       dedupe_strings => 1 });
my $enc_zlib   = Sereal::Encoder->new({ compress => SRL_ZLIB,         dedupe_strings => 1 });
my $enc_none   = Sereal::Encoder->new({ compress => SRL_UNCOMPRESSED, dedupe_strings => 1 });
my $enc_def    = Sereal::Encoder->new();

my %best_vpack = (
  'api-docs.json'       => 994160,
  'commits.json'        => 20789,
  'countries.json'      => 956786,
  'directory-tree.json' => 244716,
  'doubles.json'        => 899982,
  'doubles-small.json'  => 89998,
  'file-list.json'      => 133536,
  'object.json'         => 118630,
  'pass1.json'          => 804,
  'pass2.json'          => 51,
  'pass3.json'          => 108,
  'random1.json'        => 6836,
  'random2.json'        => 5815,
  'random3.json'        => 51515,
  'sample.json'         => 153187,
  'small.json'          => 30,
);

my $it = path(@ARGV)->iterator;

my (@rows, %totals);
while (my $f = $it->()) {
  my $b = $f->basename;
  next unless $f->is_file and $b =~ m/[.]json$/;

  my $c = eval { decode_json($f->slurp_raw) };
  debug("Skip file '$b', could not JSON-parse it: $@"), next unless defined $c;

  my $v = $best_vpack{$b};
  debug("Skip file '$b', no VPack comparison"), next unless $v;

  my $s = $f->stat->size;
  my ($def, $none, $snap, $zlib) = (
    length($enc_def->encode($c)),    length($enc_none->encode($c)),
    length($enc_snappy->encode($c)), length($enc_zlib->encode($c))
  );

  $totals{json}  += $s;
  $totals{vpack} += $v;
  $totals{def}   += $def;
  $totals{none}  += $none;
  $totals{snap}  += $snap;
  $totals{zlib}  += $zlib;

  push @rows, table_row($b, $s, $v, $def, $none, $snap, $zlib);
}

push @rows,
  table_row('-- Total --', $totals{json}, $totals{vpack}, $totals{def}, $totals{none}, $totals{snap}, $totals{zlib});

my $tb = Text::Table->new(
  'File',
  'JSON Size',
  'VPack best',
  'Defaults',
  '% JSON',
  '% VPack',
  'No Compr',
  '% JSON',
  '% VPack',
  'Snappy',
  '% JSON',
  '% VPack',
  'ZLib',
  '% JSON',
  '% VPack',
);
$tb->load(@rows);
print $tb;


sub debug {
  return unless $ENV{DEBUG};
  print STDERR "[DEBUG] @_\n";
}

sub table_row {
  my ($b, $s, $v, $def, $none, $snap, $zlib) = @_;

  return [
    $b,
    $s,
    $v,
    $def,
    sprintf('%.2f%%', $def / $s * 100),
    sprintf('%.2f%%', $def / $v * 100),
    $none,
    sprintf('%.2f%%', $none / $s * 100),
    sprintf('%.2f%%', $none / $v * 100),
    $snap,
    sprintf('%.2f%%', $snap / $s * 100),
    sprintf('%.2f%%', $snap / $v * 100),
    $zlib,
    sprintf('%.2f%%', $zlib / $s * 100),
    sprintf('%.2f%%', $zlib / $v * 100)
  ];
}

## results.txt
Legend:

 File: name of file;
 JSON Size: size of original file JSON encoded
 VPack best: size of VPack encoding, best result from Performance.md from github repo;
 Defaults: Sereal encoder results, default settings;
 No Compr: Sereal encoder results, no compression + string dedup;
 Snappy: Sereal encoder results, Snappy compression + string dedup;
 ZLib: Sereal encoder results, Zlib compression (level 6, the Sereal default) + string dedup;

The % JSON is compared to JSON Size, and % VPack is comparison with VPack best. Below 100% is better.


File                JSON Size VPack best % JSON Defaults % JSON % VPack No Compr % JSON % VPack Snappy  % JSON % VPack ZLib    % JSON % VPack
api-docs.json       1205964    994160    82.44%  962926  79.85% 96.86%   908679  75.35% 91.40%   210957 17.49% 21.22%   114777 9.52%  11.55%
commits.json          25216     20789    82.44%    9732  38.59% 46.81%     9484  37.61% 45.62%     6365 25.24% 30.62%     4691 18.60% 22.56%
countries.json      1134029    956786    84.37%  585916  51.67% 61.24%   527862  46.55% 55.17%   323064 28.49% 33.77%   220710 19.46% 23.07%
directory-tree.json  297695    244716    82.20%  179021  60.14% 73.15%   168528  56.61% 68.87%    92232 30.98% 37.69%    64377 21.63% 26.31%
doubles-small.json   158706     89998    56.71%   89990  56.70% 99.99%    89990  56.70% 99.99%    80815 50.92% 89.80%    52183 32.88% 57.98%
doubles.json        1187062    899982    75.82%  899876  75.81% 99.99%   899876  75.81% 99.99%   804100 67.74% 89.35%   423361 35.66% 47.04%
file-list.json       151317    133536    88.25%  122334  80.85% 91.61%   111793  73.88% 83.72%    60120 39.73% 45.02%    40459 26.74% 30.30%
object.json          157781    118630    75.19%  118756  75.27% 100.11%  118756  75.27% 100.11%   87212 55.27% 73.52%    54979 34.85% 46.34%
pass1.json             1441       804    55.79%     806  55.93% 100.25%     806  55.93% 100.25%     806 55.93% 100.25%     806 55.93% 100.25%
pass2.json               52        51    98.08%      38  73.08% 74.51%       38  73.08% 74.51%       38 73.08% 74.51%       38 73.08% 74.51%
pass3.json              148       108    72.97%     110  74.32% 101.85%     110  74.32% 101.85%     110 74.32% 101.85%     110 74.32% 101.85%
random1.json           9672      6836    70.68%    6094  63.01% 89.15%     5863  60.62% 85.77%     4033 41.70% 59.00%     3096 32.01% 45.29%
random2.json           8239      5815    70.58%    5192  63.02% 89.29%     4981  60.46% 85.66%     3445 41.81% 59.24%     2694 32.70% 46.33%
random3.json          72953     51515    70.61%   45064  61.77% 87.48%    42271  57.94% 82.06%    25288 34.66% 49.09%    18224 24.98% 35.38%
sample.json          687491    153187    22.28%   98172  14.28% 64.09%    83121  12.09% 54.26%    83008 12.07% 54.19%    75831 11.03% 49.50%
small.json               82        30    36.59%      54  65.85% 180.00%      54  65.85% 180.00%      54 65.85% 180.00%      54 65.85% 180.00%
-- Total --         5097848   3676943    72.13% 3124081  61.28% 84.96%  2972212  58.30% 80.83%  1781647 34.95% 48.45%  1076390 21.11% 29.27%
	#!/usr/bin/env perl

	use strict;
	use JSON::XS;
	use Path::Tiny;
	use Sereal::Encoder qw( SRL_SNAPPY SRL_ZLIB SRL_UNCOMPRESSED );
	use Text::Table;

	die "Usage: json2sereal.pl <dir>\n\n Scans <dir> for .json files, converts to seral and compares sizes\n" unless @ARGV;

	my $enc_snappy = Sereal::Encoder->new({ compress => SRL_SNAPPY, dedupe_strings => 1 });
	my $enc_zlib = Sereal::Encoder->new({ compress => SRL_ZLIB, dedupe_strings => 1 });
	my $enc_none = Sereal::Encoder->new({ compress => SRL_UNCOMPRESSED, dedupe_strings => 1 });
	my $enc_def = Sereal::Encoder->new();

	my %best_vpack = (
	'api-docs.json' => 994160,
	'commits.json' => 20789,
	'countries.json' => 956786,
	'directory-tree.json' => 244716,
	'doubles.json' => 899982,
	'doubles-small.json' => 89998,
	'file-list.json' => 133536,
	'object.json' => 118630,
	'pass1.json' => 804,
	'pass2.json' => 51,
	'pass3.json' => 108,
	'random1.json' => 6836,
	'random2.json' => 5815,
	'random3.json' => 51515,
	'sample.json' => 153187,
	'small.json' => 30,
	);

	my $it = path(@ARGV)->iterator;

	my (@rows, %totals);
	while (my $f = $it->()) {
	my $b = $f->basename;
	next unless $f->is_file and $b =~ m/[.]json$/;

	my $c = eval { decode_json($f->slurp_raw) };
	debug("Skip file '$b', could not JSON-parse it: $@"), next unless defined $c;

	my $v = $best_vpack{$b};
	debug("Skip file '$b', no VPack comparison"), next unless $v;

	my $s = $f->stat->size;
	my ($def, $none, $snap, $zlib) = (
	length($enc_def->encode($c)), length($enc_none->encode($c)),
	length($enc_snappy->encode($c)), length($enc_zlib->encode($c))
	);

	$totals{json} += $s;
	$totals{vpack} += $v;
	$totals{def} += $def;
	$totals{none} += $none;
	$totals{snap} += $snap;
	$totals{zlib} += $zlib;

	push @rows, table_row($b, $s, $v, $def, $none, $snap, $zlib);
	}

	push @rows,
	table_row('-- Total --', $totals{json}, $totals{vpack}, $totals{def}, $totals{none}, $totals{snap}, $totals{zlib});

	my $tb = Text::Table->new(
	'File',
	'JSON Size',
	'VPack best',
	'Defaults',
	'% JSON',
	'% VPack',
	'No Compr',
	'% JSON',
	'% VPack',
	'Snappy',
	'% JSON',
	'% VPack',
	'ZLib',
	'% JSON',
	'% VPack',
	);
	$tb->load(@rows);
	print $tb;


	sub debug {
	return unless $ENV{DEBUG};
	print STDERR "[DEBUG] @_\n";
	}

	sub table_row {
	my ($b, $s, $v, $def, $none, $snap, $zlib) = @_;

	return [
	$b,
	$s,
	$v,
	$def,
	sprintf('%.2f%%', $def / $s * 100),
	sprintf('%.2f%%', $def / $v * 100),
	$none,
	sprintf('%.2f%%', $none / $s * 100),
	sprintf('%.2f%%', $none / $v * 100),
	$snap,
	sprintf('%.2f%%', $snap / $s * 100),
	sprintf('%.2f%%', $snap / $v * 100),
	$zlib,
	sprintf('%.2f%%', $zlib / $s * 100),
	sprintf('%.2f%%', $zlib / $v * 100)
	];
	}
	Legend:

	File: name of file;
	JSON Size: size of original file JSON encoded
	VPack best: size of VPack encoding, best result from Performance.md from github repo;
	Defaults: Sereal encoder results, default settings;
	No Compr: Sereal encoder results, no compression + string dedup;
	Snappy: Sereal encoder results, Snappy compression + string dedup;
	ZLib: Sereal encoder results, Zlib compression (level 6, the Sereal default) + string dedup;

	The % JSON is compared to JSON Size, and % VPack is comparison with VPack best. Below 100% is better.


	File JSON Size VPack best % JSON Defaults % JSON % VPack No Compr % JSON % VPack Snappy % JSON % VPack ZLib % JSON % VPack
	api-docs.json 1205964 994160 82.44% 962926 79.85% 96.86% 908679 75.35% 91.40% 210957 17.49% 21.22% 114777 9.52% 11.55%
	commits.json 25216 20789 82.44% 9732 38.59% 46.81% 9484 37.61% 45.62% 6365 25.24% 30.62% 4691 18.60% 22.56%
	countries.json 1134029 956786 84.37% 585916 51.67% 61.24% 527862 46.55% 55.17% 323064 28.49% 33.77% 220710 19.46% 23.07%
	directory-tree.json 297695 244716 82.20% 179021 60.14% 73.15% 168528 56.61% 68.87% 92232 30.98% 37.69% 64377 21.63% 26.31%
	doubles-small.json 158706 89998 56.71% 89990 56.70% 99.99% 89990 56.70% 99.99% 80815 50.92% 89.80% 52183 32.88% 57.98%
	doubles.json 1187062 899982 75.82% 899876 75.81% 99.99% 899876 75.81% 99.99% 804100 67.74% 89.35% 423361 35.66% 47.04%
	file-list.json 151317 133536 88.25% 122334 80.85% 91.61% 111793 73.88% 83.72% 60120 39.73% 45.02% 40459 26.74% 30.30%
	object.json 157781 118630 75.19% 118756 75.27% 100.11% 118756 75.27% 100.11% 87212 55.27% 73.52% 54979 34.85% 46.34%
	pass1.json 1441 804 55.79% 806 55.93% 100.25% 806 55.93% 100.25% 806 55.93% 100.25% 806 55.93% 100.25%
	pass2.json 52 51 98.08% 38 73.08% 74.51% 38 73.08% 74.51% 38 73.08% 74.51% 38 73.08% 74.51%
	pass3.json 148 108 72.97% 110 74.32% 101.85% 110 74.32% 101.85% 110 74.32% 101.85% 110 74.32% 101.85%
	random1.json 9672 6836 70.68% 6094 63.01% 89.15% 5863 60.62% 85.77% 4033 41.70% 59.00% 3096 32.01% 45.29%
	random2.json 8239 5815 70.58% 5192 63.02% 89.29% 4981 60.46% 85.66% 3445 41.81% 59.24% 2694 32.70% 46.33%
	random3.json 72953 51515 70.61% 45064 61.77% 87.48% 42271 57.94% 82.06% 25288 34.66% 49.09% 18224 24.98% 35.38%
	sample.json 687491 153187 22.28% 98172 14.28% 64.09% 83121 12.09% 54.26% 83008 12.07% 54.19% 75831 11.03% 49.50%
	small.json 82 30 36.59% 54 65.85% 180.00% 54 65.85% 180.00% 54 65.85% 180.00% 54 65.85% 180.00%
	-- Total -- 5097848 3676943 72.13% 3124081 61.28% 84.96% 2972212 58.30% 80.83% 1781647 34.95% 48.45% 1076390 21.11% 29.27%