At $work, we are looking to replace JSON encoding with another format, to increase encode/decode speed and required storage size.
Requirements, in order of importance for our use case, YMMV:
- no schema requirement: data is JSON-compatible, deeply nested in cases, but we don't have a schema to start from;
- smallest size: we store the objects in memory on Redis DB's, so size is the main factor;
- fast decode: we can trade slower encode speed for size, but decode should be fast;
- language support: stack is Perl, Go, and JavaScript. PHP is a plus, but not required.
We are testing msgpack, cbor, sereal, and others, but here I wanted to compare just sereal (the current forerunner) with the new VPack from ArangoDB project.
We used the sample files from VPack project tests/jsonSample/, and I took the best results for VPack from the Performance.md file, last column, VPack-c. Please note: we are only comparing size at the moment (it was enough for our use case, where size is more important, YMMV)
Please don't make this a "mine is better" competition, this is based on our criteria for our use case
If you find a bug on our methodology I would appreciate a note here or on @pedromelo.
Not that I want to start a battle "mine is smaller" or so, but for the sake of completeness we have added two more columns to our performance table, where we have taken the compact VPack version and run "gzip -9" and snappy compression respectively. This now allows a sensible comparison of compressed sereal with compressed VelocyPack.
See https://github.com/arangodb/velocypack/blob/master/Performance.md for details.
The reason why we have not put in compression into the VPack format itself is that for us the main advantage of VPack is that one can quickly access subvalues without parsing or deserialization. This is of course no longer possible after compression. On the other hand, if the aim is only compact storage, then one can easily put compression on top of VPack outside of the format specification.