Skip to content

Instantly share code, notes, and snippets.

@kriszyp
Last active November 17, 2021 03:40
Show Gist options
  • Save kriszyp/b623b85d2dc25ac9e3b07d8f39df9307 to your computer and use it in GitHub Desktop.
Save kriszyp/b623b85d2dc25ac9e3b07d8f39df9307 to your computer and use it in GitHub Desktop.
This is a comparison of different CBOR encoding techniques including packing and the proposed record structure tag (and combining both) with cbor-x JS library, and how affect size and performance.

The cbor-x's packed implementation only packs whole strings that occur multiple times, it does not search for repeated prefixes or postfixes, as they would almost certainly be vastly more expensive. Strings are packed if they occur multiples in a data structures. When using packed + record tags, strings as keys are not searched for string repetition (since it assumed repetition will mostly be eliminated by the structure reuse).

The table shows encoded size for each technique, and the encoding and decoding performance. The last column also includes the gzipped size for comparison sake (no gzip performance, but generally is about 2-4x slower with gzipping in my tests). The table compares plain CBOR encoding, packed, record structures with a 1+1 definition tag and 1+2 tag, and the combination of packed and record structures.

The first comparison test uses an 8KB JSON data structure from our database of medical studies, that has a fairly complicated and dynamic structure: https://github.com/kriszyp/cbor-x/blob/master/tests/example4.json

Method size encode/sec decode/sec gzip size
CBOR 6376 140000 99900 2308
CBOR Packed 4734 37300 103800 2456
CBOR with record tags (1+1) 5227 105000 113000 2425
CBOR with record tags (1+2) 5243 105000 113000 2429
CBOR Packed + records 4515 48000 110400 2440
CBOR with stringrefs 5138 99000 101600

The second comparison test uses an 25KB JSON data structure from Twitter's example response from their search API, which is much more homogenous and repetitive in structure: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets

Method size encode/sec decode/sec gzip size
CBOR 12213 76000 54000 3000
CBOR Packed 6795 23000 63000 3260
CBOR with record tags (1+1) 7633 82000 62000 3081
CBOR with record tags (1+2) 7643 80000 62000 3084
CBOR Packed + records 6008 39000 62000 3076
CBOR with stringrefs 7295 65000 63000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment