Skip to content

Instantly share code, notes, and snippets.

@lifthrasiir
Last active April 11, 2024 14:47
Show Gist options
  • Save lifthrasiir/323bb7e33f7f650a8937 to your computer and use it in GitHub Desktop.
Save lifthrasiir/323bb7e33f7f650a8937 to your computer and use it in GitHub Desktop.
Comparison of schemaless byte-oriented binary serialization format

What the heck?

  • Schemaless: The format does not have a knowledge about the underlying data at all.
  • Byte-oriented: The format is built upon a byte stream, probably for the ease of implementation and performance.
  • Binary: The format is not targeted for human consumption and specified in terms of "bytes" (which is 8 bits long for our purpose).
  • Serialization: The format is primarily to be used for storage and transmission, not for the in-memory representation.

Contenders

  • S-Expressions (1997), a "canonical" encoding (which is also used for transport)
  • Bencode (2001)
  • MsgPack (2008)
  • BSON (2009), element non-terminal (which is easier to directly compare than the top-level document)
  • Smile (2010)
  • UBJSON (2012)
  • CBOR (2013)

Non-contenders that are nevertheless worth mentioning:

  • Schema-dependent
  • Bit-oriented
  • Textual
    • JSON (formally specified in 2002)

Table

Binary notations:

  • 00 through ff for literal bytes
  • 0?, ?0 or ?? etc. for varying or unspecified nibbles (notes follow)
  • "foo" or 'foo' for ASCII representation of literal bytes (no escape sequence)
  • .X for single literal ASCII byte
  • (...) for omitted bytes; in particular, (NNN) with all digits NNN means NNN omitted bytes
  • {...}*NNN for NNN copies of ...
Data SExp. Bencode MsgPack BSON Smile UBJSON CBOR
Self Identification N/A N/A N/A N/A .: .) 0a 0? [1] N/A d9 d9 f7
Null/nil N/A N/A c0 0a (name) 21 .Z f6
Undefined/undef N/A N/A N/A 06 (name) [2] N/A N/A f7
False N/A N/A c2 08 (name) 00 22 .F f4
True N/A N/A c3 08 (name) 01 23 .T f5
0 N/A "i0e" 00 10 (name) 00 00 00 00 c0† [3] .i 00 00
1 N/A "i1e" 01 10 (name) 01 00 00 00 c2 .i 01 01
2 N/A "i2e" 02 10 (name) 02 00 00 00 c4 .i 02 02
3 N/A "i3e" 03 10 (name) 03 00 00 00 c6 .i 03 03
10 N/A "i10e" 0a 10 (name) 0a 00 00 00 d4 .i 0a 0a
15 N/A "i15e" 0f 10 (name) 0f 00 00 00 de .i 0f 0f
16 N/A "i16e" 10 10 (name) 10 00 00 00 20 a0 .i 10 10
23 N/A "i23e" 17 10 (name) 17 00 00 00 20 ae .i 17 17
24 N/A "i24e" 18 10 (name) 18 00 00 00 20 b0 .i 18 18 18
31 N/A "i31e" 1f 10 (name) 1f 00 00 00 20 be .i 1f 18 1f
32 N/A "i32e" 20 10 (name) 20 00 00 00 20 01 80† [4] .i 20 18 20
99 N/A "i99e" 63 10 (name) 63 00 00 00 20 03 86 .i 63 18 63
100 N/A "i100e" 64 10 (name) 64 00 00 00 20 03 88 .i 64 18 64
127 N/A "i127e" 7f 10 (name) 7f 00 00 00 20 03 be .i 7f 18 7f
128 N/A "i128e" cc 80 10 (name) 80 00 00 00 20 04 80 .U 80 18 80
255 N/A "i255e" cc ff 10 (name) ff 00 00 00 20 07 be .U ff 18 ff
256 N/A "i256e" cd 01 00 10 (name) 00 01 00 00 20 08 80 .I 01 00 19 01 00
999 N/A "i999e" cd 03 e7 10 (name) e7 03 00 00 20 1f 8e .I 03 e7 19 03 e7
1000 N/A "i1000e" cd 03 e8 10 (name) e8 03 00 00 20 1f 90 .I 03 e8 19 03 e8
4095 N/A "i4095e" cd 0f ff 10 (name) ff 0f 00 00 20 7f be .I 0f ff 19 0f ff
4096 N/A "i4096e" cd 10 00 10 (name) 00 10 00 00 20 01 00 80 .I 10 00 19 10 00
32767 N/A "i32767e" cd 7f ff 10 (name) ff 7f 00 00 20 07 7f be .I 7f ff 19 7f ff
32768 N/A "i32768e" cd 80 00 10 (name) 00 80 00 00 20 08 00 80 .l 00 00 80 00 19 80 00
65535 N/A "i65535e" cd ff ff 10 (name) ff ff 00 00 20 0f 7f be .l 00 00 ff ff 19 ff ff
65536 N/A "i65536e" ce 00 01 00 00 10 (name) 00 00 01 00 20 10 00 80 .l 00 01 00 00 1a 00 01 00 00
524287 N/A "i524287e" ce 00 07 ff ff 10 (name) ff ff 07 00 20 7f 7f be .l 00 07 ff ff 1a 00 07 ff ff
524288 N/A "i524288e" ce 00 08 00 00 10 (name) 00 00 08 00 20 01 00 00 80 .l 00 08 00 00 1a 00 08 00 00
231-1 N/A "i2147483647e" ce 7f ff ff ff 10 (name) ff ff ff 7f 20 1f 7f 7f 7f be .l 7f ff ff ff 1a 7f ff ff ff
231 N/A "i2147483648e" ce 80 00 00 00 12 (name) 00 00 00 80 00 00 00 00 20 20 00 00 00 80 .L 00 00 00 00 80 00 00 00 1a 80 00 00 00
232-1 N/A "i4294967295e" ce ff ff ff ff 12 (name) ff ff ff ff 00 00 00 00 20 3f 7f 7f 7f be .L 00 00 00 00 ff ff ff ff 1a ff ff ff ff
232 N/A "i4294967296e" cf 00 00 00 01 00 00 00 00 12 (name) 00 00 00 00 01 00 00 00 21 40 00 00 00 80 .L 00 00 00 01 00 00 00 00 1b 00 00 00 01 00 00 00 00
263-1 N/A "i9223372036854775807e" cf 7f ff ff ff ff ff ff ff 12 (name) ff ff ff ff ff ff ff 7f 21 03 7f 7f 7f 7f 7f 7f 7f 7f be .L 7f ff ff ff ff ff ff ff 1b 7f ff ff ff ff ff ff ff
263 N/A "i9223372036854775808e" cf 80 00 00 00 00 00 00 00 N/A 22 89 00 20 00 00 00 00 00 00 00 00 00 [5] .H .i 13 "9223372036854775808" 1b 80 00 00 00 00 00 00 00
264-1 N/A "i18446744073709551615e" cf ff ff ff ff ff ff ff ff N/A 22 89 00 3f 7f 7f 7f 7f 7f 7f 7f 7f 03 .H .i 14 "18446744073709551615" 1b ff ff ff ff ff ff ff ff
264 N/A "i18446744073709551616e" N/A N/A 22 89 00 40 00 00 00 00 00 00 00 00 00 .H .i 14 "18446744073709551616" N/A
350 N/A "i717897987691852588770249e"   N/A N/A 22 8b 00 26 00 55 1f 43 36 2f 68 27 3c 3c 09 .H .i 18 "717897987691852588770249" N/A
3500 N/A "i36360" (229) "10001e" N/A N/A 22 01 a4 00 59 2b (109) 6e 44 01 .H .U ef "36360" (229) "10001" N/A
35000 N/A "i40389" (2376) "00001e" N/A N/A 22 0f 9f 0e 06 34 (1127) 0d 7a 01 .H .I 09 52 "40389" (2376) "00001" N/A
-1 N/A "i-1e" ff 10 (name) ff ff ff ff c1 .i ff 20
-2 N/A "i-2e" fe 10 (name) fe ff ff ff c3 .i fe 21
-3 N/A "i-3e" fd 10 (name) fd ff ff ff c5 .i fd 22
-16 N/A "i-16e" f0 10 (name) f0 ff ff ff df .i f0 2f
-17 N/A "i-17e" ef 10 (name) ef ff ff ff 20 a1 .i ef 30
-24 N/A "i-24e" e8 10 (name) e8 ff ff ff 20 af .i e8 37
-25 N/A "i-25e" e7 10 (name) e7 ff ff ff 20 b1 .i e7 38 18
-32 N/A "i-32e" e0 10 (name) e0 ff ff ff 20 bf .i e0 38 1f
-33 N/A "i-33e" d0 df 10 (name) df ff ff ff 20 01 81 .i df 38 20
-100 N/A "i-100e" d0 9c 10 (name) 9c ff ff ff 20 03 87 .i 9c 38 63
-128 N/A "i-128e" d0 80 10 (name) 80 ff ff ff 20 03 bf .i 80 38 7f
-129 N/A "i-129e" d1 ff 7f 10 (name) 7f ff ff ff 20 04 81 .I ff 7f 38 80
-256 N/A "i-256e" d1 ff 00 10 (name) 00 ff ff ff 20 07 bf .I ff 00 38 ff
-257 N/A "i-257e" d1 fe ff 10 (name) ff fe ff ff 20 08 81 .I fe ff 39 01 00
-32768 N/A "i-32768e" d1 80 00 10 (name) 00 80 ff ff 20 07 7f bf .I 80 00 39 7f ff
-32769 N/A "i-32769e" d2 ff ff 7f ff 10 (name) ff 7f ff ff 20 08 00 81 .l ff ff 7f ff 39 80 00
-65536 N/A "i-65536e" d2 ff ff 00 00 10 (name) 00 00 ff ff 20 0f 7f bf .l ff ff 00 00 39 ff ff
-65537 N/A "i-65537e" d2 ff fe ff ff 10 (name) ff ff fe ff 20 10 00 81 .l ff fe ff ff 3a 00 01 00 00
-231 N/A "i-2147483648e" d2 80 00 00 00 10 (name) 00 00 00 80 20 1f 7f 7f 7f bf .l 80 00 00 00 3a 7f ff ff ff
-231-1 N/A "i-2147483649e" d3 ff ff ff ff 7f ff ff ff 12 (name) ff ff ff 7f ff ff ff ff 21 20 00 00 00 81 .L ff ff ff ff 7f ff ff ff 3a 80 00 00 00
-232 N/A "i-4294967296e" d3 ff ff ff ff 00 00 00 00 12 (name) 00 00 00 00 ff ff ff ff 21 3f 7f 7f 7f bf .L ff ff ff ff 00 00 00 00 3a ff ff ff ff
-232-1 N/A "i-4294967297e" d3 ff ff ff fe ff ff ff ff 12 (name) ff ff ff ff fe ff ff ff 21 40 00 00 00 81 .L ff ff ff fe ff ff ff ff 3b 00 00 00 01 00 00 00 00
-263 N/A "i-9223372036854775808e" d3 80 00 00 00 00 00 00 00 12 (name) 00 00 00 00 00 00 00 80 21 03 7f 7f 7f 7f 7f 7f 7f 7f bf .L 80 00 00 00 00 00 00 00 3b 7f ff ff ff ff ff ff ff
-263-1 N/A "i-9223372036854775809e" N/A N/A 22 89 7f 5f 7f 7f 7f 7f 7f 7f 7f 03 .H .i 14 "-9223372036854775809" 3b 80 00 00 00 00 00 00 00
-264 N/A "i-18446744073709551616e" N/A N/A 22 89 7f 40 00 00 00 00 00 00 00 00 .H .i 15 "-18446744073709551616" 3b ff ff ff ff ff ff ff ff
-264-1 N/A "i-18446744073709551617e" N/A N/A 22 89 7f 3f 7f 7f 7f 7f 7f 7f 7f 03 .H .i 15 "-18446744073709551617" N/A
-350 N/A "i-717897987691852588770249e" N/A N/A 22 8b 7f 59 7f 2a 60 3c 49 50 17 58 43 43 07 .H .i 19 "-717897987691852588770249" N/A
-3500 N/A "i-36360" (229) "10001e" N/A N/A 22 01 a4 7f 26 54 (109) 11 3b 03 .H .U f0 "-36360" (229) "10001" N/A
-35000 N/A "i-40389" (2376) "00001e" N/A N/A 22 0f 9f 71 79 4b (1127) 72 05 0f .H .I 09 53 "-40389" (2376) "00001" N/A
0.0 N/A N/A ca 00 00 00 00 01 (name) 00 00 00 00 00 00 00 00 28 00 00 00 00 .d 00 00 00 00 f9 00 00
-0.0 N/A N/A ca 80 00 00 00 01 (name) 00 00 00 00 00 00 00 80 28 80 00 00 00 .d 80 00 00 00 f9 80 00
1.0 N/A N/A ca 3f 80 00 00 01 (name) 00 00 00 00 00 00 f0 3f 28 3f 80 00 00 .d 3f 80 00 00 f9 3c 00
1.5 N/A N/A ca 3f c0 00 00 01 (name) 00 00 00 00 00 00 f8 3f 28 3f c0 00 00 .d 3f c0 00 00 f9 3e 00
65504.0 N/A N/A ca 47 7f e0 00 01 (name) 00 00 00 00 00 fc ef 40 28 47 7f e0 00 .d 47 7f e0 00 f9 7b ff
100000.0 N/A N/A ca 47 c3 50 00 01 (name) 00 00 00 00 00 6a f8 40 28 47 c3 50 00 .d 47 c3 50 00 fa 47 c3 50 00
3.4028235e+38 N/A N/A ca 7f 7f ff ff 01 (name) f8 af 4d e5 ff ff ef 47 28 7f 7f ff ff .d 7f 7f ff ff fa 7f 7f ff ff
-1.1 (approx.) N/A N/A cb bf f1 99 99 99 99 99 9a 01 (name) 9a 99 99 99 99 ff f1 bf 29 bf f1 99 99 99 99 99 9a .D bf f1 99 99 99 99 99 9a fb bf f1 99 99 99 99 99 9a
1.0e+300 (approx.) N/A N/A cb 7e 37 e4 3c 88 00 75 9c 01 (name) 9c 75 00 88 3c e4 37 7e 29 7e 37 e4 3c 88 00 75 9c .D 7e 37 e4 3c 88 00 75 9c fb 7e 37 e4 3c 88 00 75 9c
-1.1 (exact) N/A N/A N/A N/A 22 81 82 05 01† [6] .H .i 04 "-1.1" N/A
1.55000 (exact) N/A N/A N/A N/A 22 4e 88 26 8a 57 4d 5c (2785) 5d 0e 01 .H .I 16 f9 "28595" (5872) "90625" N/A
Infinity N/A N/A ca 7f 80 00 00 01 (name) 00 00 00 00 00 00 f0 7f 28 7f 80 00 00 N/A [8] f9 7c 00
-Infinity N/A N/A ca ff 80 00 00 01 (name) 00 00 00 00 00 00 f0 ff 28 ff 80 00 00 N/A [8] f9 fc 00
NaN N/A N/A ca ff c0 00 00 01 (name) 00 00 00 00 00 00 f8 ff 28 ff c0 00 00 N/A [8] f9 fe 00
Empty bytes [9] "0:" [10] "0:" c4 00 05 (name) 00 00 00 00 ?? [11] e8 00 .[ .$ .U .# .i 00 40
Bytes 00 "1:" 00 "1:" 00 c4 01 00 05 (name) 01 00 00 00 ?? 00 e8 01 00 00 [7] .[ .$ .U .# .i 01 00 41 00
Bytes ff "1:" ff "1:" ff c4 01 ff 05 (name) 01 00 00 00 ?? ff e8 01 7f 01 .[ .$ .U .# .i 01 ff 41 ff
Bytes 01 02 03 "3:" 01 02 03 "3:" 01 02 03 c4 03 01 02 03 05 (name) 03 00 00 00 ?? 01 02 03 e8 03 00 40 40 03 .[ .$ .U .# .i 03 01 02 03 43 01 02 03
Bytes {55}*23 "23:" {55}*23 "23:" {55}*23 c4 17 {55}*23 05 (name) 17 00 00 00 ?? {55}*23 e8 17 {2a 55}*13 01 .[ .$ .U .# .i 17 {55}*23 57 {55}*23
Bytes {55}*24 "24:" {55}*24 "24:" {55}*24 c4 18 {55}*24 05 (name) 18 00 00 00 ?? {55}*24 e8 18 {2a 55}*13 2a 05 .[ .$ .U .# .i 18 {55}*24 58 18 {55}*24
Bytes {55}*63 "63:" {55}*63 "63:" {55}*63 c4 3f {55}*63 05 (name) 3f 00 00 00 ?? {55}*63 e8 3f {2a 55}*36 .[ .$ .U .# .i 3f {55}*63 58 3f {55}*63
Bytes {55}*64 "64:" {55}*64 "64:" {55}*64 c4 40 {55}*64 05 (name) 40 00 00 00 ?? {55}*64 e8 01 80 {2a 55}*36 2a 01 .[ .$ .U .# .i 40 {55}*64 58 40 {55}*64
Bytes {55}*127 "127:" {55}*127 "127:" {55}*127 c4 7f {55}*127 05 (name) 7f 00 00 00 ?? {55}*127 e8 01 bf {2a 55}*72 2a 01 .[ .$ .U .# .i 7f {55}*127 58 7f {55}*127
Bytes {55}*128 "128:" {55}*128 "128:" {55}*128 c4 80 {55}*128 05 (name) 80 00 00 00 ?? {55}*128 e8 02 80 {2a 55}*72 2a 55 01 .[ .$ .U .# .U 80 {55}*128 58 80 {55}*128
Bytes {55}*255 "255:" {55}*255 "255:" {55}*255 c4 ff {55}*255 05 (name) ff 00 00 00 ?? {55}*255 e8 03 bf {2a 55}*145 2a 05 .[ .$ .U .# .U ff {55}*255 58 ff {55}*255
Bytes {55}*256 "256:" {55}*256 "256:" {55}*256 c5 01 00 {55}*256 05 (name) 00 01 00 00 ?? {55}*256 e8 04 80 {2a 55}*145 2a 55 05 .[ .$ .U .# .I 01 00 {55}*256 58 ff {55}*256

† Not a unique representation but representative.

Notes:

  1. The 4th byte of Smile's self-identification is an OR of feature bits, 1=shared property name optimization in use (a subset of shared string optimization), 2=shared string optimization in use, 4=raw binary in use.
  2. undefined in BSON is deprecated.
  3. "Zigzag" encoding for signed integers: the encoded number 0, 1, 2, 3, 4, 5, ... maps to the actual number 0, -1, 1, -2, 2, -3, ....
  4. Variable-length (VInt) encoding for any integers: bit patterns of (0xxxxxxx)* 10yyyyyy decodes into (xxxxxxx)* yyyyyy. The shortest representation is not enforced, so an arbitrary number of zero bytes can be prepended.
  5. Vint-encoded length followed by the 8-bit-free encoding of BigInteger#toByteArray. [7]
  6. Vint-encoded scale (as in BigDecimal#scale) followed by unscaled BigInteger value from BigDecimal#unscaledValue. The BigInteger is encoded as in [5].
  7. The 8-bit-free encoding of binary bytes. Basically 7 raw bytes encode into 8 encoded bytes, so all but last bytes aaaaaaaa bbbbbbbb cccccccc ... are encoded as 0aaaaaaa 0abbbbbb 0bbccccc .... When the last byte needed to be padded, it is left-padded: aaaaaaaa bbbbbbbb encodes into 0aaaaaaa 0abbbbbb 000000bb (as opposed to 0bb00000).
  8. JSON does not let Infinity or NaN, so does UBJSON.
  9. If there is no explicit distinction between binary bytes and Unicode strings, all string-like representation is assumed to be binary bytes.
  10. All bytes can be prepended with .[ (bytes) .] to give a hint on the type of binary.
  11. The subtype byte may be used to determine the type of binary. The default is 00.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment