Skip to content

Instantly share code, notes, and snippets.

@mortie
Last active July 10, 2022 22:24
Show Gist options
  • Save mortie/b9b5a758214101b4238bb65c6fdb0687 to your computer and use it in GitHub Desktop.
Save mortie/b9b5a758214101b4238bb65c6fdb0687 to your computer and use it in GitHub Desktop.
JCOF: JSON-like Compact Object Format

JCOF: JSON-like Compact Object Format

A more efficient way to represent JSON-style objects.

About

JCOF tries to be a slot-in replacement for JSON, with most of the same semantics, but with a much more compact representation of objects. The main way it does this is to introduce a string table at the beginning of the object, and then replace all strings with indexes into that string table. It also employs a few extra tricks to make objects as small as possible, without losing the most important benefits of JSON. Most importantly, it remains a text-based, schemaless format.

The following JSON object:

{
	"people":[
		{"name":"Bob", "age":32, "occupation":"Plumber", "married":true},
		{"name":"Alice", "age":28, "occupation":"Programmer", "married":true},
		{"name":"Bernard", "age":36, "occupation":null, "married":false}
	]
}

could be represented as the following JCOF object:

people,name,age,occupation,married,Bob,Plumber,Alice,Programmer,Bernard;
{
	0:[
		{1:s5, 2:32, 3:s6, 4:b},
		{1:s7, 2:28, 3:s8, 4:b},
		{1:s9, 2:36, 3:n, 4:B}
	]
}

Minimized, the JSON is 203 bytes, while the JCOF is 139 bytes. And since each of the person objects in the "people" array is smaller, the difference becomes even bigger; at the price of a fixed-size table of keys (28 bytes), we save 32 bytes for every person object by encoding the key names more efficiently.

This probably doesn't make sense for the kind of use case where you pass small JSON objects around, such as REST APIs. But it might make sense for something like a serialization format for a game or web app or similar.

Rationale

I was making a JSON-based serialization format for a game I was working on, but found myself making trade-offs between space efficiency and descriptive key names, so decided to make a format which makes that a non-issue. Maybe others find it useful.

The format

Here's the BNF which describse JCOF:

document ::= header ';' value
header ::= (string (',' string)*)?

string ::= [a-zA-Z0-9]+ | json-string

value ::=
  array-literal |
  object-literal |
  string-reference |
  number-literal |
  bool-literal |
  null-literal

array-literal ::= '[' (value (',' value)*)? ']'
object-literal ::= '{' (key-value (',' key-value)*)? '}'
string-reference ::= 's' base62
number-literal ::=
  'i' base62 |
  'I' base62 |
  float
bool-literal ::= 'b' | 'B'
null-literal ::= 'n'

key-value ::= base62 ':' value
rbase62 ::= [0-9a-zA-Z]
float ::= '-'? [0-9]+ ('.' [0-9]+)?
json-string ::= [https://datatracker.ietf.org/doc/html/rfc8259#section-7]

The relevant concepts to understand are:

The header

All JCOF objects start with a header, which is a list of strings separated by a comma. All object keys are indexes into the header, written in base62 encoding.

Base62

Base62 encoding just refers to writing integer numbers in base 62 rather than base 10. This lets us use 0-9, a-z and A-Z as digits.

Values

A value can be:

  • An array literal: [, followed by 0 or more values, followed by ]
  • An object literal: {, followed by 0 or more key-value pairs, followed by }
    • A key-value pair is a base62 index into the header, followed by a :, followed by a value
  • A string reference: s followed by a base62 index into the header
  • A number literal:
    • i followed by a base62 number: A positive integer
    • I followed by a base62 number: A negative integer
    • A floating point number written in decimal, with an optional fractional part
  • A bool literal:
    • b: true
    • B: false
  • A null literal: n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment