Skip to content

Instantly share code, notes, and snippets.

  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save yaowenqiang/c24a1db1cafe517b0a7e8caac4cced5e to your computer and use it in GitHub Desktop.
You Are Doing JSON APIs Wrong

You are doing JSON APIs wrong.

When you use JSON to call an API - not a REST API, but something like JSON-RPC - you will usually want to encode one of several possible messages.

Your request body looks like this:

{
    "type": "MessageWithA",
    "data": {
        "a": 5
    }
}

Or like this:

{
    "type": "MessageWithB",
    "b": 5
}

However, that's bad. Here's how you should do it:

{
    "messageWithA": {
        "a": 5
    }
}

Or

{
    "type": "MessageWithA",
    "data": {
        "messageWithA": {
            "a": 5
        }
    }
}

Why? Stream parsers.

Stream parsers?

There are two ways of processing a JSON message. The first is to read the message into a JSON object, and then process the object inside-out. This usually takes the form of

if (event["type"] == "MessageWithA") {
    handleAMessage(event["data"]);
}

However, this has the unavoidable overhead of allocating an object for every part of the JSON tree. Especially if you are decoding into a well-typed internal data structure, you allocate this object just to throw it away shortly after. This creates unnecessary memory overhead.

A faster way is with a stream parser. A stream parser lexes the input text into a stream of JSON tokens, such as BeginObject, KeyString, BeginArray, String, String, Int, EndArray, EndObject for { "a": ["b", "c", 5]}. These tokens are then consumed by a recursive parser, usually generated, that produces the internal data structure directly. In other words, there never exists a recursive data structure for the JSON datagram.

This has the advantage of not requiring any allocation for values that we are not interested in. However, it also means we cannot access "type"; rather, we have to react to "type" as we come across it in the input stream.

As a consequence, if MessageWithA and MessageWithB have a different format for "data", as they usually do, we have to decode the message twice: Once, only decoding the "type" field, and then a second time, only decoding the "data" field.

Without knowing the value of the type field, the data field is unparseable! The stream parser will not know which recursive function it is supposed to call.

You may think that you can just read the "type" field first. But your message may look like this:

{
    "data": { .... "message": "severalMegabytesOfBase64Data" },
    "type": "VeryLargeMessage"
}

Now your stream parser has to skip over a large segment of the input text! Not only is this slow, it also requires holding the entire message in memory. If we knew from the start what the type of every field was, then we could consume the message in small chunks, improving cache efficiency as well as throughput.

Let's look at our alternative:

{
    "data": {
        "veryLargeMessage": { ... }
    },
    "type": "VeryLargeMessage"
}

This time, the JSON parser has it easy. There is no field in this message whose type is indeterminate. As a result, without having to search for the "type" field, we can create an internal data structure for this message in one go. We get an ambiguity, where the type of the "data" field may diverge from the "type" field, however, we can just leave out "type":

{
    "veryLargeMessage": { ... }
}

In summary, one simple rule

The type of every JSON object field should be uniquely determined by its field name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment