FeepingCreature/youre_doing_json_apis_wrong.md

## youre_doing_json_apis_wrong.md

      
    Raw
  

              youre_doing_json_apis_wrong.md
            
          
    You are doing JSON APIs wrong.

When you use JSON to call an API - not a REST API, but something like JSON-RPC - you will usually want to encode
one of several possible messages.
Your request body looks like this:
{
    "type": "MessageWithA",
    "data": {
        "a": 5
    }
}

Or like this:
{
    "type": "MessageWithB",
    "b": 5
}

However, that's bad. Here's how you should do it:
{
    "messageWithA": {
        "a": 5
    }
}

Or
{
    "type": "MessageWithA",
    "data": {
        "messageWithA": {
            "a": 5
        }
    }
}

Why? Stream parsers.
Stream parsers?

There are two ways of processing a JSON message. The first is to read the message into a JSON object,
and then process the object inside-out.
This usually takes the form of
if (event["type"] == "MessageWithA") {
    handleAMessage(event["data"]);
}

However, this has the unavoidable overhead of allocating an object for every part of the JSON tree.
Especially if you are decoding into a well-typed internal data structure, you allocate this object just to throw it
away shortly after. This creates unnecessary memory overhead.
A faster way is with a stream parser. A stream parser lexes the input text into a stream of JSON tokens,
such as BeginObject, KeyString, BeginArray, String, String, Int, EndArray, EndObject for { "a": ["b", "c", 5]}.
These tokens are then consumed by a recursive parser, usually generated, that produces the internal data structure
directly. In other words, there never exists a recursive data structure for the JSON datagram.
This has the advantage of not requiring any allocation for values that we are not interested in. However, it also means
we cannot access "type"; rather, we have to react to "type" as we come across it in the input stream.
As a consequence, if MessageWithA and MessageWithB have a different format for "data", as they usually do, we have
to decode the message twice: Once, only decoding the "type" field, and then a second time, only decoding
the "data" field.
Without knowing the value of the type field, the data field is unparseable! The stream parser will not know which
recursive function it is supposed to call.
You may think that you can just read the "type" field first. But your message may look like this:
{
    "data": { .... "message": "severalMegabytesOfBase64Data" },
    "type": "VeryLargeMessage"
}

Now your stream parser has to skip over a large segment of the input text! Not only is this slow, it also
requires holding the entire message in memory. If we knew from the start what the type of every field was,
then we could consume the message in small chunks, improving cache efficiency as well as throughput.
Let's look at our alternative:
{
    "data": {
        "veryLargeMessage": { ... }
    },
    "type": "VeryLargeMessage"
}

This time, the JSON parser has it easy. There is no field in this message whose type is indeterminate. As a result,
without having to search for the "type" field, we can create an internal data structure for this message in one go.
We get an ambiguity, where the type of the "data" field may diverge from the "type" field, however, we can just
leave out "type":
{
    "veryLargeMessage": { ... }
}

In summary, one simple rule

The type of every JSON object field should be uniquely determined by its field name.