Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Why do I have to write JSON decoders in Elm?

A vision for data interchange in Elm

How do you send information between clients and servers? What format should that information be in? What happens when the server changes the format, but the client has not been updated yet? What happens when the server changes the format, but the database cannot be updated?

These are difficult questions. It is not just about picking a format, but rather picking a format that can evolve as your application evolves.

Literature Review

By now there are many approaches to communicating between client and server. These approaches tend to be known within specific companies and language communities, but the techniques do not cross borders. I will outline JSON, ProtoBuf, and GraphQL here so we can learn from them all.

JSON

Within the JavaScript community, it is very common to use the JSON format for data interchange.

Pros:

  • It integrates perfectly with JavaScript.
  • It integrates decently with dynamicly languages like Ruby and Python.
  • It is often human readable. Maybe you need to prettify though.

Cons:

  • It does not fit very naturally with typed languages.
  • It is not very dense. Field names like "price" can easily be repeated hundreds or thousands of times in a single message.
  • No guidance on how to evolve as your client and server change. Just change the JSON and write some tests. Everyone just kind of makes it up it seems.
  • No guidance on how to efficiently store information in your database. The database person will deal with that.

ProtoBuf

When I was at Google, we used protobuf for everything. You can think of it as an effort to do better than XML, which shares many of JSON’s weakness.

Pros:

  • Protobuf is designed to be as dense as possible. By specifying the format in a .proto file beforehand, you can send bits without any price annotations. Just the data.

  • Protobuf has good support for union types. The latest version (proto3) supports oneof, making union types pretty easy to represent.

  • Protobuf is designed to evolve with your application. There were two major rules about protobufs when I was at Google. (1) You never remove an existing field, and (2) every field you is optional. Together these rules guarantee that the data format is always fully backwards compatible. You can only ever add optional fields. A client can check if fields X and Y are there. If so, great! If not, maybe the server is not updated yet. This decouples the code for data exchange from the code for backwards compatibility.

  • Protobuf facillicates use of different programming languages. You compile .proto files to C++, Java, JavaScript, etc. that can unpack protobufs into nice data structures within the language. If you decide to revamp your client or server, you do not need to revamp the data interchange format to make things feel nice.

  • Within Google, databases were set up to use protobufs. So you actually had the same format between client, server, and database. I do not think this pro exists outside of Google right now, but it at least points to a theory of how server and database can work together better.

Cons:

  • You must specify the format explicitly up front, rather than doing it implicitly every time you write data interchange code on the client or server. I include this here mainly because I can see why people would feel this way. I personally feel that planning ahead saves time within a matter of hours, but other people see the delayed payoff differently.

GraphQL

Facebook introduced GraphQL in the last few years. You can think of this as an effort to address the question of how the data interchange format can evolve as your client and server changes.

Pros:

  • It lets you delay database optimization. Client-side developers can just say what data they need, rather than asking database engineers to add specific queries for them. Database folks can then observe traffic patterns and optimize based on that, focusing their work on problem areas more effectively.

  • The data extracted is JSON. That means it works well when you use JavaScript, Flow, TypeScript, and other things like that.

  • The format permits “query optimization” so you can send fewer requests. Basically, if you have two GraphQL queries, you can combine them into one. This means less time doing handshakes with a server and less data sent across if the two queries needed any overlapping data.

  • You can optimize request size. Because the requests are all specified in your .graphql file, you can choose a more dense representation for certain requests. As long as the client and server both think 1 is the same thing, that is all you need to say.

  • You can support multiple languages. Because you have .graphql files, you can generate code in JavaScript, Python, Ruby, Elixir, or whatever else. Changing to a new language does not require redesigning your data interchange format.

Cons:

  • I do not personally know if it pushes you to design for an evolving application. It may be that there is a culture of “no required fields” and “never remove fields” like with protobuf, but I do not know the details for sure. This should be a pro if the community of users embraces that approach in practice!

Lessons

  1. Big projects use .format files. This decouples the data interchange format from the particulars of the client or server.

  2. Big projects care about asset size. When you are serving lots of data to lots of people, a couple bits here and there really add up.

  3. Decoupling the data interchange format from what people do with it is useful. For Google that was about backwards compatibility, and for Facebook it is about making sure client-side work is not blocked by server folks.

Lessons for Elm

For some reason we think JSON is a thing that is fine. It is not a great choice on basically every metric that matters to building reliable, efficient, and flexible applications. It seems unreasonable to think that folks will be using JSON on purpose in 10 or 20 years.

At this moment in the Elm ecosystem, folks can be quite upset that you have to write JSON decoders by hand. Rather than questioning the idea of using JSON at all, they wonder if Elm itself should be generating JSON decoders. But the history of building in support for a particular interchange format is pretty poor. For example, Java assumed that XML was the way to go. Scala followed in that tradition and actually has XML syntax as part of its parser, so XML is valid Scala. I do not think it is easy to argue that this direction was wise, and I know there are efforts to get this out of Scala.

In contrast, the success stories of protobuf and GraphQL are all around having a specific .format file and generating code in whatever language you need. This does not require any specific language support, so languages do not get tied to interchange formats and vice-versa.

I hope that rather than lamenting that Elm lacks language features that have historically turned out very poorly, folks can start getting into the world of data interchange formats. There is a rich history here, and many options. Going this route will mean better interop with other languages, smaller assets sent across the wire, no need to deal with JSON by hand, and no need to clutter the language.

jhrcek commented Oct 19, 2017

Insightful read Evan. Thanks for that! I'm 100% with you about not baking JSON support into the Elm language.
Two typos (just ctrl+f these): "It integrates decently with dynamicly languages" and "(2) every field you is optional."

antew commented Oct 19, 2017

I'm new to the Elm community, but I've been a developer for many years, and I really appreciate that you put these discussions out in the open and foster a great community around the language. Thank you!

I wholeheartedly agree that JSON is an imperfect format, and better data interchange formats exist, but the decision to use JSON may not lie with the front-end Elm developer. REST web services delivering JSON are the most popular form of web service, and for good or ill they will probably remain popular because they are popular.

Twenty years ago, nobody thought people would be using Javascript in 20 years, it's terribly inconsistent, it was written in around 10 days, in, I assume, a coffee induced fugue state. Javascript won because when developers showed up to the party on the World Wide Web it was waiting at the door with a rose and a crooked smile. It should not be as popular as it is, by any measure, except that they embraced Worse is Better, and they were first and worst.

Coming from other stacks I can take a WSDL and generate a client, or I can document my API with Swagger and generate libraries for a bunch of languages. I think Swagger is a great solution for packaging up REST APIs for different languages/environments, and could be a good project for the community.

Whether the future is protobuf's or something else, there is going to be a transition period where people will have to interop with JSON APIs, and making it easy reduces friction for new developers, and lets you develop things more quickly.

I keep wondering why are you spending so much time and effort explaining things like this. We understand why you make the choice you did, it's not the issue and never had been.

Regarding JSON, simply put I think the community wants a way to generate them, it does not matter if it's via a language feature (like macros) or as part of the language or anything else. It has to been integrated into the language somehow, using a service like json-to-elm is helpful but hard to find and copy-pasting when something is changed is tedious.

I hope that rather than lamenting that Elm lacks language features that have historically turned out very poorly, folks can start getting into the world of data interchange formats.

People who are starting to use Elm are using JSON because you chose it from the possibilities you mentioned. You have support for it in the core package, generating decoders automatically for the ports and it's the basic way for JS interop (in events). It's not feasible to use anything else right now.

About generating decoders: it is already implemented just not exposed to the public, so you can see why people are lamenting.

I'm again using the Crystal language as an example. It's a typed language has a really nice way of handling JSON and generating decoders and encoders for data: https://crystal-lang.org/api/0.23.1/JSON.html#mapping%28properties%2Cstrict%3Dfalse%29-macro.

To me these documents are justification for yourself rather than the community.

Another example of how to deal with de-/encoding in a nice way without marrying the language to a given format comes from Rust. In Rust, the (serde)[https://serde.rs/#data-formats] library is the defacto solution for that problem and only requires that you annotate your type definition to tell the compiler to execute the Serialize or Deserialize macro. And if the type content names don't match the JSON, then this can also be customized.

prozacchiwawa commented Oct 19, 2017

For a different perspective on this, I find elm's json decoders more concise and easier to reason about than competing methods in strictly typed languages. I've run into enough situations involving symmetric encoders and decoders that I wrote http://package.elm-lang.org/packages/prozacchiwawa/elm-json-codec/1.0.0/JsonCodec , While not perfect, the most common uses result in both code and cognitive load savings for me, so I think there's a case to be made for better symmetry in the core library.

I've also written a bit about my approach to json decoding in elm ( https://medium.com/@prozacchiwawa/the-im-stupid-elm-language-nugget-16-295f201eb458 ). What I might request from whatever the json decoder turns into is to give the user the ability to construct the error value, with access to the literal failed value.

import Json.Decode.Err as JDE

type DecodeError = NameField Value | PhoneField Value -- Error type for decoder reports
nameDec = JDE.string (always NameField) -- JDE.Decoder DecodeError String
-- ...
userDec = JD.map2 (,) nameDec phoneDec  -- JD.Decoder (String,String)
type alias UserError = { e : DecodeError, v : Value }
userErrorDec = JDE.map2 (,) UserError nameDec phoneDec -- JDE.Decoder UserError (String,String)

JDE.decodeString """[...,{"name":"foo"},...]""" (JDE.list userErrorDec identity)
-- Err { e = PhoneField JD.undefined, v = {"name":"foo"} }

This would both allow for proper, relevant messaging and make it easier to identify failing values in potentially large collections.

Elm's type system basically requires code generation to make any convenient use of external data interchange formats convenient (recently, I made a basic wrapper for ethereum's JsonRPC and it's very clunky to use in its current form). Whenever I come to a point where I think about generating elm code, though, I generally back off as it's usually simple enough to just bang things out by hand even if it takes an hour or two. Elm feels a bit stuck in this area, as if it came down to generating javascript or C with a small script, I'd gladly do it.

Something that holds me up on generating elm code, possibly irrationally, is that I feel like the need to do whitespace formatting on generated elm isn't worth it... It'd be nice if there was a way for the elm compiler to accept a compatible non-whitespace-significant language so I could just write something fast and crappy to spit out some code.

G4BB3R commented Oct 19, 2017

I think Decoders are format agnostic, so we could use automatic Decoder generator with no problems. In the future Decoders could be for another format, the automatic generator would still work.

boxed commented Oct 19, 2017

Agree 100%. But I do think we need to get work done in the meantime. Which is why I keep suggesting code generation for now.

We’ve been developing a production app with GraphQL query files + Elm code generation and it has been a joy. There is a huge side benefit here: our build system can run that code generation whenever the backend is deployed to test if it breaks compilation of the front end (changed/missing field types) and the front end deploys run the generation against the prod api and halt if the live scheme can’t satisfy the client. So we have some form of “type safe deployment” of front end / backend

sgwilym commented Oct 19, 2017

I can confirm that GraphQL pushes you to design around an evolving schema: when I have talked to FB employees about how to maintain a growing schema, they have told me that they have never removed fields from their schema, ever. We’ve done the same where I work and it has served us well.

Also, fields can only be required in the sense that you must query something from an object type, but you can't stipulate that you must always query the name field from a User, for instance.

@leebyron, one of the main GraphQL team members, explicitly recommends add-only GraphQL schemas here: facebook/graphql#134 (comment).

There is a bit of info on versioning here as well: facebook/graphql#175 (comment).

OlegIlyenko commented Oct 19, 2017

Great post! A small clarification regarding this point (GraphQL section):

The data extracted is JSON. That means it works well when you use JavaScript, Flow, TypeScript, and other things like that.

GraphQL actually does not require a specific serialization format (though "JSON is the preferred serialization format"):

GraphQL does not require a specific serialization format. However, clients should use a serialization format that supports the major primitives in the GraphQL response. In particular, the serialization format must support representations of the following four primitives: Map, List, String, Null

spacejam commented Oct 19, 2017

JSON is table stakes. While it does not excel in parsing performance, it has the massive benefit of being human readable. Its size is not quite as big of a deal as you hint at because of typical gzip compression ratios (at the cost of some CPU, which is rarely the bottleneck on things talking to browsers). Too few large projects consider debuggability, to the madness of anyone who gets paged at 4am to figure out what's happening in a broken system. As you hint at in the GraphQL section, performance is not the primary constraint for most things, even at scale. Further, just because protobufs support schema evolution if you follow a set of guidelines, there's little stopping you from doing this with JSON (although some language-specific implementations may make this more cumbersome than others when optional fields are not present). As much as I love compact framed binary protocols for getting a high score on some benchmark I'm paying attention to, very rarely does it beat the human-interface and compatibility benefits of JSON given the relevant trade-offs of the system at hand.

omouse commented Oct 19, 2017

nice overview! personal opinion is that ProtoBuf is the way to go, I've heard talk of using it in a few companies I've been at because inevitably, someone wants to use language and maybe we need to provide 3rd parties with a way to use our API and .... etc. etc.

just for historical purposes, RDF and its companion SPARQL (a graph query language) also were data interchange formats.

OlegIlyenko commented Oct 19, 2017

I think it is also important to point out that it is not necessarily a mutually exclusive choice between human-readable string-based format and efficient binary format. I think Amazon Ion is a good example of a format that combines both of these aspects:

Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse. The rich type system provides unambiguous semantics for long-term preservation of business data which can survive multiple generations of software evolution.

andys8 commented Oct 19, 2017

Regarding the XML syntax in Scala: Very interesting. It seems to be decoupled since 2.11.

I would also like to maybe pitch http://jsonapi.org/ here as a thing worth looking at, at least!

hashemi commented Oct 20, 2017

Add Swift Encoders as another possible source of inspiration.

Briefly, they use code generation to add generic encoders/decoders to the data types (opt-in). You then write a serializer for JSON or anything else you want.

kevinSuttle commented Oct 20, 2017

Don't forget Rich Hickey's EDN and the rest.

JSON also comes in several flavors: JSON5, JSON-LD, etc.

@steveklabnik - What do you think the advantages of JSON API over something like GraphQL are?

laczoka commented Oct 20, 2017

Hi @evancz,

interesting writeup.

I am not sure "No guidance on how to efficiently store information in your database. The database person will deal with that." is a cons for any data interchange format. Data interchange happens over the wire crossing language/system boundaries and the objectives and trade-offs are quite different from a (database) serialisation format.

Aside from different performance concerns (size, etc), a generally useful data interchange format needs to be extensible and self-describing. ProtoBuf is really neither.

Great examples for generally useful data interchange formats are Transit (from Cognitect) and AMZ Ion that have been designed with performance in mind.

I think the Cognitect guys produced a great analysis of the available formats JSON, Avro, XML, ProtoBuf and has shown what these formats are lacking in order to be generally useful as a data interchange format. I can't find this resource right now, but if you do a shout out, I am sure they will be able to provide it to you.

leshow commented Oct 23, 2017

@hashemi That appears to require ad hoc polymorphism, which Elm doesn't have support for. AFAIK there's no way to say in Elm, this is some type that implements the interface/class/trait/protocol/whatever, which is what Encodable is doing there.

I don't really see what JSON in particular has to do with any of this. The problem, I thought, with Elm's JSON enc/dec is that you have to manually type everything out, because there is no way to define a generic encoder/decoder. So now you're left with the choice of having to bake it as magic into the language. It's a consequence of the lack of generic expressiveness.

cmckni3 commented Oct 29, 2017

What do you think about Hypermedia?

danny-andrews commented Nov 3, 2017

Preface: I agree with you assessment of interchange formats and think we should strive for better alternatives to JSON.

folks can be quite upset that you have to write JSON decoders by hand

Yes. Because it's painful and error-prone.

I hope that rather than lamenting that Elm lacks language features that have historically turned out very poorly, folks can start getting into the world of data interchange formats.

You seem to be perpetrating the X-Y problem here. Elm users are upset by the problem of how painful it is to consume JSON in their applications, not that the solution doesn't involve baking JSON into the language. They don't care how the problem is solved, they just want a better path forward.

Also, how does "getting into the world of data interchange formats" help people solve this problem? Seems kind of out of touch to suggest that front-end devs wield the power to change the interchange formats all their backend APIs use.

Here's a possible solution: generate JSON decoders based on json-schema? It's quite expressive, and many shops already use it to document their APIs.

Philosophically good, but in practice makes it harder to work with Elm. I just use Elm in business logic portions of application and have other JS based code work with APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment