Skip to content

Instantly share code, notes, and snippets.

@lihaoyi
Created March 12, 2018 03:50
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lihaoyi/5ae1d6b544d65fb8190534e8c13b8de7 to your computer and use it in GitHub Desktop.
Save lihaoyi/5ae1d6b544d65fb8190534e8c13b8de7 to your computer and use it in GitHub Desktop.
AST-free JSON parsing

Provisional benchmarks of AST-free serialization puts my WIP branch of uPickle about ~40% faster than circe on my current set of ad-hoc benchmarks, if the encoders/decoders are cached (bigger numbers is better)

playJson Read 2761067
playJson Write 3412630
circe Read 6005895
circe Write 5205007
upickleDefault Read 4543628
upickleDefault Write 3814459
upickleLegacy Read 8393416
upickleLegacy Write 7431523

Circe is still significantly faster in the case where encoders/decoders are not cached, but I assume I just need to spend a bit of time micro-optimizing the encoder/decoder instantiation code and it's not a fundamental limitation (and more time optimizing should help the cached-encoder benchmark as well)

playJson Read 1975992
playJson Write 2811139
circe Read 4701980
circe Write 4252224
upickleDefault Read 2724334
upickleDefault Write 2443416
upickleLegacy Read 3142672
upickleLegacy Write 2878934

Jackson-module-scala is not included in the benchmarks because I couldn't figure out how to stop it from corrupting my data structure after being serialized/deserialized.

Note that in that branch, String -> Case Class and Case Class -> String are both AST-free; my upickle Readers simply implement jawn.Facade, and the upickle Writers effectively extend jawn.Facade => Unit, and so actual definition of reader/writer instances for various types looks pretty similar to what you would see if you pattern matched over the AST (Reader example, Writer example) but it can be driven directly by the parser without any intermediate AST being constructed

The patched version of jawn.Facade also gives you workflows like Case Class => Case Class, String => String (e.g. re-formatting your JSON), AST => Case Class, Case Class => AST, String => AST, AST => String all basically for free, also without any intermediate JSON AST

@marekzebrowski
Copy link

Fantastic!
This is what I was looking for - in Scala, going through AST does not make sense in many use-cases, especially for write part.
One thing that would be event better for the future (if the concept proves successful and there is an uptake) - target Array[Byte] instead of String to reduce overhead further. I would be nice to use directly in Akka targetting akka.util.ByteString

@lihaoyi
Copy link
Author

lihaoyi commented Mar 12, 2018

String above is a simplification and not entirely accurate. uPickle's forked non/jawn backend can parse from any bytebuffer or file (not sure why it doesn't support arbitrary InputStream), and it can write directly to any OutputStream

@travisbrown
Copy link

travisbrown commented Mar 12, 2018

This looks great! I'll be curious to see how it compares to circe-algebra, which at least makes it possible to write an interpreter for circe decoders that doesn't require instantiating any AST (although I haven't actually done that yet).

In the meantime, a slightly fairer comparison would be against circe-derivation, which avoids the runtime overhead of going through Shapeless's generic representation (in addition to the AST). It's a drop-in replacement for io.circe.generic.semiauto, but when I tried changing the deps and imports here I got a bunch of compilation errors in codegen-ed code.

(Update: I was using the sbt build instead of mill (which works)—will try to get circe-derivation working here later today.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment