Skip to content

Instantly share code, notes, and snippets.

@kgriffs
Created May 21, 2021 23:05
Show Gist options
  • Save kgriffs/e34364896ae2f329aa8016ec6f71fdce to your computer and use it in GitHub Desktop.
Save kgriffs/e34364896ae2f329aa8016ec6f71fdce to your computer and use it in GitHub Desktop.
Benchmarking Python JSON libs: std vs. orjson vs. simdjson - simple encode/decode tests
# ----------------------------------------------------------------------------
import json
import simdjson
import libpy_simdjson
decoder_std = json.JSONDecoder()
encoder_std = json.JSONEncoder(ensure_ascii=False)
p = simdjson.Parser()
p2 = libpy_simdjson.Parser()
%timeit json.loads('{"x": 54}')
%timeit json.dumps({'x': 54}, ensure_ascii=False)
%timeit json.dumps(json.loads('{"x": 54}'), ensure_ascii=False)
2.01 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.45 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
6.79 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Same as above, but using PyPy3.7-7.3.4 instead of CPython 3.8
%timeit json.loads('{"x": 54}')
%timeit json.dumps({'x': 54}, ensure_ascii=False)
%timeit json.dumps(json.loads('{"x": 54}'), ensure_ascii=False)
311 ns ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
807 ns ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.24 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit simdjson.loads('{"x": 54}') # Returns a standard dict
%timeit simdjson.dumps(simdjson.loads('{"x": 54}'), ensure_ascii=False)
3.27 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.79 µs ± 530 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit p.parse(b'{"x": 54}') # simdjson.Object
%timeit p.parse(b'{"x": 54}').as_dict()
%timeit p.parse(b'{"x": 54}').mini # encoded JSON as a string
1.67 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
2.32 µs ± 97.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.9 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# NOTE: libpy_simdjson seems to be incomplete; there is no encoder support
# to get back to a JSON string.
%timeit libpy_simdjson.loads(b'{"x": 54}') # libpy_simdjson.Object
%timeit libpy_simdjson.loads(b'{"x": 54}').as_dict() # does not decode strings
1.25 µs ± 61.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.41 µs ± 46.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit orjson.loads('{"x": 54}') # Returns a standard dict
%timeit orjson.dumps(orjson.loads('{"x": 54}')) # Returns bytes
%timeit orjson.dumps(orjson.loads('{"x": 54}')).decode()
249 ns ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
490 ns ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
626 ns ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# ----------------------------------------------------------------------------
import json
import simdjson
import libpy_simdjson
decoder_std = json.JSONDecoder()
encoder_std = json.JSONEncoder(ensure_ascii=False)
p = simdjson.Parser()
# This is a json object with several attributes, including an array of objects
# having several additional objects, from a real-world API.
print(len(test_json_doc))
6021
test_json_dict = json.loads(test_json_doc)
%timeit json.loads(test_json_doc)
%timeit json.dumps(test_json_dict, ensure_ascii=False)
%timeit json.dumps(json.loads(test_json_doc), ensure_ascii=False)
46 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
55.2 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
107 µs ± 5.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# Same as above, but using PyPy3.7-7.3.4 instead of CPython 3.8
%timeit json.loads(test_json_doc)
%timeit json.dumps(test_json_dict, ensure_ascii=False)
%timeit json.dumps(json.loads(test_json_doc), ensure_ascii=False)
10 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
21 µs ± 1.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
34.2 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit decoder_std.decode(test_json_doc)
%timeit encoder_std.encode(test_json_dict)
%timeit encoder_std.encode(decoder_std.decode(test_json_doc)) # equiv. to .mini
43.5 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
104 µs ± 5.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit simdjson.loads(test_json_doc) # Returns a standard dict
%timeit simdjson.dumps(test_json_dict, ensure_ascii=False)
%timeit simdjson.dumps(simdjson.loads(test_json_doc), ensure_ascii=False)
33.3 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
53.1 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
91.3 µs ± 4.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# NOTE: It would seem that pass-through (using .mini) combined with only
# needing to get a subset of fields is the ideal use case for simdjson,
# although simdjson.loads(test_json_doc) is still competitive with
# orjson.
#
# On the other hand, if all you need to do is serialize to JSON, orjson
# is clearly the winner with this particular sample document.
%timeit p.parse(test_json_doc_bytes) # simdjson.Object
%timeit p.parse(test_json_doc_bytes).as_dict()
%timeit p.parse(test_json_doc_bytes).mini # encoded JSON as a string
4.7 µs ± 62.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
34.7 µs ± 3.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
13.1 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# NOTE: libpy_simdjson seems to be incomplete; there is no encoder support
# to get back to a JSON string.
%timeit libpy_simdjson.loads(test_json_doc_bytes) # libpy_simdjson.Object
%timeit libpy_simdjson.loads(test_json_doc_bytes).as_dict() # does not decode strings
1.29 µs ± 54.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.38 µs ± 56.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit orjson.loads(test_json_doc) # Returns a standard dict
%timeit orjson.dumps(test_json_dict) # Returns bytes
%timeit orjson.dumps(test_json_dict).decode()
%timeit orjson.dumps(orjson.loads(test_json_doc)) # Returns bytes
%timeit orjson.dumps(orjson.loads(test_json_doc)).decode()
33.6 µs ± 2.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
8.18 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.89 µs ± 278 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
40.2 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
44.6 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
@phoerious
Copy link

phoerious commented May 24, 2022

pysimdjson / cysimdjson are by far the fastest if you only need to parse documents and access a few individual keys (> 2x faster than orjson). As soon as you convert the simdjson result to a dict or iterate over all keys, orjson is the faster option. simdjson.dumps() basically does that, since it's only an alias to json.dumps(), so it's no surprise that orjson is faster in this benchmark. simdjson is primarily a parser, not a serializer, so measuring its serialization speed is not a good performance comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment