Skip to content

Instantly share code, notes, and snippets.

@jcrist
Created January 23, 2023 17:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jcrist/9bfe44f60533225d5f8383791f2fe734 to your computer and use it in GitHub Desktop.
Save jcrist/9bfe44f60533225d5f8383791f2fe734 to your computer and use it in GitHub Desktop.
A benchmark comparing init performance of various dataclass-like libraries
"""A quick benchmark comparing how quickly `__init__` with default values runs
for various dataclass-like libraries.
We also compare against the time it takes to initialize a `dict` or `tuple`
with the same data, as a "low-bar" for pure-python implementations.
"""
import timeit
import attrs
import dataclasses
import msgspec
import pydantic
@attrs.define
class BenchAttrs:
a: int = 0
b: str = ""
c: list[int] = attrs.field(factory=list)
d: set[int] = attrs.field(factory=set)
e: dict[str, str] = attrs.field(factory=dict)
@dataclasses.dataclass
class BenchDataclass:
a: int = 0
b: str = ""
c: list[int] = dataclasses.field(default_factory=list)
d: set[int] = dataclasses.field(default_factory=set)
e: dict[str, str] = dataclasses.field(default_factory=dict)
class BenchMsgspec(msgspec.Struct):
a: int = 0
b: str = ""
c: list[int] = msgspec.field(default_factory=list)
d: set[int] = msgspec.field(default_factory=set)
e: dict[str, str] = msgspec.field(default_factory=dict)
class BenchPydantic(pydantic.BaseModel):
a: int = 0
b: str = ""
c: list[int] = pydantic.Field(default_factory=list)
d: set[int] = pydantic.Field(default_factory=set)
e: dict[str, str] = pydantic.Field(default_factory=dict)
def bench_dict():
return {"a": 0, "b": "", "c": [], "d": set(), "e": {}}
def bench_tuple():
return (0, "", [], set(), {})
def bench(cls):
timer = timeit.Timer("cls()", globals={"cls": cls})
n, t = timer.autorange()
return t / n
def main():
results = sorted(
[
("attrs", bench(BenchAttrs)),
("dataclasses", bench(BenchDataclass)),
("msgspec", bench(BenchMsgspec)),
("pydantic", bench(BenchPydantic)),
("dict", bench(bench_dict)),
("tuple", bench(bench_tuple)),
],
key=lambda r: r[1]
)
max_name_length = max(len(n) for n, _ in results)
fastest = results[0][1]
for i, (name, t) in enumerate(results):
# Format time output
if t < 1e-6:
time = f"{t * 1e9:.1f} ns"
elif t < 1e-3:
time = f"{t * 1e6:.1f} μs"
else:
time = f"{t * 1e3:.1f} ms"
padding = " " * (max_name_length - len(name))
if i == 0:
print(f"{name}:{padding} {time}")
else:
print(f"{name}:{padding} {time} ({t / fastest:.1f}x slower)")
if __name__ == "__main__":
main()
@jcrist
Copy link
Author

jcrist commented Jan 23, 2023

Results:

$ python bench_init.py 
msgspec:     95.0 ns
tuple:       108.6 ns (1.1x slower)
dict:        180.5 ns (1.9x slower)
attrs:       316.5 ns (3.3x slower)
dataclasses: 371.3 ns (3.9x slower)
pydantic:    1.8 μs (19.2x slower)
  • msgspec structs initialize as fast (or slightly faster) than allocating a tuple with the same data, and almost 2x faster than allocating a dict with the same data. Compared to other python operations, allocating a new Struct instance is very cheap.
  • attrs slightly edges out dataclasses for performance in this benchmark. I haven't dug too much into the internals of either to know why.
  • pydantic brings up the rear with measurable slowdown compared to other libraries (20x slower than msgspec)

That said, all of these operations are pretty speedy. Unless __init__ for these objects is showing up in profiling results, these operations likely aren't the bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment