Skip to content

Instantly share code, notes, and snippets.

View jcrist's full-sized avatar

Jim Crist-Harif jcrist

View GitHub Profile
@jcrist
jcrist / bench.py
Last active January 28, 2024 11:59
Vaex String benchmarks, updated with dask fixes
import vaex
import numpy as np
import dask.dataframe as dd
import dask
import dask.distributed
import json
import os
import time
import argparse
import multiprocessing
@jcrist
jcrist / bench.py
Last active January 11, 2024 14:34
A quick benchmark comparing msgspec (https://github.com/jcrist/msgspec), pydantic v1, and pydantic v2
"""A quick benchmark comparing the performance of:
- msgspec: https://github.com/jcrist/msgspec
- pydantic V1: https://docs.pydantic.dev/1.10/
- pydantic V2: https://docs.pydantic.dev/dev-v2/
The benchmark is modified from the one in the msgspec repo here:
https://github.com/jcrist/msgspec/blob/main/benchmarks/bench_validation.py
I make no claims that it's illustrative of all use cases. I wrote this up
@jcrist
jcrist / bench.py
Created November 28, 2023 21:47
A quick benchmark of msgspec vs mashumaro to clear up misconceptions in a flyte issue
import sys
import importlib.metadata
import timeit
from dataclasses import dataclass
import msgspec
import orjson
from mashumaro.codecs.json import JSONEncoder, JSONDecoder
from mashumaro.codecs.orjson import ORJSONEncoder, ORJSONDecoder
@jcrist
jcrist / pypi.ipynb
Created August 2, 2023 20:34
Analyzing PyPI data with Ibis
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jcrist
jcrist / altair_and_ibis.ipynb
Created August 1, 2023 16:58
A quick notebook demoing plotting in altair with ibis
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jcrist
jcrist / benchmark.py
Last active July 11, 2023 16:05
Benchmark of msgspec, orjson, pydantic, ... taken from Python discord
# This is a modified version of `orig_benchmark.py`, using different data to
# highlight performance differences.
import json
import random
import string
import timeit
from statistics import mean, stdev
import orjson
import simdjson
@jcrist
jcrist / msgspec_geojson.py
Last active July 8, 2023 06:03
A simple implementation of GeoJSON using msgspec
"""
A simple implementation of GeoJSON (RFC 7946) using msgspec
(https://jcristharif.com/msgspec/) for parsing and validation.
The `loads` and `dumps` methods work like normal `json.loads`/`json.dumps`,
but:
- Will result in high-level GeoJSON types
- Will error nicely if a field is missing or the wrong type
- Will fill in default values for optional fields
@jcrist
jcrist / example_msgspec.py
Created February 17, 2023 22:09
An example of using `msgspec`, mirroring the examples at https://github.com/ArjanCodes/2023-attrs
from datetime import date
from enum import StrEnum, auto
from typing import Annotated
from msgspec import Struct, Meta
class OrderStatus(StrEnum):
OPEN = auto()
CLOSED = auto()
@jcrist
jcrist / bench_init.py
Created January 23, 2023 17:14
A benchmark comparing init performance of various dataclass-like libraries
"""A quick benchmark comparing how quickly `__init__` with default values runs
for various dataclass-like libraries.
We also compare against the time it takes to initialize a `dict` or `tuple`
with the same data, as a "low-bar" for pure-python implementations.
"""
import timeit
import attrs
@jcrist
jcrist / bench.py
Created January 31, 2022 20:58
A (naive) benchmark comparing pydantic & msgspec performance
"""
This benchmark is a modified version of the benchmark available at
https://github.com/samuelcolvin/pydantic/tree/master/benchmarks to support
benchmarking msgspec.
The benchmark measures the time to JSON encode/decode `n` random objects
matching a specific schema. It compares the time required for both
serialization _and_ schema validation.
"""