How to record data from Python fast, if pickle is too slow
- JSON xxxxxx
- jq is pretty fast
- Serde for rust level
- nb: use json lines
- CSV xxxx
- use pickle anyway xxx
- plus compression
- python-pickle in Haskell
- stream your processing so you don’t load it all
- SQLite xxx
- use batched inserts, wal, synchronous=normal, temp=memory and a large mmap
- hdf5 xxx
- but it’s tabular
- arrow xxx
- parquet xx
- but it’s tabular
- sstables xx
- for minimizing disk size
- duckdb x
- pandas x
- vaex x
- zarr x
- numpy x
Side note: consider using compression!