Skip to content

Instantly share code, notes, and snippets.

@gansanay
Last active June 26, 2023 12:25
Show Gist options
  • Save gansanay/4514ec731da1a40d8811a2b3c313f836 to your computer and use it in GitHub Desktop.
Save gansanay/4514ec731da1a40d8811a2b3c313f836 to your computer and use it in GitHub Desktop.
Compare HDF5 and Feather performance (speed, file size) for storing / reading pandas dataframes
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Davidmenamm
Copy link

Good analysis, thanks!

@fizban99
Copy link

As of 2022, to_feather compresses data by default with lz4. Using hdf5 with blosc:lz4 complevel 5 reaches a similar compression ratio. If you add strings into the mix, the superiority of feather is not that clear with big dataframes, specially in reading times. See modified version at https://github.com/fizban99/hdf_vs_feather/blob/main/hdf_vs_feather.ipynb

@Qoo0607
Copy link

Qoo0607 commented Aug 13, 2022

Great analysis, thanks for your sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment