Skip to content

Instantly share code, notes, and snippets.

@klahrich
Created June 6, 2020 14:37
Show Gist options
  • Save klahrich/a66f8cdcb50def88467be8cec4af0bb0 to your computer and use it in GitHub Desktop.
Save klahrich/a66f8cdcb50def88467be8cec4af0bb0 to your computer and use it in GitHub Desktop.
from glob import glob
# Step 1: export to hdf5 chunks
for i, chunk in enumerate([pd.read_csv(file) for file in glob('folder/path')]):
df_chunk = vaex.from_pandas(chunk, copy_index=False)
df_chunk.export_hdf5(f'bigfile_part_{i}.hdf5')
df = vaex.open('bigfile_part_*.hdf5')
# Step 2: Combine back into one big hdf5 file
df.export_hdf5('bigfile.hdf5')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment