Skip to content

Instantly share code, notes, and snippets.

@klahrich
Last active June 5, 2020 14:39
Show Gist options
  • Save klahrich/5f2ced571e4b908b1ae5ef2dfce15c3e to your computer and use it in GitHub Desktop.
Save klahrich/5f2ced571e4b908b1ae5ef2dfce15c3e to your computer and use it in GitHub Desktop.
# Step 1: export to hdf5 chunks
for i, chunk in enumerate(pd.read_csv('bigfile.csv', chunksize=1_000_000)):
df_chunk = vaex.from_pandas(chunk, copy_index=False)
df_chunk.export_hdf5(f'bigfile_part_{i}.hdf5')
df = vaex.open('bigfile_part_*.hdf5')
# Step 2: Combine back into one big hdf5 file
df.export_hdf5('bigfile.hdf5')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment