Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active October 8, 2018 04:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save CMCDragonkai/be28c350e47ccd942b3c2ea799f7d3a0 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/be28c350e47ccd942b3c2ea799f7d3a0 to your computer and use it in GitHub Desktop.
Dask Storing and Loading #dask #python
#!/usr/bin/env python3
import dask.array as da
# a dask array is a grid of individual numpy arrays
# this way you compute on out-of-memory numpy arrays
# this means dask exposes IO functions for dask arrays
# that is intended to work on collections of numpy arrays
arrs = da.asarray([1,2,3])
# persisting as a directory of numpy arrays
# it stores them into a directory as `0.npy` and `info` which keeps track of metadata
da.to_npy_stack('/tmp/np_stacks', arrs)
# when it loads it back, the data is loaded as memory mapped data
da.from_npy_stack('/tmp/np_stacks')
# storing as an HDF5
da.to_hdf5('tmp/arrs.hdf5', {'/arrs': arrs})
# loading it back HDF5 is more complicated
# this is because HDF5 is intended to be an entire filesystem in a single file
# https://cyrille.rossant.net/moving-away-hdf5/
# this means you need to pick and choose what inside the HDF5 file you want to load
import h5py
f = h5py.File('tmp/arrs.hdf5')
arrs = f['/arrs']
arrs = da.from_array(arrs, chunks=arrs.shape)
# storing into a memory mapped object
da.store()
# also dask now supports zarr
da.to_zarr()
da.from_zarr()
# it has native support for loading a series of images
import dask.array.image as dam
dam.imread('tmp/*.jpg')
# it is assumed that all the images have the same shape and dtype
# the chunk size is automatically set to the total size of the first image in a sorted glob it finds
# this is because not all image formats support random access and slicing/indexing (many don't like jpg and png)
# so you have to load the entire image first
# thus this is designed for loads of images that are all the same size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment