Last active
October 8, 2018 04:05
-
-
Save CMCDragonkai/be28c350e47ccd942b3c2ea799f7d3a0 to your computer and use it in GitHub Desktop.
Dask Storing and Loading #dask #python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import dask.array as da | |
# a dask array is a grid of individual numpy arrays | |
# this way you compute on out-of-memory numpy arrays | |
# this means dask exposes IO functions for dask arrays | |
# that is intended to work on collections of numpy arrays | |
arrs = da.asarray([1,2,3]) | |
# persisting as a directory of numpy arrays | |
# it stores them into a directory as `0.npy` and `info` which keeps track of metadata | |
da.to_npy_stack('/tmp/np_stacks', arrs) | |
# when it loads it back, the data is loaded as memory mapped data | |
da.from_npy_stack('/tmp/np_stacks') | |
# storing as an HDF5 | |
da.to_hdf5('tmp/arrs.hdf5', {'/arrs': arrs}) | |
# loading it back HDF5 is more complicated | |
# this is because HDF5 is intended to be an entire filesystem in a single file | |
# https://cyrille.rossant.net/moving-away-hdf5/ | |
# this means you need to pick and choose what inside the HDF5 file you want to load | |
import h5py | |
f = h5py.File('tmp/arrs.hdf5') | |
arrs = f['/arrs'] | |
arrs = da.from_array(arrs, chunks=arrs.shape) | |
# storing into a memory mapped object | |
da.store() | |
# also dask now supports zarr | |
da.to_zarr() | |
da.from_zarr() | |
# it has native support for loading a series of images | |
import dask.array.image as dam | |
dam.imread('tmp/*.jpg') | |
# it is assumed that all the images have the same shape and dtype | |
# the chunk size is automatically set to the total size of the first image in a sorted glob it finds | |
# this is because not all image formats support random access and slicing/indexing (many don't like jpg and png) | |
# so you have to load the entire image first | |
# thus this is designed for loads of images that are all the same size |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment