Skip to content

Instantly share code, notes, and snippets.

@martindurant
Last active January 18, 2017 15:57
Show Gist options
  • Save martindurant/dc27a072da47fab8d63117488f1fd7f1 to your computer and use it in GitHub Desktop.
Save martindurant/dc27a072da47fab8d63117488f1fd7f1 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@martindurant
Copy link
Author

@shoyer , @alimanfoo
Since I had been working on fastparquet as standard storage for tabular data, I am also thinking about a standard format for array data for dask. netCDF and HDF are good legacy archival formats, but don't play nicely with parallel access across a cluster or from an archive store like s3. zarr is certainly non-standard, but would make a very nice internal store for intermediates. This gist is a simple motivator that we could use zarr not only for dask but for xarray too without too much expenditure of effort.

@martindurant
Copy link
Author

@mrocklin, updated with some real data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment