Skip to content

Instantly share code, notes, and snippets.

@nbren12
Created May 31, 2020 20:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nbren12/057016069e1d68f40cf659a6fbb3c61f to your computer and use it in GitHub Desktop.
Save nbren12/057016069e1d68f40cf659a6fbb3c61f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rabernat
Copy link

rabernat commented Jun 1, 2020

@nbren - sorry if I came off as overly critical today. It's really an excellent post about an important use case. Sometimes I have to wear the hat of Pangeo PR manager, which means I'm [overly] sensitive to the language used to describe the things we are working on.

Here are a few specific examples of how you might rephrase a few sentences in a more optimistic way. "Fail" IMO in particular is a very strong word that should be used sparingly, as it carries quite negative connotations.

- In my opinion, we have failed to take this concrete step...
+ In my opinion, we have not yet been able to take this concrete step...
- ...but can fail for larger datasets
+ ...but can perform poorly with larger datasets

One particular problem is that dask reshaping, the tool used by xarray's stacking routines, will not respect the boundaries of individual chunks.

Here I would definitely link out to dask/dask#5544. I believe we can really get this fixed.

More thoughts soon...

@rabernat
Copy link

rabernat commented Jun 2, 2020

I thought carefully about the Zarr issue, and I think you're absolutely right that Zarr is not the right fit for your output data. My comments about potential support for uneven chunks are not really relevant. We should absolutely be using Parquet for this. Perhaps you could go one step further and actually use Parquet in the post?

In order to do this, we would just have to convert to a dask dataframe. Have you tried that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment