Skip to content

Instantly share code, notes, and snippets.

@amotl
Created February 27, 2023 21:36
Show Gist options
  • Save amotl/45819df4a635a79a32540935cc811e95 to your computer and use it in GitHub Desktop.
Save amotl/45819df4a635a79a32540935cc811e95 to your computer and use it in GitHub Desktop.
Lazily access ECMWF's ERA5 data via xarray from Planetary Computer's SpatioTemporal Asset Catalog (STAC) API. The data is physically stored in Zarr format on Azure Blob Filesystem (ABFS).
"""
About
=====
Lazily access ECMWF's ERA5 data via xarray from Planetary Computer's
SpatioTemporal Asset Catalog (STAC) API. The data is physically stored in Zarr
format on Azure Blob Filesystem (ABFS).
Setup
=====
pip install adlfs dask planetary-computer pystac-client xarray zarr
References
==========
- https://github.com/earthobservations/wetterdienst/issues/871#issuecomment-1447045874
- https://planetarycomputer.microsoft.com/dataset/era5-pds
- https://github.com/davemlz/cubo
"""
import pystac_client
import planetary_computer
import xarray as xr
from pystac import ItemCollection, Item
def stac_open_collection(name: str, **kwargs) -> ItemCollection:
catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1/")
search = catalog.search(collections=[name], **kwargs)
return search.item_collection()
def stacitem_to_dataset(item: Item) -> xr.Dataset:
signed_item = planetary_computer.sign(item)
datasets = [
xr.open_dataset(asset.href, **asset.extra_fields["xarray:open_kwargs"]) for asset in signed_item.assets.values()
]
return xr.combine_by_coords(datasets, join="exact")
if __name__ == "__main__":
collection = stac_open_collection("era5-pds", datetime="1980-01")
ds = stacitem_to_dataset(collection[0])
print(ds)
@amotl
Copy link
Author

amotl commented Feb 27, 2023

The shape of the xarray.Dataset looks like this:

<xarray.Dataset>
Dimensions:                                                                                   (
                                                                                               time: 744,
                                                                                               lat: 721,
                                                                                               lon: 1440,
                                                                                               nv: 2)
Coordinates:
  * lat                                                                                       (lat) float32 ...
  * lon                                                                                       (lon) float32 ...
  * time                                                                                      (time) datetime64[ns] ...
Dimensions without coordinates: nv
Data variables:
    air_temperature_at_2_metres_1hour_Maximum                                                 (time, lat, lon) float32 dask.array<chunksize=(372, 150, 150), meta=np.ndarray>
    time1_bounds                                                                              (time, nv) datetime64[ns] dask.array<chunksize=(744, 2), meta=np.ndarray>
    air_temperature_at_2_metres_1hour_Minimum                                                 (time, lat, lon) float32 dask.array<chunksize=(372, 150, 150), meta=np.ndarray>
    integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation  (time, lat, lon) float32 dask.array<chunksize=(372, 150, 150), meta=np.ndarray>
    precipitation_amount_1hour_Accumulation                                                   (time, lat, lon) float32 dask.array<chunksize=(372, 150, 150), meta=np.ndarray>
Attributes:
    institution:  ECMWF
    source:       Reanalysis
    tilte:        ERA5 forecasts
    title:        ERA5 forecasts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment