Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save denis-bz/4eccc086f7f9b2c3bd292e345d604835 to your computer and use it in GitHub Desktop.
Save denis-bz/4eccc086f7f9b2c3bd292e345d604835 to your computer and use it in GitHub Desktop.
Introduction to Netherlands wind data with xarray

Introduction to Netherlands wind data with xarray

Keywords: Netherlands, wind, wind speed, KNMI, python, xarray, data-carpentry

Purpose: introduce Netherlands wind data from KNMI, the Royal Netherlands$ Meteorological Institute, with the steps:

  1. download 2 \.nc files by hand, one day and one gridpoint
  2. look at them in xarray
  3. programs to download data near given lat / lon coordinates

Who for: people who know python / IPython well, numpy well, and some pandas or xarray .

Data files on KNMI: by day, or on a NL grid

The Netherlands is covered by a "NL grid" of points about 2 km apart, 234 north-south x 217 east-west = 50778 gridpoints, on a Lambert conformal map. KNMI has collections of files in NetCDF (.nc) format, organized either

  1. by day, an .nc file with data for the whole NL grid on one day
  2. or by gridpoint, an .nc file with timeseries for e.g. 2019 .. 2021 at one point. (The NETS50 by-day files are about 240 MB, the by-gridpoint files about 18 MB.)

Download one daily and one NL grid file, and look at them in xarray

Start by downloading and looking at 2 of the WINS50 \.nc files. First mkdir e.g. .../nl/data; cd .../nl/data; pip install xarray. Edit a short file, say nl-knmi-data.notes, with who, what, when.

Go to https://dataplatform.knmi.nl/group/wind in your browser: "49 datasets found". Some have identical titles, hmm. ("wins50-ctl-*" may be for internal use, don't know.) Hover on "NetCDF" to show a link, click to go there for a longer description, and download individual \.nc files.

  1. download 1 day, whole NL grid: https://dataplatform.knmi.nl/dataset/wins50-wfp-nl-daily-3d-1
    Click "Toggle browser".
    Click the "download" button on the first file:
    WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_20190101.nc

  2. download 1 gridpoint, timeseries for 2019 .. 2021: https://dataplatform.knmi.nl/dataset/wins50-wfp-nl-ts-singlepoint-3
    Click "Toggle browser".
    Click the "download" button on the first file:
    WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_ix001_iy001_2019010100-2022010100.nc

(Where's my downloaded file ? Depends on browser settings -- if you don't see it, try ls ~/*/WINS50* and mv it to .../data/.)

Look at these .nc files in xarray

import numpy as np
import xarray as xr

    # limit long prints
xr.set_options( display_expand_attrs=False, display_values_threshold=10, display_width=120 )
np.set_printoptions( edgeitems=2, threshold=10, linewidth=120, formatter=dict(
        float = lambda x: "%.3g" % x ))

    # the whole NL grid on 2019-01-01 --
ncday = "data/WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_20190101.nc"
xday = xr.open_dataset( ncday )
print( f"\n-- {ncday} -- \n{xday} \n" )
    # Dimensions: (time: 24, y: 234, x: 217, height: 17)
    # i.e.               24h x 234 lats x 217 lons x 17 heights


	# timeseries 2019 .. 2021 at one point on the grid --
ncpt = "data/WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_ix001_iy001_2019010100-2022010100.nc"
xpt = xr.open_dataset( ncpt )
print( f"\n-- {ncpt} -- \n{xpt} \n" )
    # Dimensions: (time: 26305, y: 1, x: 1, height: 17)
    # i.e. (365 + 366 + 365) * 24 + 1 hours (the last is 2022-01-01, not 2021-12-31)


    # Look at the wspeed DataArray --
    # (pandas Series (column), DataFrame = several columns  correspond roughly to
    # xarray  DataArray,       Dataset = several DataArrays)
wspeed = xpt.wspeed.squeeze()
print( f"\n-- xpt.wspeed: {wspeed.shape} \n{wspeed.coords} \n" )

    # weekly average windspeeds --
avwspeed = (wspeed
        .sel( height=150 )
        .resample( time="W" )  # wspeed.resample? is help: "D" daily ...
        .mean().squeeze()
        )
print( f"\n-- monthly average windspeeds: {avwspeed.values}" )
avwspeed.plot()

# in IPython or notebook --
# wspeed? is help
# wspeed.<tab> lists 100 fields and functions: .x .y .lat .lon .plot ...
# wspeed.plot? is help ...

#   If you're comfortable with pandas, start with that:
#   df = wspeed.to_dataframe()
#   or with numpy: A = wspeed.to_numpy()  ~= wspeed.values

Python functions to download KNMI .nc files

These are work in progress. (Programs go through stages: 1 user, 2 users ... more users, or no users. These are at stage 1.)

""" load_kmni.py
In: a .csv with columns ix iy
Do: download KNMI WINS50*_ix*_iy*.nc to outdir for each ix iy
        looking first in a list of dirs
    trim_xds e.g. only wspeed, resample 3H to e.g. 1D
Out: data/nc/*.nc in bigdir
    e.g. data/wspeed/*.nc in outdir
    see trim_xds( xds, var=var, resample=resample )
"""
    # download a single .nc
	# looking first in outdir or dirs=". $data data ../data ~/.data" with `findfile`
def load_knmi_ixiy( ix=217, iy=234, outdir="data", **kw ) -> "outdir/WINS50*_ix*_iy*.nc":
def load_knmi_day( y=2019, m=1, d=1, outdir="data", **kw ) -> "e.g. outdir/WINS50_*20190101.nc":

Lat, lon ⟶ nearby lat, lon and ix, iy on the NL grid

Say we want timeseries at a list of 100 lat, lon points which are not exactly on the NL grid of 50778 points. We have a .csv file of all the gridpoints like this:

   lat    lon  ix  iy
50.421 1.3113   1   1
...
55.656 1.4797   1 234
...
 55.27 10.012 217 234

which can be used like this:

""" near_latlon.py 
In: two csv files or DataFrames, both with columns "lat" and "lon"
    call them Query and Lookup
Do:
    for each row in Query
        near = the row in Lookup with the lat lon nearest query
        write pd.concat near + query rows  -- keep only the query lat lon
Out:
    write_csv with "# header"
    len(Query) rows, ncol(Query) + ncol(Lookup) - 2 columns
Example:
    neardf = near_latlon( querycsv, bigcsv with cols lat lon ix iy )
    ix, iy = neardf.ix, neardf.iy  # for load_kmni_ixiy
Notes:
    Query is usually small, Lookup big,
    but they can both be say 1M points -- scipy KDTree is fast.

Notes

If you want only say wspeed timeseries at many gridpoints, consider downloading all of them, even if that takes 24 or 48 hours, and writing small *-wspeed.nc to local disk. This is easy, and may save your time.

xkcd-automation

Links

WINS50, Winds of the North Sea in 2050
DOWA, Dutch Offshore Wind Atlas
Dask ?
Windspeeds-Netherlands-wikipedia-KNMI-GWA under gist.github

Comments welcome

cheers
— denis-bz-py t-online.de 2022-08-30 August


The purpose of computing is insight, not numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment