Keywords: Netherlands, wind, wind speed, KNMI, python, xarray, data-carpentry
Purpose: introduce Netherlands wind data from KNMI, the Royal Netherlands$ Meteorological Institute, with the steps:
- download 2
\.nc
files by hand, one day and one gridpoint - look at them in
xarray
- programs to download data near given lat / lon coordinates
Who for: people who know python / IPython well, numpy well, and some pandas or xarray .
The Netherlands is covered by a "NL grid" of points about 2 km apart,
234 north-south x 217 east-west = 50778 gridpoints, on a Lambert conformal map.
KNMI has collections of files in NetCDF (.nc
) format, organized either
- by day, an
.nc
file with data for the whole NL grid on one day - or by gridpoint, an
.nc
file with timeseries for e.g. 2019 .. 2021 at one point. (The NETS50 by-day files are about 240 MB, the by-gridpoint files about 18 MB.)
Start by downloading and looking at 2 of the WINS50 \.nc
files.
First mkdir
e.g. .../nl/data; cd .../nl/data; pip install xarray
.
Edit a short file, say nl-knmi-data.notes
, with who, what, when.
Go to https://dataplatform.knmi.nl/group/wind in your browser: "49 datasets found".
Some have identical titles, hmm.
("wins50-ctl-*" may be for internal use, don't know.)
Hover on "NetCDF" to show a link,
click to go there for a longer description, and download individual \.nc
files.
-
download 1 day, whole NL grid: https://dataplatform.knmi.nl/dataset/wins50-wfp-nl-daily-3d-1
Click "Toggle browser".
Click the "download" button on the first file:
WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_20190101.nc -
download 1 gridpoint, timeseries for 2019 .. 2021: https://dataplatform.knmi.nl/dataset/wins50-wfp-nl-ts-singlepoint-3
Click "Toggle browser".
Click the "download" button on the first file:
WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_ix001_iy001_2019010100-2022010100.nc
(Where's my downloaded file ? Depends on browser settings --
if you don't see it, try ls ~/*/WINS50*
and mv
it to .../data/
.)
import numpy as np
import xarray as xr
# limit long prints
xr.set_options( display_expand_attrs=False, display_values_threshold=10, display_width=120 )
np.set_printoptions( edgeitems=2, threshold=10, linewidth=120, formatter=dict(
float = lambda x: "%.3g" % x ))
# the whole NL grid on 2019-01-01 --
ncday = "data/WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_20190101.nc"
xday = xr.open_dataset( ncday )
print( f"\n-- {ncday} -- \n{xday} \n" )
# Dimensions: (time: 24, y: 234, x: 217, height: 17)
# i.e. 24h x 234 lats x 217 lons x 17 heights
# timeseries 2019 .. 2021 at one point on the grid --
ncpt = "data/WINS50_43h21_fERA5_WFP_ptA_NETHERLANDS.NL_ix001_iy001_2019010100-2022010100.nc"
xpt = xr.open_dataset( ncpt )
print( f"\n-- {ncpt} -- \n{xpt} \n" )
# Dimensions: (time: 26305, y: 1, x: 1, height: 17)
# i.e. (365 + 366 + 365) * 24 + 1 hours (the last is 2022-01-01, not 2021-12-31)
# Look at the wspeed DataArray --
# (pandas Series (column), DataFrame = several columns correspond roughly to
# xarray DataArray, Dataset = several DataArrays)
wspeed = xpt.wspeed.squeeze()
print( f"\n-- xpt.wspeed: {wspeed.shape} \n{wspeed.coords} \n" )
# weekly average windspeeds --
avwspeed = (wspeed
.sel( height=150 )
.resample( time="W" ) # wspeed.resample? is help: "D" daily ...
.mean().squeeze()
)
print( f"\n-- monthly average windspeeds: {avwspeed.values}" )
avwspeed.plot()
# in IPython or notebook --
# wspeed? is help
# wspeed.<tab> lists 100 fields and functions: .x .y .lat .lon .plot ...
# wspeed.plot? is help ...
# If you're comfortable with pandas, start with that:
# df = wspeed.to_dataframe()
# or with numpy: A = wspeed.to_numpy() ~= wspeed.values
These are work in progress. (Programs go through stages: 1 user, 2 users ... more users, or no users. These are at stage 1.)
""" load_kmni.py
In: a .csv with columns ix iy
Do: download KNMI WINS50*_ix*_iy*.nc to outdir for each ix iy
looking first in a list of dirs
trim_xds e.g. only wspeed, resample 3H to e.g. 1D
Out: data/nc/*.nc in bigdir
e.g. data/wspeed/*.nc in outdir
see trim_xds( xds, var=var, resample=resample )
"""
# download a single .nc
# looking first in outdir or dirs=". $data data ../data ~/.data" with `findfile`
def load_knmi_ixiy( ix=217, iy=234, outdir="data", **kw ) -> "outdir/WINS50*_ix*_iy*.nc":
def load_knmi_day( y=2019, m=1, d=1, outdir="data", **kw ) -> "e.g. outdir/WINS50_*20190101.nc":
Say we want timeseries at a list of 100 lat, lon points which are not exactly on the NL grid of 50778 points. We have a .csv file of all the gridpoints like this:
lat lon ix iy
50.421 1.3113 1 1
...
55.656 1.4797 1 234
...
55.27 10.012 217 234
which can be used like this:
""" near_latlon.py
In: two csv files or DataFrames, both with columns "lat" and "lon"
call them Query and Lookup
Do:
for each row in Query
near = the row in Lookup with the lat lon nearest query
write pd.concat near + query rows -- keep only the query lat lon
Out:
write_csv with "# header"
len(Query) rows, ncol(Query) + ncol(Lookup) - 2 columns
Example:
neardf = near_latlon( querycsv, bigcsv with cols lat lon ix iy )
ix, iy = neardf.ix, neardf.iy # for load_kmni_ixiy
Notes:
Query is usually small, Lookup big,
but they can both be say 1M points -- scipy KDTree is fast.
If you want only say wspeed
timeseries at many gridpoints,
consider downloading all of them, even if that takes 24 or 48 hours,
and writing small *-wspeed.nc
to local disk.
This is easy, and may save your time.
WINS50, Winds of the North Sea in 2050
DOWA, Dutch Offshore Wind Atlas
Dask ?
Windspeeds-Netherlands-wikipedia-KNMI-GWA
under gist.github
cheers
— denis-bz-py t-online.de 2022-08-30 August
The purpose of computing is insight, not numbers.