denis-bz/0-Cosmopy.md

## 0-Cosmopy.md

      
    Raw
  

              0-Cosmopy.md
            
          
    Cosmopy: clip and regrid COSMO_REA6 wind data to python numpy

Keywords: Cosmo_rea6 DWD wind-data python numpy
Purpose:
DWD, the German Meteorological Service,
has wind data for Europe on a 6 x 6 km grid, from 1995 to August 2019, hourly,
at heights 10 100 .. 200m.
These are in 2 gigabyte files like 

https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/converted/hourly/2D/WS_100m.2D.201908.nc4 

-- windspeed data at 100m for 2019 August.
Cosmopy is a bare-bones package to

clip out small areas of these files, e.g. Bayern, to numpy .npz files
regrid the staggered Cosmo grid to a WGS84 uniform lat-lon grid, e.g. 0.05° x 0.05°.

Its design point: small programs with lots of print statements, verbose= .
Of course, if you're happy with a particular package, be happy.
But if you value clarity above raw speed,
and like small solid packages, and like numpy, read on.
Gallery


The flow: download, clip, regrid

Files are generated by 3 shell scripts, wget-cosmo clip-cosmo regrid-cosmo,
which call python;
see their help text, clip-cosmo -h etc.
After downloading wih wget-cosmo, clip the gigabyte files to the few percent you want to look at.
Then if need be, regrid to a uniform WGS84 grid.
Example files in this flow:
wget-cosmo:   $cosmodata/raw/WS_150m.2D.201401.nc4    -- monthly 01 .. 12, 1.5 GB each
clip-cosmo:   $cosmodata/BY-clip/WS_150m.2D.2014.npz  -- numpy savez: flexible, can memmap
regrid-cosmo: $cosmodata/BY-grid/WS_150m.2D.2014.npz  -- 0.05° x 0.05° grid to nearest cosmo

Here BY is an area name, see "Parameters" below.
The COSMO_REA6 grid

If I understand correctly,
the data in opendata.dwd.de/climate_environment/REA/COSMO_REA6/.../*.nc4
is on a staggered grid, about 6 x 6 km.
The 2d RLAT RLON in those files are in an odd rotated coordinate system
which as far as I know has no CRS / no proj string.
Ignore them, and use only the RLAT RLON in
.../constant/COSMO_REA6_CONST_withOUTsponge.nc .
To clip a particular N S W E box out of the gigabyte COSMO_REA6 files, use straight numpy,
see clip2d.py.  For example, 48.5° .. 48° x 11.25° x 12°:
mask: 83 points in (10, 10)
[[0 0 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 0]
 [1 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 1 1 1 1 1 1]
 [0 1 1 1 0 0 0 0 0 0]]

To find the j,k with RLAT[j,k], RLON[j,k] nearest say 48.137 N, 11.575 E,
use a KDTree, see kdtree.py also near_latlon.py.
"Nearest" means nearest in the Cosmo staggered grid, in the Euclidean metric (cos-shrink ?)
For example,
wgslat wgslon cosmoj cosmok cosmolat cosmolon
    48     12    378    443   47.973   11.969

Then wind_speed[:,378,443] in a (720, 824, 848) (time, lat, lon) array
is the timeseries at the Cosmo gridpoint nearest 48° N, 12° E.
Interpolation

Power from windmills is highly sensitive to peak windspeeds.
Interpolating timeseries usually smooths, lowers peaks, which would bias windpower estimates --
don't want to do that.
For example, consider two hourly timeseries at points 6 km apart with 6 km/h wind:
6  0  6  0 ... m/s: some windpower
0  6  0  6 ...
3  3  3  3 ... interpolate, average: 0 windpower

(Wind turbines generate no electricity below about 3 m/s).
I regrid the Cosmo staggered grid to wgs84 with Kdtree, with each timeseries unchanged: 

wgs84 gridpoint ⟶   the nearest staggered gridpoint ⟶   the windspeed timeseries there. 

Yes, nearest-neighbor is blocky, and several points on the uniform wgs84 grid
will get the same Cosmo timeseries.
Comments welcome.

Files: xarray ⟶   Netcdf4 ⟶   hdf5 vs. numpy .npz

Netcdf4 files are afaik hdf5 files, which can be really messy.
h5stat .../cosmo/...nc4 in h5py shows a multilevel tree with compressed blocks --
complicated and slow.
Xarray dask zarr etc. can't even see the complicated hdf5 way down below,
so cannot possibly speed it up.
What I really want is a clean path to numpy -- solid,
with a well-defined algebra of primitives (slicing, views, memmap).
A file tree of .npz files < 1 GB is, for me,
simpler and faster than messing with xarray etc.; ymmv.
See 

numpy indexes small amounts of data 1000 faster than xarray
(2019) 

Cyrille Rossant moving-away-hdf5 (2016) 

my little testbench to time various paths numpy <--> xarray <--> read/write
Notes

Choice of software depends on what you really want to do, what you know / who you know,
learn time.
Last not least, it's a matter of taste:

Less is more 

More is more

Comments welcome

cheers 

— denis-bz-py t-online.de  updated 2024-04-23 April