Organization: PEcAn Project
Student: Swarnalee Mazumder
Mentors: Istem Fer
PEcAn is an open-source ecoinformatics toolbox. It has a set of workflows that wrap around an ecosystem model and manage the flow of information in and out of process-based ecosystem models. PEcAn currently has 17 meteorological drivers available, some of which are Ameriflux, ERA5, CMIP5 which are used as input data to various models. ECMWF Open Data provides globally available ensemble forecasts which can be used to drive various models (available in PEcAn) with. The aim of this project was to extend PEcAn by building an ECMWF weather forecast data ingestion pipeline that will improve the modelling capabilities of PEcAn. An ECMWF Open Data pipeline in PEcAn would help in adding various meteorological forecast data (weather data) as input variables to the ecosystem models which would help in running them in the future. In this project, the interest was around 15-day ensemble forecast that originates from the day of the forecast and goes 360h into the future.
The first phase was focused on building the pipeline to Download ECMWF Open Data. There were two ways to integrate the downloading into PEcAn - 1. directly querying the URL (for e.g., using httr
) of the intended file in R 2. using ECMWF Open Data Python package ecmwf-opendata
. The forecast datasets are provided in GRIB2 format. The problem with the first approach was its inability to allow for subsetting the datasets. The second approach however, gave more flexibility to choose only parameters of interest but the files could only be downloaded in global scale.
15 day forecast (360h) data is divided in two sets by ECMWF. The first set consists of 0h to 144h at 3h time step and second set consisted of 150h till 360h at 6h timestep. Totalling to 85 GRIB2 files - 48 at 3h and 37 at 6h.
The main R function which downloads the ECMWF Open Data is download.ECMWF.R
which leverages download_ECMWF.py
under the hood.
Each downloaded GRIB2 file now consisted of step wise forecast information divided into controlled forecast (cf) - 1 layer
and perturbed forecast (pf) - 50 layers or ensembles
. During the second phase, I worked on manipulating the downloaded GRIB2 files. The main goals at this stage were to:
- Extract the
cf
andpf
datasets from the 85 GRIB2 files - Fill NaN at every 3rd hour in the
150h to 360h
set to have an uniform timestep of 3h across all 121 (48 + 37*2 - 1) observations - Convert the variables to Climate and Forecast (CF) Metadata Conventions standard
- Extract latitude, longitude or site-specific data from the Global files
- Write PEcAn standard netCDF files. The final files were:
- 1
cf
netCDF file consisting of 15 day forecast time series data (121 observations) at 3 hour step - 50 ensemble-wise
pf
netCDF files each consisting of 15 day forecast (121 observations) at 3 hour step
- 1
The main R function which handled all the operations and returned 51 files metadata is met2CF.ECMWF.R
which utilizes R script ECMWF_helper_functions.R
and python function met2CFutils.py
under the hood.
Python package xarray
, ecmwf-opendata
and R libraries ncdf4
, reticulate
have been used extensively in this project.
- #2975 [Open]
I really appreciate my mentor's support and advice throughout the coding period. I am grateful to PEcAn Project and Google Summer of Code for providing me with the opportunity to contribute in an area I have keen interest in.