Skip to content

Instantly share code, notes, and snippets.

@swarnaleem
Last active September 14, 2022 17:33
Show Gist options
  • Save swarnaleem/96c0d7f3c1286fb97ca5eb9c5ae3b342 to your computer and use it in GitHub Desktop.
Save swarnaleem/96c0d7f3c1286fb97ca5eb9c5ae3b342 to your computer and use it in GitHub Desktop.
GSoC_2022_PEcAn Project_Report.md

Organization: PEcAn Project

Student: Swarnalee Mazumder

Mentors: Istem Fer

About PEcAn

PEcAn is an open-source ecoinformatics toolbox. It has a set of workflows that wrap around an ecosystem model and manage the flow of information in and out of process-based ecosystem models. PEcAn currently has 17 meteorological drivers available, some of which are Ameriflux, ERA5, CMIP5 which are used as input data to various models. ECMWF Open Data provides globally available ensemble forecasts which can be used to drive various models (available in PEcAn) with. The aim of this project was to extend PEcAn by building an ECMWF weather forecast data ingestion pipeline that will improve the modelling capabilities of PEcAn. An ECMWF Open Data pipeline in PEcAn would help in adding various meteorological forecast data (weather data) as input variables to the ecosystem models which would help in running them in the future. In this project, the interest was around 15-day ensemble forecast that originates from the day of the forecast and goes 360h into the future.

Coding Period

Phase 1

The first phase was focused on building the pipeline to Download ECMWF Open Data. There were two ways to integrate the downloading into PEcAn - 1. directly querying the URL (for e.g., using httr) of the intended file in R 2. using ECMWF Open Data Python package ecmwf-opendata. The forecast datasets are provided in GRIB2 format. The problem with the first approach was its inability to allow for subsetting the datasets. The second approach however, gave more flexibility to choose only parameters of interest but the files could only be downloaded in global scale.

15 day forecast (360h) data is divided in two sets by ECMWF. The first set consists of 0h to 144h at 3h time step and second set consisted of 150h till 360h at 6h timestep. Totalling to 85 GRIB2 files - 48 at 3h and 37 at 6h.

The main R function which downloads the ECMWF Open Data is download.ECMWF.R which leverages download_ECMWF.py under the hood.

Phase 2

Each downloaded GRIB2 file now consisted of step wise forecast information divided into controlled forecast (cf) - 1 layer and perturbed forecast (pf) - 50 layers or ensembles. During the second phase, I worked on manipulating the downloaded GRIB2 files. The main goals at this stage were to:

  1. Extract the cf and pf datasets from the 85 GRIB2 files
  2. Fill NaN at every 3rd hour in the 150h to 360h set to have an uniform timestep of 3h across all 121 (48 + 37*2 - 1) observations
  3. Convert the variables to Climate and Forecast (CF) Metadata Conventions standard
  4. Extract latitude, longitude or site-specific data from the Global files
  5. Write PEcAn standard netCDF files. The final files were:
    • 1 cf netCDF file consisting of 15 day forecast time series data (121 observations) at 3 hour step
    • 50 ensemble-wise pf netCDF files each consisting of 15 day forecast (121 observations) at 3 hour step

The main R function which handled all the operations and returned 51 files metadata is met2CF.ECMWF.R which utilizes R script ECMWF_helper_functions.R and python function met2CFutils.py under the hood.

Python package xarray, ecmwf-opendata and R libraries ncdf4, reticulate have been used extensively in this project.

The pull request below has been updated to meet all the above goals:

Acknowledgements

I really appreciate my mentor's support and advice throughout the coding period. I am grateful to PEcAn Project and Google Summer of Code for providing me with the opportunity to contribute in an area I have keen interest in. 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment