Organization: PEcAn Project
Student: Ayush Prasad
Mentors: Istem Fer, Shawn Serbin, Olli Nevalainen
This project aimed to develop a pipeline for ingesting remote sensing data in PEcAn. For this purpose, the data.remote module was extended to establish connections with Google Earth Engine and AppEEARS which now allows submitting data requests to these sources from the PEcAn workflow and stores the output in BETYdb for further analysis.
The functioning of the Remote data module can be divided into two parts,
-
RpTools,
A Python package with the following modules,gee2pecan_s2
for retrieving Sentinel 2 data from GEEgee2pecan_l8
for retrieving Landsat data from GEEgee2pecan_smap
for retrieving SMAP soil moisture data from GEEbands2lai_snap
for computing Leaf Area Index using the SNAP toolboxappeears2pecan
for downloading data from AppEEARSget_remote_data
for handling the raw data download processprocess_remote_data
for processing raw datarp_control
main module for controlling the above modules
Along with the above functions, RpTools also creates GeoJSON files from the BETYdb data and manages the merging of netCDF files of the same type. This package is designed in such a way that if the need arises it can be used independently of the PEcAn workflow.
-
remote_process,
The main R function inside PEcAn's data.remote module which controls
RpTools
and adds PEcAn dependencies. The R - Python interfacing is handled using reticulate.remote_process
checks the status of the requested data in the data base(BETYdb), sets the stages, callsrp_control
and finally inserts the output in BETYdb.
During the first phase, the Sentinel 2 and SNAP code provided by the PEcAn team was modified for use in this module. Then the initial version of rp_control
function(previously named remote_process) was created for managing the download and processing of data.
Pull requests:
- PecanProject/pecan#2634 [Merged]
- PecanProject/pecan#2637 [Merged]
Two functions gee2pecan_l8
and gee2pecan_smap
were implemented for retrieving Landsat and SMAP data respectively from Google Earth Engine. rp_control
was improved to allow adding new GEE collections without having to make any changes in the source code. Then appeears2pecan
function was implemented to download data from AppEEARS.
Pull requests:
- PecanProject/pecan#2642 [Merged]
- PecanProject/pecan#2645 [Merged]
- PecanProject/pecan#2659 [Merged]
Functions were implemented for merging remote sensing data of the same type and for creating GeoJSON files from BETYdb geometry data. All of the Python code developed was packaged into RpTools
package. Finally, the main function remote_process
was implemented in R which added connections to the database(BETYdb) and made it possible to submit requests from the PEcAn workflow.
Pull request:
- PecanProject/pecan#2672 [Approved, open]
Link to all PR's: https://github.com/PecanProject/pecan/pulls?q=is%3Apr+author%3Aayushprd
- Module for calculating uncertainties - in addition to the raw data download module and process data module a third module can be developed which could calculate the uncertainties in remote sensing data.
- Remote execution - PEcAn can be run on a remote server or HPC, this module can be improved to support such use cases.
- Parallelization - some parts of this module can be modified to run parallelly or concurrently which could reduce the time taken to download data for multiple sites.
I will continue to work on these in the coming weeks.
PEcAn has the tools to run a model anywhere on earth, similarly, with the development of this remote data module it is now possible to evaluate the model everywhere as the module can get remote sensing observations from different sources for any place on earth.
I am deeply thankful to Istem Fer for her guidance throughout the project. Every feedback of yours has helped me in improving my skills tremendously. Thank you for suggesting ideas about developing the code in such a way that it could fit into the larger scheme. I thank Olli Nevalainen and Shawn Serbin for helping me with remote sensing issues and providing ideas to develop this module. Thank you PEcAn Project for this wonderful experience. I hope this project was my first step towards pursuing my interest in computer and environmental sciences.
These are some of the resources from which I benefited hugely during the course of this project.
- https://github.com/ollinevalainen/satellitetools by Olli Nevalainen - in addition to providing some of the code used in this project, the codes in this repository helped me a lot in learning about GEE.
- https://rabernat.github.io/research_computing_2018/xarray.html - for handling multidimensional data in Python using xarray.
- Although not directly related to this project, the materials of this ecosystem modelling course by Lund University helped me in gaining understanding about ecosystem modelling, while also providing some knowledge about the science behind some of the modules in PEcAn.