- Name: Nagesh Bansal
- Organisation: NumFocus
- Sub-Organisation: Data Retriever
- Project: neonwranglerpy- Data retrieval using NeonVegWrangler
- Mentors: Henry Senyondo, Sergio Marconi , Ethan White
- Querying the NEON-DATA-API
- Retrieval of Vegetation Structure (VST) Data
- Translation VST related functions from neonVegWrangleR
- Retrieval of Airborne Observation Platform (AOP) Data
- Refactor AOP Functions from neonVegWrangleR
The First Task of this Pipeline was to query the Neon data API to retrieve the datasets. In the NeonVegWrangleR package, We were using the neon-utilities package, But there is no python wrapper available for this package till now, so this issue can be solved in two ways:
- Refactoring of the neon-utilities package function
- Usage of the R functions in Python Package with The help of the rpy2 package.
This pipeline proceeds with the (1) method, and functions likeloadByProduct
,zipsByProduct
,stackByTables
,byTileAOP
were refactored succesfully.
These functions are meant to download the Vegetation Structure Data and Airborne Observation Platform for now but We have been talking about making it in a way it can be generalized to the other data-products, also.
Pull Requests:
- Querying the API directly without using the Neon Utilities (PR#6)
- Functions for getting recent versions of the files(PR #14)
- Utils Functions ( PR #13)
- Tools Functions
- StackDataFiles functions for stacking Files(PR #15)
- Support for multiple sites PR(#17)
- Table Types Functions (PR #16)
Issues:
The Vegetation Structure Data can be retrieved by using load_by_product()
function or by using the zips_by_product()
and stack_by_table()
functions simultaneously. Tests and docs for these functions were also added, and Tutorials for using these function were also created.
Pull Requests:
Refactoring of the VST Related functions from neonVegWrangleR package were done in python. The functions that were part of this was :
retrieve_VST_Data
, retrieve_coords_itc
, retrieve_dist_to_utm
.
These functions helps in retrieving the VST data using the load_by_product
funtion, Adding the UTM coordinates of vst entries based on azimuth and distance and merging of these coordinates with the apparent_individual entries
.
Pull Requests:
The Airborne Observation Platform Data can be downloaded using the by_tile_aop()
function. I refactored the 'byTileAOP()' function along with get_tile_urls() function from the neon-utilities package.
Pull Requests:
Refactoring of the AOP Related functions from neonVegWrangleR package were done in python. The functions that were part of this was:
retrieve_AOP_Data
: This function helps in retrieving the AOP data using theby_tile_aop()
funtion for the given indivdual vegetation structure data coordinates.crop_plot_data()
: This function helps create a shapefile out of vegetation structure data with lat/lon coordinates. and after that, it applies the clip_plot() function to the clip plot from AOP data using this shapefile.
Pull Requests:
Clip_plot function
: In NeonVegWrangleR , clip_plot function clips plots around NEON VST data. it works on following types of data : 1. raster (.tif) 2. lidar point cloud (.laz) 3. Hierarchical data (.h5). It includes following functions:- Clip_raster()
- Clip_lidar()
- Clip-hdf
Pull requests: weecology/neonwranglerpy#34
- In crop_plot_data function, parallel processing needs to be implemented as applying the clip_plot function over the entire dataframe in a synchronous way is not a optimized way.
Unit Tests of package was done by creating tests for functions like loadByProduct
, zipsByProduct
, stackByTables
, byTileAOP
, retrieve_aop_data
, retrieve_vst_data
succesfully.
Pull Requests:
Pull Requests:
The goal of the project was to implement a Python version of NeonVegWrangleR package, used for integrating the Neon Vegetation Structure (VST) and Airborne Observation Platform (AOP) Data. Only Vegetation Structure Data and Airborne Observation Platform Data have been integrated to the Package but We have been talking about making it in a way it can be generalized to the other data-products, also. I plan to do following work in future:
-
Implementation for the CFC dataset ( DP1.10026.001): We can also add support for CFC data, which is generally equivalent to VST data. To deal with CFC data, we need to set up a pipeline as we did for VST.
-
Refactoring of the function get_lat_lon.R: This function will help us calculate latitude and longitude values for each stem in the VST data. As for now, we’re not calculating latitude and longitude separately.
-
Asynchronous Downloading and Processing of Data can be implemented to make the pipeline more faster.
I plan to continue contributing more to Data Retriever and neonwranglerpy after GSoC'22 and become an active contributor for the repository.
During the GSoC Period, my mentor Henry Senyondo and Sergio Marconi motivated me to write blogs and tutorials explaning my work in this project and new Tech stacks such as apache web servers, packaging in cpp etc.
Description | Blog Link |
---|---|
GSoC’22: Community Bonding Period. | Blog |
GSoc'22 : Setting Up Project | Link |
GSoc'22 : Querying the NEON-DATA-API | Link |
GSoc'22 : Retrieval of Vegetation Structure (VST) Data | Link |
GSoc'22 : Translation VST related functions from neonVegWrangleR | Link |
Configuration of Apache Web Server | Link |
Creating a C++ Package | Link |
For me, the last three months have been an incredible learning experience, and I am grateful for everything I've learned. I learnt CI/CD using Docker and Github Actions, interfacing between R and Python, and using REST APIs. The entire experience has really aided my overall development as a developer, and I can confidently state that this has been the most fruitful summer of my life!