vinisalazar/gsoc-2022-report.md

## gsoc-2022-report.md

      
    Raw
  

              gsoc-2022-report.md
            
          
    Final Work Product Report - GSoC 2022

Project: Refactoring of erddapy into separate core and object layers

Organization: Integrated Ocean Observing System (IOOS)

Contributor: Vini Salazar

Mentors:

Mathew Biddle
Filipe Fernandes
Alex Kerney

Please see our mid-project blog post.

erddapy is a Python client for the ERDDAP data server. With erddapy, you can quickly and efficiently query and download oceanographic data from ERDDAP servers, and integrate this data with other Python data libraries for analysis and visualization.
This project focused on refactoring erddapy into two different layers: a core layer containing the primary erddapy functions of URL and response parsing, which may then be used by erddapy itself and downstream libraries, and an object (or opinionated) layer with high-level classes that enable more advanced manipulation and subsetting.
We successfully refactored the core layer of erddapy while preserving full backwards compatibility. This unlocks a number of basic erddapy functions that can be called individually for their specific task, allowing downstream libraries such as gliderpy and argopy to reuse these functions ad hoc without having to rely on other parts of erddapy.
erddapy itself will utilises this basic functionality to implement an opinionated layer (under the erddapy.array-like subpackage, see PR#267) that allows manipulation and subsetting of datasets. Users will be able to subset and query their dataset as an object that can be accessed using the Python "array-like" syntax, making erddapy much more flexible for data wrangling. Moreover, the creation of the core layer will improve the maintenance process of integrating erddapy with third-party libraries, such as Pandas, xarray, iris, and netCDF4.
The opinionated layer contains high-level objects that represent the ERDDAP server being queried, the network connection, and different types of ERDDAP datasets. Having a stateless interface from the basic erddapy functions to these high-level objects will facilitate scaling erddapy to parallel and/or asynchronous functionality, for both searching and downloading datasets. What this means is that these high-level objects, such the one representing an ERDDAP server, hold a certain state, that is, they record previous events and user interactions, which can be useful, but also limiting. However, they will operate under-the-hood with stateless functions, thus improving the reliability, visibility and scalability of requests made by these objects.
Future directions for this project include finishing the implementation of the subsetting interface of the dataset objects of the opinionated layer, adding asynchronous support to server search and dataset download, and preparing a major version release with all the changes.
Working on GSoC 2022 was a great experience which greatly boosted my confidence to contribute to open-source projects. I look forward to continuing to work on erddapy and to keep interacting with the open-source community.
Code example

This is how erddapy looks like after the refactor:
import erddapy

# erddapy.servers is a dictionary with ERDDAP servers
server = erddapy.servers["uaf"].url

# Choose a list of variables you'd like to select
variables = [
    "depth",
    "latitude",
    "longitude",
    "salinity",
    "temperature",
    "time",
]
# Choose a dict of dimensional constraints: lat, lon, and time
constraints = {
    "time>=": "now-7days",
}

# Generate an URL
url = erddapy.core.get_download_url(
    server=server,
    dataset_id="scrippsGliders",
    protocol="tabledap",
    response="csv",
    variables=variables,
    constraints=constraints
)

# Converting URL to Pandas dataframe
df = erddapy.core.to_pandas(url)
df.head()
                  depth       latitude       longitude       salinity       temperature       time     
  
            0       m       degrees_north       degrees_east       PSU       Celsius       UTC     
          1       0.0       35.530199999999994       -124.76329999999999       NaN       NaN       2025-08-27T04:06:00Z     
          2       501.5118       35.530199999999994       -124.76329999999999       34.191       6.176       2025-08-27T04:06:00Z     
          3       499.76883       35.530199999999994       -124.76329999999999       34.192       6.2       2025-08-27T04:06:00Z     
          4       497.66934       35.530199999999994       -124.76329999999999       34.191       6.202       2025-08-27T04:06:00Z     
  

My Pull Requests
My Issues
My Commits

Selected contributions to erddapy


Create core subpackage
Refactor tests to use gold standard ERDDAP servers
Refactor of the main erddapy.py module
Refactor URL modules
Create interfaces module and refactor interface methods
Create opinionated (array-like) layer (WIP)

Other contributions


Document search using 'allDatasets' method
Remove pagination on CSV, JSON searches
Use lazy loading with OPeNDAP responses
Refactor import in gliderpy library
Improvements to docstrings, error handling, warnings: 1, 2, 3.


Plotting with erddapy

Below is an example of plotting data from erddapy.

Data from Aquarius Sea Surface Salinity, L3 SMI, Version 5, 1.0°, Global, requested from the CoastWatch West Coast node with erddapy and plotted with Cartopy and Matplotlib. See the erddapy docs, notebook 01a-griddap for details.
	depth	latitude	longitude	salinity	temperature	time
0	m	degrees_north	degrees_east	PSU	Celsius	UTC
1	0.0	35.530199999999994	-124.76329999999999	NaN	NaN	2025-08-27T04:06:00Z
2	501.5118	35.530199999999994	-124.76329999999999	34.191	6.176	2025-08-27T04:06:00Z
3	499.76883	35.530199999999994	-124.76329999999999	34.192	6.2	2025-08-27T04:06:00Z
4	497.66934	35.530199999999994	-124.76329999999999	34.191	6.202	2025-08-27T04:06:00Z