Organization: Integrated Ocean Observing System (IOOS)
Contributor: Vini Salazar
Mentors:
- Mathew Biddle
- Filipe Fernandes
- Alex Kerney
Please see our mid-project blog post.
erddapy is a Python client for the ERDDAP data server. With erddapy, you can quickly and efficiently query and download oceanographic data from ERDDAP servers, and integrate this data with other Python data libraries for analysis and visualization.
This project focused on refactoring erddapy into two different layers: a core layer containing the primary erddapy functions of URL and response parsing, which may then be used by erddapy itself and downstream libraries, and an object (or opinionated) layer with high-level classes that enable more advanced manipulation and subsetting.
We successfully refactored the core layer of erddapy while preserving full backwards compatibility. This unlocks a number of basic erddapy functions that can be called individually for their specific task, allowing downstream libraries such as gliderpy and argopy to reuse these functions ad hoc without having to rely on other parts of erddapy.
erddapy itself will utilises this basic functionality to implement an opinionated layer (under the erddapy.array-like
subpackage, see PR#267) that allows manipulation and subsetting of datasets. Users will be able to subset and query their dataset as an object that can be accessed using the Python "array-like" syntax, making erddapy much more flexible for data wrangling. Moreover, the creation of the core layer will improve the maintenance process of integrating erddapy with third-party libraries, such as Pandas, xarray, iris, and netCDF4.
The opinionated layer contains high-level objects that represent the ERDDAP server being queried, the network connection, and different types of ERDDAP datasets. Having a stateless interface from the basic erddapy functions to these high-level objects will facilitate scaling erddapy to parallel and/or asynchronous functionality, for both searching and downloading datasets. What this means is that these high-level objects, such the one representing an ERDDAP server, hold a certain state, that is, they record previous events and user interactions, which can be useful, but also limiting. However, they will operate under-the-hood with stateless functions, thus improving the reliability, visibility and scalability of requests made by these objects.
Future directions for this project include finishing the implementation of the subsetting interface of the dataset objects of the opinionated layer, adding asynchronous support to server search and dataset download, and preparing a major version release with all the changes.
Working on GSoC 2022 was a great experience which greatly boosted my confidence to contribute to open-source projects. I look forward to continuing to work on erddapy and to keep interacting with the open-source community.
This is how erddapy looks like after the refactor:
import erddapy
# erddapy.servers is a dictionary with ERDDAP servers
server = erddapy.servers["uaf"].url
# Choose a list of variables you'd like to select
variables = [
"depth",
"latitude",
"longitude",
"salinity",
"temperature",
"time",
]
# Choose a dict of dimensional constraints: lat, lon, and time
constraints = {
"time>=": "now-7days",
}
# Generate an URL
url = erddapy.core.get_download_url(
server=server,
dataset_id="scrippsGliders",
protocol="tabledap",
response="csv",
variables=variables,
constraints=constraints
)
# Converting URL to Pandas dataframe
df = erddapy.core.to_pandas(url)
df.head()
depth | latitude | longitude | salinity | temperature | time | |
---|---|---|---|---|---|---|
0 | m | degrees_north | degrees_east | PSU | Celsius | UTC |
1 | 0.0 | 35.530199999999994 | -124.76329999999999 | NaN | NaN | 2025-08-27T04:06:00Z |
2 | 501.5118 | 35.530199999999994 | -124.76329999999999 | 34.191 | 6.176 | 2025-08-27T04:06:00Z |
3 | 499.76883 | 35.530199999999994 | -124.76329999999999 | 34.192 | 6.2 | 2025-08-27T04:06:00Z |
4 | 497.66934 | 35.530199999999994 | -124.76329999999999 | 34.191 | 6.202 | 2025-08-27T04:06:00Z |
- Create core subpackage
- Refactor tests to use gold standard ERDDAP servers
- Refactor of the main
erddapy.py
module - Refactor URL modules
- Create
interfaces
module and refactor interface methods - Create opinionated (
array-like
) layer (WIP)
- Document search using
'allDatasets'
method - Remove pagination on CSV, JSON searches
- Use lazy loading with OPeNDAP responses
- Refactor import in gliderpy library
- Improvements to docstrings, error handling, warnings: 1, 2, 3.
Below is an example of plotting data from erddapy.
Data from Aquarius Sea Surface Salinity, L3 SMI, Version 5, 1.0°, Global, requested from the CoastWatch West Coast node with erddapy and plotted with Cartopy and Matplotlib. See the erddapy docs, notebook 01a-griddap
for details.