Skip to content

Instantly share code, notes, and snippets.

@vinisalazar
Last active November 22, 2022 01:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vinisalazar/c2a5fc91667bf84ad7ac1ea49c553b61 to your computer and use it in GitHub Desktop.
Save vinisalazar/c2a5fc91667bf84ad7ac1ea49c553b61 to your computer and use it in GitHub Desktop.
Report for Google Summer of Code 2022 - IOOS erddapy Project

Final Work Product Report - GSoC 2022

Organization: Integrated Ocean Observing System (IOOS)
Contributor: Vini Salazar
Mentors:

  • Mathew Biddle
  • Filipe Fernandes
  • Alex Kerney

Please see our mid-project blog post.

erddapy is a Python client for the ERDDAP data server. With erddapy, you can quickly and efficiently query and download oceanographic data from ERDDAP servers, and integrate this data with other Python data libraries for analysis and visualization.

This project focused on refactoring erddapy into two different layers: a core layer containing the primary erddapy functions of URL and response parsing, which may then be used by erddapy itself and downstream libraries, and an object (or opinionated) layer with high-level classes that enable more advanced manipulation and subsetting.

We successfully refactored the core layer of erddapy while preserving full backwards compatibility. This unlocks a number of basic erddapy functions that can be called individually for their specific task, allowing downstream libraries such as gliderpy and argopy to reuse these functions ad hoc without having to rely on other parts of erddapy.

erddapy itself will utilises this basic functionality to implement an opinionated layer (under the erddapy.array-like subpackage, see PR#267) that allows manipulation and subsetting of datasets. Users will be able to subset and query their dataset as an object that can be accessed using the Python "array-like" syntax, making erddapy much more flexible for data wrangling. Moreover, the creation of the core layer will improve the maintenance process of integrating erddapy with third-party libraries, such as Pandas, xarray, iris, and netCDF4.

The opinionated layer contains high-level objects that represent the ERDDAP server being queried, the network connection, and different types of ERDDAP datasets. Having a stateless interface from the basic erddapy functions to these high-level objects will facilitate scaling erddapy to parallel and/or asynchronous functionality, for both searching and downloading datasets. What this means is that these high-level objects, such the one representing an ERDDAP server, hold a certain state, that is, they record previous events and user interactions, which can be useful, but also limiting. However, they will operate under-the-hood with stateless functions, thus improving the reliability, visibility and scalability of requests made by these objects.

Future directions for this project include finishing the implementation of the subsetting interface of the dataset objects of the opinionated layer, adding asynchronous support to server search and dataset download, and preparing a major version release with all the changes.

Working on GSoC 2022 was a great experience which greatly boosted my confidence to contribute to open-source projects. I look forward to continuing to work on erddapy and to keep interacting with the open-source community.

Code example

This is how erddapy looks like after the refactor:

import erddapy

# erddapy.servers is a dictionary with ERDDAP servers
server = erddapy.servers["uaf"].url

# Choose a list of variables you'd like to select
variables = [
    "depth",
    "latitude",
    "longitude",
    "salinity",
    "temperature",
    "time",
]
# Choose a dict of dimensional constraints: lat, lon, and time
constraints = {
    "time>=": "now-7days",
}

# Generate an URL
url = erddapy.core.get_download_url(
    server=server,
    dataset_id="scrippsGliders",
    protocol="tabledap",
    response="csv",
    variables=variables,
    constraints=constraints
)

# Converting URL to Pandas dataframe
df = erddapy.core.to_pandas(url)
df.head()
depth latitude longitude salinity temperature time
0 m degrees_north degrees_east PSU Celsius UTC
1 0.0 35.530199999999994 -124.76329999999999 NaN NaN 2025-08-27T04:06:00Z
2 501.5118 35.530199999999994 -124.76329999999999 34.191 6.176 2025-08-27T04:06:00Z
3 499.76883 35.530199999999994 -124.76329999999999 34.192 6.2 2025-08-27T04:06:00Z
4 497.66934 35.530199999999994 -124.76329999999999 34.191 6.202 2025-08-27T04:06:00Z

My Pull Requests

My Issues

My Commits


Selected contributions to erddapy

Other contributions


Plotting with erddapy

Below is an example of plotting data from erddapy.

aquarius_data Data from Aquarius Sea Surface Salinity, L3 SMI, Version 5, 1.0°, Global, requested from the CoastWatch West Coast node with erddapy and plotted with Cartopy and Matplotlib. See the erddapy docs, notebook 01a-griddap for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment