Skip to content

Instantly share code, notes, and snippets.

Last active February 27, 2023 06:54
Show Gist options
  • Save karan2704/1f1827728d9ad6d179dc9d09a970de8e to your computer and use it in GitHub Desktop.
Save karan2704/1f1827728d9ad6d179dc9d09a970de8e to your computer and use it in GitHub Desktop.
2022 Google Summer of Code - Work Product Report

GSoC 2022 Final Work Product Report




When whales are out in the Pacific, it becomes difficult to listen to them and track their behaviour. There are several hydrophones installed along the Oregon coast and within inland waters (like Puget Sound). For this project we decided to study the NSF-funded Ocean Observatories Initiative (OOI) slope base shallow hydrophone at a depth of 195m. The audio data captured by these hydrophones, stored in the OOI archive in miniSEED format, have been practically inaccessible to the community of marine scientists and researchers. The goal of this project is to make mseed data audio data accessible by integrating it with Orcasound live listening app and ML pipeline.

Orcanode repository contains audio tools and scripts for capturing data from the OOI archive, reformatting and transcoding them using FFmpeg, and uploading them to AWS S3 storage.


Overview of the work done

Phase 1

Handling gaps in the archive data

The archive data comes with some inconsistencies which can hinder a user's listening experience and also cause unnecessary delays while fetching data. For this problem we used OOIPY library. OOIPY comes with built-in functions to handle gaps. It uses the technique of interpolation to merge data segments with a gap of less than 1800 seconds. This PR also handles other functionalities like trancoding miniSEED to .wav and then to HLS segments(.ts) using FFmpeg, creating a directory tree for inotify for tracking audio segments and creating & updating latest.txt file which points to the most recent directory.

Pull Request: PR#40
Closes: Issue#30


Ocean Observatories Initiative archive

Delayed data

Similar to previous issue the delays in upload of mSEED data causes gaps in the archive. When the gaps are longer than 30 minutes, the previous method of interpolation becomes time consuming. The data is not fetched in real-time it has a latency of 8 hours since the audio captured by hydrophones goes through several checks before actually reaching the OOI archive. If the data has more than 30 minutes of gap then that particular slot is skipped and after 24 hours the function re-iterates the skipped slots and if data is available it is fetched.

Pull Request: PR#44
Closes: Issue#41

Manifest file (.m3u8)

For streaming audio data in near real-time we use the HTTP live streaming (HLS). Orcasite uses data segments (.ts format) and the .m3u8 file present in the AWS S3 bucket to stream live audio. The .m3u8 file acts as a playlist to define the order in which HLS segments are to be played and other metadata like segment length, discontinuity etc. Initially, the creation of .m3u8 file was handled by FFmpeg itself but there were a few issues with appending the names of .ts files and the quality of segments generated therefore we decided to move forward with the manual approach by using python file handling.

Pull Request: PR#47
Closes: Issue#42


S3 Bucket where HLS segments along with .m3u8 are stored

Phase 2

Test with fixed time

To test the entire flow or simply to listen to interesting sounds or notable events we added a testing environment feature which allows us to define a particular timeslot we want to listen to instead to listening live. We added some environment variables which act as a switch to toggle between live and fixed timestamps.

Pull Request: PR#50
Closes: Issue#45

Note: Some interesting hydrophone events along with their time of occurence are mention here. OOI Hydrophone Events

Integration Tests using GitHub actions

There are multiple integration tests to be added for the Orcanode repository including docker tests (to build images and upload them to Docker Hub) and fuctions tests to test out the entire flow i.e. from fetch to upload.

Pull Request (work in progress): PR#51
Closes: Issue#46

Future Work

  • Improve continuous integration tests with GitHub Actions.
  • Test out the entire flow using AWS Lightsail.


As a contributor here at Orcasound I got the opportunity to present my work in front of various tech leaders and researchers.

  • On 1st September, 2022 I presented the work done during GSoC virtually in a workshop organized at the Simon Fraser University, Canada. Link to PPT.
  • On 22nd August, 2022 I got an opportunity to interact with Tom Denton, a researcher at Google working in the field of bioacoustics. I got some positive feedback on my progress.
Copy link

scottveirs commented Feb 24, 2023

Hey @karan2704

I found a couple broken links to now-closed issues in the Phase 2 section (specifically the links to issues #45 and #46). I think the problem is the URL you typed in has issue when the working link has the plural, issues

For example: yields a 404

whereas, leads to issue #46 as expected.

Copy link

Fixed that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment