- Name: Karan Mishra
- GitHub: @karan2704
- Organisation: @orcasound
- Project: Stream external hydrophones to Orcasound website
- Mentors: Stephen Hicks, Scott Veirs, Valentina Staneva, Val Veirs
- Tools and Technologies: Python, Docker, AWS, FFmpeg
- Blogs: Part-1 Part-2
When whales are out in the Pacific, it becomes difficult to listen to them and track their behaviour. There are several hydrophones installed along the Oregon coast and within inland waters (like Puget Sound). For this project we decided to study the NSF-funded Ocean Observatories Initiative (OOI) slope base shallow hydrophone at a depth of 195m. The audio data captured by these hydrophones, stored in the OOI archive in miniSEED format, have been practically inaccessible to the community of marine scientists and researchers. The goal of this project is to make mseed data audio data accessible by integrating it with Orcasound live listening app and ML pipeline.
Orcanode repository contains audio tools and scripts for capturing data from the OOI archive, reformatting and transcoding them using FFmpeg, and uploading them to AWS S3 storage.
The archive data comes with some inconsistencies which can hinder a user's listening experience and also cause unnecessary delays while fetching data. For this problem we used OOIPY library. OOIPY comes with built-in functions to handle gaps. It uses the technique of interpolation to merge data segments with a gap of less than 1800 seconds. This PR also handles other functionalities like trancoding miniSEED to .wav and then to HLS segments(.ts) using FFmpeg, creating a directory tree for inotify for tracking audio segments and creating & updating latest.txt file which points to the most recent directory.
Pull Request: PR#40
Closes: Issue#30
Similar to previous issue the delays in upload of mSEED data causes gaps in the archive. When the gaps are longer than 30 minutes, the previous method of interpolation becomes time consuming. The data is not fetched in real-time it has a latency of 8 hours since the audio captured by hydrophones goes through several checks before actually reaching the OOI archive. If the data has more than 30 minutes of gap then that particular slot is skipped and after 24 hours the function re-iterates the skipped slots and if data is available it is fetched.
Pull Request: PR#44
Closes: Issue#41
For streaming audio data in near real-time we use the HTTP live streaming (HLS). Orcasite uses data segments (.ts format) and the .m3u8 file present in the AWS S3 bucket to stream live audio. The .m3u8 file acts as a playlist to define the order in which HLS segments are to be played and other metadata like segment length, discontinuity etc. Initially, the creation of .m3u8 file was handled by FFmpeg itself but there were a few issues with appending the names of .ts files and the quality of segments generated therefore we decided to move forward with the manual approach by using python file handling.
Pull Request: PR#47
Closes: Issue#42
To test the entire flow or simply to listen to interesting sounds or notable events we added a testing environment feature which allows us to define a particular timeslot we want to listen to instead to listening live. We added some environment variables which act as a switch to toggle between live and fixed timestamps.
Pull Request: PR#50
Closes: Issue#45
Note: Some interesting hydrophone events along with their time of occurence are mention here. OOI Hydrophone Events
There are multiple integration tests to be added for the Orcanode repository including docker tests (to build images and upload them to Docker Hub) and fuctions tests to test out the entire flow i.e. from fetch to upload.
Pull Request (work in progress): PR#51
Closes: Issue#46
- Improve continuous integration tests with GitHub Actions.
- Test out the entire flow using AWS Lightsail.
As a contributor here at Orcasound I got the opportunity to present my work in front of various tech leaders and researchers.
- On 1st September, 2022 I presented the work done during GSoC virtually in a workshop organized at the Simon Fraser University, Canada. Link to PPT.
- On 22nd August, 2022 I got an opportunity to interact with Tom Denton, a researcher at Google working in the field of bioacoustics. I got some positive feedback on my progress.
Hey @karan2704
I found a couple broken links to now-closed issues in the Phase 2 section (specifically the links to issues #45 and #46). I think the problem is the URL you typed in has
issue
when the working link has the plural,issues
For example:
https://github.com/orcasound/orcanode/issue/46 yields a 404
whereas,
https://github.com/orcasound/orcanode/issues/46
leads to issue #46 as expected.