Skip to content

Instantly share code, notes, and snippets.

@technosaby
technosaby / gsoc2022.md
Last active March 20, 2023 20:26
GSOC 2022 [RedHen Lab] Tagging Audio Effects Consolidated Report

Introduction

In GSoc 2022, I worked with Redhen Lab. The objective was to develop a machine learning model to tag sound effects in streams (like police sirens in a news-stream) of Red Hen’s data. A single stream of data can contain multiple sound effects, so the model should be able to label them from a group of known sound effects like a Multi-label classification problem. YamNet is used a pre-trained model in this project. The video files are converted into audio files. Then they are tagged by YamNet for the sound effects and the results are dumped into different kinds of files to understand the tagging on the video files.

Multilabel

Description

The project contains several blocks (in the form of scripts) which integrate together to annotate the tagging on RedHen videos.