Sabyasachi Ghosal technosaby

## gsoc2022.md

      
        
          
            
              
              1 file
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              1 star
            
          
        
        
          
              
          
          
            
                technosaby
                / gsoc2022.md
            
            
              Last active
              March 20, 2023 20:26
            
              
                GSOC 2022 [RedHen Lab] Tagging Audio Effects Consolidated Report
              
          
        
      
        
  
      
    Introduction

In GSoc 2022, I worked with Redhen Lab. The objective was to develop a machine learning model to tag sound effects in streams (like police sirens in a news-stream) of Red Hen’s data. A single stream of data can contain multiple sound effects, so the model should be able to label them from a group of known sound effects like a Multi-label classification problem. YamNet is used a pre-trained model in this project. The video files are converted into audio files. Then they are tagged by YamNet for the sound effects and the results are dumped into different kinds of files to understand the tagging on the video files.

Description

The project contains several blocks (in the form of scripts) which integrate together to annotate the tagging on RedHen videos.