Vamsi Uppala vamsiuppala

## README.md

      
        
          
            
              
              3 files
            
          
          
            
              
              0 forks
            
          
          
            
              
              0 comments
            
          
          
            
              
              1 star
            
          
        
        
          
              
          
          
            
                vamsiuppala
                / README.md
            
            
              Created
              January 7, 2024 18:58
                — forked from veekaybee/README.md
            
              
                whisper.ipynb
              
          
        
      
        
  
      
    Using Whisper to transcribe audio

This episode of Recsperts was transcribed with Whisper from OpenAI, an open-source neural net trained on almost 700 hours of audio. The model includes an encoder-decoder architecture by tokenizing audio into 30-second chunks, normalizing audio samples to the log-Mel scale, and passing the data into an encoder. A decoder is trained to predict the captioned text matching the encoder, and the model includes transcription, as well as timestamp-aligned transcription, and multilingual translation.

The transcription process outputs a single string file, so it's up to the end-user to parse out individual speakers, or run the model [through a sec