Skip to content

Instantly share code, notes, and snippets.

View vamsiuppala's full-sized avatar

Vamsi Uppala vamsiuppala

View GitHub Profile
@vamsiuppala
vamsiuppala / README.md
Created January 7, 2024 18:58 — forked from veekaybee/README.md
whisper.ipynb

Using Whisper to transcribe audio

This episode of Recsperts was transcribed with Whisper from OpenAI, an open-source neural net trained on almost 700 hours of audio. The model includes an encoder-decoder architecture by tokenizing audio into 30-second chunks, normalizing audio samples to the log-Mel scale, and passing the data into an encoder. A decoder is trained to predict the captioned text matching the encoder, and the model includes transcription, as well as timestamp-aligned transcription, and multilingual translation.

Screen Shot 2023-01-29 at 11 09 57 PM

The transcription process outputs a single string file, so it's up to the end-user to parse out individual speakers, or run the model [through a sec