Using OpenAI's whisper language model.
Make sure you have docker installed.
./run.sh podcast_file.mp3 --model tiny --language English > transcript.txtThis will transcribe the podcast file podcast_file.mp3 using the tiny model and the English language. The transcript will be saved in transcript.txt.
You can tail (tail -f transcript.txt) the transcript file on another terminal to see the text as it is being generated:
Use one of the following models based on your requirements as explained in the whisper docs:
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|---|---|---|---|---|---|
| tiny | 39 M | tiny.en | tiny | ~1 GB | ~10x |
| base | 74 M | base.en | base | ~1 GB | ~7x |
| small | 244 M | small.en | small | ~2 GB | ~4x |
| medium | 769 M | medium.en | medium | ~5 GB | ~2x |
| large | 1550 M | N/A | large | ~10 GB | 1x |
| turbo | 809 M | N/A | turbo | ~6 GB | ~8x |