This is a summary of a prompt in which the user asked:
"If I run Whisper locally and choose between tiny, small, medium, and large models, how much accuracy difference should I expect between them?"
The goal was to understand how much less accurate smaller Whisper models are compared to the larger ones.
| Model Size | Relative Accuracy | Estimated Word Error Rate (WER) | Notes |
|---|---|---|---|
| Large | 100% (baseline) | Lowest WER (~4-5%) | Most accurate, but computationally heavy |
| Medium | ~99% | ~5-6% | Nearly as good as large, much faster |
| Small | ~97–98% | ~6-8% | Good speed/accuracy balance for most cases |
| Tiny | ~90–95% | ~10–15%+ | Fastest, but significantly less accurate |
- Medium vs. Large: ~0.5–1% less accurate
- Small vs. Medium: ~1–2% less accurate
- Tiny vs. Small: ~5–8% less accurate
These figures are approximations and vary depending on:
- Language
- Audio clarity
- Domain vocabulary
- Noise levels in recordings
This gist was generated with the help of OpenAI based on the user's prompt and publicly available model benchmarks (as of May 2025).