Reference for choosing Whisper model sizes when using custom fine-tuned models with FUTO Voice Input on Android.
| Model | Parameters | Approx GGML Size |
|---|---|---|
| Tiny | 39M | ~75 MB |
| Base | 74M | ~142 MB |
| Small | 244M | ~465 MB |
| Medium | 769M | ~1.5 GB |
| Large-v3 | 1.5B | ~3 GB |
Note:
FUTO displays the stock models by parameter size (39, 74, 244) rather than by name. But these are the correlations.
A single data point just to provide a sense for what type of on-device transcription is viable on specific types of hardware:
My current Android is a One Plus Nord 3 5G
Operating System: OxygenOS 13.1 based on Android™ 13 CPU: MediaTek Dimensity 9000 GPU: Arm® Mali G710 MC10 RAM: 8GB/16GB LPDDR5X Storage: 128GB/256GB UFS 3.1 Available configurations: 8GB+128GB / 16GB+256GB Vibration: Haptic motor
On this handset:
- Tiny (39M) is very fast but insufficiently accurate
- Base (74M) provides the best performance tradoff
- Small (244M) works but is not really fast enough for standard use
Note: I have Medium and Large available as fine-tunes (they are not provided as stock images) but not as ACFT fine tunes. Therefore, I can't assess whether they would work. But given that Small already pushes the performance envelope, there is no rationale reason to think why they would work!
Based on my testing with FUTO Voice Input:
- Tiny/Base — Practical for real-time mobile use, responsive inference
- Small — Borderline; may be sluggish on older phones but workable on newer devices
- Medium/Large — Too heavy for comfortable mobile inference on my device
These recommendations are based on my specific hardware. Your mileage may vary:
- Newer/flagship phones may handle Medium reasonably well
- Devices with dedicated NPUs or better thermal management could support larger models
- Battery life impact increases significantly with larger models
Test on your own device to find the right balance between accuracy and responsiveness.
Generated by Claude Code. Please validate this information for your specific use case.