Running two TTS architectures on the same small fine-tune corpus surfaces a real trade-off: F5-TTS commits hard to accent character at the cost of phonetic stability; StyleTTS2 stays phonetically stable at the cost of accent commitment. Neither dominates. Each has its own late-epoch failure mode, just different ones.
This is a write-up of the comparison, with the concrete failure modes that made the trade-off visible.