ICASSP 2024 MMITS-VC CHALENGE SUBMISSION FOR NVIDIA: SCALING NVIDIA’S MULTI-SPEAKER MULTI-LINGUAL TTS SYSTEMS WITH VOICE-CLONING TO INDIC LANGUAGES
Speaker | Initial data points | Duplicates, empty audio or transcripts | Final data points | Initial Durations of final data points (hours) | Final Durations after silence removal (hours) |
---|---|---|---|---|---|
Hindi_M | 17798 | 1 | 17797 | 40.49 | 39.63 |
Hindi_F | 16512 | 29 | 16483 | 40.21 | 38.16 |
Telugu_M | 16939 | 0 | 16939 | 42.03 | 40.97 |
Telugu_F | 15933 | 143 | 15790 | 40.58 | 39.62 |
Marathi_M | 16747 | 0 | 16747 | 41.2 | 38.81 |
Marathi_F | 17874 | 0 | 17874 | 42.83 | 41.43 |
Kannada_M | 16950 | 1 | 16949 | 40.07 | 39.01 |
Kannada_F | 13310 | 0 | 13310 | 40.01 | 37.02 |
Bengali_M | 18550 | 3 | 18547 | 39.89 | 37.9 |
Bengali_F | 16850 | 1 | 16849 | 40.08 | 38.42 |
English_M | 20810 | 23 | 20787 | 39.97 | 39.87 |
English_F | 19610 | 1 | 19609 | 40.02 | 40.02 |
Chhattisgarhi_M | 18199 | 2 | 18197 | 40.03 | 38.9 |
Chhattisgarhi_F | 19949 | 1 | 19948 | 40.06 | 38.16 |
Total | 246031 | 205 | 245826 | 567.49 | 547.95 |