Skip to content

Instantly share code, notes, and snippets.

@erogol
Last active February 12, 2024 23:46
Show Gist options
  • Star 25 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save erogol/97516ad65b44dbddb8cd694953187c5b to your computer and use it in GitHub Desktop.
Save erogol/97516ad65b44dbddb8cd694953187c5b to your computer and use it in GitHub Desktop.
TTS_example.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "TTS_example.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/erogol/97516ad65b44dbddb8cd694953187c5b/tts_example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cjD0xW0cEMVT"
},
"source": [
"# Hands-on example for 🐸 [Coqui TTS](https://github.com/coqui-ai/TTS)\n",
"\n",
"This notebook trains Tacotron model on LJSpeech dataset."
]
},
{
"cell_type": "markdown",
"source": [
"## Download LJSpeech"
],
"metadata": {
"id": "QPA2gbqRi9Wx"
}
},
{
"cell_type": "code",
"metadata": {
"id": "XGiNTMShZYvj"
},
"source": [
"# download LJSpeech dataset\n",
"!wget http://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2\n",
"# decompress\n",
"!tar -xjf LJSpeech-1.1.tar.bz2"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "__k0BrbfLQ-F"
},
"source": [
"# create train-val splits\n",
"!shuf LJSpeech-1.1/metadata.csv > LJSpeech-1.1/metadata_shuf.csv\n",
"!head -n 12000 LJSpeech-1.1/metadata_shuf.csv > LJSpeech-1.1/metadata_train.csv\n",
"!tail -n 1100 LJSpeech-1.1/metadata_shuf.csv > LJSpeech-1.1/metadata_val.csv"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Setup environment"
],
"metadata": {
"id": "ocmh66BqjLCF"
}
},
{
"cell_type": "code",
"metadata": {
"id": "pyJwcU9pDUE-"
},
"source": [
"!pip install TTS "
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "zV-vHTWyirQv"
},
"source": [
"# install espeak backend if you like to use phonemes instead of raw characters\n",
"!sudo apt-get install espeak-ng"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Train Tacotron DCA"
],
"metadata": {
"id": "2Af-yiyFjU-f"
}
},
{
"cell_type": "code",
"metadata": {
"id": "y7_Xao7uNOvX"
},
"source": [
"\n",
"import os\n",
"\n",
"from trainer import Trainer, TrainerArgs\n",
"\n",
"from TTS.config.shared_configs import BaseAudioConfig\n",
"from TTS.tts.configs.shared_configs import BaseDatasetConfig\n",
"from TTS.tts.configs.tacotron2_config import Tacotron2Config\n",
"from TTS.tts.datasets import load_tts_samples\n",
"from TTS.tts.models.tacotron2 import Tacotron2\n",
"from TTS.tts.utils.text.tokenizer import TTSTokenizer\n",
"from TTS.utils.audio import AudioProcessor\n",
"\n",
"# from TTS.tts.datasets.tokenizer import Tokenizer\n",
"\n",
"output_path = \"./\"\n",
"\n",
"# init configs\n",
"dataset_config = BaseDatasetConfig(\n",
" name=\"ljspeech\", meta_file_train=\"metadata.csv\", path=os.path.join(output_path, \"/content/LJSpeech-1.1\")\n",
")\n",
"\n",
"audio_config = BaseAudioConfig(\n",
" sample_rate=22050,\n",
" do_trim_silence=True,\n",
" trim_db=60.0,\n",
" signal_norm=False,\n",
" mel_fmin=0.0,\n",
" mel_fmax=8000,\n",
" spec_gain=1.0,\n",
" log_func=\"np.log\",\n",
" ref_level_db=20,\n",
" preemphasis=0.0,\n",
")\n",
"\n",
"config = Tacotron2Config( # This is the config that is saved for the future use\n",
" audio=audio_config,\n",
" batch_size=64,\n",
" eval_batch_size=16,\n",
" num_loader_workers=4,\n",
" num_eval_loader_workers=4,\n",
" run_eval=True,\n",
" test_delay_epochs=-1,\n",
" ga_alpha=0.0,\n",
" decoder_loss_alpha=0.25,\n",
" postnet_loss_alpha=0.25,\n",
" postnet_diff_spec_alpha=0,\n",
" decoder_diff_spec_alpha=0,\n",
" decoder_ssim_alpha=0,\n",
" postnet_ssim_alpha=0,\n",
" r=2,\n",
" attention_type=\"dynamic_convolution\",\n",
" double_decoder_consistency=False,\n",
" epochs=1000,\n",
" text_cleaner=\"phoneme_cleaners\",\n",
" use_phonemes=True,\n",
" phoneme_language=\"en-us\",\n",
" phoneme_cache_path=os.path.join(output_path, \"phoneme_cache\"),\n",
" print_step=25,\n",
" print_eval=True,\n",
" mixed_precision=False,\n",
" output_path=output_path,\n",
" datasets=[dataset_config],\n",
")\n",
"\n",
"# INITIALIZE THE AUDIO PROCESSOR\n",
"# Audio processor is used for feature extraction and audio I/O.\n",
"# It mainly serves to the dataloader and the training loggers.\n",
"ap = AudioProcessor.init_from_config(config)\n",
"\n",
"# INITIALIZE THE TOKENIZER\n",
"# Tokenizer is used to convert text to sequences of token IDs.\n",
"# If characters are not defined in the config, default characters are passed to the config\n",
"tokenizer, config = TTSTokenizer.init_from_config(config)\n",
"\n",
"# LOAD DATA SAMPLES\n",
"# Each sample is a list of ```[text, audio_file_path, speaker_name]```\n",
"# You can define your custom sample loader returning the list of samples.\n",
"# Or define your custom formatter and pass it to the `load_tts_samples`.\n",
"# Check `TTS.tts.datasets.load_tts_samples` for more details.\n",
"train_samples, eval_samples = load_tts_samples(\n",
" dataset_config,\n",
" eval_split=True,\n",
" eval_split_max_size=config.eval_split_max_size,\n",
" eval_split_size=config.eval_split_size,\n",
")\n",
"\n",
"# INITIALIZE THE MODEL\n",
"# Models take a config object and a speaker manager as input\n",
"# Config defines the details of the model like the number of layers, the size of the embedding, etc.\n",
"# Speaker manager is used by multi-speaker models.\n",
"model = Tacotron2(config, ap, tokenizer)\n",
"\n",
"# INITIALIZE THE TRAINER\n",
"# Trainer provides a generic API to train all the 🐸TTS models with all its perks like mixed-precision training,\n",
"# distributed training, etc.\n",
"trainer = Trainer(\n",
" TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples\n",
")\n",
"\n",
"# AND... 3,2,1... 🚀\n",
"trainer.fit()"
],
"execution_count": null,
"outputs": []
}
]
}
@Sadam1195
Copy link

Sadam1195 commented Apr 30, 2021

Hi. I was trying to continue my tacotron training run using:
!python TTS/bin/train_tacotron.py --continue_path ../ljspeech-ddc-April-29-2021_10+43PM-e9e0784/ | tee training.log

but I got this output:
`

2021-04-30 11:22:23.043015: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
Using CUDA: True
Number of GPUs: 1
Training continues for ../ljspeech-ddc-April-29-2021_10+43PM-e9e0784/
File "TTS/bin/train_tacotron.py", line 688, in
c = load_config(args.config_path)
File "/content/drive/MyDrive/GraduationProject/utils/TTS/TTS/utils/io.py", line 46, in load_config
data = read_json_with_comments(config_path)
File "/content/drive/MyDrive/GraduationProject/utils/TTS/TTS/utils/io.py", line 30, in read_json_with_comments
data = json.loads(input_str)
File "/usr/lib/python3.7/json/init.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 622 (char 621)
`

Can anyone help?

I was getting the same kind of error. I fixed it by commenting following part in config.json

  // DISTRIBUTED TRAINING
  // "distributed":{
  //    "backend": "nccl",
  //    "url": "tcp:\/\/localhost:54321"
  // },

comment out the character at Invalid control character at: line 1 column 622 (char 621)

@ahgarawani

@ahgarawani
Copy link

Thank you. It now works. However, the training doesn't seem to continue it rather starts from epoch 0. Was that the case with you?
@Sadam1195

@Sadam1195
Copy link

Thank you. It now works. However, the training doesn't seem to continue it rather starts from epoch 0. Was that the case with you?
@Sadam1195

No, that wasn't the case for me. Do not comment the whole line instead fix the character at mentioned space line 1 column 622 (char 621).
If that doesn't fix your problem try using --restore_path checkpoint.pth.tar flag and provide it original location of your config

@SVincent
Copy link

SVincent commented Jun 4, 2021

Is it possible that the code was refactored again? If so, it broke the load_config step:

---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-7-1cccdd43e6e8> in <module>()
      1 # load the default config file and update with the local paths and settings.
      2 import json
----> 3 from TTS.utils.io import load_config
      4 CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json')
      5 CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/'  # set the target dataset to the LJSpeech

ImportError: cannot import name 'load_config' from 'TTS.utils.io' (/content/TTS/TTS/utils/io.py)


---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I thought load_config might have been moved from TTS.utils.io to TTS.config, but since the config.json also no longer exists, I've been kind of stuck at this point.

@ahgarawani
Copy link

Is it possible that the code was refactored again? If so, it broke the load_config step:

---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-7-1cccdd43e6e8> in <module>()
      1 # load the default config file and update with the local paths and settings.
      2 import json
----> 3 from TTS.utils.io import load_config
      4 CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json')
      5 CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/'  # set the target dataset to the LJSpeech

ImportError: cannot import name 'load_config' from 'TTS.utils.io' (/content/TTS/TTS/utils/io.py)


---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I thought load_config might have been moved from TTS.utils.io to TTS.config, but since the config.json also no longer exists, I've been kind of stuck at this point.

the load_config function is not in TTS.utils.io, but it is still in the mozilla repo maybe clone that instead.
git clone https://github.com/mozilla/TTS

@SVincent
Copy link

SVincent commented Jun 4, 2021

the load_config function is not in TTS.utils.io, but it is still in the mozilla repo maybe clone that instead.
git clone https://github.com/mozilla/TTS

I will probably give the mozilla TTS a try.
Still: "TTS.utils.io" and the "configs.json" file are both still referenced in the readme of the Coqui-ai TTS.

Edit: Well, and there's the fact that the Mozilla TTS also refers to this colab, which at the moment is broken.

@Sadam1195
Copy link

Is it possible that the code was refactored again? If so, it broke the load_config step:

---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-7-1cccdd43e6e8> in <module>()
      1 # load the default config file and update with the local paths and settings.
      2 import json
----> 3 from TTS.utils.io import load_config
      4 CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json')
      5 CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/'  # set the target dataset to the LJSpeech

ImportError: cannot import name 'load_config' from 'TTS.utils.io' (/content/TTS/TTS/utils/io.py)


---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

I thought load_config might have been moved from TTS.utils.io to TTS.config, but since the config.json also no longer exists, I've been kind of stuck at this point.

I am not sure if this would be wise as https://github.com/mozilla/TTS is not maintained anymore.

the load_config function is not in TTS.utils.io, but it is still in the mozilla repo maybe clone that instead.
git clone https://github.com/mozilla/TTS

May be you can edit the code and change the from TTS.utils.io import load_config to from TTS.config import load_config

Is it possible that the code was refactored again?

Yes. Config loading was refactored/resolved in the recent updates, may be that could be causing the break.
@SVincent

@ahgarawani
Copy link

Hi I have tried to run:
CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path ../tacotron2/config.json | tee ../tacotron2/training.log

on a notebook instance on gcp instead of colab and I got this even though it works fine on colab:
image

does anybody have an idea what is going on ?

@Sadam1195
Copy link

Hi I have tried to run:
CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path ../tacotron2/config.json | tee ../tacotron2/training.log

on a notebook instance on gcp instead of colab and I got this even though it works fine on colab:
image

does anybody have an idea what is going on ?

check file directory paths. You have provided invalid path in the config.
@ahgarawani

@turian
Copy link

turian commented Jun 10, 2021

@Sadam1195

from TTS.config import load_config

Which config.json should be used now?

These are the current options:

./TTS/speaker_encoder/configs/config.json
./tests/outputs/dummy_model_config.json
./tests/inputs/test_tacotron2_config.json
./tests/inputs/server_config.json
./tests/inputs/test_tacotron_bd_config.json
./tests/inputs/test_tacotron_config.json
./tests/inputs/test_vocoder_multiband_melgan_config.json
./tests/inputs/test_speaker_encoder_config.json
./tests/inputs/test_vocoder_audio_config.json
./tests/inputs/test_vocoder_wavernn_config.json
./tests/inputs/test_config.json

@Sadam1195
Copy link

Sadam1195 commented Jun 10, 2021

@Sadam1195

from TTS.config import load_config

Which config.json should be used now?

These are the current options:

./TTS/speaker_encoder/configs/config.json
./tests/outputs/dummy_model_config.json
./tests/inputs/test_tacotron2_config.json
./tests/inputs/server_config.json
./tests/inputs/test_tacotron_bd_config.json
./tests/inputs/test_tacotron_config.json
./tests/inputs/test_vocoder_multiband_melgan_config.json
./tests/inputs/test_speaker_encoder_config.json
./tests/inputs/test_vocoder_audio_config.json
./tests/inputs/test_vocoder_wavernn_config.json
./tests/inputs/test_config.json

You can train the model using following command where as you can use the config located in https://github.com/coqui-ai/TTS/blob/main/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json with latest changes in repo

!CUDA_VISIBLE_DEVICES="0" python TTS/TTS/bin/train_tacotron.py --config_path ./tacotron2-DDC.json \
                                                          --coqpit.output_path ./Results  \
                                                          --coqpit.datasets.0.path ./riccardo_fasol/   \
                                                          --coqpit.audio.stats_path ./scale_stats.npy \

@turian

@tomchingas
Copy link

tomchingas commented Jun 20, 2021

It seems to run if you replace the second to last cell:

# load the default config file and update with the local paths and settings.
import json
from TTS.utils.io import load_config
CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json')
CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/' # set the target dataset to the LJSpeech
CONFIG['audio']['stats_path'] = None # do not use mean and variance stats to normalizat spectrograms. Mean and variance stats need to be computed separately.
CONFIG['output_path'] = '../'
with open('config.json', 'w') as fp:
json.dump(CONFIG, fp)

With:

!python /content/TTS/TTS/bin/compute_statistics.py /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json /content/TTS/scale_stats.npy --data_path /content/LJSpeech-1.1/wavs/

And replace the last cell:

# pull the trigger
!CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path config.json | tee training.log

With:

!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \

@thomasvonl
Copy link

thomasvonl commented Jun 20, 2021

It seems to run if you replace the second to last cell:

# load the default config file and update with the local paths and settings. import json from TTS.utils.io import load_config CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json') CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/' # set the target dataset to the LJSpeech CONFIG['audio']['stats_path'] = None # do not use mean and variance stats to normalizat spectrograms. Mean and variance stats need to be computed separately. CONFIG['output_path'] = '../' with open('config.json', 'w') as fp: json.dump(CONFIG, fp)

With:

!python /content/TTS/TTS/bin/compute_statistics.py /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json /content/TTS/scale_stats.npy --data_path /content/LJSpeech-1.1/wavs/

And replace the last cell:

# pull the trigger !CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path config.json | tee training.log

With:

!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \

The first line seems fine, but when I run the second line something goes wrong, I have get the following error message

Using CUDA: True
Number of GPUs: 1
Mixed precision mode is ON
fatal: not a git repository (or any of the parent directories): .git
Git Hash: 0000000
Experiment folder: DEFINE THIS/ljspeech-ddc-June-20-2021_02+22PM-0000000
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "/content/TTS/TTS/bin/train_tacotron.py", line 737, in
args, config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = init_training(sys.argv)
File "/content/TTS/TTS/utils/arguments.py", line 182, in init_training
config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = process_args(args)
File "/content/TTS/TTS/utils/arguments.py", line 168, in process_args
copy_model_files(config, experiment_path, new_fields)
File "/content/TTS/TTS/utils/io.py", line 42, in copy_model_files
copy_stats_path,
File "/usr/lib/python3.7/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'scale_stats.npy'

But the file exists in the directory, how should I solve this problem?

@Sadam1195
Copy link

Sadam1195 commented Jun 20, 2021

It seems to run if you replace the second to last cell:
# load the default config file and update with the local paths and settings. import json from TTS.utils.io import load_config CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json') CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/' # set the target dataset to the LJSpeech CONFIG['audio']['stats_path'] = None # do not use mean and variance stats to normalizat spectrograms. Mean and variance stats need to be computed separately. CONFIG['output_path'] = '../' with open('config.json', 'w') as fp: json.dump(CONFIG, fp)
With:
!python /content/TTS/TTS/bin/compute_statistics.py /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json /content/TTS/scale_stats.npy --data_path /content/LJSpeech-1.1/wavs/
And replace the last cell:
# pull the trigger !CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path config.json | tee training.log
With:
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \

The first line seems fine, but when I run the second line something goes wrong, I have get the following error message

Using CUDA: True
Number of GPUs: 1
Mixed precision mode is ON
fatal: not a git repository (or any of the parent directories): .git
Git Hash: 0000000
Experiment folder: DEFINE THIS/ljspeech-ddc-June-20-2021_02+22PM-0000000
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "/content/TTS/TTS/bin/train_tacotron.py", line 737, in
args, config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = init_training(sys.argv)
File "/content/TTS/TTS/utils/arguments.py", line 182, in init_training
config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = process_args(args)
File "/content/TTS/TTS/utils/arguments.py", line 168, in process_args
copy_model_files(config, experiment_path, new_fields)
File "/content/TTS/TTS/utils/io.py", line 42, in copy_model_files
copy_stats_path,
File "/usr/lib/python3.7/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'scale_stats.npy'

But the file exists in the directory, how should I solve this problem?

Replace
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \
with
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path ./content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path ./content/LJSpeech-1.1/ \ --coqpit.audio.stats_path ./content/TTS/scale_stats.npy \

@gbvssd

@thomasvonl
Copy link

It seems to run if you replace the second to last cell:
# load the default config file and update with the local paths and settings. import json from TTS.utils.io import load_config CONFIG = load_config('/content/TTS/TTS/tts/configs/config.json') CONFIG['datasets'][0]['path'] = '../LJSpeech-1.1/' # set the target dataset to the LJSpeech CONFIG['audio']['stats_path'] = None # do not use mean and variance stats to normalizat spectrograms. Mean and variance stats need to be computed separately. CONFIG['output_path'] = '../' with open('config.json', 'w') as fp: json.dump(CONFIG, fp)
With:
!python /content/TTS/TTS/bin/compute_statistics.py /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json /content/TTS/scale_stats.npy --data_path /content/LJSpeech-1.1/wavs/
And replace the last cell:
# pull the trigger !CUDA_VISIBLE_DEVICES="0" python TTS/bin/train_tacotron.py --config_path config.json | tee training.log
With:
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \

The first line seems fine, but when I run the second line something goes wrong, I have get the following error message

Using CUDA: True
Number of GPUs: 1
Mixed precision mode is ON
fatal: not a git repository (or any of the parent directories): .git
Git Hash: 0000000
Experiment folder: DEFINE THIS/ljspeech-ddc-June-20-2021_02+22PM-0000000
fatal: not a git repository (or any of the parent directories): .git
Traceback (most recent call last):
File "/content/TTS/TTS/bin/train_tacotron.py", line 737, in
args, config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = init_training(sys.argv)
File "/content/TTS/TTS/utils/arguments.py", line 182, in init_training
config, OUT_PATH, AUDIO_PATH, c_logger, tb_logger = process_args(args)
File "/content/TTS/TTS/utils/arguments.py", line 168, in process_args
copy_model_files(config, experiment_path, new_fields)
File "/content/TTS/TTS/utils/io.py", line 42, in copy_model_files
copy_stats_path,
File "/usr/lib/python3.7/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'scale_stats.npy'

But the file exists in the directory, how should I solve this problem?

Replace
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path /content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path /content/LJSpeech-1.1/ \ --coqpit.audio.stats_path /content/TTS/scale_stats.npy \
with
!CUDA_VISIBLE_DEVICES="0" python /content/TTS/TTS/bin/train_tacotron.py --config_path ./content/TTS/recipes/ljspeech/tacotron2-DDC/tacotron2-DDC.json \ --coqpit.output_path ./Results \ --coqpit.datasets.0.path ./content/LJSpeech-1.1/ \ --coqpit.audio.stats_path ./content/TTS/scale_stats.npy \

@gbvssd

I have tried the above command line but it does not work either. I think it is not the file path problem, but more like some argument parsing problem, that the audio.stats_path do not parsing correctly, because in the code the attribute "config.audio.stats_path" is "scale_stats.npy" not the "/content/TTS/scale_stats.npy ".

@Sadam1195
Copy link

I have tried the above command line but it does not work either. I think it is not the file path problem, but more like some argument parsing problem, that the audio.stats_path do not parsing correctly, because in the code the attribute "config.audio.stats_path" is "scale_stats.npy" not the "/content/TTS/scale_stats.npy ".

Sorrry, I missed that. You can double check if this "/content/TTS/scale_stats.npy exists or not if if doesn't. Don't use config.audio.stats_path aurguments.
@gbvssd

@thomasvonl
Copy link

The file scale_stats.npy exists and the file path is correct. Is the -- coqpit.addio.stats_path argument set the state file path? And I track the error to the "utils/io.py" file and test the value of "config.audio.stats_path", I assume it should be "/content/TTS/scale_stats.npy " but it is "scale_stats.npy".
@Sadam1195

@Sadam1195
Copy link

Sadam1195 commented Jun 22, 2021

The file scale_stats.npy exists and the file path is correct. Is the -- coqpit.addio.stats_path argument set the state file path? And I track the error to the "utils/io.py" file and test the value of "config.audio.stats_path", I assume it should be "/content/TTS/scale_stats.npy " but it is "scale_stats.npy".

then may be you are not using the right version of repo. Let's get in touch on discussion or gitter as this conversation will spam the gist.
@gbvssd

Copy link

ghost commented Sep 23, 2021

Hello guys , I have some errors can anyone guide me to solve this
First Try:
image

Second Try:
image

Third Try :
image

What should I do Now ?

Thanks in Advance

all the community members

@windowshopr
Copy link

Anyone figure this out yet?

@elhamadjidi
Copy link

Anyone figure this out yet?

I encountered the same issue, instead of !git clone https://github.com/coqui-ai/TTS do this !git clone https://github.com/mozilla/TTS although you can make the json file, you still gonna have another problem.

@erogol
Copy link
Author

erogol commented May 30, 2022

Updated the notebook based on the latest version of 🐸TTS

@omkarade
Copy link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@Mynuddin-dev
Copy link

I am receiving such an error:

  File "train.py", line 13, in <module>
    from TTS.datasets.TTSDataset import MyDataset
ModuleNotFoundError: No module named 'TTS'

Where I might be doing a mistake?

I am in TTS project directory, LJSpeech is in subfile in that directory

run it again it will be ok

@raphaelmerx
Copy link

training worked fine after replacing the name= argument to BaseDatasetConfig with formatter=:

--- a/tts_example.ipynb
+++ b/tts_example_new.ipynb
@@ -139,7 +139,7 @@
         "\n",
         "# init configs\n",
         "dataset_config = BaseDatasetConfig(\n",
-        "    name=\"ljspeech\", meta_file_train=\"metadata.csv\", path=os.path.join(output_path, \"/content/LJSpeech-1.1\")\n",
+        "    formatter=\"ljspeech\", meta_file_train=\"metadata.csv\", path=os.path.join(output_path, \"/content/LJSpeech-1.1\")\n",
         ")\n",
         "\n",
         "audio_config = BaseAudioConfig(\n",

@raphaelmerx
Copy link

Seems like this notebook is creating train-val splits in metadata_train.csv and metadata_val.csv, but then they're not using in the dataset config. Shouldn't it be

dataset_config = BaseDatasetConfig(
    formatter ="ljspeech", meta_file_train="metadata_train.csv", meta_file_val="metadata_val.csv", path=os.path.join(output_path, "/content/LJSpeech-1.1")
)

@ZaneLeo111
Copy link

When I try to install -e, it gives me an error:

import os
os.chdir('/content/coqui-TTS')
!pip install -e .[all]

I got those error:
Failed to build TTS
ERROR: Could not build wheels for TTS, which is required to install pyproject.toml-based projects

Can anyone help?

@omkarade
Copy link

@ZeLiu1 The error "Failed to build TTS" and "ERROR: Could not build wheels for TTS" during the installation with -e (editable) option indicates missing dependencies or build requirements for the TTS package. Ensure all dependencies are installed, check Python version compatibility, and have necessary build tools. Consider using a virtual environment or building from the source repository to resolve the issue.

@iaclaudioia8
Copy link

Hi all,

I have launched the google colab: TTS_example.ipynb
But when Train Tacotron DCA starts i get the follow issue:


TypeError Traceback (most recent call last)
in <cell line: 18>()
16
17 # init configs
---> 18 dataset_config = BaseDatasetConfig(
19 name="ljspeech", meta_file_train="metadata.csv", path=os.path.join(output_path, "/content/LJSpeech-1.1")
20 )

TypeError: BaseDatasetConfig.init() got an unexpected keyword argument 'name'

@tomsepe
Copy link

tomsepe commented Feb 12, 2024

try installing Visual Studio Build Tools 2022 Microsoft C++ Build Tools and make sure you check the box for the "Desktop development with C++" Install option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment