Skip to content

Instantly share code, notes, and snippets.

@Jaid
Last active August 5, 2023 18:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jaid/cad22e20d9de80cdb1a0a676cf4f722f to your computer and use it in GitHub Desktop.
Save Jaid/cad22e20d9de80cdb1a0a676cf4f722f to your computer and use it in GitHub Desktop.
Coqui TTS command line help

Retrieved with:

MSYS_NO_PATHCONV=1 docker run --rm --entrypoint /bin/bash ghcr.io/coqui-ai/tts -o errexit -o xtrace -c 'tts --help; python3 /root/TTS/server/server.py --help; python3 /root/TTS/bin/train_tts.py --help'
usage: server.py [-h] [--list_models [LIST_MODELS]] [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME] [--config_path CONFIG_PATH] [--model_path MODEL_PATH] [--vocoder_path VOCODER_PATH]
[--vocoder_config_path VOCODER_CONFIG_PATH] [--speakers_file_path SPEAKERS_FILE_PATH] [--port PORT] [--use_cuda USE_CUDA] [--debug DEBUG] [--show_details SHOW_DETAILS]
options:
-h, --help show this help message and exit
--list_models [LIST_MODELS]
list available pre-trained tts and vocoder models.
--model_name MODEL_NAME
Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>
--vocoder_name VOCODER_NAME
name of one of the released vocoder models.
--config_path CONFIG_PATH
Path to model config file.
--model_path MODEL_PATH
Path to model file.
--vocoder_path VOCODER_PATH
Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).
--vocoder_config_path VOCODER_CONFIG_PATH
Path to vocoder model config file.
--speakers_file_path SPEAKERS_FILE_PATH
JSON file for multi-speaker model.
--port PORT port to listen on.
--use_cuda USE_CUDA true to use CUDA.
--debug DEBUG true to enable Flask debug mode.
--show_details SHOW_DETAILS
Generate model detail page.
usage: train_tts.py [-h] [--continue_path CONTINUE_PATH] [--restore_path RESTORE_PATH] [--best_path BEST_PATH] [--use_ddp true/false] [--use_accelerate true/false] [--grad_accum_steps GRAD_ACCUM_STEPS]
[--overfit_batch true/false] [--skip_train_epoch true/false] [--start_with_eval true/false] [--small_run SMALL_RUN] [--gpu GPU] [--rank RANK] [--group_id GROUP_ID] [--config_path CONFIG_PATH]
options:
-h, --help show this help message and exit
--continue_path CONTINUE_PATH
Coqpit Field: Path to a training folder to continue training. Restore the model from the last checkpoint and continue training under the same folder.
--restore_path RESTORE_PATH
Coqpit Field: Path to a model checkpoit. Restore the model with the given checkpoint and start a new training.
--best_path BEST_PATH
Coqpit Field: Best model file to be used for extracting the best loss. If not specified, the latest best model in continue path is used
--use_ddp true/false Coqpit Field: Use DDP in distributed training. It is to set in `distribute.py`. Do not set manually.
--use_accelerate true/false
Coqpit Field: Use HF Accelerate as the back end for training.
--grad_accum_steps GRAD_ACCUM_STEPS
Coqpit Field: Number of gradient accumulation steps. It is used to accumulate gradients over multiple batches.
--overfit_batch true/false
Coqpit Field: Overfit a single batch for debugging.
--skip_train_epoch true/false
Coqpit Field: Skip training and only run evaluation and test.
--start_with_eval true/false
Coqpit Field: Start with evaluation and test.
--small_run SMALL_RUN
Coqpit Field: Only use a subset of the samples for debugging. Set the number of samples to use. Defaults to None.
--gpu GPU Coqpit Field: GPU ID to use if ```CUDA_VISIBLE_DEVICES``` is not set. Defaults to None.
--rank RANK Coqpit Field: Process rank in a distributed training. Don't set manually.
--group_id GROUP_ID Coqpit Field: Process group id in a distributed training. Don't set manually.
--config_path CONFIG_PATH
Coqpit Field: Path to the config file.
2023-08-05 15:35:03 usage: tts [-h] [--list_models [LIST_MODELS]]
2023-08-05 15:35:03 [--model_info_by_idx MODEL_INFO_BY_IDX]
2023-08-05 15:35:03 [--model_info_by_name MODEL_INFO_BY_NAME] [--text TEXT]
2023-08-05 15:35:03 [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME]
2023-08-05 15:35:03 [--config_path CONFIG_PATH] [--model_path MODEL_PATH]
2023-08-05 15:35:03 [--out_path OUT_PATH] [--use_cuda USE_CUDA]
2023-08-05 15:35:03 [--vocoder_path VOCODER_PATH]
2023-08-05 15:35:03 [--vocoder_config_path VOCODER_CONFIG_PATH]
2023-08-05 15:35:03 [--encoder_path ENCODER_PATH]
2023-08-05 15:35:03 [--encoder_config_path ENCODER_CONFIG_PATH] [--emotion EMOTION]
2023-08-05 15:35:03 [--speakers_file_path SPEAKERS_FILE_PATH]
2023-08-05 15:35:03 [--language_ids_file_path LANGUAGE_IDS_FILE_PATH]
2023-08-05 15:35:03 [--speaker_idx SPEAKER_IDX] [--language_idx LANGUAGE_IDX]
2023-08-05 15:35:03 [--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]]
2023-08-05 15:35:03 [--gst_style GST_STYLE]
2023-08-05 15:35:03 [--capacitron_style_wav CAPACITRON_STYLE_WAV]
2023-08-05 15:35:03 [--capacitron_style_text CAPACITRON_STYLE_TEXT]
2023-08-05 15:35:03 [--list_speaker_idxs [LIST_SPEAKER_IDXS]]
2023-08-05 15:35:03 [--list_language_idxs [LIST_LANGUAGE_IDXS]]
2023-08-05 15:35:03 [--save_spectogram SAVE_SPECTOGRAM] [--reference_wav REFERENCE_WAV]
2023-08-05 15:35:03 [--reference_speaker_idx REFERENCE_SPEAKER_IDX]
2023-08-05 15:35:03 [--progress_bar PROGRESS_BAR] [--source_wav SOURCE_WAV]
2023-08-05 15:35:03 [--target_wav TARGET_WAV] [--voice_dir VOICE_DIR]
2023-08-05 15:35:03
2023-08-05 15:35:03 Synthesize speech on command line.
2023-08-05 15:35:03
2023-08-05 15:35:03 You can either use your trained model or choose a model from the provided list.
2023-08-05 15:35:03
2023-08-05 15:35:03 If you don't specify any models, then it uses LJSpeech based English model.
2023-08-05 15:35:03
2023-08-05 15:35:03 ## Example Runs
2023-08-05 15:35:03
2023-08-05 15:35:03 ### Single Speaker Models
2023-08-05 15:35:03
2023-08-05 15:35:03 - List provided models:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --list_models
2023-08-05 15:35:03
2023-08-05 15:35:03 - Query info for model info by idx:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --model_info_by_idx "<model_type>/<model_query_idx>"
2023-08-05 15:35:03
2023-08-05 15:35:03 - Query info for model info by full name:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run TTS with default models:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS"
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run a TTS model with its default vocoder model:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run with specific TTS and vocoder models from the list:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --output_path
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run your own TTS model (Using Griffin-Lim Vocoder):
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run your own TTS and Vocoder models:
2023-08-05 15:35:03 $ tts --text "Text for TTS" --model_path path/to/config.json --config_path path/to/model.pth --out_path output/path/speech.wav
2023-08-05 15:35:03 --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
2023-08-05 15:35:03
2023-08-05 15:35:03 ### Multi-speaker Models
2023-08-05 15:35:03
2023-08-05 15:35:03 - List the available speakers and choose as <speaker_id> among them:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --model_name "<language>/<dataset>/<model_name>" --list_speaker_idxs
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run the multi-speaker TTS model with the target speaker ID:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --speaker_idx <speaker_id>
2023-08-05 15:35:03
2023-08-05 15:35:03 - Run your own multi-speaker TTS model:
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/config.json --config_path path/to/model.pth --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
2023-08-05 15:35:03
2023-08-05 15:35:03 ### Voice Conversion Models
2023-08-05 15:35:03
2023-08-05 15:35:03 $ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
2023-08-05 15:35:03
2023-08-05 15:35:03
2023-08-05 15:35:03 options:
2023-08-05 15:35:03 -h, --help show this help message and exit
2023-08-05 15:35:03 --list_models [LIST_MODELS]
2023-08-05 15:35:03 list available pre-trained TTS and vocoder models.
2023-08-05 15:35:03 --model_info_by_idx MODEL_INFO_BY_IDX
2023-08-05 15:35:03 model info using query format: <model_type>/<model_query_idx>
2023-08-05 15:35:03 --model_info_by_name MODEL_INFO_BY_NAME
2023-08-05 15:35:03 model info using query format: <model_type>/<language>/<dataset>/<model_name>
2023-08-05 15:35:03 --text TEXT Text to generate speech.
2023-08-05 15:35:03 --model_name MODEL_NAME
2023-08-05 15:35:03 Name of one of the pre-trained TTS models in format <language>/<dataset>/<model_name>
2023-08-05 15:35:03 --vocoder_name VOCODER_NAME
2023-08-05 15:35:03 Name of one of the pre-trained vocoder models in format <language>/<dataset>/<model_name>
2023-08-05 15:35:03 --config_path CONFIG_PATH
2023-08-05 15:35:03 Path to model config file.
2023-08-05 15:35:03 --model_path MODEL_PATH
2023-08-05 15:35:03 Path to model file.
2023-08-05 15:35:03 --out_path OUT_PATH Output wav file path.
2023-08-05 15:35:03 --use_cuda USE_CUDA Run model on CUDA.
2023-08-05 15:35:03 --vocoder_path VOCODER_PATH
2023-08-05 15:35:03 Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).
2023-08-05 15:35:03 --vocoder_config_path VOCODER_CONFIG_PATH
2023-08-05 15:35:03 Path to vocoder model config file.
2023-08-05 15:35:03 --encoder_path ENCODER_PATH
2023-08-05 15:35:03 Path to speaker encoder model file.
2023-08-05 15:35:03 --encoder_config_path ENCODER_CONFIG_PATH
2023-08-05 15:35:03 Path to speaker encoder config file.
2023-08-05 15:35:03 --emotion EMOTION Emotion to condition the model with. Only available for 🐸Coqui Studio models.
2023-08-05 15:35:03 --speakers_file_path SPEAKERS_FILE_PATH
2023-08-05 15:35:03 JSON file for multi-speaker model.
2023-08-05 15:35:03 --language_ids_file_path LANGUAGE_IDS_FILE_PATH
2023-08-05 15:35:03 JSON file for multi-lingual model.
2023-08-05 15:35:03 --speaker_idx SPEAKER_IDX
2023-08-05 15:35:03 Target speaker ID for a multi-speaker TTS model.
2023-08-05 15:35:03 --language_idx LANGUAGE_IDX
2023-08-05 15:35:03 Target language ID for a multi-lingual TTS model.
2023-08-05 15:35:03 --speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]
2023-08-05 15:35:03 wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.
2023-08-05 15:35:03 --gst_style GST_STYLE
2023-08-05 15:35:03 Wav path file for GST style reference.
2023-08-05 15:35:03 --capacitron_style_wav CAPACITRON_STYLE_WAV
2023-08-05 15:35:03 Wav path file for Capacitron prosody reference.
2023-08-05 15:35:03 --capacitron_style_text CAPACITRON_STYLE_TEXT
2023-08-05 15:35:03 Transcription of the reference.
2023-08-05 15:35:03 --list_speaker_idxs [LIST_SPEAKER_IDXS]
2023-08-05 15:35:03 List available speaker ids for the defined multi-speaker model.
2023-08-05 15:35:03 --list_language_idxs [LIST_LANGUAGE_IDXS]
2023-08-05 15:35:03 List available language ids for the defined multi-lingual model.
2023-08-05 15:35:03 --save_spectogram SAVE_SPECTOGRAM
2023-08-05 15:35:03 If true save raw spectogram for further (vocoder) processing in out_path.
2023-08-05 15:35:03 --reference_wav REFERENCE_WAV
2023-08-05 15:35:03 Reference wav file to convert in the voice of the speaker_idx or speaker_wav
2023-08-05 15:35:03 --reference_speaker_idx REFERENCE_SPEAKER_IDX
2023-08-05 15:35:03 speaker ID of the reference_wav speaker (If not provided the embedding will be computed using the Speaker Encoder).
2023-08-05 15:35:03 --progress_bar PROGRESS_BAR
2023-08-05 15:35:03 If true shows a progress bar for the model download. Defaults to True
2023-08-05 15:35:03 --source_wav SOURCE_WAV
2023-08-05 15:35:03 Original audio file to convert in the voice of the target_wav
2023-08-05 15:35:03 --target_wav TARGET_WAV
2023-08-05 15:35:03 Target audio file to convert in the voice of the source_wav
2023-08-05 15:35:03 --voice_dir VOICE_DIR
2023-08-05 15:35:03 Voice dir for tortoise model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment