Skip to content

Instantly share code, notes, and snippets.

@davesarmoury
Created December 22, 2022 15:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save davesarmoury/a564f000ff80d440cd2b3df1f68f9120 to your computer and use it in GitHub Desktop.
Save davesarmoury/a564f000ff80d440cd2b3df1f68f9120 to your computer and use it in GitHub Desktop.
tao spectro_gen finetune -e /specs/spectro_gen/finetune.yaml -g 1 -k tlt_encode -r /results/spectro_gen/finetune -m /data/tts_en_fastpitch_v1.4.0/tts_en_fastpitch_align.nemo train_dataset=/data/GLaDOS/merged_train.json validation_dataset=/data/GLaDOS/manifest_val.json prior_folder=/results/spectro_gen/finetune/prior_folder trainer.max_epochs=200…
2022-12-22 10:36:33,950 [INFO] root: Registry: ['nvcr.io']
2022-12-22 10:36:33,984 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2022-12-22 10:36:33,993 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/davesarmoury/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
[NeMo W 2022-12-22 15:36:37 experimental:27] Module <class 'nemo.collections.tts.torch.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-12-22 15:36:37 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-12-22 15:36:41 experimental:27] Module <class 'nemo.collections.tts.torch.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-12-22 15:36:41 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] <frozen conv_ai.tts.spectro_gen.scripts.finetune>:285: UserWarning:
'finetune.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
[NeMo I 2022-12-22 15:36:42 <frozen core:20] Experiment configuration:
restore_from: /data/tts_en_fastpitch_v1.4.0/tts_en_fastpitch_align.nemo
exp_manager:
explicit_log_dir: /results/spectro_gen/finetune
exp_dir: null
name: finetuned-model
version: null
use_datetime_version: true
resume_if_exists: true
resume_past_end: false
resume_ignore_no_checkpoint: true
create_tensorboard_logger: true
summary_writer_kwargs: null
create_wandb_logger: false
wandb_logger_kwargs: null
create_checkpoint_callback: true
checkpoint_callback_params:
filepath: null
dirpath: null
filename: null
monitor: v_loss
verbose: true
save_last: true
save_top_k: 3
save_weights_only: false
mode: min
every_n_epochs: 1
prefix: null
postfix: .tlt
save_best_model: false
always_save_nemo: false
save_nemo_on_train_end: true
model_parallel_size: null
files_to_copy: null
log_step_timing: true
step_timing_kwargs:
reduction: mean
sync_cuda: false
buffer_size: 1
log_local_rank_0_only: false
log_global_rank_0_only: false
trainer:
logger: false
checkpoint_callback: false
callbacks: null
default_root_dir: null
gradient_clip_val: 0.0
process_position: 0
num_nodes: 1
gpus: 1
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
progress_bar_refresh_rate: 1
enable_progress_bar: true
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 200
min_epochs: 1
max_steps: null
min_steps: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
accelerator: ddp
sync_batchnorm: false
precision: 16
weights_summary: full
weights_save_path: null
num_sanity_val_steps: 2
resume_from_checkpoint: null
profiler: null
benchmark: false
deterministic: false
auto_lr_find: false
replace_sampler_ddp: true
detect_anomaly: false
terminate_on_nan: false
auto_scale_batch_size: false
prepare_data_per_node: true
amp_backend: native
amp_level: null
plugins: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
limit_predict_batches: 1.0
stochastic_weight_avg: false
gradient_clip_algorithm: norm
max_time: null
reload_dataloaders_every_n_epochs: 0
ipus: null
devices: null
strategy: null
enable_checkpointing: true
enable_model_summary: true
train_ds:
dataset:
_target_: nemo.collections.tts.torch.data.TTSDataset
manifest_filepath: ${train_dataset}
max_duration: null
min_duration: 0.1
int_values: false
normalize: true
sample_rate: ${sample_rate}
trim: true
trim_top_db: 50
sup_data_path: ${prior_folder}
sup_data_types: ${sup_data_types}
n_window_stride: ${n_window_stride}
n_window_size: ${n_window_size}
pitch_fmin: ${pitch_fmin}
pitch_fmax: ${pitch_fmax}
pitch_mean: ${pitch_avg}
pitch_std: ${pitch_std}
pitch_norm: true
n_fft: 1024
win_length: 1024
hop_length: 256
window: hann
n_mels: 80
lowfreq: 0
highfreq: 8000
ignore_file: null
text_normalizer:
_target_: nemo_text_processing.text_normalization.normalize.Normalizer
lang: en
input_case: cased
whitelist: ${whitelist_path}
text_normalizer_call_kwargs:
verbose: false
punct_pre_process: true
punct_post_process: true
text_tokenizer:
_target_: nemo.collections.tts.torch.tts_tokenizers.EnglishPhonemesTokenizer
punct: true
stresses: true
chars: true
apostrophe: true
pad_with_space: true
g2p:
_target_: nemo.collections.tts.torch.g2ps.EnglishG2p
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.5
dataloader_params:
drop_last: false
shuffle: true
batch_size: 16
num_workers: 12
validation_ds:
dataset:
_target_: nemo.collections.tts.torch.data.TTSDataset
manifest_filepath: ${validation_dataset}
max_duration: null
min_duration: 0.1
int_values: false
normalize: true
sample_rate: ${sample_rate}
trim: true
sup_data_path: ${prior_folder}
sup_data_types: ${sup_data_types}
n_window_stride: ${n_window_stride}
n_window_size: ${n_window_size}
pitch_fmin: ${pitch_fmin}
pitch_fmax: ${pitch_fmax}
pitch_mean: ${pitch_avg}
pitch_std: ${pitch_std}
pitch_norm: true
n_fft: 1024
win_length: 1024
hop_length: 256
window: hann
n_mels: 80
lowfreq: 0
highfreq: 8000
ignore_file: null
text_normalizer:
_target_: nemo_text_processing.text_normalization.normalize.Normalizer
lang: en
input_case: cased
whitelist: ${whitelist_path}
text_normalizer_call_kwargs:
verbose: false
punct_pre_process: true
punct_post_process: true
text_tokenizer:
_target_: nemo.collections.tts.torch.tts_tokenizers.EnglishPhonemesTokenizer
punct: true
stresses: true
chars: true
apostrophe: true
pad_with_space: true
g2p:
_target_: nemo.collections.tts.torch.g2ps.EnglishG2p
phoneme_dict: ${phoneme_dict_path}
heteronyms: ${heteronyms_path}
phoneme_probability: 0.5
dataloader_params:
drop_last: false
shuffle: true
batch_size: 32
num_workers: 12
train_dataset: /data/GLaDOS/merged_train.json
validation_dataset: /data/GLaDOS/manifest_val.json
phoneme_dict_path: ???
heteronyms_path: ???
whitelist_path: ???
sup_data_types:
- align_prior_matrix
- pitch
- speaker_id
optim:
name: adam
lr: 0.0002
betas:
- 0.9
- 0.98
weight_decay: 1.0e-06
encryption_key: '*****'
tlt_checkpoint_interval: 0
sample_rate: 22050
prior_folder: /results/spectro_gen/finetune/prior_folder
n_speakers: 2
n_window_size: 1024
n_window_stride: 256
pitch_fmin: 80.0
pitch_fmax: 2048.0
pitch_avg: 165.458
pitch_std: 40.1891
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='ddp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='ddp')` instead.
rank_zero_deprecation(
Using 16bit native Automatic Mixed Precision (AMP)
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py:52: LightningDeprecationWarning: Setting `max_steps = None` is deprecated in v1.5 and will no longer be supported in v1.7. Use `max_steps = -1` instead.
rank_zero_deprecation(
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:151: LightningDeprecationWarning: Setting `Trainer(checkpoint_callback=False)` is deprecated in v1.5 and will be removed in v1.7. Please consider using `Trainer(enable_checkpointing=False)`.
rank_zero_deprecation(
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:96: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=1)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:191: LightningDeprecationWarning: Setting `Trainer(weights_summary=full)` is deprecated in v1.5 and will be removed in v1.7. Please pass `pytorch_lightning.callbacks.model_summary.ModelSummary` with `max_depth` directly to the Trainer's `callbacks` argument instead.
rank_zero_deprecation(
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:81: LightningDeprecationWarning: Setting `prepare_data_per_node` with the trainer flag is deprecated in v1.5.0 and will be removed in v1.7.0. Please set `prepare_data_per_node` in `LightningDataModule` and/or `LightningModule` directly instead.
rank_zero_deprecation(
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:572: LightningDeprecationWarning: Trainer argument `terminate_on_nan` was deprecated in v1.5 and will be removed in 1.7. Please use `Trainer(detect_anomaly=True)` instead.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:61: LightningDeprecationWarning: Setting `Trainer(flush_logs_every_n_steps=100)` is deprecated in v1.5 and will be removed in v1.7. Please configure flushing in the logger instead.
rank_zero_deprecation(
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
[NeMo W 2022-12-22 15:36:42 exp_manager:422] There was no checkpoint folder at checkpoint_dir :/results/spectro_gen/finetune/checkpoints. Training from scratch.
[NeMo I 2022-12-22 15:36:42 exp_manager:286] Experiments will be logged at /results/spectro_gen/finetune
[NeMo I 2022-12-22 15:36:42 exp_manager:660] TensorboardLogger has been set up
[NeMo W 2022-12-22 15:36:42 nemo_logging:349] /opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:2313: LightningDeprecationWarning: `Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.
rank_zero_deprecation("`Trainer.weights_save_path` has been deprecated in v1.6 and will be removed in v1.8.")
Loading pretrained model from /data/tts_en_fastpitch_v1.4.0/tts_en_fastpitch_align.nemo.
Setting number of speakers 2
[NeMo W 2022-12-22 15:36:45 fastpitch:100] AudioToCharWithPriorAndPitchDataset class has been deprecated. No support for training or finetuning. Only inference is supported.
[NeMo W 2022-12-22 15:36:45 fastpitch:100] AudioToCharWithPriorAndPitchDataset class has been deprecated. No support for training or finetuning. Only inference is supported.
[NeMo E 2022-12-22 15:36:45 common:503] Model instantiation failed!
Target class: nemo.collections.tts.models.fastpitch.FastPitchModel
Error(s): src path does not exist or it is not a path in nemo file. src value I got was: scripts/tts_dataset_files/cmudict-0.7b_nv22.08. Absolute: /opt/nvidia/tools/scripts/tts_dataset_files/cmudict-0.7b_nv22.08
Traceback (most recent call last):
File "<frozen eff.core.archive>", line 764, in extract
File "/opt/conda/lib/python3.8/tarfile.py", line 2060, in extract
tarinfo = self.getmember(member)
File "/opt/conda/lib/python3.8/tarfile.py", line 1782, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'manifest.yaml' not found"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen eff.core.archive>", line 653, in restore_manifest
File "<frozen eff.core.archive>", line 767, in extract
File "/opt/conda/lib/python3.8/tarfile.py", line 2060, in extract
tarinfo = self.getmember(member)
File "/opt/conda/lib/python3.8/tarfile.py", line 1782, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename './manifest.yaml' not found"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen core.connectors.save_restore_connector>", line 94, in restore_from
File "<frozen core.cookbooks.nemo_cookbook>", line 404, in restore_from
File "<frozen core.cookbooks.cookbook>", line 203, in validate_archive
File "<frozen eff.core.archive>", line 657, in restore_manifest
TypeError: The indicated file '/data/tts_en_fastpitch_v1.4.0/tts_en_fastpitch_align.nemo' is not an EFF archive
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/NeMo/nemo/core/classes/common.py", line 482, in from_config_dict
instance = imported_cls(cfg=config, trainer=trainer)
File "/opt/NeMo/nemo/collections/tts/models/fastpitch.py", line 105, in __init__
self._setup_tokenizer(tokenizer_conf)
File "/opt/NeMo/nemo/collections/tts/models/fastpitch.py", line 188, in _setup_tokenizer
g2p_kwargs["phoneme_dict"] = self.register_artifact(
File "/opt/NeMo/nemo/core/classes/modelPT.py", line 222, in register_artifact
return self._save_restore_connector.register_artifact(self, config_path, src, verify_src_exists)
File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 378, in register_artifact
raise FileNotFoundError(
FileNotFoundError: src path does not exist or it is not a path in nemo file. src value I got was: scripts/tts_dataset_files/cmudict-0.7b_nv22.08. Absolute: /opt/nvidia/tools/scripts/tts_dataset_files/cmudict-0.7b_nv22.08
Error executing job with overrides: ['exp_manager.explicit_log_dir=/results/spectro_gen/finetune', 'trainer.gpus=1', 'restore_from=/data/tts_en_fastpitch_v1.4.0/tts_en_fastpitch_align.nemo', 'encryption_key=tlt_encode', 'train_dataset=/data/GLaDOS/merged_train.json', 'validation_dataset=/data/GLaDOS/manifest_val.json', 'prior_folder=/results/spectro_gen/finetune/prior_folder', 'trainer.max_epochs=200', 'n_speakers=2', 'pitch_fmin=80.0', 'pitch_fmax=2048.0', 'pitch_avg=165.458', 'pitch_std=40.1891', 'trainer.precision=16']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 252, in run_and_report
assert mdl is not None
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/conv_ai/tts/spectro_gen/scripts/finetune.py>", line 3, in <module>
File "<frozen conv_ai.tts.spectro_gen.scripts.finetune>", line 285, in <module>
File "/opt/NeMo/nemo/core/config/hydra_runner.py", line 104, in wrapper
_run_hydra(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
raise ex
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
lambda: hydra.run(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "<frozen conv_ai.tts.spectro_gen.scripts.finetune>", line 224, in main
File "/opt/NeMo/nemo/core/classes/modelPT.py", line 311, in restore_from
instance = cls._save_restore_connector.restore_from(
File "<frozen core.connectors.save_restore_connector>", line 94, in restore_from
File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 235, in restore_from
loaded_params = self.load_config_and_state_dict(
File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 158, in load_config_and_state_dict
instance = calling_cls.from_config_dict(config=conf, trainer=trainer)
File "/opt/NeMo/nemo/core/classes/common.py", line 504, in from_config_dict
raise e
File "/opt/NeMo/nemo/core/classes/common.py", line 496, in from_config_dict
instance = cls(cfg=config, trainer=trainer)
File "/opt/NeMo/nemo/collections/tts/models/fastpitch.py", line 105, in __init__
self._setup_tokenizer(tokenizer_conf)
File "/opt/NeMo/nemo/collections/tts/models/fastpitch.py", line 188, in _setup_tokenizer
g2p_kwargs["phoneme_dict"] = self.register_artifact(
File "/opt/NeMo/nemo/core/classes/modelPT.py", line 222, in register_artifact
return self._save_restore_connector.register_artifact(self, config_path, src, verify_src_exists)
File "/opt/NeMo/nemo/core/connectors/save_restore_connector.py", line 378, in register_artifact
raise FileNotFoundError(
FileNotFoundError: src path does not exist or it is not a path in nemo file. src value I got was: scripts/tts_dataset_files/cmudict-0.7b_nv22.08. Absolute: /opt/nvidia/tools/scripts/tts_dataset_files/cmudict-0.7b_nv22.08
[NeMo I 2022-12-22 15:36:46 <frozen core:166] Sending telemetry data.
[NeMo W 2022-12-22 15:36:46 <frozen core:176] Telemetry data couldn't be sent, but the command ran successfully.
[NeMo W 2022-12-22 15:36:46 <frozen core:177] [Error]: <urlopen error [Errno -2] Name or service not known>
[NeMo W 2022-12-22 15:36:46 <frozen core:181] Execution status: FAIL
2022-12-22 10:36:46,862 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment