Skip to content

Instantly share code, notes, and snippets.

@laic
Last active June 22, 2021 17:09
Show Gist options
  • Save laic/b57355e5188616c299e9eead892bed30 to your computer and use it in GitHub Desktop.
Save laic/b57355e5188616c299e9eead892bed30 to your computer and use it in GitHub Desktop.
eddie/tts setup
## Get an interactive session on eddie with GPU
qlogin -l h_rt=48:00:00 -pe gpu 1 -l h_vmem=32G
## on eddie/interactive make sure you set CUDA_VISIBLE_DEVICES
source /exports/applications/support/set_cuda_visible_devices.sh
echo $CUDA_VISIBLE_DEVICES
## load anaconda
module load anaconda
conda create -n tts11 python=3.8
source activate tts11
## To install nvidia apex you need a version of gcc/g++ > 5.0. This has to be installed before
## you install pytorch and the rest:
## Get a version of gcc > 5.0. The current anaconda default (June 2021) is 9.3 which seems to work (so far!)
conda install gcc_linux-64 gxx_linux-64
## make these the default C and C++ compilers (you could also set CC and CXX environment variabes)
## but aliasing seems to work well enough
alias gcc=x86_64-conda_cos6-linux-gnu-cc
alias g++=x86_64-conda_cos6-linux-gnu-c++
## Based on some github comments, I tried cuda 11.0.2. It may work with version 10, but I'm not sure.
## The issue is that also need nvcc, which available on eddie. The easiest way to make this 'available'
## is to load the cuda/11.0.2 module, but there seems to be some
## missing some actual cuda libraries there, so we still install cudatoolkit=11.0
module load cuda/11.0.2
## Need to install pytorch after updating gcc/g++?
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
## Check cuda is available through pytorch - should write "True"
python -c 'import torch; print("cuda available?:", torch.cuda.is_available())'
## Now get the apex code from github
git clone https://github.com/NVIDIA/apex
cd apex
## install apex using pip. If you don't get all your gcc/g++, pytorch, cuda versions sorted out earlier, this is where
## you'll start getting errors.
## I sent stdout and stderr to tmp.txt here to debug but you don't need to do this!
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ > tmp.txt 2>&1
## back up a level
cd ../
## install waveglow (get apex sorted out first!)
## This assumes you have the tts11 conda environment activated (as above)
git clone https://github.com/NVIDIA/waveglow.git
cd waveglow
## install some dependencies
conda install matplotlib tensorflow inflect scipy Unidecode pillow
pip install librosa
## get a pretrained model (from the fastpitch scripts) - I actually ended up using a different one to test,
## hence this is commented out.
#: ${MODEL_DIR:="pretrained_models/waveglow"}
#MODEL="nvidia_waveglow256pyt_fp16.pt"
#MODEL_ZIP="waveglow_ckpt_amp_256_20.01.0.zip"
#MODEL_URL="https://api.ngc.nvidia.com/v2/models/nvidia/waveglow_ckpt_amp_256/versions/20.01.0/zip"
#mkdir -p "$MODEL_DIR"
#wget --content-disposition -qO ${MODEL_DIR}/${MODEL_ZIP} ${MODEL_URL}
#unzip -qo ${MODEL_DIR}/${MODEL_ZIP} -d ${MODEL_DIR}
mkdir -p pretrained/waveglow
cd pretrained_models/waveglow/
## Actually I just downloaded the pretrained model and mel_spectrograms listed on the waveglow github (google drive)
## and rsynced them to eddie
## now try out inference (generating speech from mels) from the waveglow directory
mkdir output
python inference.py -f <(ls mel_spectrograms/*.pt) -w pretrained_models/waveglow/waveglow_256channels_universal_v5.pt -o ./output/ --is_fp16 -s 0.6
## seems to work! but haven't tried training...
## Again this assumes you've install apex and have the relevant conda environment active from the gist above
## Download flowtron
cd ../
git clone https://github.com/NVIDIA/flowtron.git
cd flowtron
## Note tacotron2 includes waveglow, so we don't actually use the waveglow install in part 2 above
## (but you could direct the python scripts to use this instead if you wanted.
git submodule update --init; cd tacotron2; git submodule update --init
## After some errors tried adding these based on some github comments, though not sure if it did anything
conda install ffmpeg
conda install -c conda-forge sox
## The thing that actually worked to generate audio was to use the pretrained waveglow v3 model
## (tried v4 and v5 and got silence) - similar comments on github issues
cd models/
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/waveglow_ljs_256channels/versions/3/zip -O waveglow_ljs_256channels_3.zip
unzip waveglow_ljs_256channels_3.zip
cd ../
## Now try to generate some speech:
## You'll need to acquire the flowtron_ljs.pt model checkpoint as well as the waveglow...pt model checkpoint,
## these are linked to google drive on the flowtron github.
python inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_ljs_v3.pt -t "It is well known that deep generative models work sometimes" -i 0
## generated wav file, spectrogram and alignments should be in ./results
## Now try out the inference notebook example

I ran the inference style transfer notebook in ipython (interactive, but not a notebook) and it worked!

If you just copy the code from the notebook you won't get the outputs as they are supposed to be embedded in the notebook.

use savefig to write plots to a file: plt.savefig("results/inf-test.png")

use scipy.io.wavefile.write to write out the audio:

with torch.no_grad():
    audio = denoiser(waveglow.infer(mel_posterior, sigma=0.75), 0.01)
audio = audio.cpu().numpy()

## filename, sampling rate, audio tensor
write("results/inf-test.wav", data_config['sampling_rate'], audio)

with torch.no_grad():
    audio_base = denoiser(waveglow.infer(mel_baseline, sigma=0.75), 0.01)
audio_base = audio_base.cpu().numpy()
write("results/inf-test-base.wav", 22050, audio_base)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment