Skip to content

Instantly share code, notes, and snippets.

@treksis
Forked from ouor/diff-svc-training.ipynb
Created February 28, 2023 04:18
Show Gist options
  • Save treksis/7846d383966a115fb1c19f7af30a973d to your computer and use it in GitHub Desktop.
Save treksis/7846d383966a115fb1c19f7af30a973d to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sun Jan 29 13:06:17 2023 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 42C P0 25W / 70W | 0MiB / 15109MiB | 5% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
],
"source": [
"! nvidia-smi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ★Set variables"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"speaker_name = 'test' # enter speaker name here\n",
"dataset_url = 'https://drive.google.com/u/0/uc?id=1sya8W09n1EauPvVVLe0xuCcME9GqkPBM'\n",
"# enter google drive url include \"uc\"\n",
"ngrok_token = '1q2w3e4r5t6y7u8i9o0p' # enter your ngrok token here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clone and Install requirement, Initial models"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'diff-svc'에 복제합니다...\n",
"remote: Enumerating objects: 741, done.\u001b[K\n",
"remote: Counting objects: 100% (222/222), done.\u001b[K\n",
"remote: Compressing objects: 100% (59/59), done.\u001b[K\n",
"remote: Total 741 (delta 184), reused 163 (delta 163), pack-reused 519\u001b[K\n",
"오브젝트를 받는 중: 100% (741/741), 62.11 MiB | 19.87 MiB/s, 완료.\n",
"델타를 알아내는 중: 100% (346/346), 완료.\n",
"받기:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease [1,581 B]\n",
"무시:2 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 InRelease\n",
"기존:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 Release\n",
"기존:4 http://archive.ubuntu.com/ubuntu focal InRelease \n",
"받기:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] \n",
"받기:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] \n",
"오류:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease\n",
" 다음 서명들은 공개키가 없기 때문에 인증할 수 없습니다: NO_PUBKEY A4B469963BF863CC\n",
"받기:8 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB] [33m\u001b[33m\u001b[33m\u001b[33m\n",
"받기:9 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1,290 kB]3m\n",
"받기:10 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1,882 kB]\n",
"받기:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2,920 kB]\n",
"받기:12 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [2,009 kB][33m\n",
"받기:13 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2,442 kB]33m\u001b[33m\u001b[33m\n",
"받기:14 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [988 kB]\n",
"패키지 목록을 읽는 중입니다... 완료% \u001b[0m \u001b[0m \u001b[33m\u001b[33m\u001b[33m\n",
"\u001b[1;33mW: \u001b[0mGPG 오류: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease: 다음 서명들은 공개키가 없기 때문에 인증할 수 없습니다: NO_PUBKEY A4B469963BF863CC\u001b[0m\n",
"\u001b[1;31mE: \u001b[0mThe repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is no longer signed.\u001b[0m\n",
"\u001b[33mN: \u001b[0mUpdating from such a repository can't be done securely, and is therefore disabled by default.\u001b[0m\n",
"\u001b[33mN: \u001b[0mSee apt-secure(8) manpage for repository creation and user configuration details.\u001b[0m\n",
"패키지 목록을 읽는 중입니다... 완료%\n",
"의존성 트리를 만드는 중입니다 \n",
"상태 정보를 읽는 중입니다... 완료\n",
"패키지 zip는 이미 최신 버전입니다 (3.0-11build1).\n",
"패키지 build-essential는 이미 최신 버전입니다 (12.8ubuntu1.1).\n",
"패키지 unzip는 이미 최신 버전입니다 (6.0-25ubuntu1.1).\n",
"패키지 ffmpeg는 이미 최신 버전입니다 (7:4.2.7-0ubuntu0.1).\n",
"패키지 libpython3.9-dev는 이미 최신 버전입니다 (3.9.5-3ubuntu0~20.04.1).\n",
"패키지 python3.9-dev는 이미 최신 버전입니다 (3.9.5-3ubuntu0~20.04.1).\n",
"0개 업그레이드, 0개 새로 설치, 0개 제거, 177개 업그레이드 안 함.\n",
"Requirement already satisfied: gdown in /usr/local/lib/python3.8/dist-packages (4.6.0)\n",
"Requirement already satisfied: tensorflow in /usr/local/lib/python3.8/dist-packages (2.8.0)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.8/dist-packages (5.4.1)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from gdown) (3.9.0)\n",
"Requirement already satisfied: six in /usr/lib/python3/dist-packages (from gdown) (1.14.0)\n",
"Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.8/dist-packages (from gdown) (4.11.1)\n",
"Requirement already satisfied: requests[socks] in /usr/lib/python3/dist-packages (from gdown) (2.22.0)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from gdown) (4.64.1)\n",
"Requirement already satisfied: protobuf>=3.9.2 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (3.19.4)\n",
"Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.1.2)\n",
"Requirement already satisfied: flatbuffers>=1.12 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (2.0)\n",
"Requirement already satisfied: keras<2.9,>=2.8.0rc0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (2.8.0)\n",
"Requirement already satisfied: tf-estimator-nightly==2.8.0.dev2021122109 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (2.8.0.dev2021122109)\n",
"Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.13.3)\n",
"Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.43.0)\n",
"Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.22.2)\n",
"Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (4.1.1)\n",
"Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (0.2.0)\n",
"Requirement already satisfied: gast>=0.2.1 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (0.5.3)\n",
"Requirement already satisfied: tensorboard<2.9,>=2.8 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (2.8.0)\n",
"Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.1.0)\n",
"Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from tensorflow) (45.2.0)\n",
"Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (3.6.0)\n",
"Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (3.3.0)\n",
"Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (0.24.0)\n",
"Requirement already satisfied: libclang>=9.0.1 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (13.0.0)\n",
"Requirement already satisfied: absl-py>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.0.0)\n",
"Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow) (1.6.3)\n",
"Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse>=1.6.0->tensorflow) (0.34.2)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (2.0.3)\n",
"Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (1.8.1)\n",
"Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (0.6.1)\n",
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (0.4.6)\n",
"Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (2.6.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.9,>=2.8->tensorflow) (3.3.6)\n",
"Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.8/dist-packages (from beautifulsoup4->gdown) (2.3.2.post1)\n",
"Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.8/dist-packages (from requests[socks]->gdown) (1.7.1)\n",
"Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow) (5.0.0)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow) (0.2.8)\n",
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow) (4.8)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.8/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.9,>=2.8->tensorflow) (1.3.1)\n",
"Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.8/dist-packages (from markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow) (4.11.1)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.9,>=2.8->tensorflow) (3.7.0)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.8/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.9,>=2.8->tensorflow) (0.4.8)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.9,>=2.8->tensorflow) (3.2.0)\n",
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mWARNING: You are using pip version 22.0.3; however, version 22.3.1 is available.\n",
"You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
"\u001b[0mRequirement already satisfied: torchcrepe in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 1)) (0.0.17)\n",
"Requirement already satisfied: praat-parselmouth==0.4.1 in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 2)) (0.4.1)\n",
"Requirement already satisfied: scikit-image in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 3)) (0.19.3)\n",
"Requirement already satisfied: ipython in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 4)) (8.0.1)\n",
"Requirement already satisfied: ipykernel in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 5)) (6.9.1)\n",
"Requirement already satisfied: pyloudnorm in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 6)) (0.1.1)\n",
"Requirement already satisfied: webrtcvad in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 7)) (2.0.10)\n",
"Requirement already satisfied: h5py in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 8)) (3.6.0)\n",
"Requirement already satisfied: einops in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 9)) (0.6.0)\n",
"Requirement already satisfied: pycwt in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 10)) (0.3.0a22)\n",
"Requirement already satisfied: torchmetrics==0.5 in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 11)) (0.5.0)\n",
"Requirement already satisfied: pytorch_lightning==1.3.3 in /usr/local/lib/python3.8/dist-packages (from -r requirements_short.txt (line 12)) (1.3.3)\n",
"Requirement already satisfied: numpy>=1.7.0 in /usr/local/lib/python3.8/dist-packages (from praat-parselmouth==0.4.1->-r requirements_short.txt (line 2)) (1.22.2)\n",
"Requirement already satisfied: torch>=1.3.1 in /usr/local/lib/python3.8/dist-packages (from torchmetrics==0.5->-r requirements_short.txt (line 11)) (1.10.2+cu113)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from torchmetrics==0.5->-r requirements_short.txt (line 11)) (21.3)\n",
"Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (4.64.1)\n",
"Requirement already satisfied: tensorboard!=2.5.0,>=2.2.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.8.0)\n",
"Requirement already satisfied: fsspec[http]>=2021.4.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2023.1.0)\n",
"Requirement already satisfied: future>=0.17.1 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.18.3)\n",
"Requirement already satisfied: PyYAML<=5.4.1,>=5.1 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (5.4.1)\n",
"Requirement already satisfied: pyDeprecate==0.3.0 in /usr/local/lib/python3.8/dist-packages (from pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.3.0)\n",
"Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from torchcrepe->-r requirements_short.txt (line 1)) (1.8.0)\n",
"Requirement already satisfied: librosa==0.9.1 in /usr/local/lib/python3.8/dist-packages (from torchcrepe->-r requirements_short.txt (line 1)) (0.9.1)\n",
"Requirement already satisfied: resampy in /usr/local/lib/python3.8/dist-packages (from torchcrepe->-r requirements_short.txt (line 1)) (0.4.2)\n",
"Requirement already satisfied: audioread>=2.1.5 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (3.0.0)\n",
"Requirement already satisfied: scikit-learn>=0.19.1 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (1.0.2)\n",
"Requirement already satisfied: numba>=0.45.1 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (0.56.4)\n",
"Requirement already satisfied: pooch>=1.0 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (1.6.0)\n",
"Requirement already satisfied: decorator>=4.0.10 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (5.1.1)\n",
"Requirement already satisfied: soundfile>=0.10.2 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (0.11.0)\n",
"Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.8/dist-packages (from librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (1.1.0)\n",
"Requirement already satisfied: pillow!=7.1.0,!=7.1.1,!=8.3.0,>=6.1.0 in /usr/local/lib/python3.8/dist-packages (from scikit-image->-r requirements_short.txt (line 3)) (9.0.1)\n",
"Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.8/dist-packages (from scikit-image->-r requirements_short.txt (line 3)) (2023.1.23.1)\n",
"Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.8/dist-packages (from scikit-image->-r requirements_short.txt (line 3)) (2.25.0)\n",
"Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from scikit-image->-r requirements_short.txt (line 3)) (1.4.1)\n",
"Requirement already satisfied: networkx>=2.2 in /usr/local/lib/python3.8/dist-packages (from scikit-image->-r requirements_short.txt (line 3)) (3.0)\n",
"Requirement already satisfied: pygments in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (2.11.2)\n",
"Requirement already satisfied: black in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (22.1.0)\n",
"Requirement already satisfied: pickleshare in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (0.7.5)\n",
"Requirement already satisfied: stack-data in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (0.2.0)\n",
"Requirement already satisfied: setuptools>=18.5 in /usr/lib/python3/dist-packages (from ipython->-r requirements_short.txt (line 4)) (45.2.0)\n",
"Requirement already satisfied: traitlets>=5 in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (5.1.1)\n",
"Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (0.18.1)\n",
"Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (0.1.3)\n",
"Requirement already satisfied: backcall in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (0.2.0)\n",
"Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (4.8.0)\n",
"Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->-r requirements_short.txt (line 4)) (3.0.28)\n",
"Requirement already satisfied: jupyter-client<8.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel->-r requirements_short.txt (line 5)) (7.1.2)\n",
"Requirement already satisfied: tornado<7.0,>=4.2 in /usr/local/lib/python3.8/dist-packages (from ipykernel->-r requirements_short.txt (line 5)) (6.1)\n",
"Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from ipykernel->-r requirements_short.txt (line 5)) (1.5.4)\n",
"Requirement already satisfied: debugpy<2.0,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel->-r requirements_short.txt (line 5)) (1.5.1)\n",
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from pycwt->-r requirements_short.txt (line 10)) (3.5.1)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.8/dist-packages (from fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (3.8.3)\n",
"Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.22.0)\n",
"Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->-r requirements_short.txt (line 4)) (0.8.3)\n",
"Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from jupyter-client<8.0->ipykernel->-r requirements_short.txt (line 5)) (2.8.2)\n",
"Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client<8.0->ipykernel->-r requirements_short.txt (line 5)) (0.4)\n",
"Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.8/dist-packages (from jupyter-client<8.0->ipykernel->-r requirements_short.txt (line 5)) (22.3.0)\n",
"Requirement already satisfied: jupyter-core>=4.6.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client<8.0->ipykernel->-r requirements_short.txt (line 5)) (4.9.2)\n",
"Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->torchmetrics==0.5->-r requirements_short.txt (line 11)) (3.0.7)\n",
"Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->-r requirements_short.txt (line 4)) (0.7.0)\n",
"Requirement already satisfied: wcwidth in /usr/local/lib/python3.8/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->-r requirements_short.txt (line 4)) (0.2.5)\n",
"Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.6.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (3.3.6)\n",
"Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.8.1)\n",
"Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.0.0)\n",
"Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.6.1)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.0.3)\n",
"Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (3.19.4)\n",
"Requirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.43.0)\n",
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.8/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.4.6)\n",
"Requirement already satisfied: wheel>=0.26 in /usr/lib/python3/dist-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.34.2)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch>=1.3.1->torchmetrics==0.5->-r requirements_short.txt (line 11)) (4.1.1)\n",
"Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.8/dist-packages (from black->ipython->-r requirements_short.txt (line 4)) (8.0.3)\n",
"Requirement already satisfied: platformdirs>=2 in /usr/local/lib/python3.8/dist-packages (from black->ipython->-r requirements_short.txt (line 4)) (2.5.0)\n",
"Requirement already satisfied: mypy-extensions>=0.4.3 in /usr/local/lib/python3.8/dist-packages (from black->ipython->-r requirements_short.txt (line 4)) (0.4.3)\n",
"Requirement already satisfied: pathspec>=0.9.0 in /usr/local/lib/python3.8/dist-packages (from black->ipython->-r requirements_short.txt (line 4)) (0.9.0)\n",
"Requirement already satisfied: tomli>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from black->ipython->-r requirements_short.txt (line 4)) (2.0.1)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->pycwt->-r requirements_short.txt (line 10)) (1.3.2)\n",
"Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->pycwt->-r requirements_short.txt (line 10)) (4.29.1)\n",
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->pycwt->-r requirements_short.txt (line 10)) (0.11.0)\n",
"Requirement already satisfied: asttokens in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython->-r requirements_short.txt (line 4)) (2.0.5)\n",
"Requirement already satisfied: pure-eval in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython->-r requirements_short.txt (line 4)) (0.2.2)\n",
"Requirement already satisfied: executing in /usr/local/lib/python3.8/dist-packages (from stack-data->ipython->-r requirements_short.txt (line 4)) (0.8.2)\n",
"Requirement already satisfied: six in /usr/lib/python3/dist-packages (from absl-py>=0.4->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.14.0)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.3.1)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.3.3)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.8.2)\n",
"Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.1.1)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (21.4.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (6.0.4)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (4.0.2)\n",
"Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (5.0.0)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.2.8)\n",
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (4.8)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.8/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (1.3.1)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.8/dist-packages (from markdown>=2.6.8->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (4.11.1)\n",
"Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.45.1->librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (0.39.1)\n",
"Requirement already satisfied: appdirs>=1.3.0 in /usr/local/lib/python3.8/dist-packages (from pooch>=1.0->librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (1.4.4)\n",
"Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn>=0.19.1->librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (3.1.0)\n",
"Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.8/dist-packages (from soundfile>=0.10.2->librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (1.15.0)\n",
"Requirement already satisfied: pycparser in /usr/local/lib/python3.8/dist-packages (from cffi>=1.0->soundfile>=0.10.2->librosa==0.9.1->torchcrepe->-r requirements_short.txt (line 1)) (2.21)\n",
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (3.7.0)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.8/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (0.4.8)\n",
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard!=2.5.0,>=2.2.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (3.2.0)\n",
"Requirement already satisfied: idna>=2.0 in /usr/lib/python3/dist-packages (from yarl<2.0,>=1.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]>=2021.4.0->pytorch_lightning==1.3.3->-r requirements_short.txt (line 12)) (2.8)\n",
"\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
"\u001b[0m\u001b[33mWARNING: You are using pip version 22.0.3; however, version 22.3.1 is available.\n",
"You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
"\u001b[0m"
]
}
],
"source": [
"# clone repo\n",
"! git clone https://github.com/prophesier/diff-svc\n",
"\n",
"import os\n",
"\n",
"home_dir = os.getcwd()\n",
"repo_dir = os.path.join(home_dir, 'diff-svc')\n",
"os.chdir(repo_dir)\n",
"\n",
"# install apt packages\n",
"! apt update\n",
"! apt install build-essential python3.9-dev libpython3.9-dev zip unzip ffmpeg -y\n",
"\n",
"# install python packages\n",
"! pip install gdown tensorflow pyyaml\n",
"! pip install -r requirements_short.txt"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Downloading...\n",
"From: https://drive.google.com/u/0/uc?id=1qeAvXvrGWvpiozsin4nwRcdwNVbcf3La\n",
"To: /workspace/t4-20230125/diff-svc/checkpoint.zip\n",
"100%|██████████| 846M/846M [00:09<00:00, 87.8MB/s] "
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: checkpoint.zip\r\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" creating: ./checkpoints/0102_xiaoma_pe/\n",
" inflating: ./checkpoints/0102_xiaoma_pe/config.yaml \n",
" inflating: ./checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt \n",
" creating: ./checkpoints/0109_hifigan_bigpopcs_hop128/\n",
" inflating: ./checkpoints/0109_hifigan_bigpopcs_hop128/config.yaml \n",
" inflating: ./checkpoints/0109_hifigan_bigpopcs_hop128/model_ckpt_steps_1512000.ckpt \n",
" inflating: ./checkpoints/0109_hifigan_bigpopcs_hop128/model_ckpt_steps_1512000.pth \n",
" creating: ./checkpoints/hubert/\n",
" inflating: ./checkpoints/hubert/hubert.onnx \n",
" inflating: ./checkpoints/hubert/hubert_soft.pt \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Downloading...\n",
"From: https://drive.google.com/u/0/uc?id=1z2qLq7DcInpF15EwhtL8v-IeBWSXnxAD\n",
"To: /workspace/t4-20230125/diff-svc/vocoder.zip\n",
"100%|██████████| 53.0M/53.0M [00:01<00:00, 49.4MB/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: vocoder.zip\r\n",
" inflating: ./checkpoints/nsf_hifigan/config.json \r\n",
" inflating: ./checkpoints/nsf_hifigan/model "
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r\n",
" inflating: ./checkpoints/nsf_hifigan/NOTICE.txt \r\n",
" inflating: ./checkpoints/nsf_hifigan/NOTICE.zh-CN.txt \r\n"
]
}
],
"source": [
"import gdown\n",
"\n",
"# dependent checkpoint\n",
"gdown.download('https://drive.google.com/u/0/uc?id=1qeAvXvrGWvpiozsin4nwRcdwNVbcf3La', 'checkpoint.zip', quiet=False)\n",
"! unzip checkpoint.zip -d ./checkpoints\n",
"os.remove('checkpoint.zip')\n",
"\n",
"# initial vocoder\n",
"gdown.download('https://drive.google.com/u/0/uc?id=1z2qLq7DcInpF15EwhtL8v-IeBWSXnxAD', 'vocoder.zip', quiet=False)\n",
"! unzip vocoder.zip -d ./checkpoints/nsf_hifigan\n",
"os.remove('vocoder.zip')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Downloading...\n",
"From: https://drive.google.com/u/0/uc?id=1sya8W09n1EauPvVVLe0xuCcME9GqkPBM\n",
"To: /workspace/t4-20230125/diff-svc/dataset.zip\n",
"100%|██████████| 16.9M/16.9M [00:00<00:00, 41.4MB/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: dataset.zip\n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-0.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-1.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-2.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-3.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-4.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-5.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-6.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-7.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-8.wav \n",
" inflating: /workspace/t4-20230125/diff-svc/data/raw/test/test-9.wav \n"
]
}
],
"source": [
"# dataset\n",
"# prior to run this cell, Be sure\n",
"# 1. dataset is 44100Hz wav file\n",
"# 2. the zip file structure is\n",
"# dataset.zip\n",
"# ├── some_wave_file_0001.wav\n",
"# ├── some_wave_file_0002.wav\n",
"# ...\n",
"# that means, there is \"NO DIRECTORY\" in zip file\n",
"gdown.download(dataset_url, 'dataset.zip', quiet=False)\n",
"dataset_dir = os.path.join(repo_dir, 'data', 'raw', speaker_name)\n",
"if not os.path.exists(dataset_dir):\n",
" os.makedirs(dataset_dir)\n",
"! unzip dataset.zip -d {dataset_dir}\n",
"os.remove('dataset.zip')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Config settings"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import yaml\n",
"\n",
"# config file (incl in repo)\n",
"config_path = os.path.join(repo_dir, 'training', 'config_nsf.yaml')\n",
"your_config_path = config_path.replace('config', 'config_' + speaker_name)\n",
"\n",
"with open(config_path, 'r') as f:\n",
" config = yaml.safe_load(f)\n",
" your_config = config"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"K_step 1000\n",
"accumulate_grad_batches 1\n",
"audio_num_mel_bins 128\n",
"audio_sample_rate 44100\n",
"binarization_args {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}\n",
"binarizer_cls preprocessing.SVCpre.SVCBinarizer\n",
"binary_data_dir data/binary/nyaru\n",
"check_val_every_n_epoch 10\n",
"choose_test_manually False\n",
"clip_grad_norm 1\n",
"config_path training/config_nsf.yaml\n",
"content_cond_steps []\n",
"cwt_add_f0_loss False\n",
"cwt_hidden_size 128\n",
"cwt_layers 2\n",
"cwt_loss l1\n",
"cwt_std_scale 0.8\n",
"datasets ['opencpop']\n",
"debug False\n",
"dec_ffn_kernel_size 9\n",
"dec_layers 4\n",
"decay_steps 40000\n",
"decoder_type fft\n",
"dict_dir \n",
"diff_decoder_type wavenet\n",
"diff_loss_type l2\n",
"dilation_cycle_length 4\n",
"dropout 0.1\n",
"ds_workers 4\n",
"dur_enc_hidden_stride_kernel ['0,2,3', '0,2,3', '0,1,3']\n",
"dur_loss mse\n",
"dur_predictor_kernel 3\n",
"dur_predictor_layers 5\n",
"enc_ffn_kernel_size 9\n",
"enc_layers 4\n",
"encoder_K 8\n",
"encoder_type fft\n",
"endless_ds False\n",
"f0_bin 256\n",
"f0_max 1100.0\n",
"f0_min 40.0\n",
"ffn_act gelu\n",
"ffn_padding SAME\n",
"fft_size 2048\n",
"fmax 16000\n",
"fmin 40\n",
"fs2_ckpt \n",
"gaussian_start True\n",
"gen_dir_name \n",
"gen_tgt_spk_id -1\n",
"hidden_size 256\n",
"hop_size 512\n",
"hubert_path checkpoints/hubert/hubert_soft.pt\n",
"hubert_gpu True\n",
"infer False\n",
"keep_bins 128\n",
"lambda_commit 0.25\n",
"lambda_energy 0.0\n",
"lambda_f0 1.0\n",
"lambda_ph_dur 0.3\n",
"lambda_sent_dur 1.0\n",
"lambda_uv 1.0\n",
"lambda_word_dur 1.0\n",
"load_ckpt \n",
"log_interval 100\n",
"loud_norm False\n",
"lr 0.0008\n",
"max_beta 0.02\n",
"max_epochs 3000\n",
"max_eval_sentences 1\n",
"max_eval_tokens 60000\n",
"max_frames 42000\n",
"max_input_tokens 60000\n",
"max_sentences 88\n",
"max_tokens 128000\n",
"max_updates 1000000\n",
"mel_loss ssim:0.5|l1:0.5\n",
"mel_vmax 1.5\n",
"mel_vmin -6.0\n",
"min_level_db -120\n",
"norm_type gn\n",
"num_ckpt_keep 10\n",
"num_heads 2\n",
"num_sanity_val_steps 1\n",
"num_spk 1\n",
"num_test_samples 0\n",
"num_valid_plots 10\n",
"optimizer_adam_beta1 0.9\n",
"optimizer_adam_beta2 0.98\n",
"out_wav_norm False\n",
"pe_ckpt checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt\n",
"pe_enable False\n",
"perform_enhance True\n",
"pitch_ar False\n",
"pitch_enc_hidden_stride_kernel ['0,2,5', '0,2,5', '0,2,5']\n",
"pitch_extractor parselmouth\n",
"pitch_loss l2\n",
"pitch_norm log\n",
"pitch_type frame\n",
"pndm_speedup 10\n",
"pre_align_args {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}\n",
"pre_align_cls data_gen.singing.pre_align.SingingPreAlign\n",
"predictor_dropout 0.5\n",
"predictor_grad 0.1\n",
"predictor_hidden -1\n",
"predictor_kernel 5\n",
"predictor_layers 5\n",
"prenet_dropout 0.5\n",
"prenet_hidden_size 256\n",
"pretrain_fs_ckpt \n",
"processed_data_dir xxx\n",
"profile_infer False\n",
"raw_data_dir data/raw/nyaru\n",
"ref_norm_layer bn\n",
"rel_pos True\n",
"reset_phone_dict True\n",
"residual_channels 384\n",
"residual_layers 20\n",
"save_best False\n",
"save_ckpt True\n",
"save_codes ['configs', 'modules', 'src', 'utils']\n",
"save_f0 True\n",
"save_gt False\n",
"schedule_type linear\n",
"seed 1234\n",
"sort_by_len True\n",
"speaker_id nyaru\n",
"spec_max [0.0]\n",
"spec_min [-5.0]\n",
"spk_cond_steps []\n",
"stop_token_weight 5.0\n",
"task_cls training.task.SVC_task.SVCTask\n",
"test_ids []\n",
"test_input_dir \n",
"test_num 0\n",
"test_prefixes ['test']\n",
"test_set_name test\n",
"timesteps 1000\n",
"train_set_name train\n",
"use_crepe True\n",
"use_denoise False\n",
"use_energy_embed False\n",
"use_gt_dur False\n",
"use_gt_f0 False\n",
"use_midi False\n",
"use_nsf True\n",
"use_pitch_embed True\n",
"use_pos_embed True\n",
"use_spk_embed False\n",
"use_spk_id False\n",
"use_split_spk_id False\n",
"use_uv False\n",
"use_vec False\n",
"use_var_enc False\n",
"val_check_interval 2000\n",
"valid_num 0\n",
"valid_set_name valid\n",
"vocoder network.vocoders.nsf_hifigan.NsfHifiGAN\n",
"vocoder_ckpt checkpoints/nsf_hifigan/model\n",
"warmup_updates 2000\n",
"wav2spec_eps 1e-6\n",
"weight_decay 0\n",
"win_size 2048\n",
"work_dir checkpoints/nyaru\n",
"no_fs2 True\n"
]
}
],
"source": [
"# show original config list\n",
"for k, v in config.items():\n",
" print(k, v)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ★Edit config as you need"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# following is frequently changed\n",
"# you can change every config item that listed previous cell\n",
"# to restore your config into original config, run Config settings cell again\n",
"\n",
"your_config['max_sentences'] = 8\n",
"# batch size. increase if you have enough vram, and if you get out of memory error decrease it\n",
"# You can increase up to 48 if you are using rtx3090 24gb. As you increse this, model quaility will be better, but learning speed will decrease.\n",
"\n",
"your_config['lr'] = 0.0008\n",
"# initial learning rate\n",
"\n",
"your_config['decay_steps'] = 30000\n",
"# every each decay_steps, learning rate will be decayed as half\n",
"\n",
"your_config['val_check_interval'] = 5000\n",
"# every each val_check_interval, ckpt will be saved and validation will be performed\n",
"# you can check validation result in tensorboard\n",
"\n",
"your_config['endless_ds'] = False\n",
"# if dataset is smaller than 1hr, set True\n",
"\n",
"your_config['work_dir'] = os.path.join(repo_dir, 'checkpoints', speaker_name)\n",
"# checkpoint output directory\n",
"\n",
"your_config['num_ckpt_keep'] = 9999\n",
"# number of checkpoints to keep\n",
"\n",
"your_config['ds_workers'] = 8\n",
"# number of workers for dataset\n",
"# if shared memory error occurs, decrease this number\n",
"\n",
"your_config['no_fs2'] = True\n",
"your_config['enable_train'] = True\n",
"# dont know what it is. just got it from recommendation\n",
"\n",
"original_speaker_name = 'nyaru'\n",
"# speaker name in original config in repo\n",
"\n",
"for k, v in your_config.items():\n",
" if isinstance(v, str):\n",
" if original_speaker_name in v:\n",
" your_config[k] = v.replace('nyaru', speaker_name)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"K_step: 1000\n",
"accumulate_grad_batches: 1\n",
"audio_num_mel_bins: 128\n",
"audio_sample_rate: 44100\n",
"binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}\n",
"binarizer_cls: preprocessing.SVCpre.SVCBinarizer\n",
"binary_data_dir: data/binary/test\n",
"check_val_every_n_epoch: 10\n",
"choose_test_manually: False\n",
"clip_grad_norm: 1\n",
"config_path: training/config_nsf.yaml\n",
"content_cond_steps: []\n",
"cwt_add_f0_loss: False\n",
"cwt_hidden_size: 128\n",
"cwt_layers: 2\n",
"cwt_loss: l1\n",
"cwt_std_scale: 0.8\n",
"datasets: ['opencpop']\n",
"debug: False\n",
"dec_ffn_kernel_size: 9\n",
"dec_layers: 4\n",
"decay_steps: 30000\n",
"decoder_type: fft\n",
"dict_dir: \n",
"diff_decoder_type: wavenet\n",
"diff_loss_type: l2\n",
"dilation_cycle_length: 4\n",
"dropout: 0.1\n",
"ds_workers: 2\n",
"dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3']\n",
"dur_loss: mse\n",
"dur_predictor_kernel: 3\n",
"dur_predictor_layers: 5\n",
"enc_ffn_kernel_size: 9\n",
"enc_layers: 4\n",
"encoder_K: 8\n",
"encoder_type: fft\n",
"endless_ds: False\n",
"f0_bin: 256\n",
"f0_max: 1100.0\n",
"f0_min: 40.0\n",
"ffn_act: gelu\n",
"ffn_padding: SAME\n",
"fft_size: 2048\n",
"fmax: 16000\n",
"fmin: 40\n",
"fs2_ckpt: \n",
"gaussian_start: True\n",
"gen_dir_name: \n",
"gen_tgt_spk_id: -1\n",
"hidden_size: 256\n",
"hop_size: 512\n",
"hubert_path: checkpoints/hubert/hubert_soft.pt\n",
"hubert_gpu: True\n",
"infer: False\n",
"keep_bins: 128\n",
"lambda_commit: 0.25\n",
"lambda_energy: 0.0\n",
"lambda_f0: 1.0\n",
"lambda_ph_dur: 0.3\n",
"lambda_sent_dur: 1.0\n",
"lambda_uv: 1.0\n",
"lambda_word_dur: 1.0\n",
"load_ckpt: \n",
"log_interval: 100\n",
"loud_norm: False\n",
"lr: 0.0008\n",
"max_beta: 0.02\n",
"max_epochs: 3000\n",
"max_eval_sentences: 1\n",
"max_eval_tokens: 60000\n",
"max_frames: 42000\n",
"max_input_tokens: 60000\n",
"max_sentences: 8\n",
"max_tokens: 128000\n",
"max_updates: 1000000\n",
"mel_loss: ssim:0.5|l1:0.5\n",
"mel_vmax: 1.5\n",
"mel_vmin: -6.0\n",
"min_level_db: -120\n",
"norm_type: gn\n",
"num_ckpt_keep: 9999\n",
"num_heads: 2\n",
"num_sanity_val_steps: 1\n",
"num_spk: 1\n",
"num_test_samples: 0\n",
"num_valid_plots: 10\n",
"optimizer_adam_beta1: 0.9\n",
"optimizer_adam_beta2: 0.98\n",
"out_wav_norm: False\n",
"pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt\n",
"pe_enable: False\n",
"perform_enhance: True\n",
"pitch_ar: False\n",
"pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5']\n",
"pitch_extractor: parselmouth\n",
"pitch_loss: l2\n",
"pitch_norm: log\n",
"pitch_type: frame\n",
"pndm_speedup: 10\n",
"pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}\n",
"pre_align_cls: data_gen.singing.pre_align.SingingPreAlign\n",
"predictor_dropout: 0.5\n",
"predictor_grad: 0.1\n",
"predictor_hidden: -1\n",
"predictor_kernel: 5\n",
"predictor_layers: 5\n",
"prenet_dropout: 0.5\n",
"prenet_hidden_size: 256\n",
"pretrain_fs_ckpt: \n",
"processed_data_dir: xxx\n",
"profile_infer: False\n",
"raw_data_dir: data/raw/test\n",
"ref_norm_layer: bn\n",
"rel_pos: True\n",
"reset_phone_dict: True\n",
"residual_channels: 384\n",
"residual_layers: 20\n",
"save_best: False\n",
"save_ckpt: True\n",
"save_codes: ['configs', 'modules', 'src', 'utils']\n",
"save_f0: True\n",
"save_gt: False\n",
"schedule_type: linear\n",
"seed: 1234\n",
"sort_by_len: True\n",
"speaker_id: test\n",
"spec_max: [0.0]\n",
"spec_min: [-5.0]\n",
"spk_cond_steps: []\n",
"stop_token_weight: 5.0\n",
"task_cls: training.task.SVC_task.SVCTask\n",
"test_ids: []\n",
"test_input_dir: \n",
"test_num: 0\n",
"test_prefixes: ['test']\n",
"test_set_name: test\n",
"timesteps: 1000\n",
"train_set_name: train\n",
"use_crepe: True\n",
"use_denoise: False\n",
"use_energy_embed: False\n",
"use_gt_dur: False\n",
"use_gt_f0: False\n",
"use_midi: False\n",
"use_nsf: True\n",
"use_pitch_embed: True\n",
"use_pos_embed: True\n",
"use_spk_embed: False\n",
"use_spk_id: False\n",
"use_split_spk_id: False\n",
"use_uv: False\n",
"use_vec: False\n",
"use_var_enc: False\n",
"val_check_interval: 5000\n",
"valid_num: 0\n",
"valid_set_name: valid\n",
"vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN\n",
"vocoder_ckpt: checkpoints/nsf_hifigan/model\n",
"warmup_updates: 2000\n",
"wav2spec_eps: 1e-6\n",
"weight_decay: 0\n",
"win_size: 2048\n",
"work_dir: /workspace/t4-20230125/diff-svc/checkpoints/test\n",
"no_fs2: True\n"
]
}
],
"source": [
"# show your config list\n",
"for k, v in your_config.items():\n",
" print(f'{k}: {v}')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# save config\n",
"with open(your_config_path, 'w') as f:\n",
" yaml.dump(your_config, f)\n",
" \n",
"if not os.path.exists(your_config['work_dir']):\n",
" os.makedirs(your_config['work_dir'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tensorboard"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tensorboard command:\n",
"tensorboard --load_fast=true --reload_interval=1 --reload_multifile=true --logdir=/workspace/t4-20230125/diff-svc/checkpoints/test/lightning_logs --port=6006\n",
"ngrok command:\n",
"curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && echo \"deb https://ngrok-agent.s3.amazonaws.com buster main\" | tee /etc/apt/sources.list.d/ngrok.list && apt update && apt install ngrok\n",
"ngrok authtoken 1q2w3e4r5t6y7u8i9o0p\n",
"ngrok http 6006\n"
]
}
],
"source": [
"import datetime, os\n",
"\n",
"log_dir = os.path.join(repo_dir, 'checkpoints', speaker_name, 'lightning_logs')\n",
"print(\"Tensorboard command:\")\n",
"print(f\"tensorboard --load_fast=true --reload_interval=1 --reload_multifile=true --logdir={log_dir} --port=6006\")\n",
"\n",
"if ngrok_token:\n",
" print(\"ngrok command:\")\n",
" print(\"\"\"curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && echo \"deb https://ngrok-agent.s3.amazonaws.com buster main\" | tee /etc/apt/sources.list.d/ngrok.list && apt update && apt install ngrok\"\"\")\n",
" print(f\"ngrok authtoken {ngrok_token}\")\n",
" print(\"ngrok http 6006\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Preprocess"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"| Hparams chains: ['/workspace/t4-20230125/diff-svc/training/config_test_nsf.yaml']\n",
"| Hparams: \n",
"\u001b[;33;mK_step\u001b[0m: 1000, \u001b[;33;maccumulate_grad_batches\u001b[0m: 1, \u001b[;33;maudio_num_mel_bins\u001b[0m: 128, \u001b[;33;maudio_sample_rate\u001b[0m: 44100, \u001b[;33;mbinarization_args\u001b[0m: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, \n",
"\u001b[;33;mbinarizer_cls\u001b[0m: preprocessing.SVCpre.SVCBinarizer, \u001b[;33;mbinary_data_dir\u001b[0m: data/binary/test, \u001b[;33;mcheck_val_every_n_epoch\u001b[0m: 10, \u001b[;33;mchoose_test_manually\u001b[0m: False, \u001b[;33;mclip_grad_norm\u001b[0m: 1, \n",
"\u001b[;33;mconfig_path\u001b[0m: training/config_nsf.yaml, \u001b[;33;mcontent_cond_steps\u001b[0m: [], \u001b[;33;mcwt_add_f0_loss\u001b[0m: False, \u001b[;33;mcwt_hidden_size\u001b[0m: 128, \u001b[;33;mcwt_layers\u001b[0m: 2, \n",
"\u001b[;33;mcwt_loss\u001b[0m: l1, \u001b[;33;mcwt_std_scale\u001b[0m: 0.8, \u001b[;33;mdatasets\u001b[0m: ['opencpop'], \u001b[;33;mdebug\u001b[0m: False, \u001b[;33;mdec_ffn_kernel_size\u001b[0m: 9, \n",
"\u001b[;33;mdec_layers\u001b[0m: 4, \u001b[;33;mdecay_steps\u001b[0m: 30000, \u001b[;33;mdecoder_type\u001b[0m: fft, \u001b[;33;mdict_dir\u001b[0m: , \u001b[;33;mdiff_decoder_type\u001b[0m: wavenet, \n",
"\u001b[;33;mdiff_loss_type\u001b[0m: l2, \u001b[;33;mdilation_cycle_length\u001b[0m: 4, \u001b[;33;mdropout\u001b[0m: 0.1, \u001b[;33;mds_workers\u001b[0m: 2, \u001b[;33;mdur_enc_hidden_stride_kernel\u001b[0m: ['0,2,3', '0,2,3', '0,1,3'], \n",
"\u001b[;33;mdur_loss\u001b[0m: mse, \u001b[;33;mdur_predictor_kernel\u001b[0m: 3, \u001b[;33;mdur_predictor_layers\u001b[0m: 5, \u001b[;33;menc_ffn_kernel_size\u001b[0m: 9, \u001b[;33;menc_layers\u001b[0m: 4, \n",
"\u001b[;33;mencoder_K\u001b[0m: 8, \u001b[;33;mencoder_type\u001b[0m: fft, \u001b[;33;mendless_ds\u001b[0m: False, \u001b[;33;mf0_bin\u001b[0m: 256, \u001b[;33;mf0_max\u001b[0m: 1100.0, \n",
"\u001b[;33;mf0_min\u001b[0m: 40.0, \u001b[;33;mffn_act\u001b[0m: gelu, \u001b[;33;mffn_padding\u001b[0m: SAME, \u001b[;33;mfft_size\u001b[0m: 2048, \u001b[;33;mfmax\u001b[0m: 16000, \n",
"\u001b[;33;mfmin\u001b[0m: 40, \u001b[;33;mfs2_ckpt\u001b[0m: , \u001b[;33;mgaussian_start\u001b[0m: True, \u001b[;33;mgen_dir_name\u001b[0m: , \u001b[;33;mgen_tgt_spk_id\u001b[0m: -1, \n",
"\u001b[;33;mhidden_size\u001b[0m: 256, \u001b[;33;mhop_size\u001b[0m: 512, \u001b[;33;mhubert_gpu\u001b[0m: True, \u001b[;33;mhubert_path\u001b[0m: checkpoints/hubert/hubert_soft.pt, \u001b[;33;minfer\u001b[0m: False, \n",
"\u001b[;33;mkeep_bins\u001b[0m: 128, \u001b[;33;mlambda_commit\u001b[0m: 0.25, \u001b[;33;mlambda_energy\u001b[0m: 0.0, \u001b[;33;mlambda_f0\u001b[0m: 1.0, \u001b[;33;mlambda_ph_dur\u001b[0m: 0.3, \n",
"\u001b[;33;mlambda_sent_dur\u001b[0m: 1.0, \u001b[;33;mlambda_uv\u001b[0m: 1.0, \u001b[;33;mlambda_word_dur\u001b[0m: 1.0, \u001b[;33;mload_ckpt\u001b[0m: , \u001b[;33;mlog_interval\u001b[0m: 100, \n",
"\u001b[;33;mloud_norm\u001b[0m: False, \u001b[;33;mlr\u001b[0m: 0.0008, \u001b[;33;mmax_beta\u001b[0m: 0.02, \u001b[;33;mmax_epochs\u001b[0m: 3000, \u001b[;33;mmax_eval_sentences\u001b[0m: 1, \n",
"\u001b[;33;mmax_eval_tokens\u001b[0m: 60000, \u001b[;33;mmax_frames\u001b[0m: 42000, \u001b[;33;mmax_input_tokens\u001b[0m: 60000, \u001b[;33;mmax_sentences\u001b[0m: 8, \u001b[;33;mmax_tokens\u001b[0m: 128000, \n",
"\u001b[;33;mmax_updates\u001b[0m: 1000000, \u001b[;33;mmel_loss\u001b[0m: ssim:0.5|l1:0.5, \u001b[;33;mmel_vmax\u001b[0m: 1.5, \u001b[;33;mmel_vmin\u001b[0m: -6.0, \u001b[;33;mmin_level_db\u001b[0m: -120, \n",
"\u001b[;33;mno_fs2\u001b[0m: True, \u001b[;33;mnorm_type\u001b[0m: gn, \u001b[;33;mnum_ckpt_keep\u001b[0m: 9999, \u001b[;33;mnum_heads\u001b[0m: 2, \u001b[;33;mnum_sanity_val_steps\u001b[0m: 1, \n",
"\u001b[;33;mnum_spk\u001b[0m: 1, \u001b[;33;mnum_test_samples\u001b[0m: 0, \u001b[;33;mnum_valid_plots\u001b[0m: 10, \u001b[;33;moptimizer_adam_beta1\u001b[0m: 0.9, \u001b[;33;moptimizer_adam_beta2\u001b[0m: 0.98, \n",
"\u001b[;33;mout_wav_norm\u001b[0m: False, \u001b[;33;mpe_ckpt\u001b[0m: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, \u001b[;33;mpe_enable\u001b[0m: False, \u001b[;33;mperform_enhance\u001b[0m: True, \u001b[;33;mpitch_ar\u001b[0m: False, \n",
"\u001b[;33;mpitch_enc_hidden_stride_kernel\u001b[0m: ['0,2,5', '0,2,5', '0,2,5'], \u001b[;33;mpitch_extractor\u001b[0m: parselmouth, \u001b[;33;mpitch_loss\u001b[0m: l2, \u001b[;33;mpitch_norm\u001b[0m: log, \u001b[;33;mpitch_type\u001b[0m: frame, \n",
"\u001b[;33;mpndm_speedup\u001b[0m: 10, \u001b[;33;mpre_align_args\u001b[0m: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, \u001b[;33;mpre_align_cls\u001b[0m: data_gen.singing.pre_align.SingingPreAlign, \u001b[;33;mpredictor_dropout\u001b[0m: 0.5, \u001b[;33;mpredictor_grad\u001b[0m: 0.1, \n",
"\u001b[;33;mpredictor_hidden\u001b[0m: -1, \u001b[;33;mpredictor_kernel\u001b[0m: 5, \u001b[;33;mpredictor_layers\u001b[0m: 5, \u001b[;33;mprenet_dropout\u001b[0m: 0.5, \u001b[;33;mprenet_hidden_size\u001b[0m: 256, \n",
"\u001b[;33;mpretrain_fs_ckpt\u001b[0m: , \u001b[;33;mprocessed_data_dir\u001b[0m: xxx, \u001b[;33;mprofile_infer\u001b[0m: False, \u001b[;33;mraw_data_dir\u001b[0m: data/raw/test, \u001b[;33;mref_norm_layer\u001b[0m: bn, \n",
"\u001b[;33;mrel_pos\u001b[0m: True, \u001b[;33;mreset_phone_dict\u001b[0m: True, \u001b[;33;mresidual_channels\u001b[0m: 384, \u001b[;33;mresidual_layers\u001b[0m: 20, \u001b[;33;msave_best\u001b[0m: False, \n",
"\u001b[;33;msave_ckpt\u001b[0m: True, \u001b[;33;msave_codes\u001b[0m: ['configs', 'modules', 'src', 'utils'], \u001b[;33;msave_f0\u001b[0m: True, \u001b[;33;msave_gt\u001b[0m: False, \u001b[;33;mschedule_type\u001b[0m: linear, \n",
"\u001b[;33;mseed\u001b[0m: 1234, \u001b[;33;msort_by_len\u001b[0m: True, \u001b[;33;mspeaker_id\u001b[0m: test, \u001b[;33;mspec_max\u001b[0m: [0.0], \u001b[;33;mspec_min\u001b[0m: [-5.0], \n",
"\u001b[;33;mspk_cond_steps\u001b[0m: [], \u001b[;33;mstop_token_weight\u001b[0m: 5.0, \u001b[;33;mtask_cls\u001b[0m: training.task.SVC_task.SVCTask, \u001b[;33;mtest_ids\u001b[0m: [], \u001b[;33;mtest_input_dir\u001b[0m: , \n",
"\u001b[;33;mtest_num\u001b[0m: 0, \u001b[;33;mtest_prefixes\u001b[0m: ['test'], \u001b[;33;mtest_set_name\u001b[0m: test, \u001b[;33;mtimesteps\u001b[0m: 1000, \u001b[;33;mtrain_set_name\u001b[0m: train, \n",
"\u001b[;33;muse_crepe\u001b[0m: True, \u001b[;33;muse_denoise\u001b[0m: False, \u001b[;33;muse_energy_embed\u001b[0m: False, \u001b[;33;muse_gt_dur\u001b[0m: False, \u001b[;33;muse_gt_f0\u001b[0m: False, \n",
"\u001b[;33;muse_midi\u001b[0m: False, \u001b[;33;muse_nsf\u001b[0m: True, \u001b[;33;muse_pitch_embed\u001b[0m: True, \u001b[;33;muse_pos_embed\u001b[0m: True, \u001b[;33;muse_spk_embed\u001b[0m: False, \n",
"\u001b[;33;muse_spk_id\u001b[0m: False, \u001b[;33;muse_split_spk_id\u001b[0m: False, \u001b[;33;muse_uv\u001b[0m: False, \u001b[;33;muse_var_enc\u001b[0m: False, \u001b[;33;muse_vec\u001b[0m: False, \n",
"\u001b[;33;mval_check_interval\u001b[0m: 5000, \u001b[;33;mvalid_num\u001b[0m: 0, \u001b[;33;mvalid_set_name\u001b[0m: valid, \u001b[;33;mvalidate\u001b[0m: False, \u001b[;33;mvocoder\u001b[0m: network.vocoders.nsf_hifigan.NsfHifiGAN, \n",
"\u001b[;33;mvocoder_ckpt\u001b[0m: checkpoints/nsf_hifigan/model, \u001b[;33;mwarmup_updates\u001b[0m: 2000, \u001b[;33;mwav2spec_eps\u001b[0m: 1e-6, \u001b[;33;mweight_decay\u001b[0m: 0, \u001b[;33;mwin_size\u001b[0m: 2048, \n",
"\u001b[;33;mwork_dir\u001b[0m: , \n",
"| Binarizer: <class 'preprocessing.SVCpre.SVCBinarizer'>\n",
"spkers: {'test'}\n",
"| spk_map: {'test': 0}\n",
"100%|█████████████████████████████████████████████| 5/5 [00:38<00:00, 7.63s/it]\n",
"| valid total duration: 69.731s\n",
"100%|█████████████████████████████████████████████| 5/5 [00:33<00:00, 6.65s/it]\n",
"| test total duration: 69.731s\n",
"100%|█████████████████████████████████████████████| 5/5 [00:26<00:00, 5.30s/it]\n",
"(128,)\n",
"| train total duration: 55.318s\n"
]
}
],
"source": [
"os.environ['PYTHONPATH']='.'\n",
"binarize_py = os.path.join(repo_dir, 'preprocessing', 'binarize.py')\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n",
"! python {binarize_py} --config {your_config_path}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"| Hparams chains: ['/workspace/t4-20230125/diff-svc/training/config_test_nsf.yaml']\n",
"| Hparams: \n",
"\u001b[;33;mK_step\u001b[0m: 1000, \u001b[;33;maccumulate_grad_batches\u001b[0m: 1, \u001b[;33;maudio_num_mel_bins\u001b[0m: 128, \u001b[;33;maudio_sample_rate\u001b[0m: 44100, \u001b[;33;mbinarization_args\u001b[0m: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, \n",
"\u001b[;33;mbinarizer_cls\u001b[0m: preprocessing.SVCpre.SVCBinarizer, \u001b[;33;mbinary_data_dir\u001b[0m: data/binary/test, \u001b[;33;mcheck_val_every_n_epoch\u001b[0m: 10, \u001b[;33;mchoose_test_manually\u001b[0m: False, \u001b[;33;mclip_grad_norm\u001b[0m: 1, \n",
"\u001b[;33;mconfig_path\u001b[0m: training/config_nsf.yaml, \u001b[;33;mcontent_cond_steps\u001b[0m: [], \u001b[;33;mcwt_add_f0_loss\u001b[0m: False, \u001b[;33;mcwt_hidden_size\u001b[0m: 128, \u001b[;33;mcwt_layers\u001b[0m: 2, \n",
"\u001b[;33;mcwt_loss\u001b[0m: l1, \u001b[;33;mcwt_std_scale\u001b[0m: 0.8, \u001b[;33;mdatasets\u001b[0m: ['opencpop'], \u001b[;33;mdebug\u001b[0m: False, \u001b[;33;mdec_ffn_kernel_size\u001b[0m: 9, \n",
"\u001b[;33;mdec_layers\u001b[0m: 4, \u001b[;33;mdecay_steps\u001b[0m: 30000, \u001b[;33;mdecoder_type\u001b[0m: fft, \u001b[;33;mdict_dir\u001b[0m: , \u001b[;33;mdiff_decoder_type\u001b[0m: wavenet, \n",
"\u001b[;33;mdiff_loss_type\u001b[0m: l2, \u001b[;33;mdilation_cycle_length\u001b[0m: 4, \u001b[;33;mdropout\u001b[0m: 0.1, \u001b[;33;mds_workers\u001b[0m: 2, \u001b[;33;mdur_enc_hidden_stride_kernel\u001b[0m: ['0,2,3', '0,2,3', '0,1,3'], \n",
"\u001b[;33;mdur_loss\u001b[0m: mse, \u001b[;33;mdur_predictor_kernel\u001b[0m: 3, \u001b[;33;mdur_predictor_layers\u001b[0m: 5, \u001b[;33;menc_ffn_kernel_size\u001b[0m: 9, \u001b[;33;menc_layers\u001b[0m: 4, \n",
"\u001b[;33;mencoder_K\u001b[0m: 8, \u001b[;33;mencoder_type\u001b[0m: fft, \u001b[;33;mendless_ds\u001b[0m: False, \u001b[;33;mf0_bin\u001b[0m: 256, \u001b[;33;mf0_max\u001b[0m: 1100.0, \n",
"\u001b[;33;mf0_min\u001b[0m: 40.0, \u001b[;33;mffn_act\u001b[0m: gelu, \u001b[;33;mffn_padding\u001b[0m: SAME, \u001b[;33;mfft_size\u001b[0m: 2048, \u001b[;33;mfmax\u001b[0m: 16000, \n",
"\u001b[;33;mfmin\u001b[0m: 40, \u001b[;33;mfs2_ckpt\u001b[0m: , \u001b[;33;mgaussian_start\u001b[0m: True, \u001b[;33;mgen_dir_name\u001b[0m: , \u001b[;33;mgen_tgt_spk_id\u001b[0m: -1, \n",
"\u001b[;33;mhidden_size\u001b[0m: 256, \u001b[;33;mhop_size\u001b[0m: 512, \u001b[;33;mhubert_gpu\u001b[0m: True, \u001b[;33;mhubert_path\u001b[0m: checkpoints/hubert/hubert_soft.pt, \u001b[;33;minfer\u001b[0m: False, \n",
"\u001b[;33;mkeep_bins\u001b[0m: 128, \u001b[;33;mlambda_commit\u001b[0m: 0.25, \u001b[;33;mlambda_energy\u001b[0m: 0.0, \u001b[;33;mlambda_f0\u001b[0m: 1.0, \u001b[;33;mlambda_ph_dur\u001b[0m: 0.3, \n",
"\u001b[;33;mlambda_sent_dur\u001b[0m: 1.0, \u001b[;33;mlambda_uv\u001b[0m: 1.0, \u001b[;33;mlambda_word_dur\u001b[0m: 1.0, \u001b[;33;mload_ckpt\u001b[0m: , \u001b[;33;mlog_interval\u001b[0m: 100, \n",
"\u001b[;33;mloud_norm\u001b[0m: False, \u001b[;33;mlr\u001b[0m: 0.0008, \u001b[;33;mmax_beta\u001b[0m: 0.02, \u001b[;33;mmax_epochs\u001b[0m: 3000, \u001b[;33;mmax_eval_sentences\u001b[0m: 1, \n",
"\u001b[;33;mmax_eval_tokens\u001b[0m: 60000, \u001b[;33;mmax_frames\u001b[0m: 42000, \u001b[;33;mmax_input_tokens\u001b[0m: 60000, \u001b[;33;mmax_sentences\u001b[0m: 8, \u001b[;33;mmax_tokens\u001b[0m: 128000, \n",
"\u001b[;33;mmax_updates\u001b[0m: 1000000, \u001b[;33;mmel_loss\u001b[0m: ssim:0.5|l1:0.5, \u001b[;33;mmel_vmax\u001b[0m: 1.5, \u001b[;33;mmel_vmin\u001b[0m: -6.0, \u001b[;33;mmin_level_db\u001b[0m: -120, \n",
"\u001b[;33;mno_fs2\u001b[0m: True, \u001b[;33;mnorm_type\u001b[0m: gn, \u001b[;33;mnum_ckpt_keep\u001b[0m: 9999, \u001b[;33;mnum_heads\u001b[0m: 2, \u001b[;33;mnum_sanity_val_steps\u001b[0m: 1, \n",
"\u001b[;33;mnum_spk\u001b[0m: 1, \u001b[;33;mnum_test_samples\u001b[0m: 0, \u001b[;33;mnum_valid_plots\u001b[0m: 10, \u001b[;33;moptimizer_adam_beta1\u001b[0m: 0.9, \u001b[;33;moptimizer_adam_beta2\u001b[0m: 0.98, \n",
"\u001b[;33;mout_wav_norm\u001b[0m: False, \u001b[;33;mpe_ckpt\u001b[0m: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, \u001b[;33;mpe_enable\u001b[0m: False, \u001b[;33;mperform_enhance\u001b[0m: True, \u001b[;33;mpitch_ar\u001b[0m: False, \n",
"\u001b[;33;mpitch_enc_hidden_stride_kernel\u001b[0m: ['0,2,5', '0,2,5', '0,2,5'], \u001b[;33;mpitch_extractor\u001b[0m: parselmouth, \u001b[;33;mpitch_loss\u001b[0m: l2, \u001b[;33;mpitch_norm\u001b[0m: log, \u001b[;33;mpitch_type\u001b[0m: frame, \n",
"\u001b[;33;mpndm_speedup\u001b[0m: 10, \u001b[;33;mpre_align_args\u001b[0m: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, \u001b[;33;mpre_align_cls\u001b[0m: data_gen.singing.pre_align.SingingPreAlign, \u001b[;33;mpredictor_dropout\u001b[0m: 0.5, \u001b[;33;mpredictor_grad\u001b[0m: 0.1, \n",
"\u001b[;33;mpredictor_hidden\u001b[0m: -1, \u001b[;33;mpredictor_kernel\u001b[0m: 5, \u001b[;33;mpredictor_layers\u001b[0m: 5, \u001b[;33;mprenet_dropout\u001b[0m: 0.5, \u001b[;33;mprenet_hidden_size\u001b[0m: 256, \n",
"\u001b[;33;mpretrain_fs_ckpt\u001b[0m: , \u001b[;33;mprocessed_data_dir\u001b[0m: xxx, \u001b[;33;mprofile_infer\u001b[0m: False, \u001b[;33;mraw_data_dir\u001b[0m: data/raw/test, \u001b[;33;mref_norm_layer\u001b[0m: bn, \n",
"\u001b[;33;mrel_pos\u001b[0m: True, \u001b[;33;mreset_phone_dict\u001b[0m: True, \u001b[;33;mresidual_channels\u001b[0m: 384, \u001b[;33;mresidual_layers\u001b[0m: 20, \u001b[;33;msave_best\u001b[0m: False, \n",
"\u001b[;33;msave_ckpt\u001b[0m: True, \u001b[;33;msave_codes\u001b[0m: ['configs', 'modules', 'src', 'utils'], \u001b[;33;msave_f0\u001b[0m: True, \u001b[;33;msave_gt\u001b[0m: False, \u001b[;33;mschedule_type\u001b[0m: linear, \n",
"\u001b[;33;mseed\u001b[0m: 1234, \u001b[;33;msort_by_len\u001b[0m: True, \u001b[;33;mspeaker_id\u001b[0m: test, \u001b[;33;mspec_max\u001b[0m: [0.0], \u001b[;33;mspec_min\u001b[0m: [-5.0], \n",
"\u001b[;33;mspk_cond_steps\u001b[0m: [], \u001b[;33;mstop_token_weight\u001b[0m: 5.0, \u001b[;33;mtask_cls\u001b[0m: training.task.SVC_task.SVCTask, \u001b[;33;mtest_ids\u001b[0m: [], \u001b[;33;mtest_input_dir\u001b[0m: , \n",
"\u001b[;33;mtest_num\u001b[0m: 0, \u001b[;33;mtest_prefixes\u001b[0m: ['test'], \u001b[;33;mtest_set_name\u001b[0m: test, \u001b[;33;mtimesteps\u001b[0m: 1000, \u001b[;33;mtrain_set_name\u001b[0m: train, \n",
"\u001b[;33;muse_crepe\u001b[0m: True, \u001b[;33;muse_denoise\u001b[0m: False, \u001b[;33;muse_energy_embed\u001b[0m: False, \u001b[;33;muse_gt_dur\u001b[0m: False, \u001b[;33;muse_gt_f0\u001b[0m: False, \n",
"\u001b[;33;muse_midi\u001b[0m: False, \u001b[;33;muse_nsf\u001b[0m: True, \u001b[;33;muse_pitch_embed\u001b[0m: True, \u001b[;33;muse_pos_embed\u001b[0m: True, \u001b[;33;muse_spk_embed\u001b[0m: False, \n",
"\u001b[;33;muse_spk_id\u001b[0m: False, \u001b[;33;muse_split_spk_id\u001b[0m: False, \u001b[;33;muse_uv\u001b[0m: False, \u001b[;33;muse_var_enc\u001b[0m: False, \u001b[;33;muse_vec\u001b[0m: False, \n",
"\u001b[;33;mval_check_interval\u001b[0m: 5000, \u001b[;33;mvalid_num\u001b[0m: 0, \u001b[;33;mvalid_set_name\u001b[0m: valid, \u001b[;33;mvalidate\u001b[0m: False, \u001b[;33;mvocoder\u001b[0m: network.vocoders.nsf_hifigan.NsfHifiGAN, \n",
"\u001b[;33;mvocoder_ckpt\u001b[0m: checkpoints/nsf_hifigan/model, \u001b[;33;mwarmup_updates\u001b[0m: 2000, \u001b[;33;mwav2spec_eps\u001b[0m: 1e-6, \u001b[;33;mweight_decay\u001b[0m: 0, \u001b[;33;mwin_size\u001b[0m: 2048, \n",
"\u001b[;33;mwork_dir\u001b[0m: checkpoints/test, \n",
"| Mel losses: {'ssim': 0.5, 'l1': 0.5}\n",
"| Load HifiGAN: checkpoints/nsf_hifigan/model\n",
"Removing weight norm...\n",
"01/29 01:13:25 PM gpu available: True, used: True\n",
"| model Trainable Parameters: 33.709M\n",
"Validation sanity check: 0%| | 0/1 [00:00<?, ?batch/s]\n",
"sample time step: 0%| | 0/100 [00:00<?, ?it/s]\u001b[A\n",
"sample time step: 1%|▎ | 1/100 [00:00<00:14, 6.79it/s]\u001b[A\n",
"sample time step: 5%|█▎ | 5/100 [00:00<00:04, 21.48it/s]\u001b[A\n",
"sample time step: 9%|██▎ | 9/100 [00:00<00:03, 28.39it/s]\u001b[A\n",
"sample time step: 13%|███ | 13/100 [00:00<00:02, 32.18it/s]\u001b[A\n",
"sample time step: 17%|████ | 17/100 [00:00<00:02, 34.64it/s]\u001b[A\n",
"sample time step: 21%|█████ | 21/100 [00:00<00:02, 35.53it/s]\u001b[A\n",
"sample time step: 25%|██████ | 25/100 [00:00<00:02, 36.71it/s]\u001b[A\n",
"sample time step: 29%|██████▉ | 29/100 [00:00<00:01, 37.32it/s]\u001b[A\n",
"sample time step: 34%|████████▏ | 34/100 [00:01<00:01, 38.54it/s]\u001b[A\n",
"sample time step: 39%|█████████▎ | 39/100 [00:01<00:01, 39.62it/s]\u001b[A\n",
"sample time step: 43%|██████████▎ | 43/100 [00:01<00:01, 39.46it/s]\u001b[A\n",
"sample time step: 47%|███████████▎ | 47/100 [00:01<00:01, 39.42it/s]\u001b[A\n",
"sample time step: 51%|████████████▏ | 51/100 [00:01<00:01, 39.17it/s]\u001b[A\n",
"sample time step: 55%|█████████████▏ | 55/100 [00:01<00:01, 39.39it/s]\u001b[A\n",
"sample time step: 60%|██████████████▍ | 60/100 [00:01<00:01, 39.86it/s]\u001b[A\n",
"sample time step: 65%|███████████████▌ | 65/100 [00:01<00:00, 40.32it/s]\u001b[A\n",
"sample time step: 70%|████████████████▊ | 70/100 [00:01<00:00, 40.65it/s]\u001b[A\n",
"sample time step: 75%|██████████████████ | 75/100 [00:02<00:00, 40.33it/s]\u001b[A\n",
"sample time step: 80%|███████████████████▏ | 80/100 [00:02<00:00, 39.63it/s]\u001b[A\n",
"sample time step: 84%|████████████████████▏ | 84/100 [00:02<00:00, 39.29it/s]\u001b[A\n",
"sample time step: 88%|█████████████████████ | 88/100 [00:02<00:00, 38.12it/s]\u001b[A\n",
"sample time step: 92%|██████████████████████ | 92/100 [00:02<00:00, 38.33it/s]\u001b[A\n",
"sample time step: 96%|███████████████████████ | 96/100 [00:02<00:00, 35.34it/s]\u001b[A\n",
"sample time step: 100%|███████████████████████| 100/100 [00:02<00:00, 36.86it/s]\u001b[A\n",
"\n",
"==============\n",
" valid results: {'total_loss': 1.0046, 'mel': 1.0046}\n",
"==============\n",
"\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1: : 1batch [00:02, 2.28s/batch, batch_size=5, lr=0.0008, mel=1, step=0] \n",
"==============\n",
" Epoch 0 ended. Steps: 0. {'total_loss': 1.0014, 'mel': 1.0014, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 2: : 2batch [00:03, 1.44s/batch, batch_size=5, lr=0.0008, mel=1, step=1]\n",
"==============\n",
" Epoch 1 ended. Steps: 1. {'total_loss': 1.0013, 'mel': 1.0013, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 3: : 3batch [00:03, 1.16s/batch, batch_size=5, lr=0.0008, mel=1, step=2]\n",
"==============\n",
" Epoch 2 ended. Steps: 2. {'total_loss': 1.0019, 'mel': 1.0019, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 4: : 4batch [00:04, 1.02s/batch, batch_size=5, lr=0.0008, mel=0.997, step=3]\n",
"==============\n",
" Epoch 3 ended. Steps: 3. {'total_loss': 0.9974, 'mel': 0.9974, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 5: : 5batch [00:05, 1.05batch/s, batch_size=5, lr=0.0008, mel=0.995, step=4]\n",
"==============\n",
" Epoch 4 ended. Steps: 4. {'total_loss': 0.9952, 'mel': 0.9952, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 6: : 6batch [00:06, 1.06batch/s, batch_size=5, lr=0.0008, mel=0.993, step=5]\n",
"==============\n",
" Epoch 5 ended. Steps: 5. {'total_loss': 0.9933, 'mel': 0.9933, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 7: : 7batch [00:07, 1.06batch/s, batch_size=5, lr=0.0008, mel=0.996, step=6]\n",
"==============\n",
" Epoch 6 ended. Steps: 6. {'total_loss': 0.9956, 'mel': 0.9956, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 8: : 8batch [00:08, 1.10batch/s, batch_size=5, lr=0.0008, mel=0.987, step=7]\n",
"==============\n",
" Epoch 7 ended. Steps: 7. {'total_loss': 0.9865, 'mel': 0.9865, 'batch_size': 5.0, 'lr': 0.0008}\n",
"==============\n",
"\n",
"Epoch 9: : 8batch [00:08, 1.10batch/s, batch_size=5, lr=0.0008, mel=0.987, step=7]^C\n",
"Traceback (most recent call last):\n",
" File \"/workspace/t4-20230125/diff-svc/run.py\", line 15, in <module>\n",
" run_task()\n",
" File \"/workspace/t4-20230125/diff-svc/run.py\", line 11, in run_task\n",
" task_cls.start()\n",
" File \"/workspace/t4-20230125/diff-svc/training/task/base_task.py\", line 234, in start\n",
" trainer.fit(task)\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 495, in fit\n",
" self.run_pretrain_routine(model)\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 588, in run_pretrain_routine\n",
" self.train()\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 1364, in train\n",
" self.run_training_epoch()\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 1398, in run_training_epoch\n",
" output = self.run_training_batch(batch, batch_idx)\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 1520, in run_training_batch\n",
" loss = optimizer_closure()\n",
" File \"/workspace/t4-20230125/diff-svc/utils/pl_utils.py\", line 1503, in optimizer_closure\n",
" model_ref.backward(closure_loss, optimizer)\n",
" File \"/workspace/t4-20230125/diff-svc/training/task/base_task.py\", line 316, in backward\n",
" loss.backward()\n",
" File \"/usr/local/lib/python3.8/dist-packages/torch/_tensor.py\", line 307, in backward\n"
]
}
],
"source": [
"# if error occurs, edit config file and run again. don't need to run binarize.py again\n",
"os.environ['PYTHONPATH']='.'\n",
"run_path = os.path.join(repo_dir, 'run.py')\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n",
"! python {run_path} --config {your_config_path} --exp_name {speaker_name} --reset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "d3355b554e33c79ba315c6a34d2d5bc309be1808e07ad4360a975b20076fde3d"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment