-
-
Save tarepan/3f6fc2d956b91ea82768ded795b8c802 to your computer and use it in GitHub Desktop.
cpc.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "cpc.ipynb", | |
"provenance": [], | |
"collapsed_sections": [], | |
"authorship_tag": "ABX9TyP7qzvU+VoKBwuZq64ObVL+", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
}, | |
"accelerator": "GPU" | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/tarepan/3f6fc2d956b91ea82768ded795b8c802/cpc.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "0ArOk3jdepOT" | |
}, | |
"source": [ | |
"# CPC inference\n", | |
"Infer discrete representation from audio with modified CPC & K-means. \n", | |
"Origin: [Fairseq/GSLM/speech2unit](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit) " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "xUrWuJqRe7rm" | |
}, | |
"source": [ | |
"## File preparation\n", | |
"You can skip this step. \n", | |
"Prepare audio files and pretraining data ([modified CPC](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit#acoustic-model), [K-means](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit#quantization-model)).\n", | |
"\n", | |
"### Example file placement\n", | |
"```\n", | |
"km.bin (Dumped K-means model)\n", | |
"cpc_big_ll6kh_top_cpc.pt (Modified CPC weight)\n", | |
"source.txt (audio manifest file)\n", | |
"/wav_source\n", | |
" hello.wav (48kHz 3sec .wav file)\n", | |
" world.wav (48kHz 4sec .wav file)\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/cpc_big_ll6kh_top_ctc.pt\n", | |
"!wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/km100/km.bin" | |
], | |
"metadata": { | |
"id": "sSV5egiJ46NJ", | |
"outputId": "d17ad8d0-a91b-422f-c6cb-c32b03a5d322", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"execution_count": 1, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"--2022-07-01 13:12:10-- https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/cpc_big_ll6kh_top_ctc.pt\n", | |
"Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n", | |
"Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n", | |
"HTTP request sent, awaiting response... 200 OK\n", | |
"Length: 54650850 (52M) [application/octet-stream]\n", | |
"Saving to: ‘cpc_big_ll6kh_top_ctc.pt’\n", | |
"\n", | |
"cpc_big_ll6kh_top_c 100%[===================>] 52.12M 233KB/s in 82s \n", | |
"\n", | |
"2022-07-01 13:13:33 (647 KB/s) - ‘cpc_big_ll6kh_top_ctc.pt’ saved [54650850/54650850]\n", | |
"\n", | |
"--2022-07-01 13:13:33-- https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/km100/km.bin\n", | |
"Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n", | |
"Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n", | |
"HTTP request sent, awaiting response... 200 OK\n", | |
"Length: 205375 (201K) [application/octet-stream]\n", | |
"Saving to: ‘km.bin’\n", | |
"\n", | |
"km.bin 100%[===================>] 200.56K 396KB/s in 0.5s \n", | |
"\n", | |
"2022-07-01 13:13:35 (396 KB/s) - ‘km.bin’ saved [205375/205375]\n", | |
"\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "5k1rmJoGie2y" | |
}, | |
"source": [ | |
"### File listup\n", | |
"Manifest file specify target audio files. Below is example.\n", | |
"\n", | |
"```\n", | |
"# manifest.txt\n", | |
"../wav_source\n", | |
"hello_16k.wav\t300\n", | |
"world_16k.wav\t400\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"!echo \"\" > manifest.txt" | |
], | |
"metadata": { | |
"id": "tdGx--cy5HSj" | |
}, | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "3zqKLYgYkhP3" | |
}, | |
"source": [ | |
"### Sampling rate adjustment" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "0I7u9fgwwDr9" | |
}, | |
"source": [ | |
"import os\n", | |
"import librosa\n", | |
"import soundfile\n", | |
"\n", | |
"def resample_16k(dir:str, name:str, extension: str):\n", | |
" \"\"\"Override a audio file with 16kHz sampling rate.\n", | |
" \"\"\"\n", | |
" wave, sr = librosa.load(f\"{dir}/{name}.{extension}\")\n", | |
" target_sr = 16000\n", | |
" wave_16k = librosa.resample(wave, sr, target_sr)\n", | |
" soundfile.write(f\"{dir}/{name}_16k.{extension}\", wave_16k, target_sr)\n", | |
"\n", | |
"resample_16k(\"wav_source\", \"spl\", \"wav\")\n", | |
"# resample_16k(\"wav_source\", \"world\", \"wav\")" | |
], | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"import librosa\n", | |
"\n", | |
"librosa.load(\"../wav_source/spl_16k.wav\", sr=None)" | |
], | |
"metadata": { | |
"id": "19LLRsaG6d6n", | |
"outputId": "0f166cc4-d9f0-4aa2-a076-d6d55464fb69", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"(array([ 9.9182129e-03, 1.7791748e-02, 1.6113281e-02, ...,\n", | |
" 0.0000000e+00, -3.0517578e-05, 0.0000000e+00], dtype=float32), 16000)" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 10 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "dG2XnhF9eeMk" | |
}, | |
"source": [ | |
"## Fairseq setup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "ymxLY6P3nQVz", | |
"outputId": "4e1674e2-ea9b-41c2-eb37-55189e27a86e", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 1000 | |
} | |
}, | |
"source": [ | |
"!git clone https://github.com/pytorch/fairseq\n", | |
"%cd fairseq\n", | |
"!pip install --editable ./" | |
], | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Cloning into 'fairseq'...\n", | |
"remote: Enumerating objects: 31566, done.\u001b[K\n", | |
"remote: Counting objects: 100% (114/114), done.\u001b[K\n", | |
"remote: Compressing objects: 100% (70/70), done.\u001b[K\n", | |
"remote: Total 31566 (delta 55), reused 86 (delta 42), pack-reused 31452\u001b[K\n", | |
"Receiving objects: 100% (31566/31566), 21.84 MiB | 12.13 MiB/s, done.\n", | |
"Resolving deltas: 100% (23211/23211), done.\n", | |
"/content/fairseq\n", | |
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", | |
"Obtaining file:///content/fairseq\n", | |
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", | |
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", | |
" Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n", | |
" Preparing wheel metadata ... \u001b[?25l\u001b[?25hdone\n", | |
"Collecting hydra-core<1.1,>=1.0.7\n", | |
" Downloading hydra_core-1.0.7-py3-none-any.whl (123 kB)\n", | |
"\u001b[K |████████████████████████████████| 123 kB 4.3 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (2022.6.2)\n", | |
"Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.11.0+cu113)\n", | |
"Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (0.29.30)\n", | |
"Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.21.6)\n", | |
"Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (4.64.0)\n", | |
"Collecting bitarray\n", | |
" Downloading bitarray-2.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (236 kB)\n", | |
"\u001b[K |████████████████████████████████| 236 kB 32.6 MB/s \n", | |
"\u001b[?25hCollecting omegaconf<2.1\n", | |
" Downloading omegaconf-2.0.6-py3-none-any.whl (36 kB)\n", | |
"Requirement already satisfied: cffi in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.15.0)\n", | |
"Collecting sacrebleu>=1.4.12\n", | |
" Downloading sacrebleu-2.1.0-py3-none-any.whl (92 kB)\n", | |
"\u001b[K |████████████████████████████████| 92 kB 11.0 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: torchaudio>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (0.11.0+cu113)\n", | |
"Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from hydra-core<1.1,>=1.0.7->fairseq==0.12.2) (5.7.1)\n", | |
"Collecting antlr4-python3-runtime==4.8\n", | |
" Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)\n", | |
"\u001b[K |████████████████████████████████| 112 kB 51.3 MB/s \n", | |
"\u001b[?25hCollecting PyYAML>=5.1.*\n", | |
" Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n", | |
"\u001b[K |████████████████████████████████| 596 kB 44.2 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from omegaconf<2.1->fairseq==0.12.2) (4.1.1)\n", | |
"Collecting colorama\n", | |
" Downloading colorama-0.4.5-py2.py3-none-any.whl (16 kB)\n", | |
"Collecting portalocker\n", | |
" Downloading portalocker-2.4.0-py2.py3-none-any.whl (16 kB)\n", | |
"Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu>=1.4.12->fairseq==0.12.2) (0.8.9)\n", | |
"Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi->fairseq==0.12.2) (2.21)\n", | |
"Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from importlib-resources->hydra-core<1.1,>=1.0.7->fairseq==0.12.2) (3.8.0)\n", | |
"Building wheels for collected packages: antlr4-python3-runtime\n", | |
" Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141230 sha256=e03b6fd55f861ef90d2d52a6cca428e356cd02d089d9ebe874c2c9ac38ea39f2\n", | |
" Stored in directory: /root/.cache/pip/wheels/ca/33/b7/336836125fc9bb4ceaa4376d8abca10ca8bc84ddc824baea6c\n", | |
"Successfully built antlr4-python3-runtime\n", | |
"Installing collected packages: PyYAML, portalocker, omegaconf, colorama, antlr4-python3-runtime, sacrebleu, hydra-core, bitarray, fairseq\n", | |
" Attempting uninstall: PyYAML\n", | |
" Found existing installation: PyYAML 3.13\n", | |
" Uninstalling PyYAML-3.13:\n", | |
" Successfully uninstalled PyYAML-3.13\n", | |
" Running setup.py develop for fairseq\n", | |
"Successfully installed PyYAML-6.0 antlr4-python3-runtime-4.8 bitarray-2.5.1 colorama-0.4.5 fairseq hydra-core-1.0.7 omegaconf-2.0.6 portalocker-2.4.0 sacrebleu-2.1.0\n" | |
] | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"application/vnd.colab-display-data+json": { | |
"pip_warning": { | |
"packages": [ | |
"yaml" | |
] | |
} | |
} | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "VyrVQ0XsfJ8v" | |
}, | |
"source": [ | |
"## Inference execution" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "42Dn_zzPnkQW", | |
"outputId": "5deea1c1-ed73-4233-991e-8c7acd879e98" | |
}, | |
"source": [ | |
"! PYTHONPATH=. python examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py \\\n", | |
" --feature_type cpc \\\n", | |
" --acoustic_model_path \"../cpc_big_ll6kh_top_ctc.pt\" \\\n", | |
" --kmeans_model_path \"../km.bin\" \\\n", | |
" --layer 1 \\\n", | |
" --manifest_path \"../manifest.txt\" \\\n", | |
" --out_quantized_file_path \"../out.txt\" \\\n", | |
" --extension \".wav\"" | |
], | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"2022-07-01 13:24:06 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX\n", | |
"2022-07-01 13:24:07 | INFO | __main__ | Namespace(acoustic_model_path='../cpc_big_ll6kh_top_ctc.pt', extension='.wav', feature_type='cpc', features_path=None, kmeans_model_path='../km.bin', layer=1, manifest_path='../manifest.txt', out_quantized_file_path='../out.txt')\n", | |
"2022-07-01 13:24:07 | INFO | __main__ | Extracting cpc acoustic features...\n", | |
" 0% 0/1 [00:00<?, ?it/s]\n", | |
"Traceback (most recent call last):\n", | |
" File \"examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py\", line 125, in <module>\n", | |
" main(args, logger)\n", | |
" File \"examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py\", line 94, in main\n", | |
" flatten=False,\n", | |
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py\", line 83, in get_features\n", | |
" for features in tqdm.tqdm(iterator, total=num_files):\n", | |
" File \"/usr/local/lib/python3.7/dist-packages/tqdm/std.py\", line 1195, in __iter__\n", | |
" for obj in iterable:\n", | |
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py\", line 64, in iterate\n", | |
" feats = reader.get_feats(file_path)\n", | |
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/cpc_feature_reader.py\", line 39, in get_feats\n", | |
" x = self.read_audio(file_path, ref_len)\n", | |
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/cpc_feature_reader.py\", line 29, in read_audio\n", | |
" wav, sr = sf.read(path)\n", | |
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 257, in read\n", | |
" subtype, endian, format, closefd) as f:\n", | |
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 629, in __init__\n", | |
" self._file = self._open(file, mode_int, closefd)\n", | |
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 1184, in _open\n", | |
" \"Error opening {0!r}: \".format(self.name))\n", | |
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 1357, in _error_check\n", | |
" raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))\n", | |
"RuntimeError: Error opening '../wav_source/spl_16k.wav 900': System error.\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"%cd fairseq" | |
], | |
"metadata": { | |
"id": "X2J2NpzV7q3e", | |
"outputId": "97e62cd2-6dc8-4185-b910-e86a2eaaa3f3", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"execution_count": 2, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"/content/fairseq\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "5wbbym--fUPi", | |
"outputId": "c752eba8-fc48-4d63-aab5-41e83c8d2741" | |
}, | |
"source": [ | |
"!cat ../out.txt" | |
], | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"cat: ../out.txt: No such file or directory\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment