Skip to content

Instantly share code, notes, and snippets.

@tarepan
Last active July 1, 2022 13:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tarepan/3f6fc2d956b91ea82768ded795b8c802 to your computer and use it in GitHub Desktop.
Save tarepan/3f6fc2d956b91ea82768ded795b8c802 to your computer and use it in GitHub Desktop.
cpc.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "cpc.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyP7qzvU+VoKBwuZq64ObVL+",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/tarepan/3f6fc2d956b91ea82768ded795b8c802/cpc.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0ArOk3jdepOT"
},
"source": [
"# CPC inference\n",
"Infer discrete representation from audio with modified CPC & K-means. \n",
"Origin: [Fairseq/GSLM/speech2unit](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit) "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xUrWuJqRe7rm"
},
"source": [
"## File preparation\n",
"You can skip this step. \n",
"Prepare audio files and pretraining data ([modified CPC](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit#acoustic-model), [K-means](https://github.com/pytorch/fairseq/tree/main/examples/textless_nlp/gslm/speech2unit#quantization-model)).\n",
"\n",
"### Example file placement\n",
"```\n",
"km.bin (Dumped K-means model)\n",
"cpc_big_ll6kh_top_cpc.pt (Modified CPC weight)\n",
"source.txt (audio manifest file)\n",
"/wav_source\n",
" hello.wav (48kHz 3sec .wav file)\n",
" world.wav (48kHz 4sec .wav file)\n",
"```"
]
},
{
"cell_type": "code",
"source": [
"!wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/cpc_big_ll6kh_top_ctc.pt\n",
"!wget https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/km100/km.bin"
],
"metadata": {
"id": "sSV5egiJ46NJ",
"outputId": "d17ad8d0-a91b-422f-c6cb-c32b03a5d322",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"--2022-07-01 13:12:10-- https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/cpc_big_ll6kh_top_ctc.pt\n",
"Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n",
"Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 54650850 (52M) [application/octet-stream]\n",
"Saving to: ‘cpc_big_ll6kh_top_ctc.pt’\n",
"\n",
"cpc_big_ll6kh_top_c 100%[===================>] 52.12M 233KB/s in 82s \n",
"\n",
"2022-07-01 13:13:33 (647 KB/s) - ‘cpc_big_ll6kh_top_ctc.pt’ saved [54650850/54650850]\n",
"\n",
"--2022-07-01 13:13:33-- https://dl.fbaipublicfiles.com/textless_nlp/gslm/cpc/km100/km.bin\n",
"Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n",
"Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 205375 (201K) [application/octet-stream]\n",
"Saving to: ‘km.bin’\n",
"\n",
"km.bin 100%[===================>] 200.56K 396KB/s in 0.5s \n",
"\n",
"2022-07-01 13:13:35 (396 KB/s) - ‘km.bin’ saved [205375/205375]\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5k1rmJoGie2y"
},
"source": [
"### File listup\n",
"Manifest file specify target audio files. Below is example.\n",
"\n",
"```\n",
"# manifest.txt\n",
"../wav_source\n",
"hello_16k.wav\t300\n",
"world_16k.wav\t400\n",
"```"
]
},
{
"cell_type": "code",
"source": [
"!echo \"\" > manifest.txt"
],
"metadata": {
"id": "tdGx--cy5HSj"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "3zqKLYgYkhP3"
},
"source": [
"### Sampling rate adjustment"
]
},
{
"cell_type": "code",
"metadata": {
"id": "0I7u9fgwwDr9"
},
"source": [
"import os\n",
"import librosa\n",
"import soundfile\n",
"\n",
"def resample_16k(dir:str, name:str, extension: str):\n",
" \"\"\"Override a audio file with 16kHz sampling rate.\n",
" \"\"\"\n",
" wave, sr = librosa.load(f\"{dir}/{name}.{extension}\")\n",
" target_sr = 16000\n",
" wave_16k = librosa.resample(wave, sr, target_sr)\n",
" soundfile.write(f\"{dir}/{name}_16k.{extension}\", wave_16k, target_sr)\n",
"\n",
"resample_16k(\"wav_source\", \"spl\", \"wav\")\n",
"# resample_16k(\"wav_source\", \"world\", \"wav\")"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import librosa\n",
"\n",
"librosa.load(\"../wav_source/spl_16k.wav\", sr=None)"
],
"metadata": {
"id": "19LLRsaG6d6n",
"outputId": "0f166cc4-d9f0-4aa2-a076-d6d55464fb69",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([ 9.9182129e-03, 1.7791748e-02, 1.6113281e-02, ...,\n",
" 0.0000000e+00, -3.0517578e-05, 0.0000000e+00], dtype=float32), 16000)"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dG2XnhF9eeMk"
},
"source": [
"## Fairseq setup"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ymxLY6P3nQVz",
"outputId": "4e1674e2-ea9b-41c2-eb37-55189e27a86e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"!git clone https://github.com/pytorch/fairseq\n",
"%cd fairseq\n",
"!pip install --editable ./"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'fairseq'...\n",
"remote: Enumerating objects: 31566, done.\u001b[K\n",
"remote: Counting objects: 100% (114/114), done.\u001b[K\n",
"remote: Compressing objects: 100% (70/70), done.\u001b[K\n",
"remote: Total 31566 (delta 55), reused 86 (delta 42), pack-reused 31452\u001b[K\n",
"Receiving objects: 100% (31566/31566), 21.84 MiB | 12.13 MiB/s, done.\n",
"Resolving deltas: 100% (23211/23211), done.\n",
"/content/fairseq\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Obtaining file:///content/fairseq\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Installing backend dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Preparing wheel metadata ... \u001b[?25l\u001b[?25hdone\n",
"Collecting hydra-core<1.1,>=1.0.7\n",
" Downloading hydra_core-1.0.7-py3-none-any.whl (123 kB)\n",
"\u001b[K |████████████████████████████████| 123 kB 4.3 MB/s \n",
"\u001b[?25hRequirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (2022.6.2)\n",
"Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.11.0+cu113)\n",
"Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (0.29.30)\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.21.6)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (4.64.0)\n",
"Collecting bitarray\n",
" Downloading bitarray-2.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (236 kB)\n",
"\u001b[K |████████████████████████████████| 236 kB 32.6 MB/s \n",
"\u001b[?25hCollecting omegaconf<2.1\n",
" Downloading omegaconf-2.0.6-py3-none-any.whl (36 kB)\n",
"Requirement already satisfied: cffi in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (1.15.0)\n",
"Collecting sacrebleu>=1.4.12\n",
" Downloading sacrebleu-2.1.0-py3-none-any.whl (92 kB)\n",
"\u001b[K |████████████████████████████████| 92 kB 11.0 MB/s \n",
"\u001b[?25hRequirement already satisfied: torchaudio>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from fairseq==0.12.2) (0.11.0+cu113)\n",
"Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from hydra-core<1.1,>=1.0.7->fairseq==0.12.2) (5.7.1)\n",
"Collecting antlr4-python3-runtime==4.8\n",
" Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)\n",
"\u001b[K |████████████████████████████████| 112 kB 51.3 MB/s \n",
"\u001b[?25hCollecting PyYAML>=5.1.*\n",
" Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
"\u001b[K |████████████████████████████████| 596 kB 44.2 MB/s \n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from omegaconf<2.1->fairseq==0.12.2) (4.1.1)\n",
"Collecting colorama\n",
" Downloading colorama-0.4.5-py2.py3-none-any.whl (16 kB)\n",
"Collecting portalocker\n",
" Downloading portalocker-2.4.0-py2.py3-none-any.whl (16 kB)\n",
"Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu>=1.4.12->fairseq==0.12.2) (0.8.9)\n",
"Requirement already satisfied: pycparser in /usr/local/lib/python3.7/dist-packages (from cffi->fairseq==0.12.2) (2.21)\n",
"Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from importlib-resources->hydra-core<1.1,>=1.0.7->fairseq==0.12.2) (3.8.0)\n",
"Building wheels for collected packages: antlr4-python3-runtime\n",
" Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141230 sha256=e03b6fd55f861ef90d2d52a6cca428e356cd02d089d9ebe874c2c9ac38ea39f2\n",
" Stored in directory: /root/.cache/pip/wheels/ca/33/b7/336836125fc9bb4ceaa4376d8abca10ca8bc84ddc824baea6c\n",
"Successfully built antlr4-python3-runtime\n",
"Installing collected packages: PyYAML, portalocker, omegaconf, colorama, antlr4-python3-runtime, sacrebleu, hydra-core, bitarray, fairseq\n",
" Attempting uninstall: PyYAML\n",
" Found existing installation: PyYAML 3.13\n",
" Uninstalling PyYAML-3.13:\n",
" Successfully uninstalled PyYAML-3.13\n",
" Running setup.py develop for fairseq\n",
"Successfully installed PyYAML-6.0 antlr4-python3-runtime-4.8 bitarray-2.5.1 colorama-0.4.5 fairseq hydra-core-1.0.7 omegaconf-2.0.6 portalocker-2.4.0 sacrebleu-2.1.0\n"
]
},
{
"output_type": "display_data",
"data": {
"application/vnd.colab-display-data+json": {
"pip_warning": {
"packages": [
"yaml"
]
}
}
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VyrVQ0XsfJ8v"
},
"source": [
"## Inference execution"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "42Dn_zzPnkQW",
"outputId": "5deea1c1-ed73-4233-991e-8c7acd879e98"
},
"source": [
"! PYTHONPATH=. python examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py \\\n",
" --feature_type cpc \\\n",
" --acoustic_model_path \"../cpc_big_ll6kh_top_ctc.pt\" \\\n",
" --kmeans_model_path \"../km.bin\" \\\n",
" --layer 1 \\\n",
" --manifest_path \"../manifest.txt\" \\\n",
" --out_quantized_file_path \"../out.txt\" \\\n",
" --extension \".wav\""
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2022-07-01 13:24:06 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX\n",
"2022-07-01 13:24:07 | INFO | __main__ | Namespace(acoustic_model_path='../cpc_big_ll6kh_top_ctc.pt', extension='.wav', feature_type='cpc', features_path=None, kmeans_model_path='../km.bin', layer=1, manifest_path='../manifest.txt', out_quantized_file_path='../out.txt')\n",
"2022-07-01 13:24:07 | INFO | __main__ | Extracting cpc acoustic features...\n",
" 0% 0/1 [00:00<?, ?it/s]\n",
"Traceback (most recent call last):\n",
" File \"examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py\", line 125, in <module>\n",
" main(args, logger)\n",
" File \"examples/textless_nlp/gslm/speech2unit/clustering/quantize_with_kmeans.py\", line 94, in main\n",
" flatten=False,\n",
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py\", line 83, in get_features\n",
" for features in tqdm.tqdm(iterator, total=num_files):\n",
" File \"/usr/local/lib/python3.7/dist-packages/tqdm/std.py\", line 1195, in __iter__\n",
" for obj in iterable:\n",
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/utils.py\", line 64, in iterate\n",
" feats = reader.get_feats(file_path)\n",
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/cpc_feature_reader.py\", line 39, in get_feats\n",
" x = self.read_audio(file_path, ref_len)\n",
" File \"/content/fairseq/examples/textless_nlp/gslm/speech2unit/pretrained/cpc_feature_reader.py\", line 29, in read_audio\n",
" wav, sr = sf.read(path)\n",
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 257, in read\n",
" subtype, endian, format, closefd) as f:\n",
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 629, in __init__\n",
" self._file = self._open(file, mode_int, closefd)\n",
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 1184, in _open\n",
" \"Error opening {0!r}: \".format(self.name))\n",
" File \"/usr/local/lib/python3.7/dist-packages/soundfile.py\", line 1357, in _error_check\n",
" raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))\n",
"RuntimeError: Error opening '../wav_source/spl_16k.wav 900': System error.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%cd fairseq"
],
"metadata": {
"id": "X2J2NpzV7q3e",
"outputId": "97e62cd2-6dc8-4185-b910-e86a2eaaa3f3",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"/content/fairseq\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5wbbym--fUPi",
"outputId": "c752eba8-fc48-4d63-aab5-41e83c8d2741"
},
"source": [
"!cat ../out.txt"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"cat: ../out.txt: No such file or directory\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment