pszemraj/rpunct-vid2cleantext-single-demo.ipynb

## rpunct-vid2cleantext-single-demo.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/pszemraj/3416fee817a4045eb67a67c7c6c17aed/rpunct-vid2cleantext-single-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oiurzXGg8DyC"
      },
      "source": [
        "# vid2cleantxt - Demo (single file)\n",
        "\n",
        "> PURPOSE: MVP demo of vid2cleantxt, transcribes a single media file that is downloaded from a URL\n",
        "\n",
        "- developed as part of the [vid2cleantxt](https://github.com/pszemraj/vid2cleantxt) repo\n",
        "- by [Peter Szemraj](https://github.com/pszemraj)\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "cellView": "form",
        "id": "mIZn1aQe1rYk",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f3c40a29-d191-4eb5-fa49-d03e4942aed8"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Sat Feb  5 15:01:29 2022       \n",
            "+-----------------------------------------------------------------------------+\n",
            "| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |\n",
            "|-------------------------------+----------------------+----------------------+\n",
            "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
            "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
            "|                               |                      |               MIG M. |\n",
            "|===============================+======================+======================|\n",
            "|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |\n",
            "| N/A   34C    P0    25W / 300W |      0MiB / 16160MiB |      0%      Default |\n",
            "|                               |                      |                  N/A |\n",
            "+-------------------------------+----------------------+----------------------+\n",
            "                                                                               \n",
            "+-----------------------------------------------------------------------------+\n",
            "| Processes:                                                                  |\n",
            "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
            "|        ID   ID                                                   Usage      |\n",
            "|=============================================================================|\n",
            "|  No running processes found                                                 |\n",
            "+-----------------------------------------------------------------------------+\n"
          ]
        }
      ],
      "source": [
        "#@title print out GPU info\n",
        "#@markdown this is the Colab-allocated GPU. \n",
        "#@markdown - <font color=\"orange\"> If the output here says it fails, no\n",
        "#@markdown GPU is being used. go to runtime at the top of your colab to set runtime to GPU.\n",
        "#@markdown - To change runtime, go to Runtime->Change Runtime Type and set Hardware Acceleration to GPU </font>\n",
        "\n",
        "\n",
        "!nvidia-smi"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2sVrAUaTCCWx"
      },
      "source": [
        "# setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "cellView": "form",
        "id": "LrDWdEzv3LaX"
      },
      "outputs": [],
      "source": [
        "#@markdown add auto-Colab formatting with `IPython.display`\n",
        "from IPython.display import HTML, display\n",
        "# colab formatting\n",
        "def set_css():\n",
        "    display(\n",
        "        HTML(\n",
        "            \"\"\"\n",
        "  <style>\n",
        "    pre {\n",
        "        white-space: pre-wrap;\n",
        "    }\n",
        "  </style>\n",
        "  \"\"\"\n",
        "        )\n",
        "    )\n",
        "\n",
        "get_ipython().events.register(\"pre_run_cell\", set_css)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "cellView": "form",
        "id": "edpWDhRuYSNr",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 106
        },
        "outputId": "b4c969c1-d04f-46c8-95ce-ba91d8c686c6"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Will use the following as directory/file: \n",
            "/content/vid2cleantxt_demo\n",
            "Using the URL as the source for video files.\n",
            "Videos will be saved here: \n",
            "/content/downloaded_media\n"
          ]
        }
      ],
      "source": [
        "#@title setup inputs and outputs\n",
        "#@markdown - <font color=\"orange\"> **This is where you setup the input media.**</font>\n",
        "\n",
        "#@markdown - optionally, set `download_output_files` to `True` to download everything\n",
        "#@markdown as a zip file. \n",
        "import os\n",
        "from os.path import join\n",
        "from google.colab import files\n",
        "directory = \"/content/vid2cleantxt_demo\"  \n",
        "# set to false if you don't want it to download a zipped file of all the text\n",
        "download_output_files = True  # @param {type:\"boolean\"}\n",
        "use_url = True  \n",
        "URL_of_media = \"https://www.dropbox.com/s/ffh1olzclzoludv/Borat%20Cultural%20Learnings%20of%20America%20for%20Make%20Benefit%20G-2.mp4?dl=1\"  # @param {type:\"string\"}\n",
        "#@markdown for the sake of **this notebook** the URL is assumed to point to a **single media file**\n",
        "#@markdown i.e., if the link is pasted into your browser, it will immediately download a media file\n",
        "print(\"Will use the following as directory/file: \")\n",
        "print(directory)\n",
        "\n",
        "URL_save_folder = join(os.getcwd(), \"downloaded_media\")\n",
        "os.makedirs(URL_save_folder, exist_ok=True)\n",
        "\n",
        "print(\"Using the URL as the source for video files.\")\n",
        "print(\"Videos will be saved here: \\n{}\".format(URL_save_folder))\n",
        "from datetime import datetime\n",
        "\n",
        "run_start = datetime.now()\n",
        "tag_date = \"started_\" + run_start.strftime(\"%m/%d/%Y, %H-%M\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "STyM3PrkCPzk"
      },
      "source": [
        "# Install, Import \n",
        "\n",
        "- imports and installs may take several minutes."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#@markdown pandas\n",
        "!pip install -U -q pandas\n",
        "import pandas as pd"
      ],
      "metadata": {
        "cellView": "form",
        "id": "FNJdFWTRG_Xy",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 142
        },
        "outputId": "59b47c6a-7735-454b-8d4c-4d40653ba74f"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[K     |████████████████████████████████| 11.3 MB 13.9 MB/s \n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "rpunct 1.0.2 requires pandas==1.2.4, but you have pandas 1.3.5 which is incompatible.\n",
            "google-colab 1.0.0 requires ipykernel~=4.10, but you have ipykernel 6.8.0 which is incompatible.\n",
            "google-colab 1.0.0 requires ipython~=5.5.0, but you have ipython 7.31.1 which is incompatible.\n",
            "google-colab 1.0.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "cellView": "form",
        "id": "aU0vhImmKeOc",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 911
        },
        "outputId": "f5f3ceaa-a810-4c74-c566-2298b7ac67da"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Reading package lists... Done\n",
            "Building dependency tree       \n",
            "Reading state information... Done\n",
            "ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).\n",
            "The following packages were automatically installed and are no longer required:\n",
            "  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1\n",
            "  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1\n",
            "  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1\n",
            "  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0\n",
            "  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1\n",
            "  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1\n",
            "  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1\n",
            "  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0\n",
            "  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0\n",
            "  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0\n",
            "  cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-compute-11-1\n",
            "  cuda-nsight-systems-10-1 cuda-nsight-systems-11-0 cuda-nsight-systems-11-1\n",
            "  cuda-nvcc-10-0 cuda-nvcc-10-1 cuda-nvcc-11-0 cuda-nvdisasm-10-0\n",
            "  cuda-nvdisasm-10-1 cuda-nvdisasm-11-0 cuda-nvml-dev-10-0 cuda-nvml-dev-10-1\n",
            "  cuda-nvml-dev-11-0 cuda-nvprof-10-0 cuda-nvprof-10-1 cuda-nvprof-11-0\n",
            "  cuda-nvprune-10-0 cuda-nvprune-10-1 cuda-nvprune-11-0 cuda-nvtx-10-0\n",
            "  cuda-nvtx-10-1 cuda-nvtx-11-0 cuda-nvvp-10-0 cuda-nvvp-10-1 cuda-nvvp-11-0\n",
            "  cuda-nvvp-11-1 cuda-samples-10-0 cuda-samples-10-1 cuda-samples-11-0\n",
            "  cuda-samples-11-1 cuda-sanitizer-11-0 cuda-sanitizer-api-10-1\n",
            "  cuda-toolkit-10-0 cuda-toolkit-10-1 cuda-toolkit-11-0 cuda-toolkit-11-1\n",
            "  cuda-tools-10-0 cuda-tools-10-1 cuda-tools-11-0 cuda-tools-11-1\n",
            "  cuda-visual-tools-10-0 cuda-visual-tools-10-1 cuda-visual-tools-11-0\n",
            "  cuda-visual-tools-11-1 default-jre dkms freeglut3 freeglut3-dev\n",
            "  keyboard-configuration libargon2-0 libcap2 libcryptsetup12\n",
            "  libdevmapper1.02.1 libfontenc1 libidn11 libip4tc0 libjansson4\n",
            "  libnvidia-cfg1-510 libnvidia-common-460 libnvidia-common-510\n",
            "  libnvidia-extra-510 libnvidia-fbc1-510 libnvidia-gl-510 libpam-systemd\n",
            "  libpolkit-agent-1-0 libpolkit-backend-1-0 libpolkit-gobject-1-0 libxfont2\n",
            "  libxi-dev libxkbfile1 libxmu-dev libxmu-headers libxnvctrl0 libxtst6\n",
            "  nsight-compute-2020.2.1 nsight-compute-2022.1.0 nsight-systems-2020.3.2\n",
            "  nsight-systems-2020.3.4 nsight-systems-2021.5.2 nvidia-dkms-510\n",
            "  nvidia-kernel-common-510 nvidia-kernel-source-510 nvidia-modprobe\n",
            "  nvidia-settings openjdk-11-jre policykit-1 policykit-1-gnome python3-xkit\n",
            "  screen-resolution-extra systemd systemd-sysv udev x11-xkb-utils\n",
            "  xserver-common xserver-xorg-core-hwe-18.04 xserver-xorg-video-nvidia-510\n",
            "Use 'apt autoremove' to remove them.\n",
            "0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "google-colab 1.0.0 requires ipykernel~=4.10, but you have ipykernel 6.8.0 which is incompatible.\n",
            "google-colab 1.0.0 requires ipython~=5.5.0, but you have ipython 7.31.1 which is incompatible.\n",
            "google-colab 1.0.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.\u001b[0m\n",
            "data folder is set to `/usr/local/lib/python3.7/dist-packages/neuspell/../data` script\n",
            "CPU times: user 3.41 s, sys: 940 ms, total: 4.35 s\n",
            "Wall time: 56 s\n"
          ]
        }
      ],
      "source": [
        "%%time\n",
        "#@markdown import / install libraries as needed\n",
        "!pip install pysbd -q\n",
        "!pip install -U transformers -q\n",
        "!pip install wordninja -q\n",
        "!pip install yake -q\n",
        "!pip install symspellpy -q\n",
        "!pip install gputil -q\n",
        "!pip install humanize -q\n",
        "!pip install -U plotly -q\n",
        "!pip install moviepy --pre --upgrade -q\n",
        "!apt install ffmpeg\n",
        "!pip install -U tqdm -q\n",
        "!pip install -U neuspell -q\n",
        "!pip install clean-text[gpl] -q\n",
        "!pip install rpunct -q\n",
        "# !apt-get install ffmpeg # if you get ffmpeg errors\n",
        "\n",
        "import math, re\n",
        "import os, shutil, time, gc\n",
        "import pprint as pp\n",
        "from datetime import datetime\n",
        "from os import listdir\n",
        "from os.path import isfile, join\n",
        "\n",
        "import librosa\n",
        "import moviepy.editor as mp\n",
        "import moviepy\n",
        "import pandas as pd\n",
        "import pkg_resources\n",
        "import pysbd\n",
        "import torch\n",
        "import wordninja\n",
        "import yake\n",
        "from natsort import natsorted\n",
        "from symspellpy import SymSpell\n",
        "import transformers\n",
        "from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer\n",
        "import psutil\n",
        "import humanize\n",
        "import GPUtil\n",
        "import GPUtil as GPU\n",
        "import neuspell\n",
        "from tqdm.auto import tqdm\n",
        "from cleantext import clean"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9gqce-JlruPm"
      },
      "source": [
        "# Function Definitions\n",
        "\n",
        "- there is **a lot** of code in here, which is sort of organized. \n",
        "- It should only need to be opened / adjusted for debugging any errors or implementing improvements, or if you love reading python code."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CqAOMIuzGtUx"
      },
      "source": [
        "## generic functions "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "GgmDOcLH2Apm",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "175bf32f-5164-4c24-e799-01d8cfa1ec51"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "# define user functions\n",
        "\n",
        "\n",
        "def increase_font():\n",
        "    from IPython.display import Javascript\n",
        "\n",
        "    display(\n",
        "        Javascript(\n",
        "            \"\"\"\n",
        "  for (rule of document.styleSheets[0].cssRules){\n",
        "    if (rule.selectorText=='body') {\n",
        "      rule.style.fontSize = '24px'\n",
        "      break\n",
        "    }\n",
        "  }\n",
        "  \"\"\"\n",
        "        )\n",
        "    )\n",
        "\n",
        "\n",
        "def reset_font():\n",
        "    from IPython.display import Javascript\n",
        "\n",
        "    display(\n",
        "        Javascript(\n",
        "            \"\"\"\n",
        "  for (rule of document.styleSheets[0].cssRules){\n",
        "    if (rule.selectorText=='body') {\n",
        "      rule.style.fontSize = '14px'\n",
        "      break\n",
        "    }\n",
        "  }\n",
        "  \"\"\"\n",
        "        )\n",
        "    )\n",
        "\n",
        "\n",
        "def corr(s):\n",
        "    # adds space after period if there isn't one\n",
        "    # removes extra spaces\n",
        "    return re.sub(r\"\\.(?! )\", \". \", re.sub(r\" +\", \" \", s))\n",
        "\n",
        "\n",
        "def shorten_title(title_text, max_no=20):\n",
        "    if len(title_text) < max_no:\n",
        "        return title_text\n",
        "    else:\n",
        "        return title_text[:max_no] + \"...\"\n",
        "\n",
        "\n",
        "def digest_txt_directory(file_directory, identifer=\"\", verbose=False, make_folder=True):\n",
        "    run_date = datetime.now()\n",
        "    files_to_merge = natsorted(\n",
        "        [\n",
        "            f\n",
        "            for f in listdir(file_directory)\n",
        "            if isfile(join(file_directory, f)) & f.endswith(\".txt\")\n",
        "        ]\n",
        "    )\n",
        "    outfilename = (\n",
        "        \"Zealous_MERGED_words_\" + identifer + run_date.strftime(\"_%d%m%Y_%H\") + \".txt\"\n",
        "    )\n",
        "\n",
        "    og_wd = os.getcwd()\n",
        "    os.chdir(file_directory)\n",
        "\n",
        "    if make_folder:\n",
        "        folder_name = \"merged_txt_files\"\n",
        "        if not os.path.isdir(join(file_directory, folder_name)):\n",
        "            os.mkdir(\n",
        "                join(file_directory, folder_name)\n",
        "            )  # make a place to store outputs if one does not exist\n",
        "        output_loc = join(file_directory, folder_name)\n",
        "\n",
        "        outfilename = join(folder_name, outfilename)\n",
        "\n",
        "        if verbose:\n",
        "            print(\"created new folder. new full path is: \\n\", output_loc)\n",
        "\n",
        "    count = 0\n",
        "    with open(outfilename, \"w\") as outfile:\n",
        "\n",
        "        for names in files_to_merge:\n",
        "\n",
        "            with open(names) as infile:\n",
        "                count += 1\n",
        "                outfile.write(\"Start of: \" + names + \"\\n\")\n",
        "                outfile.writelines(infile.readlines())\n",
        "\n",
        "            outfile.write(\"\\n\")\n",
        "\n",
        "    print(\"Merged {} text files together.\".format(count))\n",
        "    if verbose:\n",
        "        print(\"the merged file is located at: \\n\", os.getcwd())\n",
        "    os.chdir(og_wd)\n",
        "\n",
        "\n",
        "def validate_output_directories(directory, verbose=False):\n",
        "\n",
        "    # checks and creates folders\n",
        "\n",
        "    t_folder_name = \"wav2vec2_sf_transcript\"\n",
        "    m_folder_name = \"wav2vec2_sf_metadata\"\n",
        "\n",
        "    # check if transcription folder exists. If not, create it'\n",
        "\n",
        "    t_path_full = join(directory, t_folder_name)\n",
        "    m_path_full = join(directory, m_folder_name)\n",
        "    create_folder(t_path_full)\n",
        "    create_folder(m_path_full)\n",
        "\n",
        "    output_locs = {\"t_out\": t_path_full, \"m_out\": m_path_full}\n",
        "\n",
        "    return output_locs\n",
        "\n",
        "\n",
        "def move2completed(from_dir, filename, new_folder=\"completed\", verbose=False):\n",
        "\n",
        "    # this is the better version\n",
        "    old_filepath = join(from_dir, filename)\n",
        "\n",
        "    new_filedirectory = join(from_dir, new_folder)\n",
        "    create_folder(new_filedirectory)\n",
        "\n",
        "    new_filepath = join(new_filedirectory, filename)\n",
        "\n",
        "    try:\n",
        "        shutil.move(old_filepath, new_filepath)\n",
        "        if verbose:\n",
        "            print(\"moved {} to */completed.\".format(filename))\n",
        "    except:\n",
        "        print(\n",
        "            \"Warning! unable to move file to \\n{}. Please investigate\".format(\n",
        "                new_filepath\n",
        "            )\n",
        "        )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mFTrOyIk9Um7"
      },
      "source": [
        "### clean filenames"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "HOKvRjls9S_D",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "932fef3b-ff46-4b78-d96a-6e0cf8f934ca"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def cleantxt_wrap(ugly_text):\n",
        "    # a wrapper for clean text with options different than default\n",
        "\n",
        "    # https://pypi.org/project/clean-text/\n",
        "    cleaned_text = clean(\n",
        "        ugly_text,\n",
        "        fix_unicode=True,  # fix various unicode errors\n",
        "        to_ascii=True,  # transliterate to closest ASCII representation\n",
        "        lower=True,  # lowercase text\n",
        "        no_line_breaks=True,  # fully strip line breaks as opposed to only normalizing them\n",
        "        no_urls=True,  # replace all URLs with a special token\n",
        "        no_emails=True,  # replace all email addresses with a special token\n",
        "        no_phone_numbers=True,  # replace all phone numbers with a special token\n",
        "        no_numbers=False,  # replace all numbers with a special token\n",
        "        no_digits=False,  # replace all digits with a special token\n",
        "        no_currency_symbols=True,  # replace all currency symbols with a special token\n",
        "        no_punct=True,  # remove punctuations\n",
        "        replace_with_punct=\"\",  # instead of removing punctuations you may replace them\n",
        "        replace_with_url=\"<URL>\",\n",
        "        replace_with_email=\"<EMAIL>\",\n",
        "        replace_with_phone_number=\"<PHONE>\",\n",
        "        replace_with_number=\"<NUM>\",\n",
        "        replace_with_digit=\"0\",\n",
        "        replace_with_currency_symbol=\"<CUR>\",\n",
        "        lang=\"en\",  # set to 'de' for German special handling\n",
        "    )\n",
        "\n",
        "    return cleaned_text\n",
        "\n",
        "\n",
        "def beautify_filename(filename, num_words=20, start_reverse=False, word_separator=\"_\"):\n",
        "    # takes a filename stored as text, removes extension, separates into X words ...\n",
        "    # and returns a nice filename with the words separateed by\n",
        "    # useful for when you are reading files, doing things to them, and making new files\n",
        "\n",
        "    filename = str(filename)\n",
        "    index_file_Ext = filename.rfind(\".\")\n",
        "    current_name = str(filename)[:index_file_Ext]  # get rid of extension\n",
        "    if current_name[-1].isnumeric():\n",
        "        current_name = current_name + \"V2CT\"\n",
        "    clean_name = cleantxt_wrap(current_name)  # wrapper with custom defs\n",
        "    file_words = wordninja.split(clean_name)\n",
        "    # splits concatenated text into a list of words based on common word freq\n",
        "    if len(file_words) <= num_words:\n",
        "        num_words = len(file_words)\n",
        "\n",
        "    if start_reverse:\n",
        "        t_file_words = file_words[-num_words:]\n",
        "    else:\n",
        "        t_file_words = file_words[:num_words]\n",
        "\n",
        "    pretty_name = word_separator.join(t_file_words)  # see function argument\n",
        "\n",
        "    # NOTE IT DOES NOT RETURN THE EXTENSION\n",
        "    return pretty_name[\n",
        "        : (len(pretty_name) - 1)\n",
        "    ]  # there is a space always at the end, so -1"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "-BQmJNeb-5in",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "1c1ee595-02f4-427b-cd17-fd3c3710075e"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def fast_scandir(dirname):\n",
        "    # return all subfolders in a given filepath\n",
        "\n",
        "    subfolders = [f.path for f in os.scandir(dirname) if f.is_dir()]\n",
        "    for dirname in list(subfolders):\n",
        "        subfolders.extend(fast_scandir(dirname))\n",
        "    return subfolders  # list\n",
        "\n",
        "\n",
        "def create_folder(directory):\n",
        "    os.makedirs(directory, exist_ok=True)\n",
        "\n",
        "\n",
        "def chunks(lst, n):\n",
        "    \"\"\"Yield successive n-sized chunks from lst.\"\"\"\n",
        "    for i in range(0, len(lst), n):\n",
        "        yield lst[i : i + n]\n",
        "\n",
        "\n",
        "def chunky_pandas(my_df, num_chunks=4):\n",
        "    n = int(len(my_df) // num_chunks)\n",
        "    list_df = [my_df[i : i + n] for i in range(0, my_df.shape[0], n)]\n",
        "\n",
        "    return list_df"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "oWQYAWyo_RFq",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "e7411c68-feb1-42ef-84fb-d61bd83c0397"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "import os\n",
        "from os.path import basename\n",
        "from natsort import natsorted\n",
        "import pprint as pp\n",
        "\n",
        "\n",
        "def load_dir_files(directory, req_extension=\".txt\", return_type=\"list\", verbose=False):\n",
        "    appr_files = []\n",
        "    # r=root, d=directories, f = files\n",
        "    for r, d, f in os.walk(directory):\n",
        "        for prefile in f:\n",
        "            if prefile.endswith(req_extension):\n",
        "                fullpath = join(r, prefile)\n",
        "                appr_files.append(fullpath)\n",
        "\n",
        "    appr_files = natsorted(appr_files)\n",
        "\n",
        "    if verbose:\n",
        "        print(\"A list of files in the {} directory are: \\n\".format(directory))\n",
        "        if len(appr_files) < 10:\n",
        "            pp.pprint(appr_files)\n",
        "        else:\n",
        "            pp.pprint(appr_files[:10])\n",
        "            print(\"\\n and more. There are a total of {} files\".format(len(appr_files)))\n",
        "\n",
        "    if return_type.lower() == \"list\":\n",
        "        return appr_files\n",
        "    else:\n",
        "        if verbose:\n",
        "            print(\"returning dictionary\")\n",
        "\n",
        "        appr_file_dict = {}\n",
        "        for this_file in appr_files:\n",
        "            appr_file_dict[basename(this_file)] = this_file\n",
        "\n",
        "        return appr_file_dict"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "glWI8J88XIt8"
      },
      "source": [
        "### time log"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "3vC3p0g-8R3K",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "3241cfff-03e6-4baf-fe3a-3ad3a5c634c5"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def get_timestamp(exact=False):\n",
        "    \"\"\"\n",
        "    get_timestamp - return a timestamp in the format YYYY-MM-DD_HH-MM-SS (exact=False)\n",
        "        or YYYY-MM-DD_HH-MM-SS-MS (exact=True)\n",
        "    exact : bool, optional, by default False,  if True, return a timestamp with seconds\n",
        "    \"\"\"\n",
        "    ts = (\n",
        "        datetime.now().strftime(\"%b-%d-%Y_-%H-%M-%S\")\n",
        "        if exact\n",
        "        else datetime.now().strftime(\"%b-%d-%Y_-%H\")\n",
        "    )\n",
        "    return ts\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mqtG5izgiOng"
      },
      "source": [
        "### download functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "cellView": "form",
        "id": "l-obPW6BiRbb",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "28e35477-9013-4b4a-f711-5c321700b51d"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "#@markdown define `download_single_file()` for downloading one media file of unknown type\n",
        "def download_single_file(file_url, verbose=False):\n",
        "    \"\"\"\n",
        "    download_single_file - Download a single file from a remote server. The file is saved in the current directory and the filename is the same as the remote file.\n",
        "\n",
        "    Parameters\n",
        "    ----------\n",
        "    file_url : str, required, default=None, the url of the file to download.\n",
        "    verbose : bool, optional, by default False\n",
        "\n",
        "    Returns\n",
        "    -------\n",
        "    str, the filename of the downloaded file.\n",
        "    \"\"\"\n",
        "\n",
        "\n",
        "    # get the file extension from the URL\n",
        "    _extension = file_url.split(\".\")[-1]\n",
        "    _extension = _extension.replace(\"?dl=1\", \"\")\n",
        "    print(f\"Found extension: {_extension}\")\n",
        "    # get the file name from the URL\n",
        "    file_name = file_url.split(\"/\")[-1]\n",
        "    file_name_ext = file_name.replace(\"?dl=1\", \"\") # remove the ?dl=1 parameter from the file name, relevant for dropbox links\n",
        "    # get the local file name\n",
        "    local_name = join(os.getcwd(), file_name_ext)\n",
        "    if os.path.exists(local_name):\n",
        "        if verbose:\n",
        "            print(f\"File {file_name_ext} already exists. Skipping download.\")\n",
        "        return local_name\n",
        "    if verbose:\n",
        "        print(\"Downloading file...\")\n",
        "    with open(local_name, \"wb\") as f:\n",
        "        f.write(requests.get(file_url).content)\n",
        "    if verbose:\n",
        "        print(\"Download complete.\")\n",
        "    return local_name"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "cellView": "form",
        "id": "z4KE8dcNVg5b",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "57d45fc7-fdde-46b1-eecc-1495b00a4390"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "#@markdown colab download alias\n",
        "\n",
        "from google.colab import files\n",
        "\n",
        "download = files.download"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bjFiyKzZgM0j"
      },
      "source": [
        "## check hardware\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "_2Km3l-G3ngO",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "21d0a08e-2d36-4cfc-9e54-547e9e8526b0"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def gpu_mem_total():\n",
        "    # Returns the total memory of the first available GPU\n",
        "    try:\n",
        "        gpus = GPUtil.getGPUs()\n",
        "    except:\n",
        "        LOGGER.warning(\n",
        "            \"Unable to detect GPU model. Is your GPU configured? Is Colab Runtime set to GPU?\"\n",
        "        )\n",
        "        return np.nan\n",
        "    if len(gpus) == 0:\n",
        "        raise ValueError(\"No GPUs detected in the system\")\n",
        "    return gpus[0].memoryTotal"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lgxosi5s3ocK"
      },
      "source": [
        "checks and resets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "4xm8WRBxgK5t",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "74737d4d-952b-4f61-fa51-944f5c6d1e4c"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "loaded all hardware functions at:  2022-02-05 15:02:33.973275\n"
          ]
        }
      ],
      "source": [
        "def clear_GPU_cache(verbose=False):\n",
        "\n",
        "    GPUs = GPU.getGPUs()\n",
        "\n",
        "    if len(GPUs) > 0:\n",
        "        check_runhardware_torch()\n",
        "        torch.cuda.empty_cache()\n",
        "        print(\"\\nchecked and cleared cache\")\n",
        "    else:\n",
        "        print(\"\\nNo GPU being used :( time = \", datetime.now())\n",
        "    if verbose:\n",
        "        print(\"-----------End of Cache Clear----------------\")\n",
        "\n",
        "\n",
        "print(\"loaded all hardware functions at: \", datetime.now())\n",
        "\n",
        "\n",
        "def check_runhardware_torch(verbose=False):\n",
        "    # https://www.run.ai/guides/gpu-deep-learning/pytorch-gpu/\n",
        "\n",
        "    GPUs = GPU.getGPUs()\n",
        "\n",
        "    if len(GPUs) > 0:\n",
        "        if verbose:\n",
        "            print(\"\\n ------------------------------\")\n",
        "            print(\"Checking CUDA status for PyTorch\")\n",
        "\n",
        "        torch.cuda.init()\n",
        "\n",
        "        print(\"Cuda availability (PyTorch): \", torch.cuda.is_available())\n",
        "\n",
        "        # Get Id of default device\n",
        "        torch.cuda.current_device()\n",
        "        if verbose:\n",
        "            print(\n",
        "                \"Name of GPU: \", torch.cuda.get_device_name(device=0)\n",
        "            )  # '0' is the id of your GPU\n",
        "            print(\"------------------------------\\n\")\n",
        "        return True\n",
        "\n",
        "    else:\n",
        "        print(\"No GPU being used :(\")\n",
        "        return False\n",
        "\n",
        "\n",
        "def torch_validate_cuda(verbose=False):\n",
        "    GPUs = GPU.getGPUs()\n",
        "    num_gpus = len(GPUs)\n",
        "    try:\n",
        "        torch.cuda.init()\n",
        "        if not torch.cuda.is_available():\n",
        "            print(\n",
        "                \"WARNING - CUDA is not being used in processing - expect longer runtime\"\n",
        "            )\n",
        "            if verbose:\n",
        "                print(\"GPU util detects {} GPUs on your system\".format(num_gpus))\n",
        "    except:\n",
        "        print(\n",
        "            \"WARNING - unable to start CUDA. If you wanted to use a GPU, exit and check hardware.\"\n",
        "        )\n",
        "\n",
        "\n",
        "def check_runhardware(verbose=False):\n",
        "    # ML package agnostic hardware check\n",
        "    GPUs = GPU.getGPUs()\n",
        "\n",
        "    if verbose:\n",
        "        print(\"\\n ------------------------------\")\n",
        "        print(\"Checking hardware with psutil\")\n",
        "    try:\n",
        "        gpu = GPUs[0]\n",
        "    except:\n",
        "        if verbose:\n",
        "            print(\"GPU not available - \", datetime.now())\n",
        "        gpu = None\n",
        "    process = psutil.Process(os.getpid())\n",
        "\n",
        "    CPU_load = psutil.cpu_percent()\n",
        "    if CPU_load > 0:\n",
        "        cpu_load_string = \"loaded at {} % |\".format(CPU_load)\n",
        "    else:\n",
        "        # the first time process.cpu_percent() is called it returns 0 which can be confusing\n",
        "        cpu_load_string = \"|\"\n",
        "    print(\n",
        "        \"\\nGen RAM Free: \" + humanize.naturalsize(psutil.virtual_memory().available),\n",
        "        \" | Proc size: \" + humanize.naturalsize(process.memory_info().rss),\n",
        "        \" | {} CPUs \".format(psutil.cpu_count()),\n",
        "        cpu_load_string,\n",
        "    )\n",
        "\n",
        "    if len(GPUs) > 0 and GPUs is not None:\n",
        "        print(\n",
        "            \"GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB\\n\".format(\n",
        "                gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil * 100, gpu.memoryTotal\n",
        "            )\n",
        "        )\n",
        "    else:\n",
        "        print(\"No GPU being used :(\", \"\\n-----------------\\n\")\n",
        "\n",
        "\n",
        "def only_clear_GPU_cache(verbose=False):\n",
        "\n",
        "    GPUs = GPU.getGPUs()\n",
        "\n",
        "    if len(GPUs) > 0:\n",
        "        torch.cuda.empty_cache()\n",
        "        if verbose:\n",
        "            print(\"\\nchecked and cleared cache\")\n",
        "    else:\n",
        "        print(\"\\nClearCache - No GPU being used :( time = \", datetime.now())\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "msVwVwC9aY1C"
      },
      "source": [
        "## spell correction"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jnenTXRW8fs-"
      },
      "source": [
        "symspell is defined for backup purposes. It is faster than neuspell and decently accurate. It does not do grammar though."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "id": "GYIg5oMlaau2",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "4377210a-7c88-4782-d132-4c3887032fcd"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "%%capture\n",
        "\n",
        "def symspell_file(filepath, filename, dist=2, keep_numb_words=True, create_folder=True, save_metrics=False,\n",
        "                  verbose=False):\n",
        "    # given a text (has to be text) file, reads the file, autocorrects any words it deems misspelled, saves as new file\n",
        "    # it can store the new file in a sub-folder it creates as needed\n",
        "    # distance represents how far it searches for a better spelling. higher dist = higher RT.\n",
        "    # https://github.com/mammothb/symspellpy\n",
        "\n",
        "    script_start_time = time.time()\n",
        "    sym_spell = SymSpell(max_dictionary_edit_distance=dist, prefix_length=7)\n",
        "    print(\"\\nPySymSpell - Starting to correct the file: \", filename)\n",
        "    # ------------------------------------\n",
        "\n",
        "    dictionary_path = pkg_resources.resource_filename(\n",
        "        \"symspellpy\", \"frequency_dictionary_en_82_765.txt\")\n",
        "    bigram_path = pkg_resources.resource_filename(\n",
        "        \"symspellpy\", \"frequency_bigramdictionary_en_243_342.txt\")\n",
        "    # term_index is the column of the term and count_index is the\n",
        "    # column of the term frequency\n",
        "    sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)\n",
        "    sym_spell.load_bigram_dictionary(bigram_path, term_index=0, count_index=2)\n",
        "\n",
        "    # ------------------------------------\n",
        "    file = open(join(filepath, filename), 'r', encoding=\"utf-8\", errors='ignore')\n",
        "    textlines = file.readlines()  # return a list\n",
        "    file.close()\n",
        "\n",
        "    if create_folder:\n",
        "        # create a folder\n",
        "        output_folder_name = \"auto-corrected\" \n",
        "        if not os.path.isdir(join(filepath, output_folder_name)):\n",
        "            os.mkdir(join(filepath, output_folder_name))  # make a place to store outputs if one does not exist\n",
        "        filepath = join(filepath, output_folder_name)\n",
        "\n",
        "    if verbose:\n",
        "        print(\"loaded text with {0:6d} lines \".format(len(textlines)))\n",
        "\n",
        "    corrected_list = []\n",
        "\n",
        "    # iterate through list of lines. Pass each line to be corrected. \n",
        "    #Append / sum results from each line till done\n",
        "    for line in textlines:\n",
        "        if line == \"\":\n",
        "            # blank line, skip to next run\n",
        "            continue\n",
        "\n",
        "        # correct the line of text using spellcorrect_line() which returns a dictionary\n",
        "        suggestions = sym_spell.lookup_compound(phrase=line, max_edit_distance=dist, \n",
        "                                                ignore_non_words=keep_numb_words,\n",
        "                                                ignore_term_with_digits=keep_numb_words)\n",
        "        all_sugg_for_line = []\n",
        "        for suggestion in suggestions:\n",
        "            all_sugg_for_line.append(suggestion.term)\n",
        "\n",
        "        # append / sum / log results from correcting the line\n",
        "\n",
        "        corrected_list.append(' '.join(all_sugg_for_line) + \"\\n\")\n",
        "\n",
        "    # finished iterating through lines. Now sum total metrics\n",
        "\n",
        "    corrected_doc = \"\".join(corrected_list)\n",
        "    corrected_fname = \"Corrected_SSP_\" + beautify_filename(filename, \n",
        "                                                           num_words=10, start_reverse=False) + \".txt\"\n",
        "\n",
        "    # proceed to saving\n",
        "    file_out = open(join(filepath, corrected_fname), 'w',\n",
        "                    encoding=\"utf-8\", errors='ignore')\n",
        "    file_out.writelines(corrected_doc)\n",
        "    file_out.close()\n",
        "\n",
        "    # report RT\n",
        "    if verbose:\n",
        "        script_rt_m = (time.time() - script_start_time) / 60\n",
        "        print(\"RT for this file was {0:5f} minutes\".format(script_rt_m))\n",
        "        print(\"output folder for this transcription is: \\n\", \n",
        "              filepath)\n",
        "\n",
        "    print(\"Done correcting \", filename, \" at time: \", \n",
        "          datetime.now().strftime(\"%H:%M:%S\"), \"\\n\")\n",
        "\n",
        "    corr_file_Data = {\n",
        "        \"corrected_ssp_text\": corrected_doc,\n",
        "        \"corrected_ssp_fname\": corrected_fname,\n",
        "        \"output_path\": filepath,\n",
        "    }\n",
        "    return corr_file_Data\n",
        "\n",
        "\n",
        "# preload defaults\n",
        "sym_spell = SymSpell(max_dictionary_edit_distance=3, prefix_length=7)\n",
        "\n",
        "dictionary_path = pkg_resources.resource_filename(\n",
        "        \"symspellpy\", \"frequency_dictionary_en_82_765.txt\")\n",
        "bigram_path = pkg_resources.resource_filename(\n",
        "        \"symspellpy\", \"frequency_bigramdictionary_en_243_342.txt\")\n",
        "# term_index is the column of the term and count_index is the\n",
        "# column of the term frequency\n",
        "sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)\n",
        "sym_spell.load_bigram_dictionary(bigram_path, term_index=0, count_index=2)\n",
        "\n",
        "print(\"loaded defaults - \", datetime.now())\n",
        "\n",
        "def symspell_freetext(textlines, dist=3, keep_numb_words=True, verbose=False,\n",
        "                      d_path=dictionary_path, b_path=bigram_path, default=sym_spell):\n",
        "    # https://github.com/mammothb/symspellpy\n",
        "\n",
        "    if dist != 3:\n",
        "\n",
        "        # have to recreate object each time because doesn't match pre-built\n",
        "\n",
        "        sym_spell = SymSpell(max_dictionary_edit_distance=dist, prefix_length=7)\n",
        "        sym_spell.load_dictionary(d_path, term_index=0, count_index=1)\n",
        "        sym_spell.load_bigram_dictionary(b_path, term_index=0, count_index=2)\n",
        "    else:\n",
        "        sym_spell=default\n",
        "\n",
        "    corrected_list = []\n",
        "\n",
        "    if type(textlines) == str:\n",
        "        textlines = [textlines] # put in a list if a string\n",
        "\n",
        "    if verbose:\n",
        "        print(\"\\nStarting to correct text with {0:6d} lines \".format(len(textlines)))\n",
        "        print(\"the type of textlines var is \",type(textlines))\n",
        "\n",
        "    # iterate through list of lines. Pass each line to be corrected. Append / sum results from each line till done\n",
        "    for line_obj in textlines:\n",
        "        line = ''.join(line_obj) \n",
        "        if verbose:\n",
        "            print(\"line {} in the text is: \".format(textlines.index(line_obj)))\n",
        "            pp.pprint(line) \n",
        "        if line == \"\":\n",
        "            # blank line, skip to next run\n",
        "            continue\n",
        "\n",
        "        suggestions = sym_spell.lookup_compound(phrase=line, max_edit_distance=dist, \n",
        "                                                ignore_non_words=keep_numb_words,\n",
        "                                                ignore_term_with_digits=keep_numb_words)\n",
        "        all_sugg_for_line = []\n",
        "        for suggestion in suggestions:\n",
        "            all_sugg_for_line.append(suggestion.term)\n",
        "\n",
        "        # append / sum / log results from correcting the line\n",
        "\n",
        "        corrected_list.append(' '.join(all_sugg_for_line) + \"\\n\")\n",
        "\n",
        "    # join corrected text\n",
        "\n",
        "    corrected_text = \"\".join(corrected_list)\n",
        "\n",
        "    if verbose:\n",
        "        print(\"Finished correcting w/ symspell at time: \", datetime.now(), \"\\n\")\n",
        "\n",
        "    return corrected_text\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "u1oEeuGu8V9X"
      },
      "source": [
        "neuspell\n",
        "\n",
        "- a better spellchecker (but more intensive)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "OlEcmCfw8Lol",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "1d24c1a7-5ac6-4597-ece2-cf39a320f8ef"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias']\n",
            "- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
          ]
        }
      ],
      "source": [
        "%%capture\n",
        "# START OF NEUSPELL\n",
        "\n",
        "checker = neuspell.BertChecker()\n",
        "checker.from_pretrained()\n",
        "\n",
        "\n",
        "def neuspell_freetext(textlines, verbose=False):\n",
        "\n",
        "    corrected_list = []\n",
        "\n",
        "    if type(textlines) == str:\n",
        "        textlines = [textlines]  # put in a list if a string\n",
        "\n",
        "    # iterate through list of lines. Pass each line to be corrected. Append / sum results from each line till done\n",
        "    for line_obj in textlines:\n",
        "        line = \"\".join(line_obj)\n",
        "\n",
        "        if verbose:\n",
        "            print(\"line {} in the text is: \".format(textlines.index(line_obj)))\n",
        "            pp.pprint(line)\n",
        "        if line == \"\" or (len(line) <= 5):\n",
        "            # blank line, skip to next run\n",
        "            continue\n",
        "\n",
        "        line = line.lower()\n",
        "        corrected_text = checker.correct_strings([line])\n",
        "        corrected_text_f = \" \".join(corrected_text)\n",
        "\n",
        "        corrected_list.append(corrected_text_f + \"\\n\")\n",
        "\n",
        "    # join corrected text\n",
        "\n",
        "    corrected_text = \" \".join(corrected_list)\n",
        "\n",
        "    if verbose:\n",
        "        print(\"Finished correcting w/ neuspell at time: \", datetime.now(), \"\\n\")\n",
        "\n",
        "    return corrected_text"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jOOdismL8TWd"
      },
      "source": [
        "sentence boundary disambiguation"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "dDlWwLD18MHo",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "e9c3c76b-e69e-4927-872d-081cf9790efd"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def SBD_freetext(text, verbose=False):\n",
        "    # input should be STRING\n",
        "    # use pysbd to segment\n",
        "\n",
        "    if isinstance(text, list):\n",
        "        print(\n",
        "            \"Warning, input ~text~ has type {}. Will convert to str\".format(type(text))\n",
        "        )\n",
        "        text = \" \".join(text)\n",
        "\n",
        "    seg = pysbd.Segmenter(language=\"en\", clean=True)\n",
        "    sentences = []\n",
        "    sentences = seg.segment(text)\n",
        "\n",
        "    if verbose:\n",
        "        print(\n",
        "            \"input text of {} words was split into \".format(len(text.split(\" \"))),\n",
        "            len(sentences),\n",
        "            \"sentences\",\n",
        "        )\n",
        "\n",
        "    # take segments and make them sentences\n",
        "\n",
        "    capitalized = []\n",
        "    for sentence in sentences:\n",
        "        if sentence and sentence.strip():\n",
        "            # ensure that the line is not all spaces\n",
        "            first_letter = sentence[0].upper()\n",
        "            rest = sentence[1:]\n",
        "            capitalized.append(first_letter + rest)\n",
        "\n",
        "    seg_and_capital = \". \".join(capitalized)\n",
        "\n",
        "    return seg_and_capital"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "rpunct"
      ],
      "metadata": {
        "id": "Ncmh0FRY9bDk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "from rpunct import RestorePuncts\n",
        "\n",
        "if torch.cuda.is_available():\n",
        "    punctfixr = RestorePuncts()\n",
        "else:\n",
        "    punctfixr = None # due to fn def below\n",
        "\n",
        "def repunctuate_grammar(input_text, rpunct_obj=punctfixr, verbose=False):\n",
        "    \"\"\"\n",
        "    repunctuate_grammar - uses the rpunct module to repunctuate a string after stripping all existing punctuation\n",
        "\n",
        "    Args:\n",
        "        input_text (str): [string to be repunctuated]\n",
        "        rpunct_obj (rpunct.RPunct): [rpunct object]\n",
        "        verbose (bool, optional): [whether to print the output of the rpunct module]. Defaults to False.\n",
        "\n",
        "    Returns:\n",
        "        [str]: [repunctuated string]\n",
        "    \"\"\"\n",
        "    if verbose:\n",
        "        print(f\"repunctuating:\\n\\t{input_text}\")\n",
        "    # strip all punctuation on the input text, except for apostrophes\n",
        "    input_text = re.sub(r\"[^\\w\\s\\']\", \"\", input_text)\n",
        "    st = time.perf_counter()\n",
        "    ptext = rpunct_obj.punctuate(input_text, lang=\"en\")\n",
        "    rt = time.perf_counter() - st\n",
        "    if verbose:\n",
        "        print(f\"the new string is:\\n\\t{ptext}\")\n",
        "        print(\"repunctuation took {} seconds\".format(rt))\n",
        "    return ptext"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "id": "20sALwVz9cd6",
        "outputId": "4305a0b7-9c4a-4a45-c2af-ca608eac824a"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### restore grammar"
      ],
      "metadata": {
        "id": "OrMEo1pC_GS4"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def restore_grammar(input_text, rpunct_obj=punctfixr, verbose=False):\n",
        "    \"\"\"\n",
        "    helper function that decides which to use based on CUDA avail\n",
        "    \"\"\"\n",
        "    if torch.cuda.is_available():\n",
        "        return repunctuate_grammar(input_text, \n",
        "                                   rpunct_obj=rpunct_obj, \n",
        "                                   verbose=verbose)\n",
        "    else:\n",
        "        return SBD_freetext(input_text, verbose=verbose)\n",
        "\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "id": "N_5Xsbva_IpH",
        "outputId": "a3c80611-a06d-44ab-fac5-5d3baffd7848"
      },
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QeNIcsUbifDn"
      },
      "source": [
        "### pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "kmvV2gFMieUw",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "eca17123-5bc6-4bb0-816a-eea29a7124a3"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def spellcorrect_pipeline(filepath, filename, verbose=False):\n",
        "    # uses two functions (neuspell_freetext, SBD_freetext)\n",
        "    # in a pipeline\n",
        "\n",
        "    file = open(join(filepath, filename), \"r\", encoding=\"utf-8\", errors=\"ignore\")\n",
        "    textlines = file.readlines()  # return a list\n",
        "    file.close()\n",
        "\n",
        "    sc_textlines = neuspell_freetext(textlines, verbose=verbose)\n",
        "\n",
        "    loc_SC = \"spell_corrected\"\n",
        "    if not os.path.isdir(join(filepath, loc_SC)):\n",
        "        os.mkdir(\n",
        "            join(filepath, loc_SC)\n",
        "        )  # make a place to store outputs if one does not exist\n",
        "\n",
        "    sc_outname = (\n",
        "        \"NSC_\" + beautify_filename(filename, num_words=15, start_reverse=False) + \".txt\"\n",
        "    )\n",
        "\n",
        "    file_sc = open(\n",
        "        join(filepath, loc_SC, sc_outname), \"w\", encoding=\"utf-8\", errors=\"replace\"\n",
        "    )\n",
        "    file_sc.writelines(sc_textlines)\n",
        "    file_sc.close()\n",
        "    quick_sc_fixes = {\n",
        "        \" ' \": \"'\",\n",
        "    }\n",
        "    if isinstance(sc_textlines, list):\n",
        "        SBD_sc_textlines = []\n",
        "        for line in sc_textlines:\n",
        "            if isinstance(line, list):\n",
        "                # handles weird corner cases\n",
        "                line = \" \".join(line)\n",
        "\n",
        "            sentenced = restore_grammar(line, verbose=verbose)\n",
        "            for key, value in quick_sc_fixes.items():\n",
        "\n",
        "                sentenced = sentenced.replace(key, value)\n",
        "\n",
        "            SBD_sc_textlines.append(sentenced)\n",
        "    else:\n",
        "        SBD_sc_textlines = restore_grammar(sc_textlines, verbose=verbose)\n",
        "\n",
        "        for key, value in quick_sc_fixes.items():\n",
        "\n",
        "            SBD_sc_textlines = SBD_sc_textlines.replace(key, value)\n",
        "\n",
        "    # SBD_text = \" \".join(SBD_sc_textlines)\n",
        "\n",
        "    loc_SBD = \"FULLY_COMPLETE\"\n",
        "    if not os.path.isdir(join(filepath, loc_SBD)):\n",
        "        os.mkdir(\n",
        "            join(filepath, loc_SBD)\n",
        "        )  # make a place to store outputs if one does not exist\n",
        "\n",
        "    SBD_outname = (\n",
        "        \"FIN_\" + beautify_filename(filename, num_words=15, start_reverse=False) + \".txt\"\n",
        "    )\n",
        "    ncsbd_path = join(filepath, loc_SBD, SBD_outname)\n",
        "    file_sc = open(ncsbd_path, \"w\", encoding=\"utf-8\", errors=\"replace\")\n",
        "    file_sc.writelines(SBD_sc_textlines)\n",
        "    file_sc.close()\n",
        "    pipelineout = {\n",
        "        \"original_transcript_text\": \" \".join(textlines),\n",
        "        \"spellcorrected_text\": \" \".join(sc_textlines),\n",
        "        \"final_text\": \" \".join(SBD_sc_textlines),\n",
        "        \"spell_corrected_dir\": join(filepath, loc_SC),\n",
        "        \"sc_filename\": sc_outname,\n",
        "        \"SBD_dir\": join(filepath, loc_SBD),\n",
        "        \"SBD_filename\": SBD_outname,\n",
        "    }\n",
        "\n",
        "    return pipelineout"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "id": "u-8DHqFG8HVZ",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "85d8b53e-89e1-440d-8db6-f550de9ce394"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "loaded all spell correction functions at:  2022-02-05 15:02:54.962984\n"
          ]
        }
      ],
      "source": [
        "print(\"loaded all spell correction functions at: \", datetime.now())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "X-ZxcOJeCB6I"
      },
      "source": [
        "## vid2cleantext specific\n",
        "\n",
        "things that are more or less unique to video conversion / audio transcription."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DHiS5c94_VCB"
      },
      "source": [
        "### convert media"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "cellView": "form",
        "id": "lOmw4BYZ7ehH",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "0e6faa95-55f2-4de6-9185-d882786cfed3"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "!pip install -U -q pydub\n",
        "from natsort import natsorted\n",
        "from pydub import AudioSegment\n",
        "\n",
        "#@title new function `prep_transc_pydub`\n",
        "def prep_transc_pydub(\n",
        "    _vid2beconv,\n",
        "    in_dir,\n",
        "    out_dir,\n",
        "    len_chunks=15,\n",
        "    verbose=False,\n",
        "):\n",
        "    \"\"\"\n",
        "    prep_transc_pydub - prepares audio files for transcription using pydub\n",
        "\n",
        "    Parameters\n",
        "    ----------\n",
        "    _vid2beconv : str, the name of the video file to be converted\n",
        "    in_dir : str or Path, the path to the video file directory\n",
        "    out_dir : str or Path, the path to the output audio file directory\n",
        "    len_chunks : int, optional, by default 15, the length of the audio chunks in seconds\n",
        "    verbose : bool, optional, by default False\n",
        "        [description], by default False\n",
        "\n",
        "    Returns\n",
        "    -------\n",
        "    list, the list of audio filepaths created\n",
        "    \"\"\"\n",
        "\n",
        "    load_path = join(in_dir, _vid2beconv) if in_dir is not None else _vid2beconv\n",
        "    vid_audio = AudioSegment.from_file(load_path)\n",
        "    sound = AudioSegment.set_channels(vid_audio, 1)\n",
        "\n",
        "    create_folder(out_dir)  # create the output directory if it doesn't exist\n",
        "    dur_seconds = len(sound) / 1000\n",
        "    n_chunks = math.ceil(dur_seconds / len_chunks)  # to get in minutes, round up\n",
        "    preamble = shorten_title(_vid2beconv)\n",
        "    chunk_fnames = []\n",
        "    # split sound in 5-second slices and export\n",
        "    slicer = 1000 * len_chunks  # in milliseconds\n",
        "    st = time.perf_counter()\n",
        "    for i, chunk in enumerate(sound[::slicer]):\n",
        "        chunk_name = f\"{preamble}_clipaudio_{i}.wav\"\n",
        "        with open(join(out_dir, chunk_name), \"wb\") as f:\n",
        "            chunk.export(f, format=\"wav\")\n",
        "        chunk_fnames.append(chunk_name)\n",
        "    rt = round(time.perf_counter() -st, 5)\n",
        "    print(f\"\\ncreated audio chunks in {rt} seconds - {get_timestamp()}\")\n",
        "    if verbose:\n",
        "        print(f\" files saved to {out_dir}\")\n",
        "\n",
        "    return natsorted(chunk_fnames)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-BjICySo_SKd"
      },
      "source": [
        "### transcribe (main)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "id": "0OsO-NNt_Qea",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "ae83ce54-88e3-4432-8de6-72e813b0a105"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "import warnings \n",
        "warnings.filterwarnings(\"ignore\", message=\"It is strongly recommended to pass the\")\n",
        "import transformers\n",
        "transformers.utils.logging.set_verbosity(40)\n",
        "def transcribe_wav2vec(\n",
        "    transcription_model, directory, vid_clip_name, \n",
        "    chunk_length_seconds, verbose=False\n",
        "):\n",
        "    # this is the same process as used in the single video transcription, now as a function. Note that spell correction\n",
        "    # and keyword extraction are now done separately in the script\n",
        "    # user needs to pass in: the model, the folder the video is in, and the name of the video\n",
        "    output_path_full = directory\n",
        "\n",
        "    # Split Video into Audio Chunks-----------------------------------------------\n",
        "\n",
        "    print(\"Starting file: \", vid_clip_name)\n",
        "\n",
        "    # create audio chunk folder\n",
        "    output_folder_name = \"audio_chunks\"\n",
        "    path2audiochunks = join(directory, output_folder_name)\n",
        "    os.makedirs(path2audiochunks, exist_ok=True)\n",
        "    chunk_directory = prep_transc_pydub(    \n",
        "                                        vid_clip_name,\n",
        "                                        in_dir=directory,\n",
        "                                        out_dir=path2audiochunks,\n",
        "                                        len_chunks=chunk_length_seconds,\n",
        "                                        verbose=verbose,\n",
        "                                    )\n",
        "   \n",
        "\n",
        "    if verbose:\n",
        "        print(\n",
        "            \"converted video to audio. About to start transcription loop for file: \",\n",
        "            vid_clip_name,\n",
        "        )\n",
        "    torch_validate_cuda()\n",
        "    check_runhardware()\n",
        "    full_transcription = []\n",
        "    before_loop_st = time.time()\n",
        "    GPU_update_incr = math.ceil(len(chunk_directory) / 2)\n",
        "    _need_update = True\n",
        "    # Load audio chunks by name, pass into model, append output text-----------------------------------------------\n",
        "\n",
        "    for audio_chunk in tqdm(\n",
        "        chunk_directory,\n",
        "        total=len(chunk_directory),\n",
        "        desc= f\"transcribing {shorten_title(vid_clip_name)}:\\t\",\n",
        "    ):\n",
        "\n",
        "\n",
        "        current_loc = chunk_directory.index(audio_chunk)\n",
        "\n",
        "        if (current_loc % GPU_update_incr == 0) and (GPU_update_incr != 0) and _need_update:\n",
        "            # provide update on GPU usage\n",
        "            check_runhardware()\n",
        "            _need_update = False\n",
        "\n",
        "        # load dat chunk\n",
        "        audio_input, rate = librosa.load(\n",
        "                                        join(path2audiochunks,\n",
        "                                            audio_chunk),\n",
        "                                        sr=16000\n",
        "                                    )\n",
        "        # MODEL\n",
        "        device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
        "        input_values = tokenizer(\n",
        "                    audio_input, return_tensors=\"pt\", padding=\"longest\", \n",
        "                ).input_values.to(\n",
        "                    device\n",
        "            )\n",
        "        transcription_model = transcription_model.to(device)\n",
        "        logits = transcription_model(input_values).logits\n",
        "        predicted_ids = torch.argmax(logits, dim=-1)\n",
        "        transcription = str(tokenizer.batch_decode(predicted_ids)[0])\n",
        "        full_transcription.append(transcription + \"\\n\")\n",
        "        # empty memory so you don't overload the GPU\n",
        "        del input_values, logits, predicted_ids, audio_input\n",
        "        torch.cuda.empty_cache()\n",
        "\n",
        "    if verbose:\n",
        "        print(\"\\nFinished transc. of {}, saving metrics \".format(vid_clip_name))\n",
        "\n",
        "    # build metadata log -------------------------------------------------\n",
        "    mdata = []\n",
        "    mdata.append(\"original file name: \" + vid_clip_name + \"\\n\")\n",
        "    mdata.append(\n",
        "        \"number of recorded audio chunks: \"\n",
        "        + str(len(chunk_directory))\n",
        "        + \" of lengths seconds each\"\n",
        "        + str(chunk_length_seconds)\n",
        "        + \"\\n\"\n",
        "    )\n",
        "    approx_input_len = (len(chunk_directory) * chunk_length_seconds) / 60\n",
        "    mdata.append(\n",
        "        \"approx {0:3f}\".format(approx_input_len) + \" minutes of input audio \\n\"\n",
        "    )\n",
        "    mdata.append(\n",
        "        \"transcription date: \"\n",
        "        + datetime.now().strftime(\"date_%d_%m_%Y_time_%H-%M-%S\")\n",
        "        + \"\\n\"\n",
        "    )\n",
        "    full_text = \" \".join(full_transcription)\n",
        "    transcript_length = len(full_text)\n",
        "    mdata.append(\n",
        "        \"length of transcribed text: \" + str(transcript_length) + \" characters \\n\"\n",
        "    )\n",
        "    t_word_count = len(full_text.split(\" \"))\n",
        "    mdata.append(\n",
        "        \"total word count: \" + str(t_word_count) + \" words (based on spaces) \\n\"\n",
        "    )\n",
        "\n",
        "    # delete audio chunks in folder -------------------------------------------------\n",
        "    try:\n",
        "        shutil.rmtree(path2audiochunks)\n",
        "        if verbose:\n",
        "            print(\"\\nDeleted Audio Chunk Folder + Files\")\n",
        "    except:\n",
        "        print(\"warning - could not delete the audio chunk folder on VM\")\n",
        "    # compile results -------------------------------------------------\n",
        "    transcription_results = {\n",
        "        \"audio_transcription\": full_transcription,\n",
        "        \"metadata\": mdata,\n",
        "    }\n",
        "\n",
        "    return transcription_results"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "id": "cDWUIQodL85m",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "17a070e5-5c8e-455f-a2c6-809013d14475"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def save_transcript_outputs(fileheader, file_id, output_dict):\n",
        "\n",
        "    full_transcription = output_dict.get(\"audio_transcription\")\n",
        "    metadata = output_dict.get(\"metadata\")\n",
        "\n",
        "    # check if directories for output exist. If not, create them\n",
        "    storage_locs = validate_output_directories(directory)\n",
        "    output_path_transcript = storage_locs.get(\"t_out\")\n",
        "    output_path_metadata = storage_locs.get(\"m_out\")\n",
        "\n",
        "    transcribed_filename = fileheader + \"_transcription_\" + file_id + \".txt\"\n",
        "    transcribed_file = open(\n",
        "        join(output_path_transcript, transcribed_filename),\n",
        "        \"w\",\n",
        "        encoding=\"utf-8\",\n",
        "        errors=\"ignore\",\n",
        "    )\n",
        "    transcribed_file.writelines(full_transcription)\n",
        "    transcribed_file.close()\n",
        "    # metadata\n",
        "    metadata_filename = (\n",
        "        \"metadata - \" + fileheader + \"_transcription_\" + file_id + \".txt\"\n",
        "    )\n",
        "    metadata_file = open(\n",
        "        join(output_path_metadata, metadata_filename),\n",
        "        \"w\",\n",
        "        encoding=\"utf-8\",\n",
        "        errors=\"ignore\",\n",
        "    )\n",
        "    metadata_file.writelines(metadata)\n",
        "    metadata_file.close()\n",
        "\n",
        "    print(\"saved outputs for file ID {} - \".format(file_id), datetime.now())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TVc71e0R9fa-"
      },
      "source": [
        "### keyword extraction"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "id": "78rJKtvP9d-T",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 0
        },
        "outputId": "407d89a7-7304-419f-86ef-ce813e0f3d0d"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "def quick_keys(\n",
        "    filename, filepath, max_ngrams=3, num_keywords=20, save_db=False, verbose=False\n",
        "):\n",
        "    # uses YAKE to quickly determine keywords in a text file. Saves Keywords and YAKE score (0 means very important) in\n",
        "    # an excel file (from a dataframe)\n",
        "    # yes, the double entendre is intended.\n",
        "    file = open(join(filepath, filename), \"r\", encoding=\"utf-8\", errors=\"ignore\")\n",
        "    text = file.read()\n",
        "    file.close()\n",
        "\n",
        "    language = \"en\"\n",
        "    deduplication_threshold = 0.6  # technically a hyperparameter\n",
        "    custom_kw_extractor = yake.KeywordExtractor(\n",
        "        lan=language,\n",
        "        n=max_ngrams,\n",
        "        dedupLim=deduplication_threshold,\n",
        "        top=num_keywords,\n",
        "        features=None,\n",
        "    )\n",
        "    yake_keywords = custom_kw_extractor.extract_keywords(text)\n",
        "    phrase_db = pd.DataFrame(yake_keywords)\n",
        "    if verbose:\n",
        "        print(\"YAKE keywords are: \\n\", yake_keywords)\n",
        "        print(\"dataframe looks like: \\n\")\n",
        "        pp.pprint(phrase_db.head())\n",
        "\n",
        "    if len(phrase_db) == 0:\n",
        "        print(\"warning - no phrases were able to be extracted... \")\n",
        "        return None\n",
        "\n",
        "    phrase_db.columns = [\"key_phrase\", \"YAKE_score\"]\n",
        "\n",
        "    # add a column for how many words the phrases contain\n",
        "    yake_kw_len = []\n",
        "    yake_kw_freq = []\n",
        "    for entry in yake_keywords:\n",
        "        entry_wordcount = len(str(entry).split(\" \")) - 1\n",
        "        yake_kw_len.append(entry_wordcount)\n",
        "\n",
        "    for index, row in phrase_db.iterrows():\n",
        "        search_term = row[\"key_phrase\"]\n",
        "        entry_freq = text.count(str(search_term))\n",
        "        yake_kw_freq.append(entry_freq)\n",
        "\n",
        "    word_len_series = pd.Series(yake_kw_len, name=\"No. Words in Phrase\")\n",
        "    word_freq_series = pd.Series(yake_kw_freq, name=\"Phrase Freq. in Text\")\n",
        "    phrase_db2 = pd.concat([phrase_db, word_len_series, word_freq_series], axis=1)\n",
        "    # add column names and save file as excel because CSVs suck\n",
        "    phrase_db2.columns = [\n",
        "        \"key_phrase\",\n",
        "        \"YAKE Score (Lower = More Important)\",\n",
        "        \"num_words\",\n",
        "        \"freq_in_text\",\n",
        "    ]\n",
        "\n",
        "    if save_db:\n",
        "        # saves individual file if user asks\n",
        "        yake_fname = (\n",
        "            beautify_filename(filename=filename, start_reverse=False)\n",
        "            + \"_top_phrases_YAKE.xlsx\"\n",
        "        )\n",
        "        phrase_db2.to_excel(join(filepath, yake_fname), index=False)\n",
        "\n",
        "    # print out top 10 keywords, or if desired num keywords less than 10, all of them\n",
        "    max_no_disp = 10\n",
        "    if num_keywords > max_no_disp:\n",
        "        num_phrases_disp = max_no_disp\n",
        "    else:\n",
        "        num_phrases_disp = num_keywords\n",
        "\n",
        "    if verbose:\n",
        "        print(\"Top Key Phrases from YAKE, with max n-gram length: \", max_ngrams, \"\\n\")\n",
        "        pp.pprint(phrase_db2.head(n=num_phrases_disp))\n",
        "    else:\n",
        "        list_o_words = phrase_db2[\"key_phrase\"].to_list()\n",
        "        print(\"top 5 phrases are: \\n\")\n",
        "        if len(list_o_words) < 5:\n",
        "            pp.pprint(list_o_words)\n",
        "        else:\n",
        "            pp.pprint(list_o_words[:5])\n",
        "\n",
        "    return phrase_db2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hZjjohmBr7Oa"
      },
      "source": [
        "# Specify Key Parameters"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ph6f5Bo0Yt3q"
      },
      "source": [
        "## Load & Validate Source Files"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "cellView": "form",
        "id": "pusCMJhVpLD7",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "outputId": "86dd4efd-24e4-4867-8c03-3ea5ea69ebc1"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "currently allowed media types are:\n",
            "['.mp4', '.mov', '.mkv', '.mp3', '.wav', '.ogg', '.m4a']\n"
          ]
        }
      ],
      "source": [
        "from pathlib import Path\n",
        "#@markdown define allowed extension types and a function to validate if media present in user folder\n",
        "\n",
        "extensions = [\".mp4\",\".mov\",\".mkv\",\n",
        "                  \".mp3\",\".wav\",\".ogg\", \".m4a\",\n",
        "                  ]\n",
        "def is_media_empty(dir):\n",
        "    _dir = Path(dir)\n",
        "    dirfiles = [f for f in _dir.iterdir() if f.is_file()]\n",
        "    if len(dirfiles) < 1: return True\n",
        "\n",
        "    for df in dirfiles:\n",
        "        this_ext = df.suffix\n",
        "        if any([m in this_ext for m in extensions]):\n",
        "            return False\n",
        "    return True\n",
        "\n",
        "print(f'currently allowed media types are:\\n{extensions}')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "cellView": "form",
        "id": "9xPJyuR6XVFc",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 124
        },
        "outputId": "e2130b97-c5d2-4bbd-84d3-aae0333658f1"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Found extension: mp4\n",
            "File Borat%20Cultural%20Learnings%20of%20America%20for%20Make%20Benefit%20G-2.mp4 already exists. Skipping download.\n",
            "Will transcribe the following: \n",
            "\n",
            "['/content/Borat%20Cultural%20Learnings%20of%20America%20for%20Make%20Benefit%20G-2.mp4']\n"
          ]
        }
      ],
      "source": [
        "#@title load media files \n",
        "#@markdown allowed extensions defined in cell above\n",
        "# iterate through and grab files:\n",
        "import requests\n",
        "files_to_munch = []\n",
        "_target_file = download_single_file(\n",
        "    URL_of_media, \n",
        "    verbose=True,\n",
        ")\n",
        "files_to_munch.append(_target_file)\n",
        "total_files_1 = len(files_to_munch)\n",
        "removed_count_1 = 0\n",
        "approved_files = []\n",
        "# remove non-media files\n",
        "for prefile in files_to_munch:\n",
        "    if any([m in prefile for m in extensions]):\n",
        "        approved_files.append(prefile)\n",
        "    else:\n",
        "        files_to_munch.remove(prefile)\n",
        "        removed_count_1 += 1\n",
        "\n",
        "print(\"Will transcribe the following: \\n\")\n",
        "pp.pprint(approved_files)"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Original video can be viewed [here](https://www.c-span.org/classroom/document/?7986)"
      ],
      "metadata": {
        "id": "lQHuf3zEDn7D"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bV3ov-0Z_PXf"
      },
      "source": [
        "## Load wav2vec2 model (#3)\n",
        "\n",
        "- <font color = \"orange\">enter chunk length & choose a model. defaults should work fine for most cases \n",
        "- with recent upgrades made on the function to convert media to `.wav` audio chunks there is no \"runtime penalty\" of a smaller chunk length as before. Just keep in mind as the number decreases **too low** then the model won't have the relevant auditory context to determine what is being said effectively.\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "cellView": "form",
        "id": "yTmbBo39ZUkI",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 17
        },
        "outputId": "f06e2cb7-087c-4c06-fb27-4c1322e5a7b5"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        }
      ],
      "source": [
        "chunk_length =   15# @param {type:\"integer\"}\n",
        "_model = \"facebook/hubert-xlarge-ls960-ft\" #@param [\"facebook/wav2vec2-large-960h-lv60-self\", \"facebook/hubert-large-ls960-ft\", \"facebook/hubert-xlarge-ls960-ft\", \"facebook/wav2vec2-base-960h\"] {allow-input: true}\n",
        "#@markdown - model name is the tag on huggingface.co\n",
        "#@markdown - demo output uses `facebook/hubert-xlarge-ls960-ft` but this is pretty GPU intensive; try `large` first.\n",
        "#@markdown - if experiencing GPU memory usage issues, try `facebook/wav2vec2-base-960h`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "cellView": "form",
        "id": "k_gAUG_9aK-d",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 298,
          "referenced_widgets": [
            "31f98d21aaed4a87a4ccd63bdbf01ead",
            "09ece41f2d6a4743aaecf5517bed2ea1",
            "3467f71380c34575bec160ebfe823238",
            "206a23908ad54504b31f826b37314dea",
            "9cfaf034ca424fe39090d1b37b74ed82",
            "76ab8c2b27434d8d83d47668a69d5976",
            "92bbd54f7ce3400192004bf58d9bdfff",
            "72399605b0ca46e992eecaaa2a9550ec",
            "b792901613a94125b501e22dd5f75628",
            "da485b1f80804186bb591c13d79aa247",
            "7e67132b68ce453ab4b8834c6e07c428",
            "4fe9b0d2a6c64c0595c44426b5f39837",
            "49e34a76d39d42e29c9f43773858ebcd",
            "03b4542be9c14e93855c2ec29e49ac1e",
            "c2ed413a6d0b4e3aa6355cfbdae0b671",
            "770da7de020b43b099c6dd2e0ec672de",
            "6b870d609c1e4858beb8c10bce86b326",
            "9af5988a93a3439c828011ec6776bbec",
            "4eb6de456cab454582018b0737cb1037",
            "127113fa48ec425ea61f2ce5336a6304",
            "f871f9e4240a4d24a6c2aff0b271d899",
            "0142e3dbe0094966a293f21278e9fdec",
            "165bdf8275f948529e1064426d3c452c",
            "9335982b07694e008df39007e677ce61",
            "223f754aabc144caaf561448da376408",
            "2108c2ac1d8f4c9b87d3a45aa77174e4",
            "0e4ff28bdc25466e90376086424dde9a",
            "74eb167f5fab41419655b5dfc00c95ba",
            "7bc10533dec647499216a18f4ff26fb1",
            "48ea302cad404bb384a28f0e8f0c9c50",
            "6468a57a5a57402a8bb1f332e476852d",
            "b916ea576bf048b68dc88bfb8a63dab0",
            "73b406b0c74b4f59962e96f438d56422",
            "878f1fa01d534c4791aa578066d14c54",
            "0d9839362fa54ee6af35cfc82f7623e3",
            "9afb42e9e4d14cbdbeda0cad2b5262fe",
            "77c9354d0c5c4cd2ae2e887e386405d2",
            "31e9de950ea84d6b9f03525f5711ae27",
            "02e957ea74674c62be68be9191b831f4",
            "14b6eb78e78d431789b3dfb0db643c51",
            "0dcb882e268541c588a85d4e55a816db",
            "c19992212e9145a9b8c2d7c79771a202",
            "4cc37056b58c4d8ebcd1f6df32e02c38",
            "3d51b3ec19da416ca3d0ab3cba72a95c",
            "665e535c7f6a4d64a3611de1385e452a",
            "a614dca8ab6a43b5b66ed0d34a5f139b",
            "258ca760305f4e49925ff3dc7547b131",
            "2421b997a20741bd8293ae45dbbd15fc",
            "2954a506d2a34bab82206a86b46fc797",
            "008fbc8313e54087b7c66feee4ee5a62",
            "8a83e460cd88418abc7c9cec7c18bee7",
            "ada316855dc9488d870209945179b864",
            "533285b01cd54a57ac9c0e00d784b959",
            "7d75f7a490d5465482b7bc8f45cf89fc",
            "256be28c5fe147969ed8fa29838d5c28",
            "d81e5dc024bf402588dfb0b369788337",
            "3631c9c75b6841b3937e3d0664787892",
            "5aac2516a9d14e028370dca43063fae6",
            "515065138b1d4ba29a777eced14720d1",
            "5fc9e7f3ee6d4d1fb5fd26239810f432",
            "96c10e1c32eb4be4a062cc59bcc4a52b",
            "e38d31ab8ced4717b783a4ed536533aa",
            "2651917c8caa41deb59b391026303884",
            "c5b0a68ebd8d4f66a80b51e06d6c7d45",
            "ac6841c42dc24fac94ddf2e42dae5358",
            "8867dfc07002499ca5a7fa11a6cd5c4b"
          ]
        },
        "outputId": "be971da5-1cbd-4ee4-e5cd-0f950fb0591c"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Preparing to load model: facebook/hubert-xlarge-ls960-ft\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "31f98d21aaed4a87a4ccd63bdbf01ead",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/212 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "4fe9b0d2a6c64c0595c44426b5f39837",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/138 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "165bdf8275f948529e1064426d3c452c",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/1.45k [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "878f1fa01d534c4791aa578066d14c54",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/292 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "665e535c7f6a4d64a3611de1385e452a",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "d81e5dc024bf402588dfb0b369788337",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Downloading:   0%|          | 0.00/3.59G [00:00<?, ?B/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "loaded the following model: facebook/hubert-xlarge-ls960-ft  at  2022-02-05 15:04:24.049677\n",
            "CPU times: user 1min 9s, sys: 11 s, total: 1min 20s\n",
            "Wall time: 1min 25s\n"
          ]
        }
      ],
      "source": [
        "#@title load model from huggingface hub\n",
        "%%time\n",
        "from transformers import (\n",
        "    Wav2Vec2ForMaskedLM, \n",
        "    Wav2Vec2CTCTokenizer, \n",
        "    Wav2Vec2Processor,\n",
        "    HubertForCTC,\n",
        ")\n",
        "\n",
        "gpu_mem = round(gpu_mem_total() / 1024, 2)\n",
        "\n",
        "if gpu_mem < 15 and chunk_length > 20:\n",
        "    print(\"GPU memory of {} is too low.. setting chunk length to 20\".format(gpu_mem))\n",
        "    chunk_length = 20  # automatically adjust down to avoid issues\n",
        "\n",
        "\n",
        "print(\"\\nPreparing to load model: \" + _model)\n",
        "tokenizer = Wav2Vec2Processor.from_pretrained(_model)\n",
        "\n",
        "if \"hubert\" in _model.lower():\n",
        "    model = HubertForCTC.from_pretrained(_model,\n",
        "                                         gradient_checkpointing=True,\n",
        "                                         low_cpu_mem_usage=False, # set to true if issues\n",
        "                                    )\n",
        "else:\n",
        "    model = Wav2Vec2ForCTC.from_pretrained(_model)\n",
        "# (in seconds) if model fails to work or errors out (and there isn't some other\n",
        "# obvious error, reduce this number. 20-25 is a good start.\n",
        "print(\"loaded the following model:\", _model, \" at \", datetime.now())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2P4LaEB-ecL3"
      },
      "source": [
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zybapeYW_4uw"
      },
      "source": [
        "# Run Transformers Model "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "cellView": "form",
        "id": "PigSuPnG_qP6",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 88
        },
        "outputId": "0af7d2c7-e527-40ca-b386-5bd8a1748aa8"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Gen RAM Free: 48.0 GB  | Proc size: 10.3 GB  | 8 CPUs  loaded at 19.1 % |\n",
            "GPU RAM Free: 16158MB | Used: 2MB | Util   0% | Total 16160MB\n",
            "\n"
          ]
        }
      ],
      "source": [
        "import gc\n",
        "#@markdown initial check - hardware\n",
        "gc.collect()\n",
        "check_runhardware()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "cellView": "form",
        "collapsed": true,
        "id": "7EkpWuk3bmpP",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 313,
          "referenced_widgets": [
            "340c14d11c2d4d34842d6141d665b7be",
            "eb5bec651ba14e9880dc130a9e7a7d74",
            "6bcb31cf856449a78a2581ba003de6e5",
            "bc9068b4129d4951b2eab47ce3537895",
            "78344bd4fe31446f9587939877eda929",
            "9aa182af520d4ae6905861ff9b9328fa",
            "8b22d9eb337e4c988b2c8ff4320be5a2",
            "a93fd56343754ff8992b90cadecf7e20",
            "495deefc6f4e4cdea6acc148a8f445b8",
            "a41e102af7b84ca1a91c3284ee618393",
            "8c28eec252324950ac0a8306b84764e4",
            "a3cbe27c93e942d7902f9f2c7cfb7fc3",
            "06f724939ec241afb5d6d2d4a047d0fc",
            "c8ab64b0c8c34881ac77c28f0a1b548e",
            "6f333b1f92a44532a5e7c9afe3d5a56c",
            "44d7ef781f974853873e60b28a8ccaf4",
            "7062a93bea06440792ca2192908a1f75",
            "eed4f35ba61a4a74aa19390d9cef813f",
            "c6a0abc706ef41dc8315a8ef1d41a903",
            "a142afad3c21410094e463939bf346b6",
            "0fd569696a4b46ce9836c348c686f456",
            "a796474b311648e48ceb0e9174841608"
          ]
        },
        "outputId": "b31bbab9-2b78-4849-b025-7fe9fce89d6a"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "340c14d11c2d4d34842d6141d665b7be",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "Main Proc: \t:   0%|          | 0/1 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting file:  /content/Borat%20Cultural%20Learnings%20of%20America%20for%20Make%20Benefit%20G-2.mp4\n",
            "\n",
            "created audio chunks in 0.63891 seconds - Feb-05-2022_-15\n",
            "\n",
            "Gen RAM Free: 47.6 GB  | Proc size: 10.3 GB  | 8 CPUs  loaded at 37.4 % |\n",
            "GPU RAM Free: 16158MB | Used: 2MB | Util   0% | Total 16160MB\n",
            "\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "a3cbe27c93e942d7902f9f2c7cfb7fc3",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "transcribing /content/Borat%20Cul...:\t:   0%|          | 0/336 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Gen RAM Free: 47.6 GB  | Proc size: 10.3 GB  | 8 CPUs  loaded at 21.7 % |\n",
            "GPU RAM Free: 16158MB | Used: 2MB | Util   0% | Total 16160MB\n",
            "\n",
            "saved outputs for file ID 1 -  2022-02-05 15:10:21.196460\n"
          ]
        }
      ],
      "source": [
        "#@title Transcription Loop\n",
        "#@markdown - here's where the transformer model is applied.\n",
        "#@markdown - transcription speed depends on a lot of things, most notably what \n",
        "#@markdown GPU the runtime was assigned (check at the top of notebook)\n",
        "\n",
        "\n",
        "if use_url:\n",
        "    vid_src_folder = URL_save_folder\n",
        "else:\n",
        "    vid_src_folder = directory\n",
        "\n",
        "storage_locs = validate_output_directories(directory)\n",
        "output_path_transcript = storage_locs.get(\"t_out\")\n",
        "output_path_metadata = storage_locs.get(\"m_out\")\n",
        "\n",
        "for filename in tqdm(\n",
        "    approved_files, \n",
        "    total=len(approved_files), \n",
        "    desc=\"Main Proc: \\t\"\n",
        "):\n",
        "\n",
        "    t_results = transcribe_wav2vec(\n",
        "        transcription_model=model,\n",
        "        directory=vid_src_folder,\n",
        "        vid_clip_name=filename,\n",
        "        chunk_length_seconds=chunk_length,\n",
        "    )\n",
        "    # t_results is a dictonary containing the transcript and associated metadata\n",
        "    # label and store this transcription\n",
        "    vid_preamble = beautify_filename(\n",
        "        filename, num_words=30, start_reverse=False\n",
        "    )  # gets a nice phrase from filename\n",
        "    # transcription\n",
        "    ID = str(1 + approved_files.index(filename))\n",
        "    save_transcript_outputs(vid_preamble, ID, t_results)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZpwGP9EEVWFX"
      },
      "source": [
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KAlsOjR8AD7R"
      },
      "source": [
        "# Post Model Processing\n",
        "\n",
        "If you got to here, your colab file was able to run the model and transcribe it. Now a little cleaning up, then done."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "cellView": "form",
        "id": "5ECdmK3tjP4L",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 73
        },
        "outputId": "2e77d761-a9a6-4d6e-85aa-c2f30b1b234a"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "out of   1 file(s) originally in the folder,    0 non-txt files were removed\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['content_borat_20_cultural_20_learning_s_20_of_20_america_20_for_20_make_20_benefit_20_g_2_v_2_c_transcription_1.txt']"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ],
      "source": [
        "#@title Validate text files to spell-check\n",
        "#@markdown reload everything from the text directory in case of changes\n",
        "# first, you need to go through the output directory of transcripts and make sure that all those files are gucci\n",
        "transcripts_to_munch = natsorted(\n",
        "    [\n",
        "        f\n",
        "        for f in listdir(output_path_transcript)\n",
        "        if isfile(join(output_path_transcript, f))\n",
        "    ]\n",
        ")\n",
        "t_files = len(transcripts_to_munch)\n",
        "removed_count_t = 0\n",
        "approved_txt_files = []\n",
        "# remove non-.txt files\n",
        "for tfile in transcripts_to_munch:\n",
        "    if tfile.endswith(\".txt\"):\n",
        "        approved_txt_files.append(tfile)\n",
        "    else:\n",
        "        transcripts_to_munch.remove(tfile)\n",
        "        removed_count_t += 1\n",
        "\n",
        "print(\n",
        "    \"out of {0:3d} file(s) originally in the folder, \".format(t_files),\n",
        "    \"{0:3d} non-txt files were removed\".format(removed_count_t),\n",
        ")\n",
        "\n",
        "approved_txt_files"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "id": "9VVqBJnxdb7Y",
        "cellView": "form",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 265,
          "referenced_widgets": [
            "75aa4b6a3ea54c3d988844c3273de063",
            "444ce3a2deab41ce8b66c9159523e681",
            "f4fcb28aa63d44dd85098aa010296637",
            "9684e2e83c9f4197849004e9fb70a643",
            "1fa74bf1f8074a76a81cb8867b8e35ee",
            "a5a14bce22e845b0a6db048fc1b5c7c6",
            "e4cd028b52fc4c0fa821bb67e0e0ae6d",
            "7cbae2ece5b34951b86a01cd77cf7c51",
            "1f364835b62c46e3a58141d21230f89a",
            "77331dbc2ce047c99ba41ac8272fe394",
            "f8dc12262bc44094898d0eeffda14386"
          ]
        },
        "outputId": "93782af6-94b6-4d24-e4e5-4cf9b3c76343"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "75aa4b6a3ea54c3d988844c3273de063",
              "version_minor": 0,
              "version_major": 2
            },
            "text/plain": [
              "spellcorrect_pipeline on transcriptions:   0%|          | 0/1 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "Starting file 1 of  1  |  content_borat_20_cultural_20_learning_s_20_of_20_america_20_for_20_make_20_benefit_20_g_2_v_2_c_transcription_1.txt\n",
            "top 5 phrases are: \n",
            "\n",
            "['don', 'guzaks don nice', 'make', 'time', 'nice']\n",
            "completed keywords\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "\n",
              "    async function download(id, filename, size) {\n",
              "      if (!google.colab.kernel.accessAllowed) {\n",
              "        return;\n",
              "      }\n",
              "      const div = document.createElement('div');\n",
              "      const label = document.createElement('label');\n",
              "      label.textContent = `Downloading \"${filename}\": `;\n",
              "      div.appendChild(label);\n",
              "      const progress = document.createElement('progress');\n",
              "      progress.max = size;\n",
              "      div.appendChild(progress);\n",
              "      document.body.appendChild(div);\n",
              "\n",
              "      const buffers = [];\n",
              "      let downloaded = 0;\n",
              "\n",
              "      const channel = await google.colab.kernel.comms.open(id);\n",
              "      // Send a message to notify the kernel that we're ready.\n",
              "      channel.send({})\n",
              "\n",
              "      for await (const message of channel.messages) {\n",
              "        // Send a message to notify the kernel that we're ready.\n",
              "        channel.send({})\n",
              "        if (message.buffers) {\n",
              "          for (const buffer of message.buffers) {\n",
              "            buffers.push(buffer);\n",
              "            downloaded += buffer.byteLength;\n",
              "            progress.value = downloaded;\n",
              "          }\n",
              "        }\n",
              "      }\n",
              "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
              "      const a = document.createElement('a');\n",
              "      a.href = window.URL.createObjectURL(blob);\n",
              "      a.download = filename;\n",
              "      div.appendChild(a);\n",
              "      a.click();\n",
              "      div.remove();\n",
              "    }\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "download(\"download_92677fae-496a-4623-8d55-00fd0def9044\", \"YAKE - all keywords - vid2cleantxt_demo - 05022022.csv\", 1012)"
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Transcription files used to extract KW can be found in: \n",
            "  /content/vid2cleantxt_demo/wav2vec2_sf_transcript/spell_corrected\n",
            "A file with keyword results is in /content/vid2cleantxt_demo/wav2vec2_sf_transcript \n",
            "titled YAKE - all keywords - vid2cleantxt_demo - 05022022.csv\n"
          ]
        }
      ],
      "source": [
        "#@title Spellcorrect Pipeline\n",
        "#@markdown 1. lower output text and correct with Neuspell\n",
        "#@markdown 2. apply pySBD for sentence boundary detection\n",
        "#@markdown 3. extract keywords with the YAKE library\n",
        "\n",
        "transcript_run_qk = pd.DataFrame()  # empty df to hold all the keywords\n",
        "\n",
        "for orig_tscript in tqdm(\n",
        "    approved_txt_files,\n",
        "    total=len(approved_txt_files),\n",
        "    desc=\"spellcorrect_pipeline on transcriptions\",\n",
        "):\n",
        "\n",
        "    current_loc = approved_txt_files.index(orig_tscript) + 1  # add 1 bc start at 0\n",
        "    print(\n",
        "        \"\\nStarting file {} of \".format(current_loc),\n",
        "        len(approved_txt_files),\n",
        "        \" | \",\n",
        "        orig_tscript,\n",
        "    )\n",
        "\n",
        "    PL_out = spellcorrect_pipeline(\n",
        "        output_path_transcript, orig_tscript, verbose=False\n",
        "    )  # verbose is just for debug\n",
        "    directory_for_keywords = PL_out.get(\"spell_corrected_dir\")\n",
        "    filename_for_keywords = PL_out.get(\"sc_filename\")\n",
        "\n",
        "    qk_df = quick_keys(\n",
        "        filepath=directory_for_keywords,\n",
        "        filename=filename_for_keywords,\n",
        "        num_keywords=25,\n",
        "        max_ngrams=3,\n",
        "        save_db=False,\n",
        "        verbose=False,\n",
        "    )\n",
        "\n",
        "    print(\"completed keywords\")\n",
        "    transcript_run_qk = pd.concat([transcript_run_qk, qk_df], axis=1)\n",
        "\n",
        "# save overall transcription file\n",
        "date_field = datetime.now().strftime(\"%d%m%Y\")\n",
        "folder_desc = basename(directory)\n",
        "keyword_db_name = \"YAKE - all keywords - {} - {}.csv\".format(folder_desc, date_field)\n",
        "keywords_total_path = join(output_path_transcript, keyword_db_name)\n",
        "transcript_run_qk.to_csv(keywords_total_path, index=True)\n",
        "download(keywords_total_path)\n",
        "\n",
        "# print results\n",
        "print(\n",
        "    \"Transcription files used to extract KW can be found in: \\n \",\n",
        "    directory_for_keywords,\n",
        ")\n",
        "print(\n",
        "    \"A file with keyword results is in {} \\ntitled {}\".format(\n",
        "        output_path_transcript, keyword_db_name\n",
        "    )\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3A9BykjRbOpc"
      },
      "source": [
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a9qbVc4PAcFZ"
      },
      "source": [
        "# Save, Download, Exit\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {
        "cellView": "form",
        "id": "sHfbbCPcKVZp",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "outputId": "598d1dc2-1925-4821-b0c9-0f6d64170022"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "\n",
              "    async function download(id, filename, size) {\n",
              "      if (!google.colab.kernel.accessAllowed) {\n",
              "        return;\n",
              "      }\n",
              "      const div = document.createElement('div');\n",
              "      const label = document.createElement('label');\n",
              "      label.textContent = `Downloading \"${filename}\": `;\n",
              "      div.appendChild(label);\n",
              "      const progress = document.createElement('progress');\n",
              "      progress.max = size;\n",
              "      div.appendChild(progress);\n",
              "      document.body.appendChild(div);\n",
              "\n",
              "      const buffers = [];\n",
              "      let downloaded = 0;\n",
              "\n",
              "      const channel = await google.colab.kernel.comms.open(id);\n",
              "      // Send a message to notify the kernel that we're ready.\n",
              "      channel.send({})\n",
              "\n",
              "      for await (const message of channel.messages) {\n",
              "        // Send a message to notify the kernel that we're ready.\n",
              "        channel.send({})\n",
              "        if (message.buffers) {\n",
              "          for (const buffer of message.buffers) {\n",
              "            buffers.push(buffer);\n",
              "            downloaded += buffer.byteLength;\n",
              "            progress.value = downloaded;\n",
              "          }\n",
              "        }\n",
              "      }\n",
              "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
              "      const a = document.createElement('a');\n",
              "      a.href = window.URL.createObjectURL(blob);\n",
              "      a.download = filename;\n",
              "      div.appendChild(a);\n",
              "      a.click();\n",
              "      div.remove();\n",
              "    }\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "download(\"download_e5957792-6fcf-47d4-804e-0d89917c0049\", \"vid2clntxt_transcripts_archive05022022vid2cleantxt_demo.zip\", 56596)"
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "\n",
              "    async function download(id, filename, size) {\n",
              "      if (!google.colab.kernel.accessAllowed) {\n",
              "        return;\n",
              "      }\n",
              "      const div = document.createElement('div');\n",
              "      const label = document.createElement('label');\n",
              "      label.textContent = `Downloading \"${filename}\": `;\n",
              "      div.appendChild(label);\n",
              "      const progress = document.createElement('progress');\n",
              "      progress.max = size;\n",
              "      div.appendChild(progress);\n",
              "      document.body.appendChild(div);\n",
              "\n",
              "      const buffers = [];\n",
              "      let downloaded = 0;\n",
              "\n",
              "      const channel = await google.colab.kernel.comms.open(id);\n",
              "      // Send a message to notify the kernel that we're ready.\n",
              "      channel.send({})\n",
              "\n",
              "      for await (const message of channel.messages) {\n",
              "        // Send a message to notify the kernel that we're ready.\n",
              "        channel.send({})\n",
              "        if (message.buffers) {\n",
              "          for (const buffer of message.buffers) {\n",
              "            buffers.push(buffer);\n",
              "            downloaded += buffer.byteLength;\n",
              "            progress.value = downloaded;\n",
              "          }\n",
              "        }\n",
              "      }\n",
              "      const blob = new Blob(buffers, {type: 'application/binary'});\n",
              "      const a = document.createElement('a');\n",
              "      a.href = window.URL.createObjectURL(blob);\n",
              "      a.download = filename;\n",
              "      div.appendChild(a);\n",
              "      a.click();\n",
              "      div.remove();\n",
              "    }\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "download(\"download_375c5007-072b-489e-b941-8ca1a70bbacf\", \"vid2clntxt_metadata_archive05022022vid2cleantxt_demo.zip\", 602)"
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "downloaded files -  2022-02-05 15:11:51.190448\n"
          ]
        }
      ],
      "source": [
        "#@title Download All Files in .zip\n",
        "#@markdown this needs to be specified in _Setup_ at the beginning\n",
        "import os, shutil\n",
        "from os.path import basename\n",
        "\n",
        "zip_dir = join(directory, \"zipped_outputs\")\n",
        "os.makedirs(zip_dir, exist_ok=True)\n",
        "\n",
        "date_field = datetime.now().strftime(\"%d%m%Y\")\n",
        "folder_desc = basename(directory)\n",
        "base_header = date_field + folder_desc\n",
        "# transcriptions\n",
        "transc_header = \"vid2clntxt_transcripts_archive\" + base_header\n",
        "zip_path_t = join(zip_dir, transc_header)\n",
        "shutil.make_archive(zip_path_t, \"zip\", output_path_transcript)\n",
        "# metadata\n",
        "meta_header = \"vid2clntxt_metadata_archive\" + base_header\n",
        "zip_path_m = join(zip_dir, meta_header)\n",
        "shutil.make_archive(zip_path_m, \"zip\", output_path_metadata)\n",
        "\n",
        "if download_output_files:\n",
        "    files.download(join(zip_dir, zip_path_t + \".zip\"))\n",
        "    time.sleep(5) # browsers do not like when 2 files download instantly\n",
        "    files.download(join(zip_dir, zip_path_m + \".zip\"))\n",
        "    print(\"downloaded files - \", datetime.now())\n",
        "else:\n",
        "    print(\"download_output_files is set to: \", download_output_files)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {
        "cellView": "form",
        "id": "e8ekwnvWk9rb",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "outputId": "2ea4d71a-0f37-467e-bc72-b75193bc3149"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "\n",
            "the contents of transcribed file FIN_content_borat_20_cultural_20_learning_s_20_of_20_america_20_for_20_mak.txt:\n",
            "\n",
            "('Sa no te sotety i d mo de oa mma maamoi a mama m si te sa to te i s betead '\n",
            " 'mo deo a ma ma mo o ma ai cas ia a cat big yanman a is o cay an morte de '\n",
            " 'lancase it on body a ai e is the big yan man as is o cay n mote de ttppely '\n",
            " 'may smikyek to match like my name a bottle I like i you I like segs is nice '\n",
            " 'there owe my canty of a C extra. It locates between the Tasikstan at '\n",
            " 'Gergistan and Ashhash usbekistan that my town of kulsek they sir orkin the '\n",
            " 'town repeat not to not of a year of town a kilnegate and a hyer live mute '\n",
            " 'volcano, a town mechanic and a berchonist is my heels an place he is my '\n",
            " 'neighbor nusutam to walk by he is panning my ass house I get a window from a '\n",
            " 'glass you must get the window from a glass and get a step. He must get the '\n",
            " 'step. I get a cloak ready he cannot afford great success. This is Natalia. '\n",
            " 'she is my sister she is before prostitute in all of guzaks don nice This is '\n",
            " 'my mother the oldest woman in the hall of type. She is a forty three. I love '\n",
            " \"her and this is my wife oxana. She's a bori god cumer mate. he looks a god \"\n",
            " \"aspo yona we will di just get up with him I'll guess I dont come in here \"\n",
            " 'please but tin no i i i ot tick this where I live my bed this is an derisory '\n",
            " 'recorded and these her placasets now I show you outside from my trousers de '\n",
            " 'jeu a mad hobbers pin pong UNK The said ball ppeepppptidtis got the answer '\n",
            " 'open ppeppand on wecan a to capital city and watch a ladies while they make '\n",
            " 'at tiland by profession work as a television reporter for Cusaksta because '\n",
            " 'you see then i a much o t in a i a m e a sme b o ta sam n se o e e ee t eea '\n",
            " 'es a e go o o e o again i e ei e e aie e e i go e e an o bod and ro gane to '\n",
            " 'the Ges now ring of a time to although Kasaksdan a glorious country it have '\n",
            " 'a problem to economic social and know this my ministry of information have '\n",
            " 'decided to send me to you s and a great best country in the world to learn a '\n",
            " 'lesson for Kassaksdan in wild travel with most venerable producer as a mad '\n",
            " 'bagatov azamot yobri of the garita before we e ye to this to be and o go to '\n",
            " 'putting the pi teeai o demo I go to America I e e e e e e eo mint time I e '\n",
            " 'on te the eeetet tte tto tete o tetetepen tetepeon tetepe te t te e e '\n",
            " 'tetepeon te tepeoment ee e te bndrte e ertt tn dertnt undetae te e e ben '\n",
            " 'dertee un deeeaeteete etee tt can derte u e ertee an deten e an derte e '\n",
            " \"detate e e et I Arrive in America's airport with clothing just as dollars \"\n",
            " 'and a jar of gipsy tears to protect me from Aids said immediately wio me '\n",
            " 'hallow my name u bor i i a yer not american i er knew it down I sumit o '\n",
            " \"hallo nice mother as ye what'your name my name you only for ten we to the \"\n",
            " \"open door when coetffo really nice malto in my name and up save mana that's \"\n",
            " \"it. You'll get the fog out. the benefits break your jaw or no go get the fog \"\n",
            " 'or you go. What would the wog want Then do get sorry to fog you go o and age '\n",
            " \"get them but won't really memcareful in come place when eco B or go relax to \"\n",
            " 'go Go get or go go relax. On for going dont get no problem sir and welcome '\n",
            " 'to the Wellington Hotel. Do you want to pay for the entire store? Now I pay '\n",
            " \"you for one night. If I'll arise, one night is one hundred seventeen dollars \"\n",
            " \"and seven and thirteen cents. We'll call it eighty five. So we can call it \"\n",
            " 'ten hundred and seventeen. Let me get the door up for you. Come on in and '\n",
            " 'very nice. Very nice or not in the room yet. your hold on might want to '\n",
            " \"repack your things but I'll be moving again shortly. Will not move to a \"\n",
            " 'smaller room so this is your floor I Can t take you to your room This is not '\n",
            " 'your room this is the elevator it takes you to the floor where your room is '\n",
            " 'on way. Wow a time A will a live on Will king in the castle king in the '\n",
            " 'castle have a chair and have a chair to do it. Go do his king in the castle '\n",
            " \"aclon I smile on when i'm a bot up that the ion new in ton i you wise I a \"\n",
            " \"choose yourself on everybody stalking at me I don't hear a word saying only \"\n",
            " \"there goes up my mind the out of people stop em any other I i can't see all \"\n",
            " 'men in a oonn to till agi I I say hallola not touch me ikin in her my face I '\n",
            " 'kiss you her you kissed me in popping the pumpkin ball A on whether the sun '\n",
            " 'keep shining my night through the pouring rain suits may no li o may go '\n",
            " 'alone Nights later I netted down my Am me in the North East Wind overnsaid '\n",
            " 'is on Selvabre is a baboon skipping over the ocean like a storm. This has '\n",
            " 'been the most happiest day of my life. I Was very excited to start my '\n",
            " 'reporting amid Pidita out utae at otiritait amipaa bosi a or the Comoysan '\n",
            " 'uckin halico a think I tell yotenrok to know gi e total silver gets o see if '\n",
            " \"she has has ie. er's the sos yo i to la la illa corp is a poitistissesei fa \"\n",
            " \"in mat tttetao me got hello ho namis my shame's pat haggard nice with a \"\n",
            " 'bulat. nice to meet you should I make her jumbo to my mother in law? Yes In '\n",
            " \"America that's a very popular joke. You have another in law joke. Yes, all \"\n",
            " 'right I e her the sixth time with my mother in law at what time Six time I '\n",
            " 'made the sixth time with my mother in law. You had sacks with your mother in '\n",
            " \"law? Yes. I Don't think that Americans would find that funny. No, it is not \"\n",
            " \"a joke yet. we're talking about a human. Oh yes, you ask me about my mother \"\n",
            " 'in law. Do you have a joke about your mother in law? No, we make a joke on a '\n",
            " 'mother in law. and do you ever laugh on people with Arita fashion? Here in '\n",
            " \"America we try not to make fun of or be funny with things that people don't \"\n",
            " 'choose but perhaps you have not seen a some one with a very funny '\n",
            " 'retardation and may brother bile have a very funny and retardation of and '\n",
            " 'mental retardation you know causes a lot of pain and hardship for a lot of '\n",
            " 'fat. Sometimes my sister she show her virgins to my brother Billo and say '\n",
            " 'you will never get this You will never get this la la la aa la here behind '\n",
            " 'this cage. crazy crazy everybody alave shego you never get this but at one '\n",
            " 'time he breaks a cage and he gets this and then a week a laugh of five ha ha '\n",
            " 'hanow and and no that would not funny in America aka what he is or not '\n",
            " 'jokes. A hate joke is when we try to make fun of something and what we do is '\n",
            " 'we make a statement that we pretend is true but in the end we say not. Which '\n",
            " \"means it's not true to teach me how to make one if what colour is your suit \"\n",
            " \"the Asutis grey gray I would call it blue or whose green if it's blue gray \"\n",
            " \"but it's. It's certainly not gray or let's say it's gray but it's not. Ger \"\n",
            " 'me is a not jokers I would say that suit is black Not this suit is not black '\n",
            " 'so no not has to be the end o car oque this suit black nut this suit is '\n",
            " 'black. Pause in other pauses. Yes this suit is black but this suit is black '\n",
            " \"Paus not no you don't say plus this suit is black. That's a pause that this \"\n",
            " \"suit is black like and I don't. I don't undo you. Everybody says you essay \"\n",
            " 'television. much better but this I watched for three hours. Do not change as '\n",
            " 'a remote control right here. Push these two arrows to change again. Go Go A '\n",
            " 'woo You got in my got my gotti to come out in the back yard with me. I have '\n",
            " \"the urge to bury something I want. Oh I love you. don't eat me and me love \"\n",
            " 'you. You believe the magic to miss Huge feeding is a pleasure to meet you '\n",
            " 'see may be be careful be careful. So gay. This the gay was like no Kasak '\n",
            " 'woman I had ever seen. She had golden hair, teeth as white as pearls and the '\n",
            " 'assault of a seven year old. For the first time in my life. I was in love. '\n",
            " 'Speeches stood, stood still. Ya elilibdietacushatsaka Ari mote joma ada busy '\n",
            " \"story is sure. Gugena is answered foshit it's a pilot poshe gana that has \"\n",
            " 'lava lava O K Jen dobreg in Kassakstan. It is illegal for more than five '\n",
            " 'women to be in the same lace except for in brothel or in grave by us and '\n",
            " 'many women. Meet some groups called the Feminists. I find them all chance. '\n",
            " \"So what a means is feminism. It's the theory that women should be equal to \"\n",
            " 'men in matters economic. No, you are laughing at toilet. That is the '\n",
            " 'problem. Do you think a woman should be educated? Definitely that. Is it. '\n",
            " 'Not a problem that a woman has a smaller brain than a man. That is wrong. '\n",
            " 'But the government scientists Doctor Yamak have proved it his size of '\n",
            " 'squirrel year government scientist And yes o Doctor Yamaka wrong. Give me a '\n",
            " 'smile baby. Why paint her face Well what youare saying is very demeaning. Do '\n",
            " 'you know the word demeaning? No we are saying to you I could not concentrate '\n",
            " 'on what this old man was saying. The sad not to all I could think about was '\n",
            " 'this lovely woman in her red water panties woman who was this as gay last '\n",
            " 'night I saw in my hotel room with a woman called the the Gay on a television '\n",
            " \"so you know her no as she is from a town called the Baywatches. she's just \"\n",
            " 'on television. her name is Paella. does she live here in New Yak and lives '\n",
            " \"in California or in the California. He's going to look her up so know. Can \"\n",
            " \"we finish now? Listen pussiqeto smile a bit. All right and that's it. Okay \"\n",
            " \"I'm. as we've finished we have to leave. Although I was obsessed by this sea \"\n",
            " 'way I could not pursue or else my wife would. Snapped my cook soldier as I '\n",
            " 'have a telegram for you and you marry again under park said devout your wife '\n",
            " 'as Oxnas was walking your retarded bike in the woods when a bear attacked '\n",
            " 'and violated and break her. She is now dead a night you say my wife is dead '\n",
            " \"This is what it's a yes sir I'm sorry to inform you but that's what the \"\n",
            " 'telegram says his fy might t to see ee t t e e e e ia e e e o the hand in at '\n",
            " 'a mak venture bunchkit on California here is ter the women I fellanother '\n",
            " 'erig a go unikan in luesta to t shma ratias is divided bytelevis for the '\n",
            " \"Masia and California They are married to the womepoor won't you butd y a tan \"\n",
            " 'Califor di shator wale tem or a parlil beshama yeshit ter a Texas soe iven '\n",
            " 'tally I Persuaded Azamad that we would travel to California and make our '\n",
            " 'reporting along the way. He insists we not fly in case the news repeated '\n",
            " 'their attack of Nine Eleven That are be you Gentleman Luksom My name is like '\n",
            " \"I'm going to be your driving instructor. Welcome to your country of my name \"\n",
            " \"A bodat of ok good good.'M not used to that but that's fine Now you do know \"\n",
            " 'how to drive a little bit. Why eyes would it indeed What drive away so way '\n",
            " \"second Have you driven a car before? Yes a mammoth have's go this way. wait \"\n",
            " \"a minute. I don't want you go hit any one. You'use two hands now with two \"\n",
            " \"hands but then it look like I am holding a gipsy while we get my cra O'care \"\n",
            " 'what looks like you use two hands when you drive Uka o know watch the '\n",
            " 'children go stay in. The problem must not hit the children. Look there is a '\n",
            " 'woman in a town. can we follow her and may me make this sex inmna na na '\n",
            " 'naanana na na noima Because a woman has a right to choose who she has sex '\n",
            " 'with And how about that in Tilasawork there must be consent about a a A '\n",
            " \"that's good. Why is not good for me? So it is. Go the car man. You want to \"\n",
            " \"have a drink. You can't drink them while you drive in. It's was. It's \"\n",
            " \"against the law. fat who is discarded. Plus I wish it didn't fall us in and \"\n",
            " \"I don'know if we will them. We better not lose them. They don't look at me, \"\n",
            " \"eat my pants. We'll make a right turn there. Don't look at me like that. \"\n",
            " \"That will get your shape. You fuck my mother hecon'you do be know before he \"\n",
            " \"looks on me. So say her, don't throw us in jail Me with you. You are in to \"\n",
            " \"jail. He'll look on me a man behind you. Can '. t say that I like you. Do \"\n",
            " 'you like me? I do like you. You are my friend. Your a nice young man and I '\n",
            " \"am your friend. You will be my boy friend. Ya I won't be your boy friend. \"\n",
            " 'Why not? You do hit me yet. I can be deten no boy friend Ye I care. Great '\n",
            " 'success Now time to make purchase of motor cars. I want to have a car '\n",
            " \"that'attract a woman with a shave down below. Well that would be a corvett \"\n",
            " \"or a hammer. We'll try to help you out here. A man a yesterday and tell me \"\n",
            " 'if I buy a car I must buy one with a pussy magnet. He means a car that women '\n",
            " 'will like Yes but where you keep this place as yo magnate knows just that. '\n",
            " 'He means the vehicle. Women love the hummers. Do this abuse magnet know the '\n",
            " 'vehicle shop will be a bag If I give you good price will you please put in '\n",
            " \"pussy magnet? Reader'sense there's no such thing in this country as as a \"\n",
            " 'magnet. If this car dry to a group of guys will there be any damage to the '\n",
            " \"car? It depends on how hard you hit him and all that. It's hard hard you \"\n",
            " 'might. If somebody rolls on the windshield they could crack your windshield. '\n",
            " 'How fast do I need to go to guarantee a time? Tellyou some with this vehicle '\n",
            " 'here probably doing thirty five forty miles an hour would do it great. And '\n",
            " 'when I are by my wife and at the start she was a cool good her vagina work '\n",
            " 'and and she strong on plough but after three years when she was fifteen then '\n",
            " 'she became weak. her voice became a deep borat marked as she a received a '\n",
            " 'hair on her chest and her virgin crown like slave of wizard who shes How do '\n",
            " 'I know that this will not happen with the can? Chevarlais guarantees you '\n",
            " 'that with a warrantee. I like her very much A Ba these homers how much is '\n",
            " 'it? Fifty Two thousand. I am looking for something between an six hundred er '\n",
            " \"to six hundred fifty dollars. We don't have any cards for six fifty that you \"\n",
            " 'can buy and might be a able to sell wholesale car. A car with a lot of miles '\n",
            " 'for seven hundred with no warranty to a come on a London a new get out on '\n",
            " 'thehigway up to a coming but poradventure in what ever comes the way to do '\n",
            " 'mass headfirst s on our journey visit Washington D is home of mighty S S '\n",
            " 'Warlord Premier Bush Each gohosis oda wa way we we was backistan fakheyo '\n",
            " 'malefacors. We arrive here to learn of American politics as I might arrange '\n",
            " 'interviews with party officials from ruling regime. We are good friends. '\n",
            " 'Abobba yes I hope so it is the custom have a cheese et her starved Thank you '\n",
            " 'my wife and she a make this cheese very nice she make it and for my milk '\n",
            " 'from her teat panel can be huffing after interview and encounters '\n",
            " 'traditional American street festivals. People here were much more friendly '\n",
            " 'than in New York. Next morning I interviewed politician who is a genuine '\n",
            " \"chocolate face to make up. On a Sunday I arrived in Washington's and there \"\n",
            " 'was an parade time marker. Two friends from this parade and I invited back '\n",
            " 'my hotel room. After we drink like a normal in a cesarean, we wrestle like a '\n",
            " 'normal in a casestan and then they say if a wash you in a shower and he puts '\n",
            " 'me in a shower it sounds like that you met somebody who is from what is '\n",
            " 'called in America the gay community. That did mean a Gaythis or the '\n",
            " 'homosexual homosexual you mean we Are You Telling me the man who tried to '\n",
            " 'put a rubber fist in my mouth was a homosexual even though my anus was '\n",
            " 'broken I Knew that the rest of our journey would be great success. We left '\n",
            " \"Washington's and continued towards California. Hardy Barnes a more his \"\n",
            " 'tenmoney Monne to anyone Mondes makes the final pips of Tancin E bile that '\n",
            " 'Categan Tewiski date once me his omega pose word you as a move nearly over '\n",
            " 'the car and the gaspeeper nnobuilding you as stationed around you should see '\n",
            " 'it see between you we named best news case in this state base the tea. Well '\n",
            " 'this morning we have a very special guest here in the studio that is borahat '\n",
            " 'satisfied. He is travelling across America to get a taste of life here in '\n",
            " 'the United States and he spent the last few days here in our God bless my am '\n",
            " 'a Bora holding hello like you Oh before we start can you tell me because I '\n",
            " 'want my Ayurins and then I come back here and am if you tell me one not '\n",
            " 'before we stay where when we started we are actually live on the air right '\n",
            " \"now are very excited at Air's too he said i all ye and am very excited over \"\n",
            " 'excited me around Hello hello to you as well. Now you route or quickly. why '\n",
            " 'are you here in the United States? Because I want you to learn from a one S '\n",
            " 'to be a your counter at the derstand me from having using Heppim and to take '\n",
            " 'this lesson back to my country. All right would Rujock have a seat? Yes '\n",
            " 'please. Please said please said please said. Now one of the things that '\n",
            " \"you've enjoyed so much about a have a microphone so Pupo can't see them if \"\n",
            " \"they can hear you right now and make up this little thing right here. that's \"\n",
            " 'the microphone What thing? Blog so long as it might you all welcome to the '\n",
            " 'United States. Thank you very much or I ni E R E All right when you come out '\n",
            " \"to Kusukston, you cannestan my house. If I had you can't sleep my house and \"\n",
            " 'I owe you my fantastic eyes and meteorologist Can Johnson on the latest on '\n",
            " 'Tropical Storm Emmily. When we return onto and anti as sixteen, hunt traffic '\n",
            " 'is flowing along smoothly along Interstate Fifty Five dry road conditions. '\n",
            " \"However, if you're heading to the North not too far away from homes in \"\n",
            " \"Atallic County, there's some showers up there. Took that out on one point. \"\n",
            " \"Dopler's sixteen area this morning is a do A Very nice for even mum What one \"\n",
            " \"Mam and we're all here right now doing the weather way my mam of they are \"\n",
            " 'doing the weather right now almost go up here with Adrian to her address '\n",
            " \"heads she's calling you to go over here isn't she? For a four miss s to urge \"\n",
            " 'a go over here with a few men was going to dothe be on the the weather got '\n",
            " \"hecwit to o k a hh ha and let's go over to the weather they can see the rate \"\n",
            " 'right all right. Next shower some showers and storms up to the north again '\n",
            " 'to go but will let me up nice to meet you showers and storms north of the '\n",
            " 'major city up towards caziyesco at thank you me my iiteitil cenati hit de '\n",
            " 'oforsati hatch bon chassi to rodier ama my conde of short and coming time is '\n",
            " 'he he may make the de we chanan in ita cog byla ganat gone piti gwe wa u by '\n",
            " 'to ba to row bol tal and two to vote ban and on a to mad miles to long to '\n",
            " 'sea do better on lies can do table or got one base to do it than he can an '\n",
            " 'on do an on by a cab to a bid and to be to be of course every picture that '\n",
            " \"we get backfrom the stars anything else A maslons they'd refer at black hair \"\n",
            " \"and black mustache often else shake that day young mustache off so you'not \"\n",
            " 'so conspicuous or so you look like maybe a nicanan or something yes or from '\n",
            " 'people looking at you I see a lot of people and I think foot that gun making '\n",
            " \"on some kind of bag he's got strap to a yes and you probably normal now \"\n",
            " \"maybe that's known yokno iam but was that can all of the all which you look \"\n",
            " 'like this thing gets over with an fan way when it can kick down nuts of tar '\n",
            " 'eyes all of them some of the bucks hanging from the gallows ye as he at a we '\n",
            " 'see was proving several really understand theercipmen take care since they '\n",
            " \"don't kiss you why not The people that do the kissing other her are on a bar \"\n",
            " 'on time. that as they all in learning stay away from them to kiss look at no '\n",
            " 'burkis in my country they ye take them and they take them to jail and '\n",
            " \"finally take them out and hang em yes but we're trying to get done here half \"\n",
            " 'live thank ye ha e otin o pertin you will you please ye o most for well to a '\n",
            " 'gentleman who comes all the way from Mantistan and we are honored to have '\n",
            " 'singing our National Other Ladies into Grand Sachia. My name a Borat I come '\n",
            " 'at from Kasakston can I say at first we support a war of terror May we show '\n",
            " 'our support to our boys And never may year say we kill every sit on the '\n",
            " 'territories A may a jogs Boto drink the blob of every single human woman and '\n",
            " 'chob of era may you destroy this country so that for the next thousand years '\n",
            " 'not even a single lizard will survive in their desert. E o no friendship I '\n",
            " 'now will seeing the our Kayak national answer to the one of your national '\n",
            " 'anthem. Listen Kazastan is the greatest country in the world all last as '\n",
            " 'trees are run by little girls. Constant is number one exporter of potassium '\n",
            " 'or the central Asian countries have inferior potassium cause a stay is a is '\n",
            " 'centre of otontris is it as I academic. This did not sponsor insecurity and '\n",
            " 'was back on Esfor I Was sad the idiotic peoples did not like mewhat if '\n",
            " 'formula did not like me too we needed something to change our fortunes. '\n",
            " 'Sacelazamat in a cursian si bonnetso pulayesemes do not fear me. Gypsy All I '\n",
            " \"want from you is your tears. Please give them to me or I will take them. I'm \"\n",
            " \"not a gypsy, I'm mind turn farmer's daughter Americana. You have many \"\n",
            " \"treasures. Who did you rob for this? We didn't rob and they all came from \"\n",
            " 'inside the house. I will look at your treasures gipsy If this understood I '\n",
            " 'will look on them. Please do. Who is this lady you have shrunk? Was she the '\n",
            " \"owner of this house that you camp in front of? There's a people born shouted \"\n",
            " \"or do not try Shrink me Gipsy and the Rio these are your spells so there's a \"\n",
            " 'good one to millionaire finds that no go way to a way. Be what shecasa on I '\n",
            " 'muchwo ado gei o a i ye tant of e e do e do i te do p t i te go e o the a ta '\n",
            " 'o do i shu e a o di e he he he milked ly o e i e e e i e e e e ei e t o e e '\n",
            " 'i it i i e ein e ca to and if thilapat go me marpe fellaway they lay the '\n",
            " 'asking on the she i theginle was it to you I a e go go write i ma e m a k '\n",
            " 'eaa a ea ter ter it is and the oe fe en e fel fo so i cu r eat the napkin up '\n",
            " 'then up to earn the direction to California please I your now away from home '\n",
            " \"da yo na a we place oh you w we see wawwe we where you would be doing I've \"\n",
            " 'friends with my friend then as am at the bagatoar of terfo croo you case me '\n",
            " 'to talking a out it you a o gray way you look like a Michael Jackson a bit '\n",
            " 'me you look a like you pipples know can you teach me how to dress how can I '\n",
            " 'be like you do son that you need loosen up the bird let them squeeze them '\n",
            " \"pull them down don't hold em it like a funde oat like ya go onota like a tee \"\n",
            " 'he you and you can do shut your hurkies down a a ha a if a you no o no in my '\n",
            " 'antithe you know your i woc i us you listen to i are like her very much a '\n",
            " 'corky bouquet you know coke bouquet being bung bing bung bing di le ten then '\n",
            " 'the nbing can bin an min the then then they are poo t to ye can you teach me '\n",
            " 'speaker like you I raniside bretkin or yiner how you say how do you do was a '\n",
            " \"wig and was sat with it by seen the softeno on s what's up with his vanilla \"\n",
            " \"face A mere mahomeasamat just parked a slab outside. We're looking for some \"\n",
            " 'way to post up a black asses for the night. So the bang bang a skit skid '\n",
            " 'nigger when just a couple of pimps on how you get leave on her leave now in '\n",
            " \"kalkavs might be taken out good morning we may do Men'er tiu shuteimmon bon \"\n",
            " 'giasgo I rater thinkyou have a romboo dony yes ye yes definitely a Come on '\n",
            " 'in see to my own moni for a around aldi a yard of the house all the '\n",
            " \"paintings in the house I did what does this ma'am this is a Yama jew and \"\n",
            " \"he's working on a piece of jewelry. The Yemanites were also jewellers. Why \"\n",
            " \"you have a picture of Duke Because I'm selfish to a lots of pictures on jobs \"\n",
            " 'and this is the room and as do you need two pills yes great I thank you. A '\n",
            " 'lovely place bholinch was how you you know the sante moh misi iscornin '\n",
            " 'elected bad a thinks a if way down and you go hollo how are you are a '\n",
            " \"sitting sano lon This is an especially sandwich for you and yes you're not \"\n",
            " 'so hungry there Can I eat this? The fat so low you go you bone a hit at a '\n",
            " \"half you'll take a ha a to can to hold and then you'll see no I'm not so \"\n",
            " \"hungry. if he eats a little bit you'have to eat something because you're \"\n",
            " \"hungry and as your guest and I don't want to see you go hungry Us: What is \"\n",
            " 'this picture over here who yes it is Three in the morning I am in a nest of '\n",
            " 'news. They have cleverly shifted their shapes. One of them has taken the '\n",
            " 'form of a little old woman you can barely see her horns she have tried to '\n",
            " 'poison already. These rats are very clever esin note osby Mona deservinghopi '\n",
            " 'most eivitive be if i same mysho an im me to ye the way make et docto geto '\n",
            " 'vichur that if all e en t a time a to go to pay baa p tepo o bo o a a ooe o '\n",
            " 'e teo e e e e e e e satyt mige or yor duties to regulate to bottom Montacaa '\n",
            " 'California trades California as California Timota cama catifornia an e red '\n",
            " 'caras trri alldagaza has the ceraon limitation. What is the gun to defend me '\n",
            " 'from a few? I would a recommend either a nine millimeter or a forty five. '\n",
            " 'very nice Wawwowo it like as a movie star a dirty herald you come on and '\n",
            " \"make my day you but he would not sell me gun since I'm not american thing \"\n",
            " 'more so I look for the protection. What type of dog is that? This is a '\n",
            " \"tortoise? Is this a cat in a hat? Know it's a tortoise and a shell. Yes I \"\n",
            " 'needed the animal for protection. What do you have for me We a let eteeop to '\n",
            " 'e golden eiiisao d sai e am then or she was sitting one day o a tee where to '\n",
            " 'write a a i bag at wund e gue we see the ete this man a o me the risk that '\n",
            " \"it's to a curious do did or e happy times We were safe and well on our way \"\n",
            " 'to Paella it was time to get back to work. Hima gidisul manketant menketant '\n",
            " 'zartcaza joofurtitulo hima du bidietas e sotet hachtonimobipangielsko jaman '\n",
            " 'but indulges to suicides that answered consider Beshman at a nice meet. You '\n",
            " \"know it's so nice to meet you. Welcome to America. Will you please teach me \"\n",
            " \"how to Done like gentlemen? Of course I'll be happy to. Is it polite to \"\n",
            " 'greet people when I make entry? Yes it is. Let me introduce you ye a round a '\n",
            " \"yugan I say they like Jerod Hello I'm Bethany Western Lovenes How you do how \"\n",
            " \"you do mym bed Should I pay interest in a people's around the table. Sides: \"\n",
            " 'Yes. and if it is a big table a very long table you might want to restrict '\n",
            " 'your conversation yes to people right in your vicinity. Very nice so you are '\n",
            " \"not yelling at what do you do. I'm the pastor of a church yes what do you \"\n",
            " \"do? I have tried s in construction and I've recently retired. You are to a \"\n",
            " \"yes and have musical orment though I also do not retort i i don't work in if \"\n",
            " 'not toward stop working is a very good you man allowed rats to eat a with '\n",
            " \"you in the same place. That's not what we're saying about this man. He is \"\n",
            " 'not what you would referred to as reward. No no no lo not at all. Do you '\n",
            " 'have a telephone in this village and should I show photos of my family you '\n",
            " 'have photos of your family wonderful at this and my favorite son a Huelois. '\n",
            " \"oh you yes he looked eyes he have'very strong my goodness is that him \"\n",
            " 'holding yes over he grew a three centimetre he now a seventeen a centimetre '\n",
            " \"long away I'M not sure I would show these photos of him without clothes on \"\n",
            " 'Should I pay compliments to the peoples yes but only if you truly agree with '\n",
            " 'that compliment. You have very gentle face and a very erotic physique. '\n",
            " \"You're correct, yes that's a very good observation. She is your wife so \"\n",
            " \"that's my one in my country they would go crazy alafor these two enjoy so \"\n",
            " 'much Yewawa Wilar what would I say if I need to go to the school and you '\n",
            " 'mean to the restaurant to the place to make the rim to go wet the not a bus '\n",
            " 'to make her dirty no does rip below the toilet when you may a yes you '\n",
            " \"understand her bad ye bad's bad siing for yes and what you do is you say \"\n",
            " 'excuse me I need to go to the restaurant an excuse me Is it possible to to '\n",
            " 'do a yun all beauty how you say what you not ears and you owe you a say s me '\n",
            " 'a most goes behind me to see on goes you say hank as you go upstairs? Yes '\n",
            " 'thank you I think that the cultural differences are vast he said and I think '\n",
            " \"he's a delightful man and it wouldn't take very much time for him to really \"\n",
            " 'become Americanized very much I feel much better sell why but I shall like '\n",
            " \"to doesity just t let my don't for this ol can be maybe in the other rat in \"\n",
            " 'the other rooms from being in there war excuse me excuse me for just a '\n",
            " 'moment please you row loss like this and you wipe your bottom and you put '\n",
            " \"the paper by you. Why I know I don't you do this is as a very sthankg the \"\n",
            " 'host to clear you a pass to the father. no no no no no no o no nobody '\n",
            " 'touches you except you can I bring a guest to dinner if you have been '\n",
            " 'invited to a home or to a party? Yes it is acceptable to bring a guest if '\n",
            " 'you ask your host in advance. Yes to jack go a wing away if you please my '\n",
            " 'friend or are you looking for both Ye s is me oh ho sir a lunout a ho here '\n",
            " 'this at my friend. The lunch an oh o ky an u o O o did huge Well we were I I '\n",
            " \"don't know ea way oh that we're doing it are getting very very late. Excuse \"\n",
            " \"me a moment ago on key it's getting it very late and it it's time that the a \"\n",
            " \"you know that we were ending our dinner party and everything. Ah but can't \"\n",
            " 'she come for desserts? Absolutely not and neither can you assure another '\n",
            " 'name teas or a al you and a I o epac whayocoll boys have the act that escape '\n",
            " \"I Must say you're very sorry if they to to these folks thank you. I was \"\n",
            " \"thinking maybe I'd just take the night after. don't we just go out and have \"\n",
            " 'some fun. What do you think about that? There was the bobor time to be so '\n",
            " 'aee ei me m mwi sti sa te e is to study easpec. My mother Mamoran leads my '\n",
            " 'brother me me as she is a brush juke wrote how a funny on themselves '\n",
            " 'everybody almost sees her on the pens never rode a bull before and will you '\n",
            " 'want and you want to come here for a little while? I would like her very '\n",
            " 'much but that you love me so may inmellabl it would not be nice to her '\n",
            " \"favorite pay Well if you're ever in town again that's why you know Oglia \"\n",
            " 'forgive her. In town again alone I Would very much like to pay you for sex A '\n",
            " 'shy in tat len es goo night poor rat go her otebyou say my name r Dorrat '\n",
            " 'House Borraco Billy Obob Met Bob by Idi late Tottealaaas if fairly simply '\n",
            " \"called. She frequently explained there's not the whole load of logic either \"\n",
            " 'way I lose my life or very spontaneous A very spontaneous to to a baabo tabo '\n",
            " 'taa o the souaabababo baby bee the saro mate me I needed a gift to give to '\n",
            " 'Paella so that she would grant me entry into her won. Therefore a convinced '\n",
            " \"Azamat to let me file a report in an American store. Here's the Governance: \"\n",
            " 'Administrative: A Visitor shops. Why? This is my antique shop. Why do you '\n",
            " \"have up so many things with a flag when we're honouring our heritage Elm? \"\n",
            " 'What A male here that does this. Recognize a number of collectibles. A mean '\n",
            " 'this is a this is a lamp you know just use in your home. This is a Chinese '\n",
            " 'closing a bell and this is a just a a little decorative duck. And the doyou '\n",
            " \"is can you know when they said only that we need help at back? No's hooky \"\n",
            " \"and I search so I will er can with this. I don't worry my friend, he can \"\n",
            " \"make a glue and don't think you're going to be able to glue stuff back. I \"\n",
            " \"think you're going to have to pay for it or k I have a digital watcher from \"\n",
            " 'a future. I will give you esam Worse more than all of this, You broke four '\n",
            " 'hundred twenty five hours worth of stuff. Hundred and sixty undredand '\n",
            " \"seventy hundred and eighty. It's not enough. Do you want the hair so I don't \"\n",
            " 'want any damn hair? This is the best hair in the cusacs that pull bolli I '\n",
            " \"don't want your dam hair this a hair from pubi. we don't get to do two \"\n",
            " \"thousand bags by next Sia we don't use that s t suffer in this country nt is \"\n",
            " \"te teineoal. Why the look on this? ye give me another twenty that's good \"\n",
            " 'enough Has it Zi bahomona av a dom me onte I say teres a bacomanee in the '\n",
            " 'island mash o er o if he jes o o es o ha e la i e e i ho a e all it you look '\n",
            " 'at to go ththe now has to e to t chug up the tendeti i ati ed little fosete '\n",
            " 'pussy there be a g la ha me me te e be blany bandy is I it If Bill of Focus '\n",
            " 'site dancot Nishprianik is connected with the Bowl of Drama Go too Mixed '\n",
            " \"yagyvitch Lithium anyone? aao you've heard that Peter A is ju si ten e thi e \"\n",
            " 'o te eo e a fo e e a i wit ete the a o o tey o te t t can yes to one each '\n",
            " 'cover as an bn of fotosa n e bies an a o to fe to ten on of lutosima '\n",
            " \"california each to lie a an a t I forget tat es be eson'it wel war is en \"\n",
            " 'some bio to te e e e i bak di er yes ci ather i i do t a tti de go an o ea t '\n",
            " 'e e s te e te la la to at b by by and find giiaa fio fo a let em go go o '\n",
            " 'ahead o the igm ae i place a liee ei e ocu eea live o a o a ane a an ot gaal '\n",
            " 'ous cobar a a cunam couna cne made as crok pi smell bos each goes on his '\n",
            " 'crate and a lardis a pacenta heas cheboset by unchanty benzene in his bed of '\n",
            " 'yecca bin one that he as cohelca pone ewila ea coelc coner o a aoer why he '\n",
            " 'had smoke hot at it on food suckol costs alive get bit about of it got to a '\n",
            " 'dish u i u o i o in tai am tots all you must go out by little sea foul the '\n",
            " 'it as we come here and from o t at t you been in me ya we have a special '\n",
            " 'guest here this evening now Truth vaner is wer pe e e de d nu jus yet wo de '\n",
            " 'e es e aa a e te aa o te o i the clack e y ooea e e la a g goo o o do i to '\n",
            " 'bot yani i i ada so im business as a man also awake up he disappear and he '\n",
            " 'takes cancer beer army and he also decided to take common and also my '\n",
            " 'passport and then he leave me all this bag misses him and the ticket '\n",
            " \"cossacks can with no passports but at least he's a man enough to leave me \"\n",
            " 'near my beautiful eye which I can have cleaned since last night and the I '\n",
            " 'decide to continue making the comments make it without ashamed I think we '\n",
            " 'will be better and will have a more successor without him. Spin queen she '\n",
            " 'saw and I really want the seven fine cents in I I had no car, no money and '\n",
            " 'no azamat. The only thing keep me going was my dream of one day holding '\n",
            " 'Paella in my arms and then making room explosion on her stomach eventually '\n",
            " 'and managed to hike a kitchen with group of young scholars also travelling '\n",
            " 'across a country I place you av aaw you n and how you know in go to a ton i '\n",
            " 'hear from makasaks then a half a sea lock in ho and a as ye go as I is know '\n",
            " \"what's your name we anybody Antonijev anatomy an question fashion and divide \"\n",
            " 'everyday avoid a very nice A can you open this place or thank you ers owe ya '\n",
            " \"like the bitches out there in the Afakina oldsrush up there. Why mine's an \"\n",
            " \"old robber of the water mad and ran to Imemman's the faket whose baby after \"\n",
            " \"the girl see who too shut out of her and a time call em again why you don't \"\n",
            " \"call them because they do n't have a telephone desk no not cause or that \"\n",
            " \"they don've never done us at all I mean so so what are you doing it in \"\n",
            " 'America Why areyou wantare you doing I say you find me to travel across a '\n",
            " \"yet you is said they do No one is a at why that tell let's go course an \"\n",
            " 'place and break that do u or The olbai baby on me time years old is to the '\n",
            " \"baby to go at soak Sa when we tell you game we play you'no here gain you if \"\n",
            " 'we play game Code: When the snake eat the pig when the snake eat the half '\n",
            " 'the snake eat the pig you get a baby mouse in no and you are ever put a bit '\n",
            " 'of cheese in the whole of your crumb until let it go inside the graty for me '\n",
            " \"I'll do it I'also a park I'll do it. Let me ask you this. Our women or women \"\n",
            " 'your slaves in Russia No do you have a slaves here We marry slaves know if '\n",
            " 'it is a shame and bodac bot asham he exchamed it would be better Carasaiv '\n",
            " 'ter yes have a body and r do to be sure slaves our country the minorities '\n",
            " 'actually have more power. Any one that is minority has the the E upper hand '\n",
            " \"way. the yes we've anybon that's against the main stream they want to see. \"\n",
            " 'The new wife lies is my new wife so no now for long I will take her vantin '\n",
            " \"for the far. I'm going to put the ship on board to boa lo wi so we are I \"\n",
            " 'will take her mertin and will uncock her but but I no she is no version yes '\n",
            " 'not so the old version. Onto a liar liar lie your pancis as if I got a sea '\n",
            " \"at small tacos the other small yacht what's she think she's stuck in some \"\n",
            " \"deck master sucgon duck this's not her I guarantee that she men has no man \"\n",
            " \"hats around for us but I ask She's not to come to her sorry I'm sure sir and \"\n",
            " \"go to do so. if if I aren't going over wiedi you'd mama di Oman your mamae \"\n",
            " \"no mesis t on born no er was a body on horse and if you know we'll remember \"\n",
            " \"you always but I you want ye wahoo hay why you're an American now you'll \"\n",
            " 'make it you you keep going. You are bigger than a woman. You know better '\n",
            " 'than a woman here. we will always be behind you. Do not let a woman will '\n",
            " 'ever I begin ever make ye who you are good for my friend Gofoter e do he go '\n",
            " \"on this He made the fuza it even grocantofralia Go don't live early perhaps \"\n",
            " 'aaatat to be an noisy day baa bee ferry it is big if this is my tent that '\n",
            " \"o'coastal camp me in a decade but mine is we're a christian nicen now we \"\n",
            " 'were christian nicen in the beginning and we can always be a christian nicen '\n",
            " \"in till good lord return no I diny vall laughed bumbleginn you didn't use to \"\n",
            " 'be a tadpole is what I is there word than it sit iiad n i a astan sa bele o '\n",
            " \"ce po sees as es to a eve o a e e v sne e e I don't care what the devil's \"\n",
            " \"done to you or what he's trying to do, all you got to do is step out of that \"\n",
            " \"owl nal and make your way down to this of what's have a little old town \"\n",
            " 'church right now finding somebody and by to it to come on sir and the mot to '\n",
            " 'my name o e o get the mot to my surname we e to be sad by plate Ladies and '\n",
            " 'gentlemen the gentlemen here standing long your next to me his name is black '\n",
            " 'would you greet him with a great big jees this night for just a couple of '\n",
            " 'digs. L Tetet I have no friends I am alone in this country. Nobody like me. '\n",
            " \"my only friend aasamat. He taken my money and my bear and he'd leave me \"\n",
            " 'alone. Not only then the woman I loved the reason I travel across the '\n",
            " 'country. she never do something terrible on a boat and now I can never '\n",
            " 'forgive her. Is there anybody who can help me one that can help you? If we '\n",
            " 'preached about the no cansus Do what Jesus like me absolutely Jesus loves '\n",
            " 'you. Do it less like you my sons. Jesus loves your soul. Do Jesus love him '\n",
            " 'that brother Bile He loves your brother bilo. Do a Jesus love my neighbour '\n",
            " 'Nusultan tuliakbai Yes he loves everybody. Nobody loves my neighbor Nusultan '\n",
            " 'tuliakbai. If can just heal pain that is in my heart So I die in the pain '\n",
            " 'that is in my heart. lift your hands and then in no worse of what you take '\n",
            " 'die if they is a bad that is a mention of give me o Massil I te go te pla t '\n",
            " 'ik i o a g a o e ba e a beat out ot ot o g t o a go o go dotgo go go a t be '\n",
            " 'go see o A to go I go a B feet a o a a a get o o u i i i ni o w o be b o to '\n",
            " \"a a a ini eio see I won't forgive baaa and if we go togae bonya and go to \"\n",
            " 'manetot go can a few t two eyes and together we will talk our do I Took a '\n",
            " 'bus to lose Angeles with some friends of Mister Jesus finally I had arrived '\n",
            " \"Happy times we a among mam almen baamenemen even i tribe o him e e e what's \"\n",
            " 'the mtt ats te mato foo i cu pel a g b b a e boti i ba te gt can te te bo '\n",
            " 'haan taboo ol tabo tea at the chat b lov o years again a fort and yes to it '\n",
            " 'so is how this a tary delava he got and have to i to the fat the villa to '\n",
            " 'iwi so she helon a day that the yon a tee ain might die go no goge beware '\n",
            " 'mov menaa had no mean else converts cacky mic to rescue four years ago but I '\n",
            " 'had not come to Hollywood to fight a man dressed as Hitler I had come to '\n",
            " 'make parallel afternoons my wife so i ga vazamat yes ki dave on me orum '\n",
            " 'doesta pucarenti gabernays yes his umnor bone o some mor bonim boterus teoe '\n",
            " 'this this i sinchlav bonus polorit korchiga Pamela Bane for vice genasula '\n",
            " 'drug for a jovunitive or gintaninerun to charge her in negatutunteri a often '\n",
            " 'exec ex cep sei i except eating a a o can a success why ispeticidis or inka '\n",
            " 'gilkegere va e gilkilgere had love another because they said they al or in '\n",
            " 'doctea magla will me of you in mrivetanileev maniet katena Tomella avolbev '\n",
            " 'signal her couscous selita seta either is designed to roviter gave of the '\n",
            " 'globiy a make at in time I the shape rock chap yet shumaghe having learnt a '\n",
            " 'many lessons from you s and as I will now teach America how to have a wooden '\n",
            " 'Kazaki style you find more chegi ecideekcide or a robot you know only about '\n",
            " 'o think nothi being tey you e a ea is chief Ioliver Folilla Anderson saito s '\n",
            " 'I e semen hallo a mane ganin in my name a borada sagteervaya son of baten '\n",
            " 'bala sagdio ante not of the ripest for my former husband of foxana sagdiavo '\n",
            " 'was daughter of Maliam Taraqa and as she bot of the ripest I make a is for '\n",
            " 'you that is a time or name and my name as a young name Paella Anderson A for '\n",
            " \"it's like the her this two day's do dear sir say ye I date v visit to the \"\n",
            " 'date to wedding is inside is a sarcab Paella will you marry me he a e her '\n",
            " 'her he know are ageing with la bessoic o o t to go as if s do it to ttosave '\n",
            " 'your own life he went to gutter twas the meal go to my ripes I never stole '\n",
            " \"you go away hike no mylie I'll give you your own flow ice or go o a a degree \"\n",
            " 'o up to see an anything on your bad look here her look her still inner '\n",
            " 'jumping your knees apart your knees up barbella I am not attracted to you '\n",
            " 'any more not as I was humiliated It was time for me to return to new work '\n",
            " 'where a ticket was waiting for me to fly back home. While I sat on the bus I '\n",
            " 'thought of my journey over the past three weeks, the great times, the good '\n",
            " 'times and the wet times. Mainly they were sweet times. I Had come to America '\n",
            " 'to learn lessons for Kasak stands. but what had a learned suddenly I '\n",
            " 'realized as I had learnt that if you chase a dream especially one with '\n",
            " 'plastic chests you can miss the real beauty in front of your eyes all come '\n",
            " 'back to my town of Cusek. Since I return there have much improvements. We no '\n",
            " \"longer hear the running of the new. it's cruel where christians now aaaaaa \"\n",
            " \"Gotan improved too pay don't dance and titeni tune and grade or go maous \"\n",
            " 'tokit it the titietietik ternu sutan duliaglie give a still car ho I get to '\n",
            " 'that she only gets it but will Everybody knows it was women to '\n",
            " \"gositattsand's s wife beautiful's to take forward of my film automobile lie \"\n",
            " 'suddenly teeeetetetetetee ettete ttetete I even desitate so petes county wo '\n",
            " 'a ta is a iin deges alexan as a ttacian at Go Tisa in the days a incin in '\n",
            " 'tons is left the two meat in six meters on these systems to die aainobobeyti '\n",
            " 'but simple inbu silly do tosta tosto obey as pa tea to shan duno taks o ti '\n",
            " 'to bos and sta tba o bois an e ta i stai aa o i o e o i asa i best in world '\n",
            " 'is in the dark by mantos ofes tatesas or in its eyes in the region in and '\n",
            " 'the go through as an a a a a tewe di tda ta tea o tin te o das o tin down o '\n",
            " 'my body of body is to be the best be of is is see i e i pe p pe p peepte t '\n",
            " 'come of evy good e i a et e no e t ti i go wood o the water but in ite ine i '\n",
            " 'e e e ni een anie e ei i e eei e en e ei ac in iite o ei eten e e en e ani e '\n",
            " 'enneeie e ei i e e o e eetete e elon eoetettetee i ooi o on o o e teteteee '\n",
            " 'ote i no ai pe ten an a t n on the to mai in n te o a n tten e t e ei e to '\n",
            " 'te to i en e ten e to o e e e e e ei e ton te to i e t e te on te on the toe '\n",
            " 'o tee to see t to write to ai e e te they do te en we an end bente and pe '\n",
            " 'ppe pi e p pa pe e pe i p e fed undestandaan o see ef fecof letters n a e e '\n",
            " 'et be e e go i e e ssipli e e e o e i e i e i e e e e o ei e e e e e e e e e '\n",
            " 'e e e ee an o an an never be i i e e e de i e e a doei e e n a oe o e e ei e '\n",
            " 'ena de e e en e ar mertrd aeteitiny I abo it do e i tey tiny tia e e tote et '\n",
            " 'y n e o e i e e e o eny tite elbet and in antho er i e te e e te to te e '\n",
            " 'tete te e e e te e e e e te e t e to te e te e te e te pi ye.')\n"
          ]
        }
      ],
      "source": [
        "#@title print contents of final transcript\n",
        "\n",
        "#@markdown mostly for demo purposes, usually it takes too much space\n",
        "from pathlib import Path\n",
        "_final_out = Path(\"/content/vid2cleantxt_demo/wav2vec2_sf_transcript/FULLY_COMPLETE\")\n",
        "text_files = [f for f in _final_out.iterdir() if f.is_file() and f.suffix == \".txt\"]\n",
        "\n",
        "for _txt in text_files:\n",
        "    print(f\"\\n\\nthe contents of transcribed file {_txt.name}:\\n\")\n",
        "    with open(_txt, 'r', encoding='utf-8', errors='ignore') as fi:\n",
        "        _text = fi.read()\n",
        "    pp.pprint(_text, indent=4)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d5xXtKzB3BaT"
      },
      "source": [
        "Not bad eh, given the video quality? Note that the above was completed with `facebook/hubert-xlarge-ls960-ft` but the official version will be with just `large` as the XL has heavy memory requirements.\n",
        "\n",
        "---"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "cellView": "form",
        "id": "OOoaNN5f5efD",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 394
        },
        "outputId": "741b7f6b-169f-4e9d-bad1-0e178dab9b64"
      },
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "  <style>\n",
              "    pre {\n",
              "        white-space: pre-wrap;\n",
              "    }\n",
              "  </style>\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "application/javascript": [
              "\n",
              "  for (rule of document.styleSheets[0].cssRules){\n",
              "    if (rule.selectorText=='body') {\n",
              "      rule.style.fontSize = '24px'\n",
              "      break\n",
              "    }\n",
              "  }\n",
              "  "
            ],
            "text/plain": [
              "<IPython.core.display.Javascript object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "\n",
            "----------------------------------- Script Complete -------------------------------\n",
            "2022-02-05 15:11:51.232372\n",
            "Transcription files + more in folder: \n",
            " /content/vid2cleantxt_demo/wav2vec2_sf_transcript\n",
            "More specifically, best transcriptions in: \n",
            " /content/vid2cleantxt_demo/wav2vec2_sf_transcript/FULLY_COMPLETE\n",
            "Metadata for each transcription located @ \n",
            " /content/vid2cleantxt_demo/wav2vec2_sf_metadata\n"
          ]
        }
      ],
      "source": [
        "#@title  Exit block\n",
        "increase_font()\n",
        "print(\n",
        "    \"\\n\\n----------------------------------- Script Complete -------------------------------\"\n",
        ")\n",
        "print(datetime.now())\n",
        "print(\"Transcription files + more in folder: \\n\", output_path_transcript)\n",
        "print(\"More specifically, best transcriptions in: \\n\", PL_out.get(\"SBD_dir\"))\n",
        "print(\"Metadata for each transcription located @ \\n\", output_path_metadata)"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "background_execution": "on",
      "collapsed_sections": [
        "9gqce-JlruPm",
        "mqtG5izgiOng"
      ],
      "machine_shape": "hm",
      "name": "rpunct + vid2cleantext-single-demo.ipynb",
      "provenance": [],
      "toc_visible": true,
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "31f98d21aaed4a87a4ccd63bdbf01ead": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_09ece41f2d6a4743aaecf5517bed2ea1",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_3467f71380c34575bec160ebfe823238",
              "IPY_MODEL_206a23908ad54504b31f826b37314dea",
              "IPY_MODEL_9cfaf034ca424fe39090d1b37b74ed82"
            ]
          }
        },
        "09ece41f2d6a4743aaecf5517bed2ea1": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "3467f71380c34575bec160ebfe823238": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_76ab8c2b27434d8d83d47668a69d5976",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_92bbd54f7ce3400192004bf58d9bdfff"
          }
        },
        "206a23908ad54504b31f826b37314dea": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_72399605b0ca46e992eecaaa2a9550ec",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 212,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 212,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_b792901613a94125b501e22dd5f75628"
          }
        },
        "9cfaf034ca424fe39090d1b37b74ed82": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_da485b1f80804186bb591c13d79aa247",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 212/212 [00:00&lt;00:00, 8.52kB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_7e67132b68ce453ab4b8834c6e07c428"
          }
        },
        "76ab8c2b27434d8d83d47668a69d5976": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "92bbd54f7ce3400192004bf58d9bdfff": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "72399605b0ca46e992eecaaa2a9550ec": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "b792901613a94125b501e22dd5f75628": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "da485b1f80804186bb591c13d79aa247": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "7e67132b68ce453ab4b8834c6e07c428": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "4fe9b0d2a6c64c0595c44426b5f39837": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_49e34a76d39d42e29c9f43773858ebcd",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_03b4542be9c14e93855c2ec29e49ac1e",
              "IPY_MODEL_c2ed413a6d0b4e3aa6355cfbdae0b671",
              "IPY_MODEL_770da7de020b43b099c6dd2e0ec672de"
            ]
          }
        },
        "49e34a76d39d42e29c9f43773858ebcd": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "03b4542be9c14e93855c2ec29e49ac1e": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_6b870d609c1e4858beb8c10bce86b326",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_9af5988a93a3439c828011ec6776bbec"
          }
        },
        "c2ed413a6d0b4e3aa6355cfbdae0b671": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_4eb6de456cab454582018b0737cb1037",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 138,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 138,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_127113fa48ec425ea61f2ce5336a6304"
          }
        },
        "770da7de020b43b099c6dd2e0ec672de": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_f871f9e4240a4d24a6c2aff0b271d899",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 138/138 [00:00&lt;00:00, 5.74kB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_0142e3dbe0094966a293f21278e9fdec"
          }
        },
        "6b870d609c1e4858beb8c10bce86b326": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "9af5988a93a3439c828011ec6776bbec": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "4eb6de456cab454582018b0737cb1037": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "127113fa48ec425ea61f2ce5336a6304": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "f871f9e4240a4d24a6c2aff0b271d899": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "0142e3dbe0094966a293f21278e9fdec": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "165bdf8275f948529e1064426d3c452c": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_9335982b07694e008df39007e677ce61",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_223f754aabc144caaf561448da376408",
              "IPY_MODEL_2108c2ac1d8f4c9b87d3a45aa77174e4",
              "IPY_MODEL_0e4ff28bdc25466e90376086424dde9a"
            ]
          }
        },
        "9335982b07694e008df39007e677ce61": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "223f754aabc144caaf561448da376408": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_74eb167f5fab41419655b5dfc00c95ba",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_7bc10533dec647499216a18f4ff26fb1"
          }
        },
        "2108c2ac1d8f4c9b87d3a45aa77174e4": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_48ea302cad404bb384a28f0e8f0c9c50",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 1487,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 1487,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_6468a57a5a57402a8bb1f332e476852d"
          }
        },
        "0e4ff28bdc25466e90376086424dde9a": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_b916ea576bf048b68dc88bfb8a63dab0",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 1.45k/1.45k [00:00&lt;00:00, 61.6kB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_73b406b0c74b4f59962e96f438d56422"
          }
        },
        "74eb167f5fab41419655b5dfc00c95ba": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "7bc10533dec647499216a18f4ff26fb1": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "48ea302cad404bb384a28f0e8f0c9c50": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "6468a57a5a57402a8bb1f332e476852d": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "b916ea576bf048b68dc88bfb8a63dab0": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "73b406b0c74b4f59962e96f438d56422": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "878f1fa01d534c4791aa578066d14c54": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_0d9839362fa54ee6af35cfc82f7623e3",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_9afb42e9e4d14cbdbeda0cad2b5262fe",
              "IPY_MODEL_77c9354d0c5c4cd2ae2e887e386405d2",
              "IPY_MODEL_31e9de950ea84d6b9f03525f5711ae27"
            ]
          }
        },
        "0d9839362fa54ee6af35cfc82f7623e3": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "9afb42e9e4d14cbdbeda0cad2b5262fe": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_02e957ea74674c62be68be9191b831f4",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_14b6eb78e78d431789b3dfb0db643c51"
          }
        },
        "77c9354d0c5c4cd2ae2e887e386405d2": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_0dcb882e268541c588a85d4e55a816db",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 292,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 292,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_c19992212e9145a9b8c2d7c79771a202"
          }
        },
        "31e9de950ea84d6b9f03525f5711ae27": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_4cc37056b58c4d8ebcd1f6df32e02c38",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 292/292 [00:00&lt;00:00, 12.1kB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_3d51b3ec19da416ca3d0ab3cba72a95c"
          }
        },
        "02e957ea74674c62be68be9191b831f4": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "14b6eb78e78d431789b3dfb0db643c51": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "0dcb882e268541c588a85d4e55a816db": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "c19992212e9145a9b8c2d7c79771a202": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "4cc37056b58c4d8ebcd1f6df32e02c38": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "3d51b3ec19da416ca3d0ab3cba72a95c": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "665e535c7f6a4d64a3611de1385e452a": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_a614dca8ab6a43b5b66ed0d34a5f139b",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_258ca760305f4e49925ff3dc7547b131",
              "IPY_MODEL_2421b997a20741bd8293ae45dbbd15fc",
              "IPY_MODEL_2954a506d2a34bab82206a86b46fc797"
            ]
          }
        },
        "a614dca8ab6a43b5b66ed0d34a5f139b": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "258ca760305f4e49925ff3dc7547b131": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_008fbc8313e54087b7c66feee4ee5a62",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_8a83e460cd88418abc7c9cec7c18bee7"
          }
        },
        "2421b997a20741bd8293ae45dbbd15fc": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_ada316855dc9488d870209945179b864",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 85,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 85,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_533285b01cd54a57ac9c0e00d784b959"
          }
        },
        "2954a506d2a34bab82206a86b46fc797": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_7d75f7a490d5465482b7bc8f45cf89fc",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 85.0/85.0 [00:00&lt;00:00, 2.17kB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_256be28c5fe147969ed8fa29838d5c28"
          }
        },
        "008fbc8313e54087b7c66feee4ee5a62": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "8a83e460cd88418abc7c9cec7c18bee7": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "ada316855dc9488d870209945179b864": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "533285b01cd54a57ac9c0e00d784b959": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "7d75f7a490d5465482b7bc8f45cf89fc": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "256be28c5fe147969ed8fa29838d5c28": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "d81e5dc024bf402588dfb0b369788337": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_3631c9c75b6841b3937e3d0664787892",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_5aac2516a9d14e028370dca43063fae6",
              "IPY_MODEL_515065138b1d4ba29a777eced14720d1",
              "IPY_MODEL_5fc9e7f3ee6d4d1fb5fd26239810f432"
            ]
          }
        },
        "3631c9c75b6841b3937e3d0664787892": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "5aac2516a9d14e028370dca43063fae6": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_96c10e1c32eb4be4a062cc59bcc4a52b",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Downloading: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_e38d31ab8ced4717b783a4ed536533aa"
          }
        },
        "515065138b1d4ba29a777eced14720d1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_2651917c8caa41deb59b391026303884",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 3850480983,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 3850480983,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_c5b0a68ebd8d4f66a80b51e06d6c7d45"
          }
        },
        "5fc9e7f3ee6d4d1fb5fd26239810f432": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_ac6841c42dc24fac94ddf2e42dae5358",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 3.59G/3.59G [01:09&lt;00:00, 58.9MB/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_8867dfc07002499ca5a7fa11a6cd5c4b"
          }
        },
        "96c10e1c32eb4be4a062cc59bcc4a52b": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "e38d31ab8ced4717b783a4ed536533aa": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "2651917c8caa41deb59b391026303884": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "c5b0a68ebd8d4f66a80b51e06d6c7d45": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "ac6841c42dc24fac94ddf2e42dae5358": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "8867dfc07002499ca5a7fa11a6cd5c4b": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "340c14d11c2d4d34842d6141d665b7be": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_eb5bec651ba14e9880dc130a9e7a7d74",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_6bcb31cf856449a78a2581ba003de6e5",
              "IPY_MODEL_bc9068b4129d4951b2eab47ce3537895",
              "IPY_MODEL_78344bd4fe31446f9587939877eda929"
            ]
          }
        },
        "eb5bec651ba14e9880dc130a9e7a7d74": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "6bcb31cf856449a78a2581ba003de6e5": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_9aa182af520d4ae6905861ff9b9328fa",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "Main Proc: \t: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_8b22d9eb337e4c988b2c8ff4320be5a2"
          }
        },
        "bc9068b4129d4951b2eab47ce3537895": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_a93fd56343754ff8992b90cadecf7e20",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 1,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 1,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_495deefc6f4e4cdea6acc148a8f445b8"
          }
        },
        "78344bd4fe31446f9587939877eda929": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_a41e102af7b84ca1a91c3284ee618393",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 1/1 [05:55&lt;00:00, 355.46s/it]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_8c28eec252324950ac0a8306b84764e4"
          }
        },
        "9aa182af520d4ae6905861ff9b9328fa": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "8b22d9eb337e4c988b2c8ff4320be5a2": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "a93fd56343754ff8992b90cadecf7e20": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "495deefc6f4e4cdea6acc148a8f445b8": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "a41e102af7b84ca1a91c3284ee618393": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "8c28eec252324950ac0a8306b84764e4": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "a3cbe27c93e942d7902f9f2c7cfb7fc3": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_06f724939ec241afb5d6d2d4a047d0fc",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_c8ab64b0c8c34881ac77c28f0a1b548e",
              "IPY_MODEL_6f333b1f92a44532a5e7c9afe3d5a56c",
              "IPY_MODEL_44d7ef781f974853873e60b28a8ccaf4"
            ]
          }
        },
        "06f724939ec241afb5d6d2d4a047d0fc": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "c8ab64b0c8c34881ac77c28f0a1b548e": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_7062a93bea06440792ca2192908a1f75",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "transcribing /content/Borat%20Cul...:\t: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_eed4f35ba61a4a74aa19390d9cef813f"
          }
        },
        "6f333b1f92a44532a5e7c9afe3d5a56c": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_c6a0abc706ef41dc8315a8ef1d41a903",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 336,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 336,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_a142afad3c21410094e463939bf346b6"
          }
        },
        "44d7ef781f974853873e60b28a8ccaf4": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_0fd569696a4b46ce9836c348c686f456",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 336/336 [05:43&lt;00:00,  1.28it/s]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_a796474b311648e48ceb0e9174841608"
          }
        },
        "7062a93bea06440792ca2192908a1f75": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "eed4f35ba61a4a74aa19390d9cef813f": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "c6a0abc706ef41dc8315a8ef1d41a903": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "a142afad3c21410094e463939bf346b6": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "0fd569696a4b46ce9836c348c686f456": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "a796474b311648e48ceb0e9174841608": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "75aa4b6a3ea54c3d988844c3273de063": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HBoxView",
            "_dom_classes": [],
            "_model_name": "HBoxModel",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "box_style": "",
            "layout": "IPY_MODEL_444ce3a2deab41ce8b66c9159523e681",
            "_model_module": "@jupyter-widgets/controls",
            "children": [
              "IPY_MODEL_f4fcb28aa63d44dd85098aa010296637",
              "IPY_MODEL_9684e2e83c9f4197849004e9fb70a643",
              "IPY_MODEL_1fa74bf1f8074a76a81cb8867b8e35ee"
            ]
          }
        },
        "444ce3a2deab41ce8b66c9159523e681": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "f4fcb28aa63d44dd85098aa010296637": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_a5a14bce22e845b0a6db048fc1b5c7c6",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": "spellcorrect_pipeline on transcriptions: 100%",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_e4cd028b52fc4c0fa821bb67e0e0ae6d"
          }
        },
        "9684e2e83c9f4197849004e9fb70a643": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "ProgressView",
            "style": "IPY_MODEL_7cbae2ece5b34951b86a01cd77cf7c51",
            "_dom_classes": [],
            "description": "",
            "_model_name": "FloatProgressModel",
            "bar_style": "success",
            "max": 1,
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": 1,
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "orientation": "horizontal",
            "min": 0,
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_1f364835b62c46e3a58141d21230f89a"
          }
        },
        "1fa74bf1f8074a76a81cb8867b8e35ee": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "HTMLView",
            "style": "IPY_MODEL_77331dbc2ce047c99ba41ac8272fe394",
            "_dom_classes": [],
            "description": "",
            "_model_name": "HTMLModel",
            "placeholder": "",
            "_view_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "value": " 1/1 [01:24&lt;00:00, 84.90s/it]",
            "_view_count": null,
            "_view_module_version": "1.5.0",
            "description_tooltip": null,
            "_model_module": "@jupyter-widgets/controls",
            "layout": "IPY_MODEL_f8dc12262bc44094898d0eeffda14386"
          }
        },
        "a5a14bce22e845b0a6db048fc1b5c7c6": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "e4cd028b52fc4c0fa821bb67e0e0ae6d": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "7cbae2ece5b34951b86a01cd77cf7c51": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "ProgressStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "bar_color": null,
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "1f364835b62c46e3a58141d21230f89a": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        },
        "77331dbc2ce047c99ba41ac8272fe394": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_view_name": "StyleView",
            "_model_name": "DescriptionStyleModel",
            "description_width": "",
            "_view_module": "@jupyter-widgets/base",
            "_model_module_version": "1.5.0",
            "_view_count": null,
            "_view_module_version": "1.2.0",
            "_model_module": "@jupyter-widgets/controls"
          }
        },
        "f8dc12262bc44094898d0eeffda14386": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_view_name": "LayoutView",
            "grid_template_rows": null,
            "right": null,
            "justify_content": null,
            "_view_module": "@jupyter-widgets/base",
            "overflow": null,
            "_model_module_version": "1.2.0",
            "_view_count": null,
            "flex_flow": null,
            "width": null,
            "min_width": null,
            "border": null,
            "align_items": null,
            "bottom": null,
            "_model_module": "@jupyter-widgets/base",
            "top": null,
            "grid_column": null,
            "overflow_y": null,
            "overflow_x": null,
            "grid_auto_flow": null,
            "grid_area": null,
            "grid_template_columns": null,
            "flex": null,
            "_model_name": "LayoutModel",
            "justify_items": null,
            "grid_row": null,
            "max_height": null,
            "align_content": null,
            "visibility": null,
            "align_self": null,
            "height": null,
            "min_height": null,
            "padding": null,
            "grid_auto_rows": null,
            "grid_gap": null,
            "max_width": null,
            "order": null,
            "_view_module_version": "1.2.0",
            "grid_template_areas": null,
            "object_position": null,
            "object_fit": null,
            "grid_auto_columns": null,
            "margin": null,
            "display": null,
            "left": null
          }
        }
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}