Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrm8488/6b3aac4f54930b2f64f8885ec3afffbb to your computer and use it in GitHub Desktop.
Save mrm8488/6b3aac4f54930b2f64f8885ec3afffbb to your computer and use it in GitHub Desktop.
smallBERTa_Pretraining.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "smallBERTa_Pretraining.ipynb",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true,
"authorship_tag": "ABX9TyPqbqeA0VB70ho26cHWVp02",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/aditya-malte/2d4f896f471be9c38eb4d723a710768b/smallberta_pretraining.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "V4OynugZvMG2",
"colab_type": "text"
},
"source": [
"# Pre-training SmallBERTa - A tiny model to train on a tiny dataset\n",
"(Using HuggingFace Transformers)<br>\n",
"Admittedly, while language modeling is associated with terabytes of data, not all of use have either the processing power nor the resources to train huge models on such huge amounts of data.\n",
"In this example, we are going to train a relatively small neural net on a small dataset (which still happens to have over 2M rows).\n",
"<br>\n",
"\n",
"The ***main purpose*** of this blog is not to achieve state-of-the-art performance on LM tasks but to show a simple idea of how the recent language_modeling.py script can be used to train a Transformer model from scratch.\n",
"\n",
"This very notebook can be extended to various esoteric use cases where general purpose pre-trained models fail to perform well. Examples include medical dataset, scientific literature, legal documentation, etc.\n",
"\n",
"Input:\n",
" 1. To the Tokenizer:<br>\n",
" LM data in a directory containing all samples in separate *.txt files.\n",
" \n",
" 2. To the Model:<br>\n",
" LM data split into:<br>\n",
" 1. train.txt <br>\n",
" 2. eval.txt \n",
" \n",
"Output:<br>\n",
" Trained Model weights(that can be used elsewhere) and Tensorboard logs"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5sHQ_tWig474",
"colab_type": "text"
},
"source": [
"## Install Dependencies"
]
},
{
"cell_type": "code",
"metadata": {
"id": "hPxoElNugaMu",
"colab_type": "code",
"outputId": "705e0776-70b3-4d51-a50c-b67e5f639997",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"#tokenizer working version --- 0.5.0\n",
"#transformer working version --- 2.5.0\n",
"!pip install transformers\n",
"!pip install tokenizers\n",
"!pip install tensorboard==2.1.0"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting transformers\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/04/58/3d789b98923da6485f376be1e04d59ad7003a63bdb2b04b5eea7e02857e5/transformers-2.5.0-py3-none-any.whl (481kB)\n",
"\r\u001b[K |▊ | 10kB 21.9MB/s eta 0:00:01\r\u001b[K |█▍ | 20kB 29.1MB/s eta 0:00:01\r\u001b[K |██ | 30kB 24.6MB/s eta 0:00:01\r\u001b[K |██▊ | 40kB 19.1MB/s eta 0:00:01\r\u001b[K |███▍ | 51kB 15.6MB/s eta 0:00:01\r\u001b[K |████ | 61kB 15.4MB/s eta 0:00:01\r\u001b[K |████▊ | 71kB 13.4MB/s eta 0:00:01\r\u001b[K |█████▍ | 81kB 12.9MB/s eta 0:00:01\r\u001b[K |██████▏ | 92kB 12.7MB/s eta 0:00:01\r\u001b[K |██████▉ | 102kB 12.9MB/s eta 0:00:01\r\u001b[K |███████▌ | 112kB 12.9MB/s eta 0:00:01\r\u001b[K |████████▏ | 122kB 12.9MB/s eta 0:00:01\r\u001b[K |████████▉ | 133kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████▌ | 143kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████▏ | 153kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████▉ | 163kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████▋ | 174kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████▎ | 184kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████ | 194kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████▋ | 204kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████▎ | 215kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████ | 225kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████▋ | 235kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████▎ | 245kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████ | 256kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████▊ | 266kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████████▍ | 276kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████ | 286kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████▊ | 296kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████▍ | 307kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████████ | 317kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████████▊ | 327kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████████████▌ | 337kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████████▏ | 348kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████████▉ | 358kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████████▌ | 368kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████████████▏ | 378kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████████████▉ | 389kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████████████████▌ | 399kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████████████▏ | 409kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████████████ | 419kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████████████▋ | 430kB 12.9MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▎ | 440kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████████████████████ | 450kB 12.9MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▋ | 460kB 12.9MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▎| 471kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 481kB 12.9MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 491kB 12.9MB/s \n",
"\u001b[?25hCollecting tokenizers==0.5.0\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/7e/1d/ea7e2c628942e686595736f73678348272120d026b7acd54fe43e5211bb1/tokenizers-0.5.0-cp36-cp36m-manylinux1_x86_64.whl (3.8MB)\n",
"\u001b[K |████████████████████████████████| 3.8MB 51.0MB/s \n",
"\u001b[?25hCollecting sacremoses\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/b4/7a41d630547a4afd58143597d5a49e07bfd4c42914d8335b2a5657efc14b/sacremoses-0.0.38.tar.gz (860kB)\n",
"\u001b[K |████████████████████████████████| 870kB 51.0MB/s \n",
"\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.21.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12)\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.17.5)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.28.1)\n",
"Requirement already satisfied: boto3 in /usr/local/lib/python3.6/dist-packages (from transformers) (1.11.15)\n",
"Collecting sentencepiece\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/74/f4/2d5214cbf13d06e7cb2c20d84115ca25b53ea76fa1f0ade0e3c9749de214/sentencepiece-0.1.85-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)\n",
"\u001b[K |████████████████████████████████| 1.0MB 50.1MB/s \n",
"\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20)\n",
"Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.12.0)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.0)\n",
"Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.14.1)\n",
"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2019.11.28)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.8)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)\n",
"Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.9.4)\n",
"Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.3.3)\n",
"Requirement already satisfied: botocore<1.15.0,>=1.14.15 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (1.14.15)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.6/dist-packages (from botocore<1.15.0,>=1.14.15->boto3->transformers) (2.6.1)\n",
"Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/lib/python3.6/dist-packages (from botocore<1.15.0,>=1.14.15->boto3->transformers) (0.15.2)\n",
"Building wheels for collected packages: sacremoses\n",
" Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for sacremoses: filename=sacremoses-0.0.38-cp36-none-any.whl size=884628 sha256=bfd64cc598a7e475f655abf031d4190a57d3ca64431f51d59dfb570f216a77f8\n",
" Stored in directory: /root/.cache/pip/wheels/6d/ec/1a/21b8912e35e02741306f35f66c785f3afe94de754a0eaf1422\n",
"Successfully built sacremoses\n",
"Installing collected packages: tokenizers, sacremoses, sentencepiece, transformers\n",
"Successfully installed sacremoses-0.0.38 sentencepiece-0.1.85 tokenizers-0.5.0 transformers-2.5.0\n",
"Requirement already satisfied: tokenizers in /usr/local/lib/python3.6/dist-packages (0.5.0)\n",
"Collecting tensorboard==2.1.0\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/40/23/53ffe290341cd0855d595b0a2e7485932f473798af173bbe3a584b99bb06/tensorboard-2.1.0-py3-none-any.whl (3.8MB)\n",
"\u001b[K |████████████████████████████████| 3.8MB 27.7MB/s \n",
"\u001b[?25hRequirement already satisfied: grpcio>=1.24.3 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (1.27.1)\n",
"Requirement already satisfied: numpy>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (1.17.5)\n",
"Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (0.4.1)\n",
"Requirement already satisfied: protobuf>=3.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (3.10.0)\n",
"Requirement already satisfied: wheel>=0.26; python_version >= \"3\" in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (0.34.2)\n",
"Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (0.9.0)\n",
"Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (1.12.0)\n",
"Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (45.1.0)\n",
"Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (1.7.2)\n",
"Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (2.21.0)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (1.0.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard==2.1.0) (3.2.1)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.1.0) (1.3.0)\n",
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard==2.1.0) (0.2.8)\n",
"Requirement already satisfied: cachetools<3.2,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard==2.1.0) (3.1.1)\n",
"Requirement already satisfied: rsa<4.1,>=3.1.4 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard==2.1.0) (4.0)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard==2.1.0) (2.8)\n",
"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard==2.1.0) (1.24.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard==2.1.0) (2019.11.28)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3,>=2.21.0->tensorboard==2.1.0) (3.0.4)\n",
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.1.0) (3.1.0)\n",
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.6/dist-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard==2.1.0) (0.4.8)\n",
"\u001b[31mERROR: tensorflow 1.15.0 has requirement tensorboard<1.16.0,>=1.15.0, but you'll have tensorboard 2.1.0 which is incompatible.\u001b[0m\n",
"Installing collected packages: tensorboard\n",
" Found existing installation: tensorboard 1.15.0\n",
" Uninstalling tensorboard-1.15.0:\n",
" Successfully uninstalled tensorboard-1.15.0\n",
"Successfully installed tensorboard-2.1.0\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cBcbCQoEg9cT",
"colab_type": "text"
},
"source": [
"## Fetch Data\n",
"We will be using a tiny dataset(The Examiner - SpamClickBait News) of around 3M rows from kaggle to train our model. The dataset also contains output labels which will be dropped and only the text shall be used. For convenience we are using the Kaggle API to direcltly download the data from Kaggle to save our time and efforts. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "AtFnApKwiGUb",
"colab_type": "code",
"outputId": "99c4c4e6-147a-46ae-91da-d89a148a6c0c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 169
}
},
"source": [
"import os\n",
"import getpass\n",
"\n",
"#For a kaggle username & key, just go to your kaggle account and generate key\n",
"#The JSON file so downloaded contains both of them\n",
"if(\"examine-the-examiner.zip\" not in os.listdir()):\n",
" print(\"Copy these two values from the JSON file so generated\")\n",
" os.environ['KAGGLE_USERNAME'] = getpass.getpass(prompt='Kaggle username: ') \n",
" os.environ['KAGGLE_KEY'] = getpass.getpass(prompt='Kaggle key: ')\n",
" !kaggle datasets download -d therohk/examine-the-examiner\n",
" !unzip /content/examine-the-examiner.zip"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Copy these two values from the JSON file so generated\n",
"Kaggle username: ··········\n",
"Kaggle key: ··········\n",
"Downloading examine-the-examiner.zip to /content\n",
" 86% 123M/142M [00:00<00:00, 132MB/s]\n",
"100% 142M/142M [00:00<00:00, 163MB/s]\n",
"Archive: /content/examine-the-examiner.zip\n",
" inflating: examiner-date-text.csv \n",
" inflating: examiner-date-tokens.csv \n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IQ7hj9kuhBIj",
"colab_type": "text"
},
"source": [
"## Load and Preprocess data"
]
},
{
"cell_type": "code",
"metadata": {
"id": "HOG-fl1cGhJ4",
"colab_type": "code",
"colab": {}
},
"source": [
"import regex as re\n",
"def basicPreprocess(text):\n",
" try:\n",
" processed_text = text.lower()\n",
" processed_text = re.sub(r'\\W +', ' ', processed_text)\n",
" except Exception as e:\n",
" print(\"Exception:\",e,\",on text:\", text)\n",
" return None\n",
" return processed_text"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Fn68O17MsqYp",
"colab_type": "code",
"colab": {}
},
"source": [
"import pandas as pd\n",
"from tqdm import tqdm"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "iUtf-gZ_hWEE",
"colab_type": "text"
},
"source": [
"## Read and Prune the data\n",
"For our purpose we are going to read a subset (~200,000 samples) to train, just to see results quickly. Feel free to increase (or remove) this limitation. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "bj7Bo6hMiySr",
"colab_type": "code",
"outputId": "0886be29-864c-4e12-b4b0-28c4e23f88f1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 253
}
},
"source": [
"data = pd.read_csv(\"/content/examiner-date-text.csv\")\n",
"print(data)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
" publish_date headline_text\n",
"0 20100101 100 Most Anticipated books releasing in 2010\n",
"1 20100101 10 best films of 2009 - What's on your list?\n",
"2 20100101 10 days of free admission at Lan Su Chinese Ga...\n",
"3 20100101 10 PlayStation games to watch out for in 2010\n",
"4 20100101 10 resolutions for a Happy New Year for you an...\n",
"... ... ...\n",
"3089776 20151231 Which is better investment, Lego bricks or gol...\n",
"3089777 20151231 Wild score three unanswered goals to defeat th...\n",
"3089778 20151231 With NASA and Russia on the sidelines, Europe ...\n",
"3089779 20151231 Wolf Pack battling opponents, officials on the...\n",
"3089780 20151231 Writespace hosts all genre open mic night\n",
"\n",
"[3089781 rows x 2 columns]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1JaLDYtAnZIP",
"colab_type": "code",
"colab": {}
},
"source": [
"data = data.sample(frac=1).sample(frac=1)\n",
"data = data[:200000]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "qYYUOhiXhHP8",
"colab_type": "text"
},
"source": [
"### Before Preprocessing "
]
},
{
"cell_type": "code",
"metadata": {
"id": "8STrareTIxox",
"colab_type": "code",
"outputId": "3553e801-b2b4-463a-c7c2-03e1d1b6c500",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 253
}
},
"source": [
"print(data)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
" publish_date headline_text\n",
"618246 20100816 Triangle UFO low and silent over rural Deansbo...\n",
"1794117 20120420 Kevin Hart and 'Think Like a Man' co-stars lea...\n",
"3053438 20150920 Uma Thurman custody battle finally settled wit...\n",
"180273 20100313 Legislator confident of Health Care bill\n",
"938083 20101228 McDonald's ad in Spanish, provoking sparks\n",
"... ... ...\n",
"1737672 20120319 Washington Post: Obama has been lying to Ameri...\n",
"1780904 20120413 California retiree collects $227k Mega Million...\n",
"1614310 20120105 This Weekend at Miami Science Museum Laser Show\n",
"1565925 20111205 December 12th is National Poinsettia Day\n",
"1358212 20110731 Spartans' Cousins gives stirring, thought-prov...\n",
"\n",
"[200000 rows x 2 columns]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Z8md5U5tGx1J",
"colab_type": "code",
"colab": {}
},
"source": [
"data[\"headline_text\"] = data[\"headline_text\"].apply(basicPreprocess).dropna() #ignore exception if for empty/nan values"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "DCzsk_sVhLsi",
"colab_type": "text"
},
"source": [
"### After Preprocessing"
]
},
{
"cell_type": "code",
"metadata": {
"id": "wV8ysU3cI1a-",
"colab_type": "code",
"outputId": "b3962d3a-6d6a-468b-9534-d9d1dd6f457b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 253
}
},
"source": [
"print(data)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
" publish_date headline_text\n",
"618246 20100816 triangle ufo low and silent over rural deansbo...\n",
"1794117 20120420 kevin hart and 'think like a man co-stars lear...\n",
"3053438 20150920 uma thurman custody battle finally settled wit...\n",
"180273 20100313 legislator confident of health care bill\n",
"938083 20101228 mcdonald's ad in spanish provoking sparks\n",
"... ... ...\n",
"1737672 20120319 washington post obama has been lying to americ...\n",
"1780904 20120413 california retiree collects $227k mega million...\n",
"1614310 20120105 this weekend at miami science museum laser show\n",
"1565925 20111205 december 12th is national poinsettia day\n",
"1358212 20110731 spartans cousins gives stirring thought-provok...\n",
"\n",
"[200000 rows x 2 columns]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dbp40Xkrhs8l",
"colab_type": "text"
},
"source": [
"Removing newline characters just in case the input text has them. This is because the LineByLine class that we are going to use later assumes that samples are separated by newline"
]
},
{
"cell_type": "code",
"metadata": {
"id": "9dBFTDQnjXnE",
"colab_type": "code",
"colab": {}
},
"source": [
"data = data[\"headline_text\"]\n",
"data = data.replace(\"\\n\",\" \")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "gI1Tp54IiVBj",
"colab_type": "text"
},
"source": [
"## Train a custom tokenizer\n",
"I have used a ByteLevelBPETokenizer just to prevent \\<unk> tokens entirely.\n",
"Furthermore, the function used to train the tokenizer assumes that each sample is stored in a different text file."
]
},
{
"cell_type": "code",
"metadata": {
"id": "rs-wK-N1EACp",
"colab_type": "code",
"colab": {}
},
"source": [
"txt_files_dir = \"/tmp/text_split\"\n",
"!mkdir {txt_files_dir}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "QIvCE_svi7sQ",
"colab_type": "text"
},
"source": [
"Split LM data into individual files. These files are stored in /tmp/text_split and are used to train the tokenizer **only**."
]
},
{
"cell_type": "code",
"metadata": {
"id": "_2oI92Z0tyAp",
"colab_type": "code",
"outputId": "022fc930-6312-4e83-eb4f-24b68e0b0394",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"i=0\n",
"for row in tqdm(data.to_list()):\n",
" file_name = os.path.join(txt_files_dir, str(i)+'.txt')\n",
" try:\n",
" f = open(file_name, 'w')\n",
" f.write(row)\n",
" f.close()\n",
" except Exception as e: #catch exceptions(for eg. empty rows)\n",
" print(row, e) \n",
" i+=1"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"100%|██████████| 200000/200000 [00:09<00:00, 20693.63it/s]\n"
],
"name": "stderr"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "3r6RuiCBXIJy",
"colab_type": "code",
"colab": {}
},
"source": [
"from pathlib import Path\n",
"from tokenizers import ByteLevelBPETokenizer\n",
"from tokenizers.processors import BertProcessing\n",
"\n",
"\n",
"paths = [str(x) for x in Path(txt_files_dir).glob(\"**/*.txt\")]\n",
"\n",
"# Initialize a tokenizer\n",
"tokenizer = ByteLevelBPETokenizer()\n",
"\n",
"vocab_size=5000\n",
"# Customize training\n",
"tokenizer.train(files=paths, vocab_size=vocab_size, min_frequency=5, special_tokens=[\n",
" \"<s>\",\n",
" \"<pad>\",\n",
" \"</s>\",\n",
" \"<unk>\",\n",
" \"<mask>\",\n",
"])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "0bv78Z2UjIci",
"colab_type": "code",
"colab": {}
},
"source": [
"lm_data_dir = \"/tmp/lm_data\"\n",
"!mkdir {lm_data_dir}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "sI5kEwUojOQo",
"colab_type": "text"
},
"source": [
"## Split into Valdation and Train set\n",
"We split the train data into validation and train. These two files are used to train and evaluate our model"
]
},
{
"cell_type": "code",
"metadata": {
"id": "2nWv7Yuki66k",
"colab_type": "code",
"colab": {}
},
"source": [
"train_split = 0.9\n",
"train_data_size = int(len(data)*train_split)\n",
"\n",
"with open(os.path.join(lm_data_dir,'train.txt') , 'w') as f:\n",
" for item in data[:train_data_size].tolist():\n",
" f.write(\"%s\\n\" % item)\n",
"\n",
"with open(os.path.join(lm_data_dir,'eval.txt') , 'w') as f:\n",
" for item in data[train_data_size:].tolist():\n",
" f.write(\"%s\\n\" % item)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "UKaVWBiVTtEO",
"colab_type": "code",
"colab": {}
},
"source": [
"!mkdir /content/models\n",
"!mkdir /content/models/smallBERTa"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "noQfBUkhJmFC",
"colab_type": "code",
"outputId": "9deb334f-d4ec-45e3-df14-1c048a4890ba",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 50
}
},
"source": [
"tokenizer.save(\"/content/models/smallBERTa\", \"smallBERTa\")"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['/content/models/smallBERTa/smallBERTa-vocab.json',\n",
" '/content/models/smallBERTa/smallBERTa-merges.txt']"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "odSTiCM--4_p",
"colab_type": "code",
"colab": {}
},
"source": [
"!mv /content/models/smallBERTa/smallBERTa-vocab.json /content/models/smallBERTa/vocab.json\n",
"!mv /content/models/smallBERTa/smallBERTa-merges.txt /content/models/smallBERTa/merges.txt"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "naEJbZDjFnNo",
"colab_type": "code",
"colab": {}
},
"source": [
"train_path = os.path.join(lm_data_dir,\"train.txt\")\n",
"eval_path = os.path.join(lm_data_dir,\"eval.txt\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "P91yVQkXj9rc",
"colab_type": "text"
},
"source": [
"## Set Model Configuration\n",
"For our purpose, we are training a very small model for demo purposes"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XS4q1YtxZ2GW",
"colab_type": "code",
"colab": {}
},
"source": [
"import json\n",
"config = {\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.3,\n",
" \"hidden_size\": 128,\n",
" \"initializer_range\": 0.02,\n",
" \"num_attention_heads\": 1,\n",
" \"num_hidden_layers\": 1,\n",
" \"vocab_size\": vocab_size,\n",
" \"intermediate_size\": 256,\n",
" \"max_position_embeddings\": 256\n",
"}\n",
"with open(\"/content/models/smallBERTa/config.json\", 'w') as fp:\n",
" json.dump(config, fp)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "CbVBgrDbmVJ2",
"outputId": "160bd4f1-ae4b-474e-bb4f-19a8907d05e3",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 135
}
},
"source": [
"#%cd /content\n",
"!git clone https://github.com/huggingface/transformers.git"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Cloning into 'transformers'...\n",
"remote: Enumerating objects: 24, done.\u001b[K\n",
"remote: Counting objects: 100% (24/24), done.\u001b[K\n",
"remote: Compressing objects: 100% (23/23), done.\u001b[K\n",
"remote: Total 19858 (delta 5), reused 6 (delta 0), pack-reused 19834\u001b[K\n",
"Receiving objects: 100% (19858/19858), 11.95 MiB | 4.05 MiB/s, done.\n",
"Resolving deltas: 100% (14423/14423), done.\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EZMJ0zMxDIyc",
"colab_type": "text"
},
"source": [
"## Run training using the run_language_modeling.py examples script"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4kvkxHIk2Vgn",
"colab_type": "code",
"outputId": "7dbd97f4-e05b-4158-86a5-083818c57082",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 304
}
},
"source": [
"!nvidia-smi #just to confirm that you are on a GPU, if not go to Runtime->Change Runtime"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Fri Feb 21 12:17:21 2020 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 440.48.02 Driver Version: 418.67 CUDA Version: 10.1 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 41C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: GPU Memory |\n",
"| GPU PID Type Process name Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Hk2MUnKFV58z",
"colab_type": "code",
"colab": {}
},
"source": [
"#Setting environment variables\n",
"os.environ[\"train_path\"] = train_path\n",
"os.environ[\"eval_path\"] = eval_path\n",
"os.environ[\"CUDA_LAUNCH_BLOCKING\"]='1' #Makes for easier debugging (just in case)\n",
"weights_dir = \"/content/models/smallBERTa/weights\"\n",
"!mkdir {weights_dir}"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "6UJ_BSAlmccq",
"colab_type": "code",
"colab": {}
},
"source": [
"cmd = '''python /content/transformers/examples/run_language_modeling.py --output_dir {0} \\\n",
" --model_type roberta \\\n",
" --mlm \\\n",
" --train_data_file {1} \\\n",
" --eval_data_file {2} \\\n",
" --config_name /content/models/smallBERTa \\\n",
" --tokenizer_name /content/models/smallBERTa \\\n",
" --do_train \\\n",
" --line_by_line \\\n",
" --overwrite_output_dir \\\n",
" --do_eval \\\n",
" --block_size 256 \\\n",
" --learning_rate 1e-4 \\\n",
" --num_train_epochs 5 \\\n",
" --save_total_limit 2 \\\n",
" --save_steps 2000 \\\n",
" --logging_steps 500 \\\n",
" --per_gpu_eval_batch_size 32 \\\n",
" --per_gpu_train_batch_size 32 \\\n",
" --evaluate_during_training \\\n",
" --seed 42 \\\n",
" '''.format(weights_dir, train_path, eval_path)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jqhJzq03Fc15",
"colab_type": "code",
"outputId": "3a02319a-1040-457b-baf8-f5e4ed3c1e0e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"!{cmd}"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"\u001b[1;30;43mStreaming output truncated to the last 5000 lines.\u001b[0m\n",
"Evaluating: 96% 598/625 [00:04<00:00, 124.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 611/625 [00:04<00:00, 124.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 126.93it/s]\u001b[A\u001b[A\n",
"\n",
"\u001b[A\u001b[A02/21/2020 12:30:10 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:30:10 - INFO - __main__ - perplexity = tensor(873.4072)\n",
"\n",
"Iteration: 11% 628/5625 [00:31<44:27, 1.87it/s]\u001b[A\n",
"Iteration: 11% 632/5625 [00:31<31:46, 2.62it/s]\u001b[A\n",
"Iteration: 11% 636/5625 [00:31<22:55, 3.63it/s]\u001b[A\n",
"Iteration: 11% 640/5625 [00:31<16:43, 4.97it/s]\u001b[A\n",
"Iteration: 11% 644/5625 [00:31<12:21, 6.72it/s]\u001b[A\n",
"Iteration: 12% 648/5625 [00:31<09:18, 8.91it/s]\u001b[A\n",
"Iteration: 12% 652/5625 [00:31<07:10, 11.54it/s]\u001b[A\n",
"Iteration: 12% 656/5625 [00:31<05:41, 14.56it/s]\u001b[A\n",
"Iteration: 12% 660/5625 [00:31<04:41, 17.63it/s]\u001b[A\n",
"Iteration: 12% 664/5625 [00:32<03:56, 20.97it/s]\u001b[A\n",
"Iteration: 12% 668/5625 [00:32<03:24, 24.25it/s]\u001b[A\n",
"Iteration: 12% 672/5625 [00:32<03:05, 26.70it/s]\u001b[A\n",
"Iteration: 12% 676/5625 [00:32<02:52, 28.77it/s]\u001b[A\n",
"Iteration: 12% 680/5625 [00:32<02:40, 30.77it/s]\u001b[A\n",
"Iteration: 12% 684/5625 [00:32<02:32, 32.44it/s]\u001b[A\n",
"Iteration: 12% 688/5625 [00:32<02:26, 33.77it/s]\u001b[A\n",
"Iteration: 12% 692/5625 [00:32<02:21, 34.88it/s]\u001b[A\n",
"Iteration: 12% 696/5625 [00:32<02:20, 35.01it/s]\u001b[A\n",
"Iteration: 12% 700/5625 [00:33<02:18, 35.67it/s]\u001b[A\n",
"Iteration: 13% 704/5625 [00:33<02:16, 36.10it/s]\u001b[A\n",
"Iteration: 13% 708/5625 [00:33<02:16, 36.05it/s]\u001b[A\n",
"Iteration: 13% 712/5625 [00:33<02:15, 36.21it/s]\u001b[A\n",
"Iteration: 13% 716/5625 [00:33<02:14, 36.62it/s]\u001b[A\n",
"Iteration: 13% 720/5625 [00:33<02:14, 36.59it/s]\u001b[A\n",
"Iteration: 13% 724/5625 [00:33<02:13, 36.62it/s]\u001b[A\n",
"Iteration: 13% 728/5625 [00:33<02:14, 36.52it/s]\u001b[A\n",
"Iteration: 13% 732/5625 [00:33<02:16, 35.84it/s]\u001b[A\n",
"Iteration: 13% 736/5625 [00:34<02:18, 35.26it/s]\u001b[A\n",
"Iteration: 13% 740/5625 [00:34<02:17, 35.49it/s]\u001b[A\n",
"Iteration: 13% 744/5625 [00:34<02:16, 35.69it/s]\u001b[A\n",
"Iteration: 13% 748/5625 [00:34<02:15, 35.91it/s]\u001b[A\n",
"Iteration: 13% 752/5625 [00:34<02:14, 36.30it/s]\u001b[A\n",
"Iteration: 13% 756/5625 [00:34<02:14, 36.32it/s]\u001b[A\n",
"Iteration: 14% 760/5625 [00:34<02:11, 36.88it/s]\u001b[A\n",
"Iteration: 14% 764/5625 [00:34<02:12, 36.70it/s]\u001b[A\n",
"Iteration: 14% 768/5625 [00:34<02:13, 36.30it/s]\u001b[A\n",
"Iteration: 14% 772/5625 [00:35<02:12, 36.57it/s]\u001b[A\n",
"Iteration: 14% 776/5625 [00:35<02:13, 36.41it/s]\u001b[A\n",
"Iteration: 14% 780/5625 [00:35<02:11, 36.85it/s]\u001b[A\n",
"Iteration: 14% 784/5625 [00:35<02:11, 36.80it/s]\u001b[A\n",
"Iteration: 14% 788/5625 [00:35<02:11, 36.88it/s]\u001b[A\n",
"Iteration: 14% 792/5625 [00:35<02:10, 37.05it/s]\u001b[A\n",
"Iteration: 14% 796/5625 [00:35<02:10, 36.89it/s]\u001b[A\n",
"Iteration: 14% 800/5625 [00:35<02:08, 37.52it/s]\u001b[A\n",
"Iteration: 14% 804/5625 [00:35<02:09, 37.30it/s]\u001b[A\n",
"Iteration: 14% 808/5625 [00:35<02:11, 36.55it/s]\u001b[A\n",
"Iteration: 14% 812/5625 [00:36<02:11, 36.55it/s]\u001b[A\n",
"Iteration: 15% 816/5625 [00:36<02:15, 35.37it/s]\u001b[A\n",
"Iteration: 15% 820/5625 [00:36<02:15, 35.55it/s]\u001b[A\n",
"Iteration: 15% 824/5625 [00:36<02:15, 35.38it/s]\u001b[A\n",
"Iteration: 15% 828/5625 [00:36<02:16, 35.25it/s]\u001b[A\n",
"Iteration: 15% 832/5625 [00:36<02:13, 35.92it/s]\u001b[A\n",
"Iteration: 15% 836/5625 [00:36<02:13, 35.95it/s]\u001b[A\n",
"Iteration: 15% 840/5625 [00:36<02:11, 36.50it/s]\u001b[A\n",
"Iteration: 15% 844/5625 [00:36<02:13, 35.94it/s]\u001b[A\n",
"Iteration: 15% 848/5625 [00:37<02:11, 36.22it/s]\u001b[A\n",
"Iteration: 15% 852/5625 [00:37<02:11, 36.36it/s]\u001b[A\n",
"Iteration: 15% 856/5625 [00:37<02:11, 36.16it/s]\u001b[A\n",
"Iteration: 15% 860/5625 [00:37<02:10, 36.45it/s]\u001b[A\n",
"Iteration: 15% 864/5625 [00:37<02:11, 36.28it/s]\u001b[A\n",
"Iteration: 15% 868/5625 [00:37<02:10, 36.35it/s]\u001b[A\n",
"Iteration: 16% 872/5625 [00:37<02:08, 36.87it/s]\u001b[A\n",
"Iteration: 16% 876/5625 [00:37<02:08, 36.85it/s]\u001b[A\n",
"Iteration: 16% 880/5625 [00:37<02:11, 36.18it/s]\u001b[A\n",
"Iteration: 16% 884/5625 [00:38<02:09, 36.48it/s]\u001b[A\n",
"Iteration: 16% 888/5625 [00:38<02:11, 36.00it/s]\u001b[A\n",
"Iteration: 16% 892/5625 [00:38<02:10, 36.25it/s]\u001b[A\n",
"Iteration: 16% 896/5625 [00:38<02:08, 36.74it/s]\u001b[A\n",
"Iteration: 16% 900/5625 [00:38<02:07, 37.16it/s]\u001b[A\n",
"Iteration: 16% 904/5625 [00:38<02:08, 36.83it/s]\u001b[A\n",
"Iteration: 16% 908/5625 [00:38<02:06, 37.18it/s]\u001b[A\n",
"Iteration: 16% 912/5625 [00:38<02:10, 36.04it/s]\u001b[A\n",
"Iteration: 16% 916/5625 [00:38<02:14, 34.93it/s]\u001b[A\n",
"Iteration: 16% 920/5625 [00:39<02:13, 35.19it/s]\u001b[A\n",
"Iteration: 16% 924/5625 [00:39<02:12, 35.48it/s]\u001b[A\n",
"Iteration: 16% 928/5625 [00:39<02:12, 35.33it/s]\u001b[A\n",
"Iteration: 17% 932/5625 [00:39<02:11, 35.67it/s]\u001b[A\n",
"Iteration: 17% 936/5625 [00:39<02:10, 36.05it/s]\u001b[A\n",
"Iteration: 17% 940/5625 [00:39<02:10, 35.85it/s]\u001b[A\n",
"Iteration: 17% 944/5625 [00:39<02:09, 36.24it/s]\u001b[A\n",
"Iteration: 17% 948/5625 [00:39<02:07, 36.64it/s]\u001b[A\n",
"Iteration: 17% 952/5625 [00:39<02:06, 36.95it/s]\u001b[A\n",
"Iteration: 17% 956/5625 [00:40<02:08, 36.32it/s]\u001b[A\n",
"Iteration: 17% 960/5625 [00:40<02:07, 36.52it/s]\u001b[A\n",
"Iteration: 17% 964/5625 [00:40<02:07, 36.66it/s]\u001b[A\n",
"Iteration: 17% 968/5625 [00:40<02:07, 36.46it/s]\u001b[A\n",
"Iteration: 17% 972/5625 [00:40<02:07, 36.61it/s]\u001b[A\n",
"Iteration: 17% 976/5625 [00:40<02:07, 36.45it/s]\u001b[A\n",
"Iteration: 17% 980/5625 [00:40<02:06, 36.71it/s]\u001b[A\n",
"Iteration: 17% 984/5625 [00:40<02:06, 36.71it/s]\u001b[A\n",
"Iteration: 18% 988/5625 [00:40<02:05, 36.98it/s]\u001b[A\n",
"Iteration: 18% 992/5625 [00:41<02:06, 36.68it/s]\u001b[A\n",
"Iteration: 18% 996/5625 [00:41<02:08, 36.11it/s]\u001b[A\n",
"Iteration: 18% 1000/5625 [00:41<02:07, 36.21it/s]\u001b[A\n",
"Iteration: 18% 1004/5625 [00:41<02:10, 35.32it/s]\u001b[A\n",
"Iteration: 18% 1008/5625 [00:41<02:09, 35.54it/s]\u001b[A\n",
"Iteration: 18% 1012/5625 [00:41<02:08, 35.81it/s]\u001b[A\n",
"Iteration: 18% 1016/5625 [00:41<02:09, 35.66it/s]\u001b[A\n",
"Iteration: 18% 1020/5625 [00:41<02:07, 36.16it/s]\u001b[A\n",
"Iteration: 18% 1024/5625 [00:41<02:04, 36.81it/s]\u001b[A\n",
"Iteration: 18% 1028/5625 [00:42<02:06, 36.45it/s]\u001b[A\n",
"Iteration: 18% 1032/5625 [00:42<02:05, 36.71it/s]\u001b[A\n",
"Iteration: 18% 1036/5625 [00:42<02:06, 36.38it/s]\u001b[A\n",
"Iteration: 18% 1040/5625 [00:42<02:04, 36.89it/s]\u001b[A\n",
"Iteration: 19% 1044/5625 [00:42<02:02, 37.27it/s]\u001b[A\n",
"Iteration: 19% 1048/5625 [00:42<02:03, 37.02it/s]\u001b[A\n",
"Iteration: 19% 1052/5625 [00:42<02:03, 37.00it/s]\u001b[A\n",
"Iteration: 19% 1056/5625 [00:42<02:03, 36.96it/s]\u001b[A\n",
"Iteration: 19% 1060/5625 [00:42<02:03, 36.92it/s]\u001b[A\n",
"Iteration: 19% 1064/5625 [00:43<02:02, 37.09it/s]\u001b[A\n",
"Iteration: 19% 1068/5625 [00:43<02:04, 36.64it/s]\u001b[A\n",
"Iteration: 19% 1072/5625 [00:43<02:03, 36.96it/s]\u001b[A\n",
"Iteration: 19% 1076/5625 [00:43<02:05, 36.30it/s]\u001b[A\n",
"Iteration: 19% 1080/5625 [00:43<02:10, 34.87it/s]\u001b[A\n",
"Iteration: 19% 1084/5625 [00:43<02:06, 35.80it/s]\u001b[A\n",
"Iteration: 19% 1088/5625 [00:43<02:04, 36.37it/s]\u001b[A\n",
"Iteration: 19% 1092/5625 [00:43<02:03, 36.57it/s]\u001b[A\n",
"Iteration: 19% 1096/5625 [00:43<02:04, 36.45it/s]\u001b[A\n",
"Iteration: 20% 1100/5625 [00:44<02:04, 36.40it/s]\u001b[A\n",
"Iteration: 20% 1104/5625 [00:44<02:05, 36.03it/s]\u001b[A\n",
"Iteration: 20% 1108/5625 [00:44<02:05, 35.93it/s]\u001b[A\n",
"Iteration: 20% 1112/5625 [00:44<02:05, 35.87it/s]\u001b[A\n",
"Iteration: 20% 1116/5625 [00:44<02:06, 35.75it/s]\u001b[A\n",
"Iteration: 20% 1120/5625 [00:44<02:05, 35.82it/s]\u001b[A\n",
"Iteration: 20% 1124/5625 [00:44<02:06, 35.64it/s]\u001b[A02/21/2020 12:30:24 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:30:25 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:30:25 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:30:25 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:05, 122.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 124.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 123.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 125.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 67/625 [00:00<00:04, 127.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 81/625 [00:00<00:04, 128.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 123.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 124.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:04, 124.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 127.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 126.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 125.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 186/625 [00:01<00:03, 127.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 127.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 213/625 [00:01<00:03, 128.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 127.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 125.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:01<00:02, 125.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 126.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 278/625 [00:02<00:02, 127.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 291/625 [00:02<00:02, 123.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 304/625 [00:02<00:02, 125.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 318/625 [00:02<00:02, 127.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 332/625 [00:02<00:02, 128.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 346/625 [00:02<00:02, 129.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 359/625 [00:02<00:02, 128.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 372/625 [00:02<00:01, 127.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 385/625 [00:03<00:01, 127.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 399/625 [00:03<00:01, 128.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 412/625 [00:03<00:01, 128.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 426/625 [00:03<00:01, 130.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 440/625 [00:03<00:01, 127.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 453/625 [00:03<00:01, 128.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 466/625 [00:03<00:01, 127.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:03<00:01, 128.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:03<00:01, 125.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:03<00:00, 123.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 124.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 533/625 [00:04<00:00, 126.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 546/625 [00:04<00:00, 127.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 560/625 [00:04<00:00, 129.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 573/625 [00:04<00:00, 127.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 587/625 [00:04<00:00, 128.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 600/625 [00:04<00:00, 127.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 614/625 [00:04<00:00, 130.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.47it/s]\u001b[A\u001b[A02/21/2020 12:30:30 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:30:30 - INFO - __main__ - perplexity = tensor(867.5541)\n",
"02/21/2020 12:30:30 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-18000/config.json\n",
"02/21/2020 12:30:30 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-18000/pytorch_model.bin\n",
"02/21/2020 12:30:30 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-18000\n",
"02/21/2020 12:30:30 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-14000] due to args.save_total_limit\n",
"02/21/2020 12:30:30 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-18000\n",
"\n",
"Iteration: 20% 1128/5625 [00:51<38:38, 1.94it/s]\u001b[A\n",
"Iteration: 20% 1132/5625 [00:51<27:38, 2.71it/s]\u001b[A\n",
"Iteration: 20% 1136/5625 [00:51<19:56, 3.75it/s]\u001b[A\n",
"Iteration: 20% 1140/5625 [00:51<14:32, 5.14it/s]\u001b[A\n",
"Iteration: 20% 1144/5625 [00:51<10:47, 6.92it/s]\u001b[A\n",
"Iteration: 20% 1148/5625 [00:51<08:09, 9.14it/s]\u001b[A\n",
"Iteration: 20% 1152/5625 [00:51<06:17, 11.85it/s]\u001b[A\n",
"Iteration: 21% 1156/5625 [00:52<04:59, 14.91it/s]\u001b[A\n",
"Iteration: 21% 1160/5625 [00:52<04:06, 18.14it/s]\u001b[A\n",
"Iteration: 21% 1164/5625 [00:52<03:30, 21.16it/s]\u001b[A\n",
"Iteration: 21% 1168/5625 [00:52<03:05, 23.99it/s]\u001b[A\n",
"Iteration: 21% 1172/5625 [00:52<02:45, 26.93it/s]\u001b[A\n",
"Iteration: 21% 1176/5625 [00:52<02:34, 28.88it/s]\u001b[A\n",
"Iteration: 21% 1180/5625 [00:52<02:23, 30.97it/s]\u001b[A\n",
"Iteration: 21% 1184/5625 [00:52<02:16, 32.65it/s]\u001b[A\n",
"Iteration: 21% 1188/5625 [00:52<02:11, 33.62it/s]\u001b[A\n",
"Iteration: 21% 1192/5625 [00:53<02:08, 34.44it/s]\u001b[A\n",
"Iteration: 21% 1196/5625 [00:53<02:09, 34.13it/s]\u001b[A\n",
"Iteration: 21% 1200/5625 [00:53<02:09, 34.15it/s]\u001b[A\n",
"Iteration: 21% 1204/5625 [00:53<02:06, 34.90it/s]\u001b[A\n",
"Iteration: 21% 1208/5625 [00:53<02:04, 35.55it/s]\u001b[A\n",
"Iteration: 22% 1212/5625 [00:53<02:02, 35.99it/s]\u001b[A\n",
"Iteration: 22% 1216/5625 [00:53<02:01, 36.42it/s]\u001b[A\n",
"Iteration: 22% 1220/5625 [00:53<02:00, 36.65it/s]\u001b[A\n",
"Iteration: 22% 1224/5625 [00:53<02:00, 36.58it/s]\u001b[A\n",
"Iteration: 22% 1228/5625 [00:54<02:00, 36.48it/s]\u001b[A\n",
"Iteration: 22% 1232/5625 [00:54<01:59, 36.79it/s]\u001b[A\n",
"Iteration: 22% 1236/5625 [00:54<02:00, 36.50it/s]\u001b[A\n",
"Iteration: 22% 1240/5625 [00:54<02:00, 36.51it/s]\u001b[A\n",
"Iteration: 22% 1244/5625 [00:54<02:00, 36.37it/s]\u001b[A\n",
"Iteration: 22% 1248/5625 [00:54<02:01, 35.99it/s]\u001b[A\n",
"Iteration: 22% 1252/5625 [00:54<02:00, 36.23it/s]\u001b[A\n",
"Iteration: 22% 1256/5625 [00:54<01:59, 36.51it/s]\u001b[A\n",
"Iteration: 22% 1260/5625 [00:54<01:58, 36.69it/s]\u001b[A\n",
"Iteration: 22% 1264/5625 [00:55<01:59, 36.45it/s]\u001b[A\n",
"Iteration: 23% 1268/5625 [00:55<01:58, 36.69it/s]\u001b[A\n",
"Iteration: 23% 1272/5625 [00:55<01:57, 37.02it/s]\u001b[A\n",
"Iteration: 23% 1276/5625 [00:55<02:01, 35.78it/s]\u001b[A\n",
"Iteration: 23% 1280/5625 [00:55<02:01, 35.90it/s]\u001b[A\n",
"Iteration: 23% 1284/5625 [00:55<01:59, 36.33it/s]\u001b[A\n",
"Iteration: 23% 1288/5625 [00:55<01:59, 36.15it/s]\u001b[A\n",
"Iteration: 23% 1292/5625 [00:55<01:58, 36.56it/s]\u001b[A\n",
"Iteration: 23% 1296/5625 [00:55<01:58, 36.62it/s]\u001b[A\n",
"Iteration: 23% 1300/5625 [00:56<02:01, 35.47it/s]\u001b[A\n",
"Iteration: 23% 1304/5625 [00:56<02:00, 35.99it/s]\u001b[A\n",
"Iteration: 23% 1308/5625 [00:56<01:59, 36.26it/s]\u001b[A\n",
"Iteration: 23% 1312/5625 [00:56<02:01, 35.54it/s]\u001b[A\n",
"Iteration: 23% 1316/5625 [00:56<01:59, 35.95it/s]\u001b[A\n",
"Iteration: 23% 1320/5625 [00:56<02:01, 35.54it/s]\u001b[A\n",
"Iteration: 24% 1324/5625 [00:56<01:59, 35.88it/s]\u001b[A\n",
"Iteration: 24% 1328/5625 [00:56<01:58, 36.40it/s]\u001b[A\n",
"Iteration: 24% 1332/5625 [00:56<01:57, 36.66it/s]\u001b[A\n",
"Iteration: 24% 1336/5625 [00:57<01:55, 36.99it/s]\u001b[A\n",
"Iteration: 24% 1340/5625 [00:57<01:55, 37.09it/s]\u001b[A\n",
"Iteration: 24% 1344/5625 [00:57<01:55, 37.01it/s]\u001b[A\n",
"Iteration: 24% 1348/5625 [00:57<01:59, 35.75it/s]\u001b[A\n",
"Iteration: 24% 1352/5625 [00:57<01:58, 35.93it/s]\u001b[A\n",
"Iteration: 24% 1356/5625 [00:57<01:58, 35.97it/s]\u001b[A\n",
"Iteration: 24% 1360/5625 [00:57<01:57, 36.30it/s]\u001b[A\n",
"Iteration: 24% 1364/5625 [00:57<01:55, 36.84it/s]\u001b[A\n",
"Iteration: 24% 1368/5625 [00:57<01:55, 36.77it/s]\u001b[A\n",
"Iteration: 24% 1372/5625 [00:58<01:56, 36.42it/s]\u001b[A\n",
"Iteration: 24% 1376/5625 [00:58<01:56, 36.39it/s]\u001b[A\n",
"Iteration: 25% 1380/5625 [00:58<01:56, 36.56it/s]\u001b[A\n",
"Iteration: 25% 1384/5625 [00:58<01:58, 35.70it/s]\u001b[A\n",
"Iteration: 25% 1388/5625 [00:58<01:56, 36.29it/s]\u001b[A\n",
"Iteration: 25% 1392/5625 [00:58<01:55, 36.59it/s]\u001b[A\n",
"Iteration: 25% 1396/5625 [00:58<01:57, 36.13it/s]\u001b[A\n",
"Iteration: 25% 1400/5625 [00:58<01:56, 36.40it/s]\u001b[A\n",
"Iteration: 25% 1404/5625 [00:58<01:54, 36.77it/s]\u001b[A\n",
"Iteration: 25% 1408/5625 [00:59<01:54, 36.90it/s]\u001b[A\n",
"Iteration: 25% 1412/5625 [00:59<01:58, 35.68it/s]\u001b[A\n",
"Iteration: 25% 1416/5625 [00:59<01:57, 35.93it/s]\u001b[A\n",
"Iteration: 25% 1420/5625 [00:59<01:56, 36.04it/s]\u001b[A\n",
"Iteration: 25% 1424/5625 [00:59<01:59, 35.28it/s]\u001b[A\n",
"Iteration: 25% 1428/5625 [00:59<01:58, 35.55it/s]\u001b[A\n",
"Iteration: 25% 1432/5625 [00:59<01:56, 35.94it/s]\u001b[A\n",
"Iteration: 26% 1436/5625 [00:59<01:56, 35.91it/s]\u001b[A\n",
"Iteration: 26% 1440/5625 [00:59<01:55, 36.16it/s]\u001b[A\n",
"Iteration: 26% 1444/5625 [01:00<01:54, 36.43it/s]\u001b[A\n",
"Iteration: 26% 1448/5625 [01:00<01:54, 36.37it/s]\u001b[A\n",
"Iteration: 26% 1452/5625 [01:00<01:53, 36.66it/s]\u001b[A\n",
"Iteration: 26% 1456/5625 [01:00<01:53, 36.69it/s]\u001b[A\n",
"Iteration: 26% 1460/5625 [01:00<01:53, 36.58it/s]\u001b[A\n",
"Iteration: 26% 1464/5625 [01:00<01:52, 36.87it/s]\u001b[A\n",
"Iteration: 26% 1468/5625 [01:00<01:53, 36.62it/s]\u001b[A\n",
"Iteration: 26% 1472/5625 [01:00<01:54, 36.21it/s]\u001b[A\n",
"Iteration: 26% 1476/5625 [01:00<01:53, 36.61it/s]\u001b[A\n",
"Iteration: 26% 1480/5625 [01:01<01:52, 36.83it/s]\u001b[A\n",
"Iteration: 26% 1484/5625 [01:01<01:53, 36.60it/s]\u001b[A\n",
"Iteration: 26% 1488/5625 [01:01<01:52, 36.62it/s]\u001b[A\n",
"Iteration: 27% 1492/5625 [01:01<01:53, 36.51it/s]\u001b[A\n",
"Iteration: 27% 1496/5625 [01:01<01:54, 36.01it/s]\u001b[A\n",
"Iteration: 27% 1500/5625 [01:01<01:53, 36.42it/s]\u001b[A\n",
"Iteration: 27% 1504/5625 [01:01<01:53, 36.36it/s]\u001b[A\n",
"Iteration: 27% 1508/5625 [01:01<01:53, 36.43it/s]\u001b[A\n",
"Iteration: 27% 1512/5625 [01:01<01:54, 35.81it/s]\u001b[A\n",
"Iteration: 27% 1516/5625 [01:01<01:52, 36.46it/s]\u001b[A\n",
"Iteration: 27% 1520/5625 [01:02<01:51, 36.74it/s]\u001b[A\n",
"Iteration: 27% 1524/5625 [01:02<01:52, 36.61it/s]\u001b[A\n",
"Iteration: 27% 1528/5625 [01:02<01:50, 36.94it/s]\u001b[A\n",
"Iteration: 27% 1532/5625 [01:02<01:52, 36.43it/s]\u001b[A\n",
"Iteration: 27% 1536/5625 [01:02<01:51, 36.72it/s]\u001b[A\n",
"Iteration: 27% 1540/5625 [01:02<01:51, 36.64it/s]\u001b[A\n",
"Iteration: 27% 1544/5625 [01:02<01:51, 36.76it/s]\u001b[A\n",
"Iteration: 28% 1548/5625 [01:02<01:51, 36.48it/s]\u001b[A\n",
"Iteration: 28% 1552/5625 [01:02<01:52, 36.36it/s]\u001b[A\n",
"Iteration: 28% 1556/5625 [01:03<01:51, 36.38it/s]\u001b[A\n",
"Iteration: 28% 1560/5625 [01:03<01:51, 36.44it/s]\u001b[A\n",
"Iteration: 28% 1564/5625 [01:03<01:52, 36.22it/s]\u001b[A\n",
"Iteration: 28% 1568/5625 [01:03<01:51, 36.52it/s]\u001b[A\n",
"Iteration: 28% 1572/5625 [01:03<01:54, 35.39it/s]\u001b[A\n",
"Iteration: 28% 1576/5625 [01:03<01:52, 36.11it/s]\u001b[A\n",
"Iteration: 28% 1580/5625 [01:03<01:49, 36.83it/s]\u001b[A\n",
"Iteration: 28% 1584/5625 [01:03<01:51, 36.39it/s]\u001b[A\n",
"Iteration: 28% 1588/5625 [01:03<01:52, 35.79it/s]\u001b[A\n",
"Iteration: 28% 1592/5625 [01:04<01:53, 35.52it/s]\u001b[A\n",
"Iteration: 28% 1596/5625 [01:04<01:53, 35.60it/s]\u001b[A\n",
"Iteration: 28% 1600/5625 [01:04<01:52, 35.92it/s]\u001b[A\n",
"Iteration: 29% 1604/5625 [01:04<01:50, 36.31it/s]\u001b[A\n",
"Iteration: 29% 1608/5625 [01:04<01:51, 35.95it/s]\u001b[A\n",
"Iteration: 29% 1612/5625 [01:04<01:50, 36.21it/s]\u001b[A\n",
"Iteration: 29% 1616/5625 [01:04<01:49, 36.63it/s]\u001b[A\n",
"Iteration: 29% 1620/5625 [01:04<01:49, 36.61it/s]\u001b[A\n",
"Iteration: 29% 1624/5625 [01:04<01:48, 37.02it/s]\u001b[A02/21/2020 12:30:44 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:30:46 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:30:46 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:30:46 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 130.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 130.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 7% 41/625 [00:00<00:04, 129.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 54/625 [00:00<00:04, 129.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 68/625 [00:00<00:04, 129.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 81/625 [00:00<00:04, 129.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 95/625 [00:00<00:04, 129.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 108/625 [00:00<00:04, 127.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 20% 122/625 [00:00<00:03, 128.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 22% 135/625 [00:01<00:03, 128.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 24% 148/625 [00:01<00:03, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 26% 161/625 [00:01<00:03, 127.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 175/625 [00:01<00:03, 128.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 188/625 [00:01<00:03, 127.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 201/625 [00:01<00:03, 127.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 214/625 [00:01<00:03, 125.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 227/625 [00:01<00:03, 126.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 240/625 [00:01<00:03, 125.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 41% 254/625 [00:01<00:02, 127.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 43% 267/625 [00:02<00:02, 123.54it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 280/625 [00:02<00:02, 121.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 293/625 [00:02<00:02, 122.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 307/625 [00:02<00:02, 124.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 320/625 [00:02<00:02, 123.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 334/625 [00:02<00:02, 126.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 348/625 [00:02<00:02, 127.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 362/625 [00:02<00:02, 128.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 375/625 [00:02<00:01, 127.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 389/625 [00:03<00:01, 128.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 402/625 [00:03<00:01, 128.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 416/625 [00:03<00:01, 129.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 430/625 [00:03<00:01, 130.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 444/625 [00:03<00:01, 127.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 457/625 [00:03<00:01, 123.91it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 470/625 [00:03<00:01, 125.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 484/625 [00:03<00:01, 126.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 497/625 [00:03<00:01, 125.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 510/625 [00:04<00:00, 126.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 524/625 [00:04<00:00, 127.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 537/625 [00:04<00:00, 128.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 550/625 [00:04<00:00, 128.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 563/625 [00:04<00:00, 128.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 576/625 [00:04<00:00, 124.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 589/625 [00:04<00:00, 124.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 602/625 [00:04<00:00, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 616/625 [00:04<00:00, 127.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.01it/s]\u001b[A\u001b[A02/21/2020 12:30:51 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:30:51 - INFO - __main__ - perplexity = tensor(863.2436)\n",
"\n",
"Iteration: 29% 1628/5625 [01:11<35:01, 1.90it/s]\u001b[A\n",
"Iteration: 29% 1632/5625 [01:11<25:01, 2.66it/s]\u001b[A\n",
"Iteration: 29% 1636/5625 [01:11<18:02, 3.69it/s]\u001b[A\n",
"Iteration: 29% 1640/5625 [01:12<13:10, 5.04it/s]\u001b[A\n",
"Iteration: 29% 1644/5625 [01:12<09:44, 6.81it/s]\u001b[A\n",
"Iteration: 29% 1648/5625 [01:12<07:21, 9.01it/s]\u001b[A\n",
"Iteration: 29% 1652/5625 [01:12<05:40, 11.68it/s]\u001b[A\n",
"Iteration: 29% 1656/5625 [01:12<04:31, 14.63it/s]\u001b[A\n",
"Iteration: 30% 1660/5625 [01:12<03:42, 17.82it/s]\u001b[A\n",
"Iteration: 30% 1664/5625 [01:12<03:08, 20.96it/s]\u001b[A\n",
"Iteration: 30% 1668/5625 [01:12<02:43, 24.15it/s]\u001b[A\n",
"Iteration: 30% 1672/5625 [01:12<02:25, 27.08it/s]\u001b[A\n",
"Iteration: 30% 1676/5625 [01:13<02:14, 29.27it/s]\u001b[A\n",
"Iteration: 30% 1680/5625 [01:13<02:06, 31.15it/s]\u001b[A\n",
"Iteration: 30% 1684/5625 [01:13<02:00, 32.76it/s]\u001b[A\n",
"Iteration: 30% 1688/5625 [01:13<01:57, 33.48it/s]\u001b[A\n",
"Iteration: 30% 1692/5625 [01:13<01:58, 33.16it/s]\u001b[A\n",
"Iteration: 30% 1696/5625 [01:13<01:55, 33.99it/s]\u001b[A\n",
"Iteration: 30% 1700/5625 [01:13<01:54, 34.41it/s]\u001b[A\n",
"Iteration: 30% 1704/5625 [01:13<01:52, 34.82it/s]\u001b[A\n",
"Iteration: 30% 1708/5625 [01:13<01:51, 35.27it/s]\u001b[A\n",
"Iteration: 30% 1712/5625 [01:14<01:53, 34.51it/s]\u001b[A\n",
"Iteration: 31% 1716/5625 [01:14<01:56, 33.52it/s]\u001b[A\n",
"Iteration: 31% 1720/5625 [01:14<01:59, 32.69it/s]\u001b[A\n",
"Iteration: 31% 1724/5625 [01:14<02:01, 32.00it/s]\u001b[A\n",
"Iteration: 31% 1728/5625 [01:14<02:02, 31.77it/s]\u001b[A\n",
"Iteration: 31% 1732/5625 [01:14<02:04, 31.36it/s]\u001b[A\n",
"Iteration: 31% 1736/5625 [01:14<02:01, 31.94it/s]\u001b[A\n",
"Iteration: 31% 1740/5625 [01:14<01:55, 33.53it/s]\u001b[A\n",
"Iteration: 31% 1744/5625 [01:15<01:54, 33.82it/s]\u001b[A\n",
"Iteration: 31% 1748/5625 [01:15<01:57, 33.12it/s]\u001b[A\n",
"Iteration: 31% 1752/5625 [01:15<01:58, 32.72it/s]\u001b[A\n",
"Iteration: 31% 1756/5625 [01:15<02:01, 31.97it/s]\u001b[A\n",
"Iteration: 31% 1760/5625 [01:15<02:00, 32.02it/s]\u001b[A\n",
"Iteration: 31% 1764/5625 [01:15<02:00, 31.92it/s]\u001b[A\n",
"Iteration: 31% 1768/5625 [01:15<02:02, 31.57it/s]\u001b[A\n",
"Iteration: 32% 1772/5625 [01:15<01:59, 32.28it/s]\u001b[A\n",
"Iteration: 32% 1776/5625 [01:16<01:54, 33.75it/s]\u001b[A\n",
"Iteration: 32% 1780/5625 [01:16<01:50, 34.69it/s]\u001b[A\n",
"Iteration: 32% 1784/5625 [01:16<01:49, 35.11it/s]\u001b[A\n",
"Iteration: 32% 1788/5625 [01:16<01:47, 35.68it/s]\u001b[A\n",
"Iteration: 32% 1792/5625 [01:16<01:46, 36.01it/s]\u001b[A\n",
"Iteration: 32% 1796/5625 [01:16<01:45, 36.24it/s]\u001b[A\n",
"Iteration: 32% 1800/5625 [01:16<01:46, 36.03it/s]\u001b[A\n",
"Iteration: 32% 1804/5625 [01:16<01:46, 35.95it/s]\u001b[A\n",
"Iteration: 32% 1808/5625 [01:16<01:45, 36.22it/s]\u001b[A\n",
"Iteration: 32% 1812/5625 [01:17<01:48, 35.05it/s]\u001b[A\n",
"Iteration: 32% 1816/5625 [01:17<01:47, 35.43it/s]\u001b[A\n",
"Iteration: 32% 1820/5625 [01:17<01:46, 35.72it/s]\u001b[A\n",
"Iteration: 32% 1824/5625 [01:17<01:46, 35.74it/s]\u001b[A\n",
"Iteration: 32% 1828/5625 [01:17<01:44, 36.20it/s]\u001b[A\n",
"Iteration: 33% 1832/5625 [01:17<01:47, 35.38it/s]\u001b[A\n",
"Iteration: 33% 1836/5625 [01:17<01:47, 35.34it/s]\u001b[A\n",
"Iteration: 33% 1840/5625 [01:17<01:48, 34.88it/s]\u001b[A\n",
"Iteration: 33% 1844/5625 [01:17<01:49, 34.55it/s]\u001b[A\n",
"Iteration: 33% 1848/5625 [01:18<01:47, 35.27it/s]\u001b[A\n",
"Iteration: 33% 1852/5625 [01:18<01:44, 35.96it/s]\u001b[A\n",
"Iteration: 33% 1856/5625 [01:18<01:43, 36.39it/s]\u001b[A\n",
"Iteration: 33% 1860/5625 [01:18<01:43, 36.47it/s]\u001b[A\n",
"Iteration: 33% 1864/5625 [01:18<01:44, 35.86it/s]\u001b[A\n",
"Iteration: 33% 1868/5625 [01:18<01:44, 35.87it/s]\u001b[A\n",
"Iteration: 33% 1872/5625 [01:18<01:43, 36.22it/s]\u001b[A\n",
"Iteration: 33% 1876/5625 [01:18<01:45, 35.66it/s]\u001b[A\n",
"Iteration: 33% 1880/5625 [01:18<01:43, 36.24it/s]\u001b[A\n",
"Iteration: 33% 1884/5625 [01:19<01:43, 36.22it/s]\u001b[A\n",
"Iteration: 34% 1888/5625 [01:19<01:43, 36.24it/s]\u001b[A\n",
"Iteration: 34% 1892/5625 [01:19<01:42, 36.41it/s]\u001b[A\n",
"Iteration: 34% 1896/5625 [01:19<01:42, 36.21it/s]\u001b[A\n",
"Iteration: 34% 1900/5625 [01:19<01:42, 36.37it/s]\u001b[A\n",
"Iteration: 34% 1904/5625 [01:19<01:41, 36.67it/s]\u001b[A\n",
"Iteration: 34% 1908/5625 [01:19<01:41, 36.63it/s]\u001b[A\n",
"Iteration: 34% 1912/5625 [01:19<01:42, 36.05it/s]\u001b[A\n",
"Iteration: 34% 1916/5625 [01:19<01:41, 36.46it/s]\u001b[A\n",
"Iteration: 34% 1920/5625 [01:20<01:41, 36.66it/s]\u001b[A\n",
"Iteration: 34% 1924/5625 [01:20<01:41, 36.39it/s]\u001b[A\n",
"Iteration: 34% 1928/5625 [01:20<01:41, 36.32it/s]\u001b[A\n",
"Iteration: 34% 1932/5625 [01:20<01:41, 36.56it/s]\u001b[A\n",
"Iteration: 34% 1936/5625 [01:20<01:41, 36.30it/s]\u001b[A\n",
"Iteration: 34% 1940/5625 [01:20<01:40, 36.53it/s]\u001b[A\n",
"Iteration: 35% 1944/5625 [01:20<01:40, 36.66it/s]\u001b[A\n",
"Iteration: 35% 1948/5625 [01:20<01:43, 35.36it/s]\u001b[A\n",
"Iteration: 35% 1952/5625 [01:20<01:44, 35.03it/s]\u001b[A\n",
"Iteration: 35% 1956/5625 [01:21<01:44, 35.12it/s]\u001b[A\n",
"Iteration: 35% 1960/5625 [01:21<01:43, 35.29it/s]\u001b[A\n",
"Iteration: 35% 1964/5625 [01:21<01:43, 35.29it/s]\u001b[A\n",
"Iteration: 35% 1968/5625 [01:21<01:42, 35.69it/s]\u001b[A\n",
"Iteration: 35% 1972/5625 [01:21<01:41, 36.04it/s]\u001b[A\n",
"Iteration: 35% 1976/5625 [01:21<01:41, 36.06it/s]\u001b[A\n",
"Iteration: 35% 1980/5625 [01:21<01:40, 36.36it/s]\u001b[A\n",
"Iteration: 35% 1984/5625 [01:21<01:39, 36.63it/s]\u001b[A\n",
"Iteration: 35% 1988/5625 [01:21<01:41, 35.95it/s]\u001b[A\n",
"Iteration: 35% 1992/5625 [01:22<01:39, 36.34it/s]\u001b[A\n",
"Iteration: 35% 1996/5625 [01:22<01:39, 36.45it/s]\u001b[A\n",
"Iteration: 36% 2000/5625 [01:22<01:40, 36.25it/s]\u001b[A\n",
"Iteration: 36% 2004/5625 [01:22<01:38, 36.60it/s]\u001b[A\n",
"Iteration: 36% 2008/5625 [01:22<01:40, 35.81it/s]\u001b[A\n",
"Iteration: 36% 2012/5625 [01:22<01:41, 35.45it/s]\u001b[A\n",
"Iteration: 36% 2016/5625 [01:22<01:41, 35.50it/s]\u001b[A\n",
"Iteration: 36% 2020/5625 [01:22<01:40, 35.84it/s]\u001b[A\n",
"Iteration: 36% 2024/5625 [01:22<01:42, 35.20it/s]\u001b[A\n",
"Iteration: 36% 2028/5625 [01:23<01:41, 35.27it/s]\u001b[A\n",
"Iteration: 36% 2032/5625 [01:23<01:43, 34.86it/s]\u001b[A\n",
"Iteration: 36% 2036/5625 [01:23<01:42, 35.13it/s]\u001b[A\n",
"Iteration: 36% 2040/5625 [01:23<01:42, 35.05it/s]\u001b[A\n",
"Iteration: 36% 2044/5625 [01:23<01:43, 34.49it/s]\u001b[A\n",
"Iteration: 36% 2048/5625 [01:23<01:43, 34.45it/s]\u001b[A\n",
"Iteration: 36% 2052/5625 [01:23<01:41, 35.06it/s]\u001b[A\n",
"Iteration: 37% 2056/5625 [01:23<01:40, 35.37it/s]\u001b[A\n",
"Iteration: 37% 2060/5625 [01:23<01:41, 35.28it/s]\u001b[A\n",
"Iteration: 37% 2064/5625 [01:24<01:41, 35.01it/s]\u001b[A\n",
"Iteration: 37% 2068/5625 [01:24<01:40, 35.56it/s]\u001b[A\n",
"Iteration: 37% 2072/5625 [01:24<01:38, 35.97it/s]\u001b[A\n",
"Iteration: 37% 2076/5625 [01:24<01:38, 35.97it/s]\u001b[A\n",
"Iteration: 37% 2080/5625 [01:24<01:37, 36.37it/s]\u001b[A\n",
"Iteration: 37% 2084/5625 [01:24<01:36, 36.55it/s]\u001b[A\n",
"Iteration: 37% 2088/5625 [01:24<01:37, 36.35it/s]\u001b[A\n",
"Iteration: 37% 2092/5625 [01:24<01:36, 36.73it/s]\u001b[A\n",
"Iteration: 37% 2096/5625 [01:24<01:38, 35.95it/s]\u001b[A\n",
"Iteration: 37% 2100/5625 [01:25<01:37, 36.31it/s]\u001b[A\n",
"Iteration: 37% 2104/5625 [01:25<01:36, 36.45it/s]\u001b[A\n",
"Iteration: 37% 2108/5625 [01:25<01:36, 36.58it/s]\u001b[A\n",
"Iteration: 38% 2112/5625 [01:25<01:35, 36.71it/s]\u001b[A\n",
"Iteration: 38% 2116/5625 [01:25<01:35, 36.86it/s]\u001b[A\n",
"Iteration: 38% 2120/5625 [01:25<01:37, 35.90it/s]\u001b[A\n",
"Iteration: 38% 2124/5625 [01:25<01:41, 34.33it/s]\u001b[A02/21/2020 12:31:05 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:31:07 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:31:07 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:31:07 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 128.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 129.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 127.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 54/625 [00:00<00:04, 128.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 67/625 [00:00<00:04, 127.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 80/625 [00:00<00:04, 127.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 127.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 126.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:03, 126.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 128.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 127.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 127.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 127.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 126.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 198/625 [00:01<00:03, 125.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 211/625 [00:01<00:03, 125.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 224/625 [00:01<00:03, 126.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 237/625 [00:01<00:03, 121.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 250/625 [00:01<00:03, 117.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 262/625 [00:02<00:03, 113.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 274/625 [00:02<00:03, 111.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 286/625 [00:02<00:03, 109.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 298/625 [00:02<00:02, 110.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 310/625 [00:02<00:02, 108.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 321/625 [00:02<00:02, 108.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 334/625 [00:02<00:02, 113.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 347/625 [00:02<00:02, 117.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 360/625 [00:02<00:02, 120.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 374/625 [00:03<00:02, 123.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 387/625 [00:03<00:01, 124.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 401/625 [00:03<00:01, 126.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 414/625 [00:03<00:01, 127.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 428/625 [00:03<00:01, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 441/625 [00:03<00:01, 126.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 454/625 [00:03<00:01, 126.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 467/625 [00:03<00:01, 125.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:03<00:01, 124.99it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:04<00:01, 119.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:04<00:00, 121.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 123.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 124.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 545/625 [00:04<00:00, 125.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 558/625 [00:04<00:00, 125.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 571/625 [00:04<00:00, 123.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 584/625 [00:04<00:00, 124.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 597/625 [00:04<00:00, 125.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 610/625 [00:04<00:00, 125.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 624/625 [00:05<00:00, 127.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 123.28it/s]\u001b[A\u001b[A02/21/2020 12:31:12 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:31:12 - INFO - __main__ - perplexity = tensor(852.3326)\n",
"\n",
"Iteration: 38% 2128/5625 [01:32<31:26, 1.85it/s]\u001b[A\n",
"Iteration: 38% 2132/5625 [01:32<22:27, 2.59it/s]\u001b[A\n",
"Iteration: 38% 2136/5625 [01:32<16:09, 3.60it/s]\u001b[A\n",
"Iteration: 38% 2140/5625 [01:32<11:46, 4.93it/s]\u001b[A\n",
"Iteration: 38% 2144/5625 [01:33<08:42, 6.66it/s]\u001b[A\n",
"Iteration: 38% 2148/5625 [01:33<06:34, 8.81it/s]\u001b[A\n",
"Iteration: 38% 2152/5625 [01:33<05:04, 11.42it/s]\u001b[A\n",
"Iteration: 38% 2156/5625 [01:33<04:01, 14.37it/s]\u001b[A\n",
"Iteration: 38% 2160/5625 [01:33<03:17, 17.59it/s]\u001b[A\n",
"Iteration: 38% 2164/5625 [01:33<02:45, 20.87it/s]\u001b[A\n",
"Iteration: 39% 2168/5625 [01:33<02:24, 23.96it/s]\u001b[A\n",
"Iteration: 39% 2172/5625 [01:33<02:09, 26.74it/s]\u001b[A\n",
"Iteration: 39% 2176/5625 [01:33<01:58, 29.02it/s]\u001b[A\n",
"Iteration: 39% 2180/5625 [01:34<01:54, 30.09it/s]\u001b[A\n",
"Iteration: 39% 2184/5625 [01:34<01:48, 31.60it/s]\u001b[A\n",
"Iteration: 39% 2188/5625 [01:34<01:43, 33.29it/s]\u001b[A\n",
"Iteration: 39% 2192/5625 [01:34<01:40, 34.14it/s]\u001b[A\n",
"Iteration: 39% 2196/5625 [01:34<01:38, 34.88it/s]\u001b[A\n",
"Iteration: 39% 2200/5625 [01:34<01:37, 35.09it/s]\u001b[A\n",
"Iteration: 39% 2204/5625 [01:34<01:35, 35.88it/s]\u001b[A\n",
"Iteration: 39% 2208/5625 [01:34<01:36, 35.41it/s]\u001b[A\n",
"Iteration: 39% 2212/5625 [01:34<01:34, 36.25it/s]\u001b[A\n",
"Iteration: 39% 2216/5625 [01:35<01:33, 36.56it/s]\u001b[A\n",
"Iteration: 39% 2220/5625 [01:35<01:33, 36.27it/s]\u001b[A\n",
"Iteration: 40% 2224/5625 [01:35<01:33, 36.46it/s]\u001b[A\n",
"Iteration: 40% 2228/5625 [01:35<01:33, 36.52it/s]\u001b[A\n",
"Iteration: 40% 2232/5625 [01:35<01:32, 36.71it/s]\u001b[A\n",
"Iteration: 40% 2236/5625 [01:35<01:31, 36.98it/s]\u001b[A\n",
"Iteration: 40% 2240/5625 [01:35<01:31, 36.97it/s]\u001b[A\n",
"Iteration: 40% 2244/5625 [01:35<01:32, 36.54it/s]\u001b[A\n",
"Iteration: 40% 2248/5625 [01:35<01:31, 36.93it/s]\u001b[A\n",
"Iteration: 40% 2252/5625 [01:36<01:31, 36.99it/s]\u001b[A\n",
"Iteration: 40% 2256/5625 [01:36<01:32, 36.36it/s]\u001b[A\n",
"Iteration: 40% 2260/5625 [01:36<01:31, 36.92it/s]\u001b[A\n",
"Iteration: 40% 2264/5625 [01:36<01:31, 36.78it/s]\u001b[A\n",
"Iteration: 40% 2268/5625 [01:36<01:32, 36.23it/s]\u001b[A\n",
"Iteration: 40% 2272/5625 [01:36<01:32, 36.24it/s]\u001b[A\n",
"Iteration: 40% 2276/5625 [01:36<01:32, 36.01it/s]\u001b[A\n",
"Iteration: 41% 2280/5625 [01:36<01:32, 36.28it/s]\u001b[A\n",
"Iteration: 41% 2284/5625 [01:36<01:31, 36.44it/s]\u001b[A\n",
"Iteration: 41% 2288/5625 [01:37<01:30, 36.94it/s]\u001b[A\n",
"Iteration: 41% 2292/5625 [01:37<01:30, 36.63it/s]\u001b[A\n",
"Iteration: 41% 2296/5625 [01:37<01:31, 36.20it/s]\u001b[A\n",
"Iteration: 41% 2300/5625 [01:37<01:31, 36.31it/s]\u001b[A\n",
"Iteration: 41% 2304/5625 [01:37<01:31, 36.23it/s]\u001b[A\n",
"Iteration: 41% 2308/5625 [01:37<01:31, 36.40it/s]\u001b[A\n",
"Iteration: 41% 2312/5625 [01:37<01:30, 36.63it/s]\u001b[A\n",
"Iteration: 41% 2316/5625 [01:37<01:30, 36.46it/s]\u001b[A\n",
"Iteration: 41% 2320/5625 [01:37<01:30, 36.60it/s]\u001b[A\n",
"Iteration: 41% 2324/5625 [01:38<01:29, 36.97it/s]\u001b[A\n",
"Iteration: 41% 2328/5625 [01:38<01:29, 36.71it/s]\u001b[A\n",
"Iteration: 41% 2332/5625 [01:38<01:30, 36.31it/s]\u001b[A\n",
"Iteration: 42% 2336/5625 [01:38<01:29, 36.61it/s]\u001b[A\n",
"Iteration: 42% 2340/5625 [01:38<01:29, 36.78it/s]\u001b[A\n",
"Iteration: 42% 2344/5625 [01:38<01:31, 36.04it/s]\u001b[A\n",
"Iteration: 42% 2348/5625 [01:38<01:31, 35.88it/s]\u001b[A\n",
"Iteration: 42% 2352/5625 [01:38<01:30, 36.08it/s]\u001b[A\n",
"Iteration: 42% 2356/5625 [01:38<01:33, 34.88it/s]\u001b[A\n",
"Iteration: 42% 2360/5625 [01:39<01:32, 35.25it/s]\u001b[A\n",
"Iteration: 42% 2364/5625 [01:39<01:31, 35.60it/s]\u001b[A\n",
"Iteration: 42% 2368/5625 [01:39<01:33, 34.90it/s]\u001b[A\n",
"Iteration: 42% 2372/5625 [01:39<01:31, 35.64it/s]\u001b[A\n",
"Iteration: 42% 2376/5625 [01:39<01:29, 36.10it/s]\u001b[A\n",
"Iteration: 42% 2380/5625 [01:39<01:28, 36.70it/s]\u001b[A\n",
"Iteration: 42% 2384/5625 [01:39<01:28, 36.61it/s]\u001b[A\n",
"Iteration: 42% 2388/5625 [01:39<01:27, 37.01it/s]\u001b[A\n",
"Iteration: 43% 2392/5625 [01:39<01:27, 37.02it/s]\u001b[A\n",
"Iteration: 43% 2396/5625 [01:40<01:26, 37.14it/s]\u001b[A\n",
"Iteration: 43% 2400/5625 [01:40<01:26, 37.24it/s]\u001b[A\n",
"Iteration: 43% 2404/5625 [01:40<01:28, 36.30it/s]\u001b[A\n",
"Iteration: 43% 2408/5625 [01:40<01:29, 36.01it/s]\u001b[A\n",
"Iteration: 43% 2412/5625 [01:40<01:28, 36.44it/s]\u001b[A\n",
"Iteration: 43% 2416/5625 [01:40<01:28, 36.38it/s]\u001b[A\n",
"Iteration: 43% 2420/5625 [01:40<01:28, 36.33it/s]\u001b[A\n",
"Iteration: 43% 2424/5625 [01:40<01:27, 36.72it/s]\u001b[A\n",
"Iteration: 43% 2428/5625 [01:40<01:27, 36.62it/s]\u001b[A\n",
"Iteration: 43% 2432/5625 [01:40<01:27, 36.37it/s]\u001b[A\n",
"Iteration: 43% 2436/5625 [01:41<01:27, 36.43it/s]\u001b[A\n",
"Iteration: 43% 2440/5625 [01:41<01:29, 35.44it/s]\u001b[A\n",
"Iteration: 43% 2444/5625 [01:41<01:28, 35.80it/s]\u001b[A\n",
"Iteration: 44% 2448/5625 [01:41<01:30, 35.22it/s]\u001b[A\n",
"Iteration: 44% 2452/5625 [01:41<01:29, 35.53it/s]\u001b[A\n",
"Iteration: 44% 2456/5625 [01:41<01:29, 35.41it/s]\u001b[A\n",
"Iteration: 44% 2460/5625 [01:41<01:27, 35.98it/s]\u001b[A\n",
"Iteration: 44% 2464/5625 [01:41<01:27, 36.19it/s]\u001b[A\n",
"Iteration: 44% 2468/5625 [01:42<01:27, 36.08it/s]\u001b[A\n",
"Iteration: 44% 2472/5625 [01:42<01:29, 35.42it/s]\u001b[A\n",
"Iteration: 44% 2476/5625 [01:42<01:27, 36.04it/s]\u001b[A\n",
"Iteration: 44% 2480/5625 [01:42<01:28, 35.74it/s]\u001b[A\n",
"Iteration: 44% 2484/5625 [01:42<01:26, 36.12it/s]\u001b[A\n",
"Iteration: 44% 2488/5625 [01:42<01:26, 36.24it/s]\u001b[A\n",
"Iteration: 44% 2492/5625 [01:42<01:25, 36.54it/s]\u001b[A\n",
"Iteration: 44% 2496/5625 [01:42<01:24, 36.91it/s]\u001b[A\n",
"Iteration: 44% 2500/5625 [01:42<01:24, 36.87it/s]\u001b[A\n",
"Iteration: 45% 2504/5625 [01:42<01:24, 36.94it/s]\u001b[A\n",
"Iteration: 45% 2508/5625 [01:43<01:24, 36.86it/s]\u001b[A\n",
"Iteration: 45% 2512/5625 [01:43<01:23, 37.11it/s]\u001b[A\n",
"Iteration: 45% 2516/5625 [01:43<01:25, 36.45it/s]\u001b[A\n",
"Iteration: 45% 2520/5625 [01:43<01:27, 35.67it/s]\u001b[A\n",
"Iteration: 45% 2524/5625 [01:43<01:27, 35.59it/s]\u001b[A\n",
"Iteration: 45% 2528/5625 [01:43<01:25, 36.23it/s]\u001b[A\n",
"Iteration: 45% 2532/5625 [01:43<01:24, 36.48it/s]\u001b[A\n",
"Iteration: 45% 2536/5625 [01:43<01:24, 36.54it/s]\u001b[A\n",
"Iteration: 45% 2540/5625 [01:43<01:23, 36.73it/s]\u001b[A\n",
"Iteration: 45% 2544/5625 [01:44<01:24, 36.57it/s]\u001b[A\n",
"Iteration: 45% 2548/5625 [01:44<01:24, 36.53it/s]\u001b[A\n",
"Iteration: 45% 2552/5625 [01:44<01:26, 35.71it/s]\u001b[A\n",
"Iteration: 45% 2556/5625 [01:44<01:25, 35.97it/s]\u001b[A\n",
"Iteration: 46% 2560/5625 [01:44<01:25, 36.01it/s]\u001b[A\n",
"Iteration: 46% 2564/5625 [01:44<01:26, 35.24it/s]\u001b[A\n",
"Iteration: 46% 2568/5625 [01:44<01:25, 35.73it/s]\u001b[A\n",
"Iteration: 46% 2572/5625 [01:44<01:24, 36.27it/s]\u001b[A\n",
"Iteration: 46% 2576/5625 [01:44<01:23, 36.43it/s]\u001b[A\n",
"Iteration: 46% 2580/5625 [01:45<01:23, 36.25it/s]\u001b[A\n",
"Iteration: 46% 2584/5625 [01:45<01:24, 36.16it/s]\u001b[A\n",
"Iteration: 46% 2588/5625 [01:45<01:24, 35.80it/s]\u001b[A\n",
"Iteration: 46% 2592/5625 [01:45<01:24, 35.86it/s]\u001b[A\n",
"Iteration: 46% 2596/5625 [01:45<01:24, 35.91it/s]\u001b[A\n",
"Iteration: 46% 2600/5625 [01:45<01:24, 35.99it/s]\u001b[A\n",
"Iteration: 46% 2604/5625 [01:45<01:23, 36.28it/s]\u001b[A\n",
"Iteration: 46% 2608/5625 [01:45<01:22, 36.50it/s]\u001b[A\n",
"Iteration: 46% 2612/5625 [01:45<01:21, 36.79it/s]\u001b[A\n",
"Iteration: 47% 2616/5625 [01:46<01:21, 37.10it/s]\u001b[A\n",
"Iteration: 47% 2620/5625 [01:46<01:22, 36.42it/s]\u001b[A\n",
"Iteration: 47% 2624/5625 [01:46<01:22, 36.30it/s]\u001b[A02/21/2020 12:31:26 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:31:27 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:31:27 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:31:27 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 129.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 130.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 125.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 54/625 [00:00<00:04, 128.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 68/625 [00:00<00:04, 128.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 82/625 [00:00<00:04, 129.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 95/625 [00:00<00:04, 129.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 108/625 [00:00<00:03, 129.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 121/625 [00:00<00:03, 129.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 134/625 [00:01<00:03, 127.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 24% 147/625 [00:01<00:03, 126.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 26% 160/625 [00:01<00:03, 125.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 173/625 [00:01<00:03, 120.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 187/625 [00:01<00:03, 123.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 200/625 [00:01<00:03, 124.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 214/625 [00:01<00:03, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 227/625 [00:01<00:03, 126.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 39% 241/625 [00:01<00:02, 128.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 41% 254/625 [00:01<00:02, 126.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 43% 267/625 [00:02<00:02, 126.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 280/625 [00:02<00:02, 125.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 293/625 [00:02<00:02, 126.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 306/625 [00:02<00:02, 124.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 320/625 [00:02<00:02, 126.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 334/625 [00:02<00:02, 127.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 348/625 [00:02<00:02, 129.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 361/625 [00:02<00:02, 129.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 375/625 [00:02<00:01, 129.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 389/625 [00:03<00:01, 130.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 403/625 [00:03<00:01, 129.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 417/625 [00:03<00:01, 129.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 430/625 [00:03<00:01, 128.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 443/625 [00:03<00:01, 126.91it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 456/625 [00:03<00:01, 127.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 469/625 [00:03<00:01, 124.92it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 483/625 [00:03<00:01, 126.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 496/625 [00:03<00:01, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 509/625 [00:03<00:00, 126.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 523/625 [00:04<00:00, 128.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 536/625 [00:04<00:00, 127.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 549/625 [00:04<00:00, 127.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 562/625 [00:04<00:00, 126.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 575/625 [00:04<00:00, 126.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 588/625 [00:04<00:00, 126.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 602/625 [00:04<00:00, 127.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 616/625 [00:04<00:00, 129.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.53it/s]\u001b[A\u001b[A02/21/2020 12:31:32 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:31:32 - INFO - __main__ - perplexity = tensor(831.1484)\n",
"\n",
"Iteration: 47% 2628/5625 [01:53<26:13, 1.90it/s]\u001b[A\n",
"Iteration: 47% 2632/5625 [01:53<18:47, 2.65it/s]\u001b[A\n",
"Iteration: 47% 2636/5625 [01:53<13:32, 3.68it/s]\u001b[A\n",
"Iteration: 47% 2640/5625 [01:53<09:53, 5.03it/s]\u001b[A\n",
"Iteration: 47% 2644/5625 [01:53<07:20, 6.77it/s]\u001b[A\n",
"Iteration: 47% 2648/5625 [01:53<05:36, 8.86it/s]\u001b[A\n",
"Iteration: 47% 2652/5625 [01:53<04:18, 11.50it/s]\u001b[A\n",
"Iteration: 47% 2656/5625 [01:53<03:24, 14.55it/s]\u001b[A\n",
"Iteration: 47% 2660/5625 [01:53<02:49, 17.47it/s]\u001b[A\n",
"Iteration: 47% 2664/5625 [01:54<02:21, 20.86it/s]\u001b[A\n",
"Iteration: 47% 2668/5625 [01:54<02:03, 23.97it/s]\u001b[A\n",
"Iteration: 48% 2672/5625 [01:54<01:50, 26.68it/s]\u001b[A\n",
"Iteration: 48% 2676/5625 [01:54<01:41, 29.05it/s]\u001b[A\n",
"Iteration: 48% 2680/5625 [01:54<01:35, 30.71it/s]\u001b[A\n",
"Iteration: 48% 2684/5625 [01:54<01:31, 32.07it/s]\u001b[A\n",
"Iteration: 48% 2688/5625 [01:54<01:28, 33.23it/s]\u001b[A\n",
"Iteration: 48% 2692/5625 [01:54<01:25, 34.43it/s]\u001b[A\n",
"Iteration: 48% 2696/5625 [01:54<01:23, 35.01it/s]\u001b[A\n",
"Iteration: 48% 2700/5625 [01:55<01:21, 35.74it/s]\u001b[A\n",
"Iteration: 48% 2704/5625 [01:55<01:21, 35.93it/s]\u001b[A\n",
"Iteration: 48% 2708/5625 [01:55<01:20, 36.35it/s]\u001b[A\n",
"Iteration: 48% 2712/5625 [01:55<01:20, 36.24it/s]\u001b[A\n",
"Iteration: 48% 2716/5625 [01:55<01:18, 36.92it/s]\u001b[A\n",
"Iteration: 48% 2720/5625 [01:55<01:20, 36.14it/s]\u001b[A\n",
"Iteration: 48% 2724/5625 [01:55<01:19, 36.27it/s]\u001b[A\n",
"Iteration: 48% 2728/5625 [01:55<01:19, 36.34it/s]\u001b[A\n",
"Iteration: 49% 2732/5625 [01:55<01:19, 36.25it/s]\u001b[A\n",
"Iteration: 49% 2736/5625 [01:56<01:22, 35.21it/s]\u001b[A\n",
"Iteration: 49% 2740/5625 [01:56<01:22, 34.96it/s]\u001b[A\n",
"Iteration: 49% 2744/5625 [01:56<01:21, 35.47it/s]\u001b[A\n",
"Iteration: 49% 2748/5625 [01:56<01:20, 35.96it/s]\u001b[A\n",
"Iteration: 49% 2752/5625 [01:56<01:19, 36.23it/s]\u001b[A\n",
"Iteration: 49% 2756/5625 [01:56<01:20, 35.80it/s]\u001b[A\n",
"Iteration: 49% 2760/5625 [01:56<01:19, 35.82it/s]\u001b[A\n",
"Iteration: 49% 2764/5625 [01:56<01:19, 35.93it/s]\u001b[A\n",
"Iteration: 49% 2768/5625 [01:56<01:18, 36.38it/s]\u001b[A\n",
"Iteration: 49% 2772/5625 [01:57<01:17, 36.73it/s]\u001b[A\n",
"Iteration: 49% 2776/5625 [01:57<01:17, 36.84it/s]\u001b[A\n",
"Iteration: 49% 2780/5625 [01:57<01:17, 36.59it/s]\u001b[A\n",
"Iteration: 49% 2784/5625 [01:57<01:20, 35.48it/s]\u001b[A\n",
"Iteration: 50% 2788/5625 [01:57<01:20, 35.46it/s]\u001b[A\n",
"Iteration: 50% 2792/5625 [01:57<01:20, 35.33it/s]\u001b[A\n",
"Iteration: 50% 2796/5625 [01:57<01:18, 35.84it/s]\u001b[A\n",
"Iteration: 50% 2800/5625 [01:57<01:19, 35.33it/s]\u001b[A\n",
"Iteration: 50% 2804/5625 [01:57<01:18, 35.80it/s]\u001b[A\n",
"Iteration: 50% 2808/5625 [01:58<01:17, 36.29it/s]\u001b[A\n",
"Iteration: 50% 2812/5625 [01:58<01:16, 36.83it/s]\u001b[A\n",
"Iteration: 50% 2816/5625 [01:58<01:16, 36.82it/s]\u001b[A\n",
"Iteration: 50% 2820/5625 [01:58<01:15, 37.10it/s]\u001b[A\n",
"Iteration: 50% 2824/5625 [01:58<01:14, 37.47it/s]\u001b[A\n",
"Iteration: 50% 2828/5625 [01:58<01:15, 36.96it/s]\u001b[A\n",
"Iteration: 50% 2832/5625 [01:58<01:15, 37.02it/s]\u001b[A\n",
"Iteration: 50% 2836/5625 [01:58<01:15, 36.82it/s]\u001b[A\n",
"Iteration: 50% 2840/5625 [01:58<01:16, 36.52it/s]\u001b[A\n",
"Iteration: 51% 2844/5625 [01:59<01:16, 36.26it/s]\u001b[A\n",
"Iteration: 51% 2848/5625 [01:59<01:15, 36.56it/s]\u001b[A\n",
"Iteration: 51% 2852/5625 [01:59<01:16, 36.45it/s]\u001b[A\n",
"Iteration: 51% 2856/5625 [01:59<01:15, 36.65it/s]\u001b[A\n",
"Iteration: 51% 2860/5625 [01:59<01:16, 36.24it/s]\u001b[A\n",
"Iteration: 51% 2864/5625 [01:59<01:17, 35.68it/s]\u001b[A\n",
"Iteration: 51% 2868/5625 [01:59<01:16, 36.05it/s]\u001b[A\n",
"Iteration: 51% 2872/5625 [01:59<01:15, 36.46it/s]\u001b[A\n",
"Iteration: 51% 2876/5625 [01:59<01:15, 36.47it/s]\u001b[A\n",
"Iteration: 51% 2880/5625 [02:00<01:14, 36.66it/s]\u001b[A\n",
"Iteration: 51% 2884/5625 [02:00<01:14, 36.79it/s]\u001b[A\n",
"Iteration: 51% 2888/5625 [02:00<01:14, 36.72it/s]\u001b[A\n",
"Iteration: 51% 2892/5625 [02:00<01:13, 36.98it/s]\u001b[A\n",
"Iteration: 51% 2896/5625 [02:00<01:13, 37.37it/s]\u001b[A\n",
"Iteration: 52% 2900/5625 [02:00<01:13, 36.95it/s]\u001b[A\n",
"Iteration: 52% 2904/5625 [02:00<01:14, 36.57it/s]\u001b[A\n",
"Iteration: 52% 2908/5625 [02:00<01:14, 36.54it/s]\u001b[A\n",
"Iteration: 52% 2912/5625 [02:00<01:14, 36.37it/s]\u001b[A\n",
"Iteration: 52% 2916/5625 [02:00<01:13, 36.70it/s]\u001b[A\n",
"Iteration: 52% 2920/5625 [02:01<01:13, 36.60it/s]\u001b[A\n",
"Iteration: 52% 2924/5625 [02:01<01:13, 36.97it/s]\u001b[A\n",
"Iteration: 52% 2928/5625 [02:01<01:12, 37.00it/s]\u001b[A\n",
"Iteration: 52% 2932/5625 [02:01<01:12, 37.31it/s]\u001b[A\n",
"Iteration: 52% 2936/5625 [02:01<01:12, 36.86it/s]\u001b[A\n",
"Iteration: 52% 2940/5625 [02:01<01:14, 36.26it/s]\u001b[A\n",
"Iteration: 52% 2944/5625 [02:01<01:13, 36.41it/s]\u001b[A\n",
"Iteration: 52% 2948/5625 [02:01<01:13, 36.27it/s]\u001b[A\n",
"Iteration: 52% 2952/5625 [02:01<01:12, 36.72it/s]\u001b[A\n",
"Iteration: 53% 2956/5625 [02:02<01:12, 37.03it/s]\u001b[A\n",
"Iteration: 53% 2960/5625 [02:02<01:12, 36.83it/s]\u001b[A\n",
"Iteration: 53% 2964/5625 [02:02<01:11, 37.03it/s]\u001b[A\n",
"Iteration: 53% 2968/5625 [02:02<01:12, 36.86it/s]\u001b[A\n",
"Iteration: 53% 2972/5625 [02:02<01:11, 37.03it/s]\u001b[A\n",
"Iteration: 53% 2976/5625 [02:02<01:11, 37.06it/s]\u001b[A\n",
"Iteration: 53% 2980/5625 [02:02<01:12, 36.58it/s]\u001b[A\n",
"Iteration: 53% 2984/5625 [02:02<01:11, 36.82it/s]\u001b[A\n",
"Iteration: 53% 2988/5625 [02:02<01:10, 37.25it/s]\u001b[A\n",
"Iteration: 53% 2992/5625 [02:03<01:11, 36.97it/s]\u001b[A\n",
"Iteration: 53% 2996/5625 [02:03<01:11, 36.86it/s]\u001b[A\n",
"Iteration: 53% 3000/5625 [02:03<01:12, 36.36it/s]\u001b[A\n",
"Iteration: 53% 3004/5625 [02:03<01:11, 36.55it/s]\u001b[A\n",
"Iteration: 53% 3008/5625 [02:03<01:11, 36.66it/s]\u001b[A\n",
"Iteration: 54% 3012/5625 [02:03<01:10, 36.97it/s]\u001b[A\n",
"Iteration: 54% 3016/5625 [02:03<01:11, 36.66it/s]\u001b[A\n",
"Iteration: 54% 3020/5625 [02:03<01:11, 36.26it/s]\u001b[A\n",
"Iteration: 54% 3024/5625 [02:03<01:12, 36.12it/s]\u001b[A\n",
"Iteration: 54% 3028/5625 [02:04<01:13, 35.29it/s]\u001b[A\n",
"Iteration: 54% 3032/5625 [02:04<01:13, 35.27it/s]\u001b[A\n",
"Iteration: 54% 3036/5625 [02:04<01:12, 35.71it/s]\u001b[A\n",
"Iteration: 54% 3040/5625 [02:04<01:12, 35.63it/s]\u001b[A\n",
"Iteration: 54% 3044/5625 [02:04<01:11, 36.15it/s]\u001b[A\n",
"Iteration: 54% 3048/5625 [02:04<01:10, 36.44it/s]\u001b[A\n",
"Iteration: 54% 3052/5625 [02:04<01:11, 36.16it/s]\u001b[A\n",
"Iteration: 54% 3056/5625 [02:04<01:10, 36.47it/s]\u001b[A\n",
"Iteration: 54% 3060/5625 [02:04<01:10, 36.60it/s]\u001b[A\n",
"Iteration: 54% 3064/5625 [02:05<01:11, 35.72it/s]\u001b[A\n",
"Iteration: 55% 3068/5625 [02:05<01:11, 35.75it/s]\u001b[A\n",
"Iteration: 55% 3072/5625 [02:05<01:10, 36.31it/s]\u001b[A\n",
"Iteration: 55% 3076/5625 [02:05<01:10, 36.15it/s]\u001b[A\n",
"Iteration: 55% 3080/5625 [02:05<01:09, 36.36it/s]\u001b[A\n",
"Iteration: 55% 3084/5625 [02:05<01:09, 36.54it/s]\u001b[A\n",
"Iteration: 55% 3088/5625 [02:05<01:11, 35.45it/s]\u001b[A\n",
"Iteration: 55% 3092/5625 [02:05<01:10, 35.95it/s]\u001b[A\n",
"Iteration: 55% 3096/5625 [02:05<01:10, 36.04it/s]\u001b[A\n",
"Iteration: 55% 3100/5625 [02:06<01:10, 36.05it/s]\u001b[A\n",
"Iteration: 55% 3104/5625 [02:06<01:08, 36.62it/s]\u001b[A\n",
"Iteration: 55% 3108/5625 [02:06<01:08, 36.57it/s]\u001b[A\n",
"Iteration: 55% 3112/5625 [02:06<01:08, 36.62it/s]\u001b[A\n",
"Iteration: 55% 3116/5625 [02:06<01:08, 36.59it/s]\u001b[A\n",
"Iteration: 55% 3120/5625 [02:06<01:08, 36.61it/s]\u001b[A\n",
"Iteration: 56% 3124/5625 [02:06<01:07, 37.17it/s]\u001b[A02/21/2020 12:31:46 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:31:47 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:31:47 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:31:47 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 129.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 129.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 127.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 129.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 10% 65/625 [00:00<00:04, 124.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 12% 76/625 [00:00<00:04, 117.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 14% 89/625 [00:00<00:04, 119.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 16% 102/625 [00:00<00:04, 120.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 18% 114/625 [00:00<00:04, 120.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 20% 127/625 [00:01<00:04, 122.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 141/625 [00:01<00:03, 126.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 154/625 [00:01<00:03, 124.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 167/625 [00:01<00:03, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 180/625 [00:01<00:03, 125.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 31% 193/625 [00:01<00:03, 125.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 33% 206/625 [00:01<00:03, 125.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 35% 220/625 [00:01<00:03, 127.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 37% 234/625 [00:01<00:03, 128.92it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 247/625 [00:01<00:02, 128.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 261/625 [00:02<00:02, 128.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 274/625 [00:02<00:02, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 287/625 [00:02<00:02, 127.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 301/625 [00:02<00:02, 129.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 315/625 [00:02<00:02, 129.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 328/625 [00:02<00:02, 128.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 342/625 [00:02<00:02, 129.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 356/625 [00:02<00:02, 129.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 370/625 [00:02<00:02, 126.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 383/625 [00:03<00:01, 126.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 397/625 [00:03<00:01, 127.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 410/625 [00:03<00:01, 128.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 423/625 [00:03<00:01, 127.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 437/625 [00:03<00:01, 128.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 450/625 [00:03<00:01, 127.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 463/625 [00:03<00:01, 127.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 476/625 [00:03<00:01, 128.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 490/625 [00:03<00:01, 129.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 504/625 [00:03<00:00, 129.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 517/625 [00:04<00:00, 128.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 531/625 [00:04<00:00, 129.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 544/625 [00:04<00:00, 128.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 557/625 [00:04<00:00, 128.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 571/625 [00:04<00:00, 129.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 584/625 [00:04<00:00, 127.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 597/625 [00:04<00:00, 127.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 611/625 [00:04<00:00, 128.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 129.19it/s]\u001b[A\u001b[A\n",
"\n",
"\u001b[A\u001b[A02/21/2020 12:31:52 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:31:52 - INFO - __main__ - perplexity = tensor(833.0645)\n",
"02/21/2020 12:31:52 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-20000/config.json\n",
"02/21/2020 12:31:52 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-20000/pytorch_model.bin\n",
"02/21/2020 12:31:52 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-20000\n",
"02/21/2020 12:31:52 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-16000] due to args.save_total_limit\n",
"02/21/2020 12:31:52 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-20000\n",
"\n",
"Iteration: 56% 3128/5625 [02:13<21:23, 1.95it/s]\u001b[A\n",
"Iteration: 56% 3132/5625 [02:13<15:19, 2.71it/s]\u001b[A\n",
"Iteration: 56% 3136/5625 [02:13<11:02, 3.76it/s]\u001b[A\n",
"Iteration: 56% 3140/5625 [02:13<08:03, 5.14it/s]\u001b[A\n",
"Iteration: 56% 3144/5625 [02:13<05:59, 6.91it/s]\u001b[A\n",
"Iteration: 56% 3148/5625 [02:13<04:32, 9.10it/s]\u001b[A\n",
"Iteration: 56% 3152/5625 [02:13<03:31, 11.71it/s]\u001b[A\n",
"Iteration: 56% 3156/5625 [02:14<02:51, 14.37it/s]\u001b[A\n",
"Iteration: 56% 3160/5625 [02:14<02:22, 17.30it/s]\u001b[A\n",
"Iteration: 56% 3164/5625 [02:14<02:03, 19.98it/s]\u001b[A\n",
"Iteration: 56% 3168/5625 [02:14<01:49, 22.45it/s]\u001b[A\n",
"Iteration: 56% 3172/5625 [02:14<01:40, 24.45it/s]\u001b[A\n",
"Iteration: 56% 3176/5625 [02:14<01:34, 25.99it/s]\u001b[A\n",
"Iteration: 57% 3180/5625 [02:14<01:28, 27.74it/s]\u001b[A\n",
"Iteration: 57% 3184/5625 [02:14<01:22, 29.42it/s]\u001b[A\n",
"Iteration: 57% 3188/5625 [02:15<01:21, 29.88it/s]\u001b[A\n",
"Iteration: 57% 3192/5625 [02:15<01:20, 30.20it/s]\u001b[A\n",
"Iteration: 57% 3196/5625 [02:15<01:19, 30.71it/s]\u001b[A\n",
"Iteration: 57% 3200/5625 [02:15<01:18, 30.74it/s]\u001b[A\n",
"Iteration: 57% 3204/5625 [02:15<01:20, 30.19it/s]\u001b[A\n",
"Iteration: 57% 3208/5625 [02:15<01:19, 30.48it/s]\u001b[A\n",
"Iteration: 57% 3212/5625 [02:15<01:17, 31.33it/s]\u001b[A\n",
"Iteration: 57% 3216/5625 [02:15<01:14, 32.44it/s]\u001b[A\n",
"Iteration: 57% 3220/5625 [02:16<01:12, 33.33it/s]\u001b[A\n",
"Iteration: 57% 3224/5625 [02:16<01:10, 34.09it/s]\u001b[A\n",
"Iteration: 57% 3228/5625 [02:16<01:10, 34.03it/s]\u001b[A\n",
"Iteration: 57% 3232/5625 [02:16<01:09, 34.47it/s]\u001b[A\n",
"Iteration: 58% 3236/5625 [02:16<01:07, 35.40it/s]\u001b[A\n",
"Iteration: 58% 3240/5625 [02:16<01:07, 35.37it/s]\u001b[A\n",
"Iteration: 58% 3244/5625 [02:16<01:07, 35.39it/s]\u001b[A\n",
"Iteration: 58% 3248/5625 [02:16<01:07, 35.00it/s]\u001b[A\n",
"Iteration: 58% 3252/5625 [02:17<01:10, 33.47it/s]\u001b[A\n",
"Iteration: 58% 3256/5625 [02:17<01:09, 33.92it/s]\u001b[A\n",
"Iteration: 58% 3260/5625 [02:17<01:09, 34.04it/s]\u001b[A\n",
"Iteration: 58% 3264/5625 [02:17<01:08, 34.22it/s]\u001b[A\n",
"Iteration: 58% 3268/5625 [02:17<01:07, 34.78it/s]\u001b[A\n",
"Iteration: 58% 3272/5625 [02:17<01:07, 34.79it/s]\u001b[A\n",
"Iteration: 58% 3276/5625 [02:17<01:07, 34.70it/s]\u001b[A\n",
"Iteration: 58% 3280/5625 [02:17<01:07, 34.96it/s]\u001b[A\n",
"Iteration: 58% 3284/5625 [02:17<01:07, 34.59it/s]\u001b[A\n",
"Iteration: 58% 3288/5625 [02:18<01:07, 34.46it/s]\u001b[A\n",
"Iteration: 59% 3292/5625 [02:18<01:06, 35.33it/s]\u001b[A\n",
"Iteration: 59% 3296/5625 [02:18<01:05, 35.83it/s]\u001b[A\n",
"Iteration: 59% 3300/5625 [02:18<01:04, 36.18it/s]\u001b[A\n",
"Iteration: 59% 3304/5625 [02:18<01:03, 36.30it/s]\u001b[A\n",
"Iteration: 59% 3308/5625 [02:18<01:03, 36.33it/s]\u001b[A\n",
"Iteration: 59% 3312/5625 [02:18<01:04, 36.10it/s]\u001b[A\n",
"Iteration: 59% 3316/5625 [02:18<01:03, 36.28it/s]\u001b[A\n",
"Iteration: 59% 3320/5625 [02:18<01:04, 35.87it/s]\u001b[A\n",
"Iteration: 59% 3324/5625 [02:19<01:04, 35.50it/s]\u001b[A\n",
"Iteration: 59% 3328/5625 [02:19<01:03, 35.96it/s]\u001b[A\n",
"Iteration: 59% 3332/5625 [02:19<01:02, 36.41it/s]\u001b[A\n",
"Iteration: 59% 3336/5625 [02:19<01:02, 36.64it/s]\u001b[A\n",
"Iteration: 59% 3340/5625 [02:19<01:03, 36.14it/s]\u001b[A\n",
"Iteration: 59% 3344/5625 [02:19<01:02, 36.43it/s]\u001b[A\n",
"Iteration: 60% 3348/5625 [02:19<01:01, 36.96it/s]\u001b[A\n",
"Iteration: 60% 3352/5625 [02:19<01:02, 36.45it/s]\u001b[A\n",
"Iteration: 60% 3356/5625 [02:19<01:01, 36.77it/s]\u001b[A\n",
"Iteration: 60% 3360/5625 [02:20<01:02, 36.09it/s]\u001b[A\n",
"Iteration: 60% 3364/5625 [02:20<01:02, 36.13it/s]\u001b[A\n",
"Iteration: 60% 3368/5625 [02:20<01:02, 36.11it/s]\u001b[A\n",
"Iteration: 60% 3372/5625 [02:20<01:02, 36.20it/s]\u001b[A\n",
"Iteration: 60% 3376/5625 [02:20<01:01, 36.34it/s]\u001b[A\n",
"Iteration: 60% 3380/5625 [02:20<01:02, 36.13it/s]\u001b[A\n",
"Iteration: 60% 3384/5625 [02:20<01:00, 36.85it/s]\u001b[A\n",
"Iteration: 60% 3388/5625 [02:20<01:00, 36.77it/s]\u001b[A\n",
"Iteration: 60% 3392/5625 [02:20<01:02, 35.79it/s]\u001b[A\n",
"Iteration: 60% 3396/5625 [02:21<01:02, 35.40it/s]\u001b[A\n",
"Iteration: 60% 3400/5625 [02:21<01:02, 35.66it/s]\u001b[A\n",
"Iteration: 61% 3404/5625 [02:21<01:01, 36.07it/s]\u001b[A\n",
"Iteration: 61% 3408/5625 [02:21<01:00, 36.58it/s]\u001b[A\n",
"Iteration: 61% 3412/5625 [02:21<01:01, 36.23it/s]\u001b[A\n",
"Iteration: 61% 3416/5625 [02:21<01:01, 36.19it/s]\u001b[A\n",
"Iteration: 61% 3420/5625 [02:21<01:01, 36.08it/s]\u001b[A\n",
"Iteration: 61% 3424/5625 [02:21<01:00, 36.45it/s]\u001b[A\n",
"Iteration: 61% 3428/5625 [02:21<00:59, 36.72it/s]\u001b[A\n",
"Iteration: 61% 3432/5625 [02:22<00:59, 36.66it/s]\u001b[A\n",
"Iteration: 61% 3436/5625 [02:22<00:59, 36.58it/s]\u001b[A\n",
"Iteration: 61% 3440/5625 [02:22<01:00, 36.38it/s]\u001b[A\n",
"Iteration: 61% 3444/5625 [02:22<01:00, 36.16it/s]\u001b[A\n",
"Iteration: 61% 3448/5625 [02:22<01:00, 35.82it/s]\u001b[A\n",
"Iteration: 61% 3452/5625 [02:22<01:00, 35.77it/s]\u001b[A\n",
"Iteration: 61% 3456/5625 [02:22<01:00, 36.09it/s]\u001b[A\n",
"Iteration: 62% 3460/5625 [02:22<00:59, 36.35it/s]\u001b[A\n",
"Iteration: 62% 3464/5625 [02:22<00:59, 36.41it/s]\u001b[A\n",
"Iteration: 62% 3468/5625 [02:23<00:59, 36.32it/s]\u001b[A\n",
"Iteration: 62% 3472/5625 [02:23<01:01, 34.95it/s]\u001b[A\n",
"Iteration: 62% 3476/5625 [02:23<01:00, 35.31it/s]\u001b[A\n",
"Iteration: 62% 3480/5625 [02:23<00:59, 35.79it/s]\u001b[A\n",
"Iteration: 62% 3484/5625 [02:23<00:59, 35.89it/s]\u001b[A\n",
"Iteration: 62% 3488/5625 [02:23<00:59, 35.87it/s]\u001b[A\n",
"Iteration: 62% 3492/5625 [02:23<00:59, 35.72it/s]\u001b[A\n",
"Iteration: 62% 3496/5625 [02:23<01:00, 34.98it/s]\u001b[A\n",
"Iteration: 62% 3500/5625 [02:23<01:00, 35.16it/s]\u001b[A\n",
"Iteration: 62% 3504/5625 [02:24<00:59, 35.93it/s]\u001b[A\n",
"Iteration: 62% 3508/5625 [02:24<00:59, 35.73it/s]\u001b[A\n",
"Iteration: 62% 3512/5625 [02:24<00:59, 35.76it/s]\u001b[A\n",
"Iteration: 63% 3516/5625 [02:24<00:58, 36.01it/s]\u001b[A\n",
"Iteration: 63% 3520/5625 [02:24<00:58, 36.09it/s]\u001b[A\n",
"Iteration: 63% 3524/5625 [02:24<00:57, 36.39it/s]\u001b[A\n",
"Iteration: 63% 3528/5625 [02:24<00:57, 36.43it/s]\u001b[A\n",
"Iteration: 63% 3532/5625 [02:24<00:57, 36.53it/s]\u001b[A\n",
"Iteration: 63% 3536/5625 [02:24<00:57, 36.25it/s]\u001b[A\n",
"Iteration: 63% 3540/5625 [02:25<00:57, 36.48it/s]\u001b[A\n",
"Iteration: 63% 3544/5625 [02:25<00:58, 35.83it/s]\u001b[A\n",
"Iteration: 63% 3548/5625 [02:25<00:57, 35.81it/s]\u001b[A\n",
"Iteration: 63% 3552/5625 [02:25<00:57, 36.01it/s]\u001b[A\n",
"Iteration: 63% 3556/5625 [02:25<00:56, 36.33it/s]\u001b[A\n",
"Iteration: 63% 3560/5625 [02:25<00:57, 35.64it/s]\u001b[A\n",
"Iteration: 63% 3564/5625 [02:25<00:59, 34.71it/s]\u001b[A\n",
"Iteration: 63% 3568/5625 [02:25<01:01, 33.65it/s]\u001b[A\n",
"Iteration: 64% 3572/5625 [02:25<01:01, 33.26it/s]\u001b[A\n",
"Iteration: 64% 3576/5625 [02:26<01:02, 32.57it/s]\u001b[A\n",
"Iteration: 64% 3580/5625 [02:26<01:03, 32.03it/s]\u001b[A\n",
"Iteration: 64% 3584/5625 [02:26<01:04, 31.58it/s]\u001b[A\n",
"Iteration: 64% 3588/5625 [02:26<01:02, 32.64it/s]\u001b[A\n",
"Iteration: 64% 3592/5625 [02:26<01:02, 32.78it/s]\u001b[A\n",
"Iteration: 64% 3596/5625 [02:26<01:00, 33.46it/s]\u001b[A\n",
"Iteration: 64% 3600/5625 [02:26<00:58, 34.40it/s]\u001b[A\n",
"Iteration: 64% 3604/5625 [02:26<00:59, 33.73it/s]\u001b[A\n",
"Iteration: 64% 3608/5625 [02:27<00:58, 34.31it/s]\u001b[A\n",
"Iteration: 64% 3612/5625 [02:27<00:58, 34.51it/s]\u001b[A\n",
"Iteration: 64% 3616/5625 [02:27<00:57, 34.89it/s]\u001b[A\n",
"Iteration: 64% 3620/5625 [02:27<00:56, 35.77it/s]\u001b[A\n",
"Iteration: 64% 3624/5625 [02:27<00:55, 35.81it/s]\u001b[A02/21/2020 12:32:07 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:32:08 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:32:08 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:32:08 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:05, 120.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 25/625 [00:00<00:05, 117.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 36/625 [00:00<00:05, 113.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 47/625 [00:00<00:05, 112.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 58/625 [00:00<00:05, 110.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 70/625 [00:00<00:05, 110.81it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 82/625 [00:00<00:04, 110.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 109.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 105/625 [00:00<00:04, 112.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 118/625 [00:01<00:04, 115.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 131/625 [00:01<00:04, 118.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 145/625 [00:01<00:03, 121.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 158/625 [00:01<00:03, 123.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 171/625 [00:01<00:03, 123.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 125.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 127.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 212/625 [00:01<00:03, 127.88it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 128.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 128.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:02<00:02, 127.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 126.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 279/625 [00:02<00:02, 128.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 292/625 [00:02<00:02, 128.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 128.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 319/625 [00:02<00:02, 128.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 333/625 [00:02<00:02, 130.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 347/625 [00:02<00:02, 130.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 361/625 [00:02<00:02, 131.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 375/625 [00:03<00:01, 129.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 389/625 [00:03<00:01, 130.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 403/625 [00:03<00:01, 129.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 417/625 [00:03<00:01, 130.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 431/625 [00:03<00:01, 130.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 445/625 [00:03<00:01, 126.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 458/625 [00:03<00:01, 126.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 471/625 [00:03<00:01, 126.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 484/625 [00:03<00:01, 126.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 497/625 [00:03<00:01, 127.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 510/625 [00:04<00:00, 126.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 523/625 [00:04<00:00, 125.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 536/625 [00:04<00:00, 122.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 549/625 [00:04<00:00, 120.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 563/625 [00:04<00:00, 123.81it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 576/625 [00:04<00:00, 123.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 589/625 [00:04<00:00, 122.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 602/625 [00:04<00:00, 118.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 616/625 [00:04<00:00, 122.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 123.85it/s]\u001b[A\u001b[A02/21/2020 12:32:13 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:32:13 - INFO - __main__ - perplexity = tensor(831.8867)\n",
"\n",
"Iteration: 64% 3628/5625 [02:34<17:47, 1.87it/s]\u001b[A\n",
"Iteration: 65% 3632/5625 [02:34<12:42, 2.61it/s]\u001b[A\n",
"Iteration: 65% 3636/5625 [02:34<09:09, 3.62it/s]\u001b[A\n",
"Iteration: 65% 3640/5625 [02:34<06:40, 4.96it/s]\u001b[A\n",
"Iteration: 65% 3644/5625 [02:34<04:56, 6.68it/s]\u001b[A\n",
"Iteration: 65% 3648/5625 [02:34<03:43, 8.85it/s]\u001b[A\n",
"Iteration: 65% 3652/5625 [02:35<02:52, 11.43it/s]\u001b[A\n",
"Iteration: 65% 3656/5625 [02:35<02:16, 14.42it/s]\u001b[A\n",
"Iteration: 65% 3660/5625 [02:35<01:51, 17.66it/s]\u001b[A\n",
"Iteration: 65% 3664/5625 [02:35<01:34, 20.67it/s]\u001b[A\n",
"Iteration: 65% 3668/5625 [02:35<01:22, 23.75it/s]\u001b[A\n",
"Iteration: 65% 3672/5625 [02:35<01:13, 26.43it/s]\u001b[A\n",
"Iteration: 65% 3676/5625 [02:35<01:08, 28.62it/s]\u001b[A\n",
"Iteration: 65% 3680/5625 [02:35<01:03, 30.72it/s]\u001b[A\n",
"Iteration: 65% 3684/5625 [02:35<01:00, 32.15it/s]\u001b[A\n",
"Iteration: 66% 3688/5625 [02:36<00:58, 33.33it/s]\u001b[A\n",
"Iteration: 66% 3692/5625 [02:36<00:56, 34.22it/s]\u001b[A\n",
"Iteration: 66% 3696/5625 [02:36<00:55, 34.96it/s]\u001b[A\n",
"Iteration: 66% 3700/5625 [02:36<00:55, 34.57it/s]\u001b[A\n",
"Iteration: 66% 3704/5625 [02:36<00:54, 35.34it/s]\u001b[A\n",
"Iteration: 66% 3708/5625 [02:36<00:53, 35.98it/s]\u001b[A\n",
"Iteration: 66% 3712/5625 [02:36<00:53, 35.63it/s]\u001b[A\n",
"Iteration: 66% 3716/5625 [02:36<00:54, 35.12it/s]\u001b[A\n",
"Iteration: 66% 3720/5625 [02:36<00:53, 35.62it/s]\u001b[A\n",
"Iteration: 66% 3724/5625 [02:37<00:53, 35.44it/s]\u001b[A\n",
"Iteration: 66% 3728/5625 [02:37<00:54, 34.99it/s]\u001b[A\n",
"Iteration: 66% 3732/5625 [02:37<00:54, 35.05it/s]\u001b[A\n",
"Iteration: 66% 3736/5625 [02:37<00:54, 34.85it/s]\u001b[A\n",
"Iteration: 66% 3740/5625 [02:37<00:53, 35.37it/s]\u001b[A\n",
"Iteration: 67% 3744/5625 [02:37<00:52, 35.81it/s]\u001b[A\n",
"Iteration: 67% 3748/5625 [02:37<00:53, 35.33it/s]\u001b[A\n",
"Iteration: 67% 3752/5625 [02:37<00:53, 35.21it/s]\u001b[A\n",
"Iteration: 67% 3756/5625 [02:37<00:52, 35.63it/s]\u001b[A\n",
"Iteration: 67% 3760/5625 [02:38<00:52, 35.63it/s]\u001b[A\n",
"Iteration: 67% 3764/5625 [02:38<00:51, 36.09it/s]\u001b[A\n",
"Iteration: 67% 3768/5625 [02:38<00:51, 36.26it/s]\u001b[A\n",
"Iteration: 67% 3772/5625 [02:38<00:51, 36.13it/s]\u001b[A\n",
"Iteration: 67% 3776/5625 [02:38<00:51, 36.21it/s]\u001b[A\n",
"Iteration: 67% 3780/5625 [02:38<00:51, 36.08it/s]\u001b[A\n",
"Iteration: 67% 3784/5625 [02:38<00:51, 36.02it/s]\u001b[A\n",
"Iteration: 67% 3788/5625 [02:38<00:52, 35.02it/s]\u001b[A\n",
"Iteration: 67% 3792/5625 [02:38<00:53, 34.15it/s]\u001b[A\n",
"Iteration: 67% 3796/5625 [02:39<00:52, 34.88it/s]\u001b[A\n",
"Iteration: 68% 3800/5625 [02:39<00:52, 35.09it/s]\u001b[A\n",
"Iteration: 68% 3804/5625 [02:39<00:52, 34.76it/s]\u001b[A\n",
"Iteration: 68% 3808/5625 [02:39<00:52, 34.47it/s]\u001b[A\n",
"Iteration: 68% 3812/5625 [02:39<00:51, 35.21it/s]\u001b[A\n",
"Iteration: 68% 3816/5625 [02:39<00:50, 35.50it/s]\u001b[A\n",
"Iteration: 68% 3820/5625 [02:39<00:50, 36.00it/s]\u001b[A\n",
"Iteration: 68% 3824/5625 [02:39<00:49, 36.48it/s]\u001b[A\n",
"Iteration: 68% 3828/5625 [02:39<00:51, 35.01it/s]\u001b[A\n",
"Iteration: 68% 3832/5625 [02:40<00:50, 35.65it/s]\u001b[A\n",
"Iteration: 68% 3836/5625 [02:40<00:49, 35.92it/s]\u001b[A\n",
"Iteration: 68% 3840/5625 [02:40<00:49, 35.98it/s]\u001b[A\n",
"Iteration: 68% 3844/5625 [02:40<00:49, 35.71it/s]\u001b[A\n",
"Iteration: 68% 3848/5625 [02:40<00:49, 35.64it/s]\u001b[A\n",
"Iteration: 68% 3852/5625 [02:40<00:49, 35.99it/s]\u001b[A\n",
"Iteration: 69% 3856/5625 [02:40<00:48, 36.22it/s]\u001b[A\n",
"Iteration: 69% 3860/5625 [02:40<00:48, 36.24it/s]\u001b[A\n",
"Iteration: 69% 3864/5625 [02:40<00:48, 36.38it/s]\u001b[A\n",
"Iteration: 69% 3868/5625 [02:41<00:48, 36.05it/s]\u001b[A\n",
"Iteration: 69% 3872/5625 [02:41<00:49, 35.43it/s]\u001b[A\n",
"Iteration: 69% 3876/5625 [02:41<00:49, 35.08it/s]\u001b[A\n",
"Iteration: 69% 3880/5625 [02:41<00:51, 33.61it/s]\u001b[A\n",
"Iteration: 69% 3884/5625 [02:41<00:50, 34.42it/s]\u001b[A\n",
"Iteration: 69% 3888/5625 [02:41<00:50, 34.74it/s]\u001b[A\n",
"Iteration: 69% 3892/5625 [02:41<00:49, 34.90it/s]\u001b[A\n",
"Iteration: 69% 3896/5625 [02:41<00:48, 35.33it/s]\u001b[A\n",
"Iteration: 69% 3900/5625 [02:41<00:48, 35.72it/s]\u001b[A\n",
"Iteration: 69% 3904/5625 [02:42<00:48, 35.67it/s]\u001b[A\n",
"Iteration: 69% 3908/5625 [02:42<00:48, 35.77it/s]\u001b[A\n",
"Iteration: 70% 3912/5625 [02:42<00:47, 35.82it/s]\u001b[A\n",
"Iteration: 70% 3916/5625 [02:42<00:47, 36.28it/s]\u001b[A\n",
"Iteration: 70% 3920/5625 [02:42<00:47, 35.72it/s]\u001b[A\n",
"Iteration: 70% 3924/5625 [02:42<00:47, 35.60it/s]\u001b[A\n",
"Iteration: 70% 3928/5625 [02:42<00:47, 35.87it/s]\u001b[A\n",
"Iteration: 70% 3932/5625 [02:42<00:46, 36.08it/s]\u001b[A\n",
"Iteration: 70% 3936/5625 [02:42<00:46, 35.95it/s]\u001b[A\n",
"Iteration: 70% 3940/5625 [02:43<00:47, 35.24it/s]\u001b[A\n",
"Iteration: 70% 3944/5625 [02:43<00:46, 36.04it/s]\u001b[A\n",
"Iteration: 70% 3948/5625 [02:43<00:46, 36.35it/s]\u001b[A\n",
"Iteration: 70% 3952/5625 [02:43<00:48, 34.78it/s]\u001b[A\n",
"Iteration: 70% 3956/5625 [02:43<00:48, 34.50it/s]\u001b[A\n",
"Iteration: 70% 3960/5625 [02:43<00:48, 34.37it/s]\u001b[A\n",
"Iteration: 70% 3964/5625 [02:43<00:47, 34.61it/s]\u001b[A\n",
"Iteration: 71% 3968/5625 [02:43<00:47, 35.03it/s]\u001b[A\n",
"Iteration: 71% 3972/5625 [02:44<00:46, 35.37it/s]\u001b[A\n",
"Iteration: 71% 3976/5625 [02:44<00:46, 35.61it/s]\u001b[A\n",
"Iteration: 71% 3980/5625 [02:44<00:45, 36.04it/s]\u001b[A\n",
"Iteration: 71% 3984/5625 [02:44<00:45, 35.90it/s]\u001b[A\n",
"Iteration: 71% 3988/5625 [02:44<00:46, 35.43it/s]\u001b[A\n",
"Iteration: 71% 3992/5625 [02:44<00:47, 34.63it/s]\u001b[A\n",
"Iteration: 71% 3996/5625 [02:44<00:47, 34.38it/s]\u001b[A\n",
"Iteration: 71% 4000/5625 [02:44<00:46, 35.14it/s]\u001b[A\n",
"Iteration: 71% 4004/5625 [02:44<00:45, 35.68it/s]\u001b[A\n",
"Iteration: 71% 4008/5625 [02:45<00:44, 36.02it/s]\u001b[A\n",
"Iteration: 71% 4012/5625 [02:45<00:44, 36.43it/s]\u001b[A\n",
"Iteration: 71% 4016/5625 [02:45<00:44, 36.04it/s]\u001b[A\n",
"Iteration: 71% 4020/5625 [02:45<00:44, 36.37it/s]\u001b[A\n",
"Iteration: 72% 4024/5625 [02:45<00:45, 34.95it/s]\u001b[A\n",
"Iteration: 72% 4028/5625 [02:45<00:45, 35.12it/s]\u001b[A\n",
"Iteration: 72% 4032/5625 [02:45<00:44, 35.59it/s]\u001b[A\n",
"Iteration: 72% 4036/5625 [02:45<00:44, 35.72it/s]\u001b[A\n",
"Iteration: 72% 4040/5625 [02:45<00:43, 36.25it/s]\u001b[A\n",
"Iteration: 72% 4044/5625 [02:46<00:43, 36.04it/s]\u001b[A\n",
"Iteration: 72% 4048/5625 [02:46<00:43, 36.45it/s]\u001b[A\n",
"Iteration: 72% 4052/5625 [02:46<00:43, 36.27it/s]\u001b[A\n",
"Iteration: 72% 4056/5625 [02:46<00:43, 36.41it/s]\u001b[A\n",
"Iteration: 72% 4060/5625 [02:46<00:42, 36.58it/s]\u001b[A\n",
"Iteration: 72% 4064/5625 [02:46<00:44, 35.29it/s]\u001b[A\n",
"Iteration: 72% 4068/5625 [02:46<00:43, 36.06it/s]\u001b[A\n",
"Iteration: 72% 4072/5625 [02:46<00:42, 36.59it/s]\u001b[A\n",
"Iteration: 72% 4076/5625 [02:46<00:42, 36.44it/s]\u001b[A\n",
"Iteration: 73% 4080/5625 [02:47<00:43, 35.81it/s]\u001b[A\n",
"Iteration: 73% 4084/5625 [02:47<00:42, 35.99it/s]\u001b[A\n",
"Iteration: 73% 4088/5625 [02:47<00:42, 36.17it/s]\u001b[A\n",
"Iteration: 73% 4092/5625 [02:47<00:42, 36.26it/s]\u001b[A\n",
"Iteration: 73% 4096/5625 [02:47<00:42, 36.33it/s]\u001b[A\n",
"Iteration: 73% 4100/5625 [02:47<00:43, 35.39it/s]\u001b[A\n",
"Iteration: 73% 4104/5625 [02:47<00:42, 35.54it/s]\u001b[A\n",
"Iteration: 73% 4108/5625 [02:47<00:42, 35.59it/s]\u001b[A\n",
"Iteration: 73% 4112/5625 [02:47<00:43, 35.01it/s]\u001b[A\n",
"Iteration: 73% 4116/5625 [02:48<00:42, 35.59it/s]\u001b[A\n",
"Iteration: 73% 4120/5625 [02:48<00:41, 36.01it/s]\u001b[A\n",
"Iteration: 73% 4124/5625 [02:48<00:41, 36.06it/s]\u001b[A02/21/2020 12:32:27 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:32:29 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:32:29 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:32:29 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 129.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 129.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 127.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 125.54it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 126.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 126.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 126.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 104/625 [00:00<00:04, 123.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 117/625 [00:00<00:04, 122.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 131/625 [00:01<00:03, 125.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 144/625 [00:01<00:03, 126.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 157/625 [00:01<00:03, 126.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 170/625 [00:01<00:03, 125.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 184/625 [00:01<00:03, 126.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 197/625 [00:01<00:03, 127.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 210/625 [00:01<00:03, 127.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 223/625 [00:01<00:03, 124.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 236/625 [00:01<00:03, 125.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 249/625 [00:01<00:03, 125.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 262/625 [00:02<00:02, 126.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 275/625 [00:02<00:02, 125.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 288/625 [00:02<00:02, 126.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 302/625 [00:02<00:02, 128.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 315/625 [00:02<00:02, 125.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 328/625 [00:02<00:02, 124.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 341/625 [00:02<00:02, 126.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 354/625 [00:02<00:02, 123.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 368/625 [00:02<00:02, 125.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 381/625 [00:03<00:01, 125.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 63% 395/625 [00:03<00:01, 127.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 65% 408/625 [00:03<00:01, 125.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 421/625 [00:03<00:01, 124.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 435/625 [00:03<00:01, 126.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 448/625 [00:03<00:01, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 461/625 [00:03<00:01, 127.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 474/625 [00:03<00:01, 126.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 487/625 [00:03<00:01, 125.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 500/625 [00:03<00:00, 126.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 513/625 [00:04<00:00, 125.81it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 526/625 [00:04<00:00, 126.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 539/625 [00:04<00:00, 127.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 553/625 [00:04<00:00, 128.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 567/625 [00:04<00:00, 128.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 580/625 [00:04<00:00, 129.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 593/625 [00:04<00:00, 129.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 606/625 [00:04<00:00, 127.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 619/625 [00:04<00:00, 126.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 126.33it/s]\u001b[A\u001b[A02/21/2020 12:32:34 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:32:34 - INFO - __main__ - perplexity = tensor(825.2130)\n",
"\n",
"Iteration: 73% 4128/5625 [02:54<12:55, 1.93it/s]\u001b[A\n",
"Iteration: 73% 4132/5625 [02:55<09:14, 2.69it/s]\u001b[A\n",
"Iteration: 74% 4136/5625 [02:55<06:38, 3.73it/s]\u001b[A\n",
"Iteration: 74% 4140/5625 [02:55<04:50, 5.11it/s]\u001b[A\n",
"Iteration: 74% 4144/5625 [02:55<03:34, 6.90it/s]\u001b[A\n",
"Iteration: 74% 4148/5625 [02:55<02:41, 9.14it/s]\u001b[A\n",
"Iteration: 74% 4152/5625 [02:55<02:04, 11.79it/s]\u001b[A\n",
"Iteration: 74% 4156/5625 [02:55<01:40, 14.64it/s]\u001b[A\n",
"Iteration: 74% 4160/5625 [02:55<01:22, 17.71it/s]\u001b[A\n",
"Iteration: 74% 4164/5625 [02:55<01:09, 21.04it/s]\u001b[A\n",
"Iteration: 74% 4168/5625 [02:56<01:01, 23.78it/s]\u001b[A\n",
"Iteration: 74% 4172/5625 [02:56<00:54, 26.57it/s]\u001b[A\n",
"Iteration: 74% 4176/5625 [02:56<00:49, 29.26it/s]\u001b[A\n",
"Iteration: 74% 4180/5625 [02:56<00:46, 31.16it/s]\u001b[A\n",
"Iteration: 74% 4184/5625 [02:56<00:44, 32.62it/s]\u001b[A\n",
"Iteration: 74% 4188/5625 [02:56<00:42, 33.97it/s]\u001b[A\n",
"Iteration: 75% 4192/5625 [02:56<00:41, 34.21it/s]\u001b[A\n",
"Iteration: 75% 4196/5625 [02:56<00:41, 34.32it/s]\u001b[A\n",
"Iteration: 75% 4200/5625 [02:56<00:40, 35.14it/s]\u001b[A\n",
"Iteration: 75% 4204/5625 [02:57<00:39, 35.63it/s]\u001b[A\n",
"Iteration: 75% 4208/5625 [02:57<00:39, 35.98it/s]\u001b[A\n",
"Iteration: 75% 4212/5625 [02:57<00:38, 36.34it/s]\u001b[A\n",
"Iteration: 75% 4216/5625 [02:57<00:39, 35.87it/s]\u001b[A\n",
"Iteration: 75% 4220/5625 [02:57<00:39, 35.62it/s]\u001b[A\n",
"Iteration: 75% 4224/5625 [02:57<00:39, 35.87it/s]\u001b[A\n",
"Iteration: 75% 4228/5625 [02:57<00:38, 36.36it/s]\u001b[A\n",
"Iteration: 75% 4232/5625 [02:57<00:38, 36.01it/s]\u001b[A\n",
"Iteration: 75% 4236/5625 [02:57<00:38, 36.27it/s]\u001b[A\n",
"Iteration: 75% 4240/5625 [02:58<00:38, 36.34it/s]\u001b[A\n",
"Iteration: 75% 4244/5625 [02:58<00:37, 36.79it/s]\u001b[A\n",
"Iteration: 76% 4248/5625 [02:58<00:37, 36.52it/s]\u001b[A\n",
"Iteration: 76% 4252/5625 [02:58<00:37, 36.41it/s]\u001b[A\n",
"Iteration: 76% 4256/5625 [02:58<00:37, 36.63it/s]\u001b[A\n",
"Iteration: 76% 4260/5625 [02:58<00:37, 36.40it/s]\u001b[A\n",
"Iteration: 76% 4264/5625 [02:58<00:38, 35.81it/s]\u001b[A\n",
"Iteration: 76% 4268/5625 [02:58<00:38, 35.63it/s]\u001b[A\n",
"Iteration: 76% 4272/5625 [02:58<00:37, 36.15it/s]\u001b[A\n",
"Iteration: 76% 4276/5625 [02:59<00:37, 35.67it/s]\u001b[A\n",
"Iteration: 76% 4280/5625 [02:59<00:37, 35.86it/s]\u001b[A\n",
"Iteration: 76% 4284/5625 [02:59<00:37, 36.18it/s]\u001b[A\n",
"Iteration: 76% 4288/5625 [02:59<00:36, 36.39it/s]\u001b[A\n",
"Iteration: 76% 4292/5625 [02:59<00:36, 36.81it/s]\u001b[A\n",
"Iteration: 76% 4296/5625 [02:59<00:35, 37.13it/s]\u001b[A\n",
"Iteration: 76% 4300/5625 [02:59<00:36, 36.79it/s]\u001b[A\n",
"Iteration: 77% 4304/5625 [02:59<00:35, 37.02it/s]\u001b[A\n",
"Iteration: 77% 4308/5625 [02:59<00:36, 36.46it/s]\u001b[A\n",
"Iteration: 77% 4312/5625 [02:59<00:36, 36.37it/s]\u001b[A\n",
"Iteration: 77% 4316/5625 [03:00<00:35, 36.78it/s]\u001b[A\n",
"Iteration: 77% 4320/5625 [03:00<00:35, 36.90it/s]\u001b[A\n",
"Iteration: 77% 4324/5625 [03:00<00:35, 36.79it/s]\u001b[A\n",
"Iteration: 77% 4328/5625 [03:00<00:35, 36.82it/s]\u001b[A\n",
"Iteration: 77% 4332/5625 [03:00<00:35, 36.82it/s]\u001b[A\n",
"Iteration: 77% 4336/5625 [03:00<00:34, 36.89it/s]\u001b[A\n",
"Iteration: 77% 4340/5625 [03:00<00:34, 36.90it/s]\u001b[A\n",
"Iteration: 77% 4344/5625 [03:00<00:34, 36.72it/s]\u001b[A\n",
"Iteration: 77% 4348/5625 [03:00<00:34, 36.86it/s]\u001b[A\n",
"Iteration: 77% 4352/5625 [03:01<00:34, 36.83it/s]\u001b[A\n",
"Iteration: 77% 4356/5625 [03:01<00:34, 36.70it/s]\u001b[A\n",
"Iteration: 78% 4360/5625 [03:01<00:34, 36.82it/s]\u001b[A\n",
"Iteration: 78% 4364/5625 [03:01<00:34, 36.81it/s]\u001b[A\n",
"Iteration: 78% 4368/5625 [03:01<00:34, 36.78it/s]\u001b[A\n",
"Iteration: 78% 4372/5625 [03:01<00:34, 36.64it/s]\u001b[A\n",
"Iteration: 78% 4376/5625 [03:01<00:34, 36.59it/s]\u001b[A\n",
"Iteration: 78% 4380/5625 [03:01<00:34, 35.85it/s]\u001b[A\n",
"Iteration: 78% 4384/5625 [03:01<00:34, 35.84it/s]\u001b[A\n",
"Iteration: 78% 4388/5625 [03:02<00:34, 35.40it/s]\u001b[A\n",
"Iteration: 78% 4392/5625 [03:02<00:34, 35.81it/s]\u001b[A\n",
"Iteration: 78% 4396/5625 [03:02<00:33, 36.32it/s]\u001b[A\n",
"Iteration: 78% 4400/5625 [03:02<00:33, 36.10it/s]\u001b[A\n",
"Iteration: 78% 4404/5625 [03:02<00:33, 36.74it/s]\u001b[A\n",
"Iteration: 78% 4408/5625 [03:02<00:33, 36.72it/s]\u001b[A\n",
"Iteration: 78% 4412/5625 [03:02<00:32, 37.17it/s]\u001b[A\n",
"Iteration: 79% 4416/5625 [03:02<00:32, 37.11it/s]\u001b[A\n",
"Iteration: 79% 4420/5625 [03:02<00:32, 36.83it/s]\u001b[A\n",
"Iteration: 79% 4424/5625 [03:03<00:32, 36.78it/s]\u001b[A\n",
"Iteration: 79% 4428/5625 [03:03<00:32, 36.84it/s]\u001b[A\n",
"Iteration: 79% 4432/5625 [03:03<00:32, 36.70it/s]\u001b[A\n",
"Iteration: 79% 4436/5625 [03:03<00:32, 36.79it/s]\u001b[A\n",
"Iteration: 79% 4440/5625 [03:03<00:32, 36.63it/s]\u001b[A\n",
"Iteration: 79% 4444/5625 [03:03<00:32, 36.72it/s]\u001b[A\n",
"Iteration: 79% 4448/5625 [03:03<00:32, 36.67it/s]\u001b[A\n",
"Iteration: 79% 4452/5625 [03:03<00:31, 37.27it/s]\u001b[A\n",
"Iteration: 79% 4456/5625 [03:03<00:31, 36.68it/s]\u001b[A\n",
"Iteration: 79% 4460/5625 [03:04<00:32, 35.90it/s]\u001b[A\n",
"Iteration: 79% 4464/5625 [03:04<00:32, 35.72it/s]\u001b[A\n",
"Iteration: 79% 4468/5625 [03:04<00:32, 36.15it/s]\u001b[A\n",
"Iteration: 80% 4472/5625 [03:04<00:31, 36.04it/s]\u001b[A\n",
"Iteration: 80% 4476/5625 [03:04<00:31, 36.51it/s]\u001b[A\n",
"Iteration: 80% 4480/5625 [03:04<00:31, 36.55it/s]\u001b[A\n",
"Iteration: 80% 4484/5625 [03:04<00:31, 36.51it/s]\u001b[A\n",
"Iteration: 80% 4488/5625 [03:04<00:31, 36.63it/s]\u001b[A\n",
"Iteration: 80% 4492/5625 [03:04<00:31, 36.33it/s]\u001b[A\n",
"Iteration: 80% 4496/5625 [03:05<00:30, 36.60it/s]\u001b[A\n",
"Iteration: 80% 4500/5625 [03:05<00:30, 36.42it/s]\u001b[A\n",
"Iteration: 80% 4504/5625 [03:05<00:30, 36.60it/s]\u001b[A\n",
"Iteration: 80% 4508/5625 [03:05<00:30, 36.78it/s]\u001b[A\n",
"Iteration: 80% 4512/5625 [03:05<00:29, 37.23it/s]\u001b[A\n",
"Iteration: 80% 4516/5625 [03:05<00:30, 36.85it/s]\u001b[A\n",
"Iteration: 80% 4520/5625 [03:05<00:30, 36.18it/s]\u001b[A\n",
"Iteration: 80% 4524/5625 [03:05<00:30, 36.60it/s]\u001b[A\n",
"Iteration: 80% 4528/5625 [03:05<00:29, 36.88it/s]\u001b[A\n",
"Iteration: 81% 4532/5625 [03:05<00:30, 36.41it/s]\u001b[A\n",
"Iteration: 81% 4536/5625 [03:06<00:29, 36.64it/s]\u001b[A\n",
"Iteration: 81% 4540/5625 [03:06<00:29, 37.10it/s]\u001b[A\n",
"Iteration: 81% 4544/5625 [03:06<00:29, 36.82it/s]\u001b[A\n",
"Iteration: 81% 4548/5625 [03:06<00:29, 36.74it/s]\u001b[A\n",
"Iteration: 81% 4552/5625 [03:06<00:29, 36.78it/s]\u001b[A\n",
"Iteration: 81% 4556/5625 [03:06<00:29, 36.59it/s]\u001b[A\n",
"Iteration: 81% 4560/5625 [03:06<00:28, 36.84it/s]\u001b[A\n",
"Iteration: 81% 4564/5625 [03:06<00:28, 36.84it/s]\u001b[A\n",
"Iteration: 81% 4568/5625 [03:06<00:29, 36.31it/s]\u001b[A\n",
"Iteration: 81% 4572/5625 [03:07<00:28, 36.48it/s]\u001b[A\n",
"Iteration: 81% 4576/5625 [03:07<00:29, 35.59it/s]\u001b[A\n",
"Iteration: 81% 4580/5625 [03:07<00:29, 35.44it/s]\u001b[A\n",
"Iteration: 81% 4584/5625 [03:07<00:28, 35.99it/s]\u001b[A\n",
"Iteration: 82% 4588/5625 [03:07<00:28, 36.24it/s]\u001b[A\n",
"Iteration: 82% 4592/5625 [03:07<00:28, 36.46it/s]\u001b[A\n",
"Iteration: 82% 4596/5625 [03:07<00:28, 36.69it/s]\u001b[A\n",
"Iteration: 82% 4600/5625 [03:07<00:27, 36.73it/s]\u001b[A\n",
"Iteration: 82% 4604/5625 [03:07<00:27, 36.64it/s]\u001b[A\n",
"Iteration: 82% 4608/5625 [03:08<00:28, 35.53it/s]\u001b[A\n",
"Iteration: 82% 4612/5625 [03:08<00:28, 36.06it/s]\u001b[A\n",
"Iteration: 82% 4616/5625 [03:08<00:27, 36.27it/s]\u001b[A\n",
"Iteration: 82% 4620/5625 [03:08<00:27, 36.74it/s]\u001b[A\n",
"Iteration: 82% 4624/5625 [03:08<00:27, 36.55it/s]\u001b[A02/21/2020 12:32:48 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:32:50 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:32:50 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:32:50 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 129.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 129.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 125.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 127.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 127.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 127.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 125.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 104/625 [00:00<00:04, 121.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 116/625 [00:00<00:04, 121.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 130/625 [00:01<00:03, 124.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 143/625 [00:01<00:03, 125.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 156/625 [00:01<00:03, 125.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 169/625 [00:01<00:03, 123.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 182/625 [00:01<00:03, 125.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 31% 195/625 [00:01<00:03, 124.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 33% 208/625 [00:01<00:03, 125.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 35% 221/625 [00:01<00:03, 124.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 37% 234/625 [00:01<00:03, 126.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 247/625 [00:01<00:03, 124.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 260/625 [00:02<00:02, 126.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 273/625 [00:02<00:02, 127.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 286/625 [00:02<00:02, 127.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 300/625 [00:02<00:02, 128.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 313/625 [00:02<00:02, 127.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 327/625 [00:02<00:02, 128.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 54% 340/625 [00:02<00:02, 127.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 353/625 [00:02<00:02, 126.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 366/625 [00:02<00:02, 127.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 379/625 [00:03<00:01, 126.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 63% 392/625 [00:03<00:01, 124.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 65% 406/625 [00:03<00:01, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 419/625 [00:03<00:01, 127.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 433/625 [00:03<00:01, 128.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 446/625 [00:03<00:01, 127.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 459/625 [00:03<00:01, 124.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 472/625 [00:03<00:01, 116.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 484/625 [00:03<00:01, 115.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 496/625 [00:03<00:01, 112.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 508/625 [00:04<00:01, 111.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 520/625 [00:04<00:00, 109.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 110.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 544/625 [00:04<00:00, 110.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 556/625 [00:04<00:00, 112.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 570/625 [00:04<00:00, 117.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 582/625 [00:04<00:00, 112.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 594/625 [00:04<00:00, 109.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 606/625 [00:04<00:00, 110.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 618/625 [00:05<00:00, 108.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 121.26it/s]\u001b[A\u001b[A02/21/2020 12:32:55 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:32:55 - INFO - __main__ - perplexity = tensor(827.0211)\n",
"\n",
"Iteration: 82% 4628/5625 [03:15<09:09, 1.81it/s]\u001b[A\n",
"Iteration: 82% 4632/5625 [03:15<06:32, 2.53it/s]\u001b[A\n",
"Iteration: 82% 4636/5625 [03:15<04:43, 3.49it/s]\u001b[A\n",
"Iteration: 82% 4640/5625 [03:15<03:25, 4.79it/s]\u001b[A\n",
"Iteration: 83% 4644/5625 [03:16<02:31, 6.48it/s]\u001b[A\n",
"Iteration: 83% 4648/5625 [03:16<01:54, 8.53it/s]\u001b[A\n",
"Iteration: 83% 4652/5625 [03:16<01:27, 11.08it/s]\u001b[A\n",
"Iteration: 83% 4656/5625 [03:16<01:08, 14.07it/s]\u001b[A\n",
"Iteration: 83% 4660/5625 [03:16<00:55, 17.30it/s]\u001b[A\n",
"Iteration: 83% 4664/5625 [03:16<00:46, 20.52it/s]\u001b[A\n",
"Iteration: 83% 4668/5625 [03:16<00:40, 23.73it/s]\u001b[A\n",
"Iteration: 83% 4672/5625 [03:16<00:35, 26.67it/s]\u001b[A\n",
"Iteration: 83% 4676/5625 [03:16<00:32, 28.92it/s]\u001b[A\n",
"Iteration: 83% 4680/5625 [03:17<00:31, 30.38it/s]\u001b[A\n",
"Iteration: 83% 4684/5625 [03:17<00:30, 31.35it/s]\u001b[A\n",
"Iteration: 83% 4688/5625 [03:17<00:28, 32.66it/s]\u001b[A\n",
"Iteration: 83% 4692/5625 [03:17<00:27, 33.95it/s]\u001b[A\n",
"Iteration: 83% 4696/5625 [03:17<00:26, 34.56it/s]\u001b[A\n",
"Iteration: 84% 4700/5625 [03:17<00:26, 35.28it/s]\u001b[A\n",
"Iteration: 84% 4704/5625 [03:17<00:25, 35.77it/s]\u001b[A\n",
"Iteration: 84% 4708/5625 [03:17<00:25, 35.83it/s]\u001b[A\n",
"Iteration: 84% 4712/5625 [03:17<00:25, 36.45it/s]\u001b[A\n",
"Iteration: 84% 4716/5625 [03:18<00:24, 36.73it/s]\u001b[A\n",
"Iteration: 84% 4720/5625 [03:18<00:24, 36.23it/s]\u001b[A\n",
"Iteration: 84% 4724/5625 [03:18<00:24, 36.38it/s]\u001b[A\n",
"Iteration: 84% 4728/5625 [03:18<00:24, 36.12it/s]\u001b[A\n",
"Iteration: 84% 4732/5625 [03:18<00:24, 36.54it/s]\u001b[A\n",
"Iteration: 84% 4736/5625 [03:18<00:24, 36.56it/s]\u001b[A\n",
"Iteration: 84% 4740/5625 [03:18<00:24, 36.28it/s]\u001b[A\n",
"Iteration: 84% 4744/5625 [03:18<00:24, 36.66it/s]\u001b[A\n",
"Iteration: 84% 4748/5625 [03:18<00:23, 37.04it/s]\u001b[A\n",
"Iteration: 84% 4752/5625 [03:19<00:23, 37.04it/s]\u001b[A\n",
"Iteration: 85% 4756/5625 [03:19<00:23, 36.66it/s]\u001b[A\n",
"Iteration: 85% 4760/5625 [03:19<00:23, 36.51it/s]\u001b[A\n",
"Iteration: 85% 4764/5625 [03:19<00:23, 36.23it/s]\u001b[A\n",
"Iteration: 85% 4768/5625 [03:19<00:23, 36.38it/s]\u001b[A\n",
"Iteration: 85% 4772/5625 [03:19<00:23, 36.63it/s]\u001b[A\n",
"Iteration: 85% 4776/5625 [03:19<00:23, 36.54it/s]\u001b[A\n",
"Iteration: 85% 4780/5625 [03:19<00:22, 36.88it/s]\u001b[A\n",
"Iteration: 85% 4784/5625 [03:19<00:22, 37.16it/s]\u001b[A\n",
"Iteration: 85% 4788/5625 [03:20<00:22, 37.24it/s]\u001b[A\n",
"Iteration: 85% 4792/5625 [03:20<00:22, 37.15it/s]\u001b[A\n",
"Iteration: 85% 4796/5625 [03:20<00:22, 36.53it/s]\u001b[A\n",
"Iteration: 85% 4800/5625 [03:20<00:22, 36.81it/s]\u001b[A\n",
"Iteration: 85% 4804/5625 [03:20<00:22, 36.26it/s]\u001b[A\n",
"Iteration: 85% 4808/5625 [03:20<00:22, 36.78it/s]\u001b[A\n",
"Iteration: 86% 4812/5625 [03:20<00:21, 37.05it/s]\u001b[A\n",
"Iteration: 86% 4816/5625 [03:20<00:21, 37.16it/s]\u001b[A\n",
"Iteration: 86% 4820/5625 [03:20<00:21, 37.24it/s]\u001b[A\n",
"Iteration: 86% 4824/5625 [03:21<00:21, 36.93it/s]\u001b[A\n",
"Iteration: 86% 4828/5625 [03:21<00:21, 37.01it/s]\u001b[A\n",
"Iteration: 86% 4832/5625 [03:21<00:21, 36.77it/s]\u001b[A\n",
"Iteration: 86% 4836/5625 [03:21<00:21, 36.45it/s]\u001b[A\n",
"Iteration: 86% 4840/5625 [03:21<00:21, 36.63it/s]\u001b[A\n",
"Iteration: 86% 4844/5625 [03:21<00:22, 35.39it/s]\u001b[A\n",
"Iteration: 86% 4848/5625 [03:21<00:22, 35.24it/s]\u001b[A\n",
"Iteration: 86% 4852/5625 [03:21<00:21, 35.56it/s]\u001b[A\n",
"Iteration: 86% 4856/5625 [03:21<00:21, 35.89it/s]\u001b[A\n",
"Iteration: 86% 4860/5625 [03:22<00:21, 36.14it/s]\u001b[A\n",
"Iteration: 86% 4864/5625 [03:22<00:20, 36.48it/s]\u001b[A\n",
"Iteration: 87% 4868/5625 [03:22<00:20, 36.63it/s]\u001b[A\n",
"Iteration: 87% 4872/5625 [03:22<00:21, 35.75it/s]\u001b[A\n",
"Iteration: 87% 4876/5625 [03:22<00:21, 35.54it/s]\u001b[A\n",
"Iteration: 87% 4880/5625 [03:22<00:21, 35.30it/s]\u001b[A\n",
"Iteration: 87% 4884/5625 [03:22<00:20, 35.49it/s]\u001b[A\n",
"Iteration: 87% 4888/5625 [03:22<00:20, 35.94it/s]\u001b[A\n",
"Iteration: 87% 4892/5625 [03:22<00:20, 36.24it/s]\u001b[A\n",
"Iteration: 87% 4896/5625 [03:23<00:19, 36.69it/s]\u001b[A\n",
"Iteration: 87% 4900/5625 [03:23<00:20, 36.17it/s]\u001b[A\n",
"Iteration: 87% 4904/5625 [03:23<00:20, 36.00it/s]\u001b[A\n",
"Iteration: 87% 4908/5625 [03:23<00:20, 35.29it/s]\u001b[A\n",
"Iteration: 87% 4912/5625 [03:23<00:20, 35.44it/s]\u001b[A\n",
"Iteration: 87% 4916/5625 [03:23<00:20, 34.11it/s]\u001b[A\n",
"Iteration: 87% 4920/5625 [03:23<00:20, 34.95it/s]\u001b[A\n",
"Iteration: 88% 4924/5625 [03:23<00:19, 35.51it/s]\u001b[A\n",
"Iteration: 88% 4928/5625 [03:23<00:19, 36.01it/s]\u001b[A\n",
"Iteration: 88% 4932/5625 [03:24<00:19, 36.13it/s]\u001b[A\n",
"Iteration: 88% 4936/5625 [03:24<00:18, 36.68it/s]\u001b[A\n",
"Iteration: 88% 4940/5625 [03:24<00:18, 36.98it/s]\u001b[A\n",
"Iteration: 88% 4944/5625 [03:24<00:18, 36.10it/s]\u001b[A\n",
"Iteration: 88% 4948/5625 [03:24<00:18, 36.12it/s]\u001b[A\n",
"Iteration: 88% 4952/5625 [03:24<00:18, 36.55it/s]\u001b[A\n",
"Iteration: 88% 4956/5625 [03:24<00:18, 35.89it/s]\u001b[A\n",
"Iteration: 88% 4960/5625 [03:24<00:18, 35.99it/s]\u001b[A\n",
"Iteration: 88% 4964/5625 [03:24<00:18, 35.65it/s]\u001b[A\n",
"Iteration: 88% 4968/5625 [03:25<00:18, 36.20it/s]\u001b[A\n",
"Iteration: 88% 4972/5625 [03:25<00:17, 36.47it/s]\u001b[A\n",
"Iteration: 88% 4976/5625 [03:25<00:17, 37.07it/s]\u001b[A\n",
"Iteration: 89% 4980/5625 [03:25<00:17, 36.49it/s]\u001b[A\n",
"Iteration: 89% 4984/5625 [03:25<00:17, 36.92it/s]\u001b[A\n",
"Iteration: 89% 4988/5625 [03:25<00:17, 36.26it/s]\u001b[A\n",
"Iteration: 89% 4992/5625 [03:25<00:18, 34.80it/s]\u001b[A\n",
"Iteration: 89% 4996/5625 [03:25<00:18, 33.34it/s]\u001b[A\n",
"Iteration: 89% 5000/5625 [03:25<00:19, 32.56it/s]\u001b[A\n",
"Iteration: 89% 5004/5625 [03:26<00:19, 32.16it/s]\u001b[A\n",
"Iteration: 89% 5008/5625 [03:26<00:19, 32.24it/s]\u001b[A\n",
"Iteration: 89% 5012/5625 [03:26<00:18, 32.29it/s]\u001b[A\n",
"Iteration: 89% 5016/5625 [03:26<00:18, 32.36it/s]\u001b[A\n",
"Iteration: 89% 5020/5625 [03:26<00:18, 33.45it/s]\u001b[A\n",
"Iteration: 89% 5024/5625 [03:26<00:17, 34.63it/s]\u001b[A\n",
"Iteration: 89% 5028/5625 [03:26<00:17, 34.93it/s]\u001b[A\n",
"Iteration: 89% 5032/5625 [03:26<00:16, 35.22it/s]\u001b[A\n",
"Iteration: 90% 5036/5625 [03:27<00:16, 35.49it/s]\u001b[A\n",
"Iteration: 90% 5040/5625 [03:27<00:16, 36.04it/s]\u001b[A\n",
"Iteration: 90% 5044/5625 [03:27<00:15, 36.39it/s]\u001b[A\n",
"Iteration: 90% 5048/5625 [03:27<00:15, 36.85it/s]\u001b[A\n",
"Iteration: 90% 5052/5625 [03:27<00:15, 36.79it/s]\u001b[A\n",
"Iteration: 90% 5056/5625 [03:27<00:15, 36.29it/s]\u001b[A\n",
"Iteration: 90% 5060/5625 [03:27<00:15, 36.28it/s]\u001b[A\n",
"Iteration: 90% 5064/5625 [03:27<00:15, 36.65it/s]\u001b[A\n",
"Iteration: 90% 5068/5625 [03:27<00:15, 36.55it/s]\u001b[A\n",
"Iteration: 90% 5072/5625 [03:27<00:15, 36.29it/s]\u001b[A\n",
"Iteration: 90% 5076/5625 [03:28<00:15, 36.08it/s]\u001b[A\n",
"Iteration: 90% 5080/5625 [03:28<00:14, 36.45it/s]\u001b[A\n",
"Iteration: 90% 5084/5625 [03:28<00:14, 36.35it/s]\u001b[A\n",
"Iteration: 90% 5088/5625 [03:28<00:14, 36.40it/s]\u001b[A\n",
"Iteration: 91% 5092/5625 [03:28<00:14, 36.43it/s]\u001b[A\n",
"Iteration: 91% 5096/5625 [03:28<00:14, 36.16it/s]\u001b[A\n",
"Iteration: 91% 5100/5625 [03:28<00:14, 35.97it/s]\u001b[A\n",
"Iteration: 91% 5104/5625 [03:28<00:14, 36.42it/s]\u001b[A\n",
"Iteration: 91% 5108/5625 [03:28<00:14, 36.35it/s]\u001b[A\n",
"Iteration: 91% 5112/5625 [03:29<00:14, 36.25it/s]\u001b[A\n",
"Iteration: 91% 5116/5625 [03:29<00:13, 36.48it/s]\u001b[A\n",
"Iteration: 91% 5120/5625 [03:29<00:13, 36.15it/s]\u001b[A\n",
"Iteration: 91% 5124/5625 [03:29<00:14, 33.92it/s]\u001b[A02/21/2020 12:33:09 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:33:10 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:33:10 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:33:10 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 128.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 128.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 126.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 128.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 67/625 [00:00<00:04, 129.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 81/625 [00:00<00:04, 129.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 94/625 [00:00<00:04, 128.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 107/625 [00:00<00:04, 127.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 120/625 [00:00<00:03, 127.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 128.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 124.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 124.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 123.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 124.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 126.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 212/625 [00:01<00:03, 127.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 225/625 [00:01<00:03, 127.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 238/625 [00:01<00:03, 125.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 251/625 [00:01<00:02, 126.54it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 264/625 [00:02<00:02, 127.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 277/625 [00:02<00:02, 127.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 291/625 [00:02<00:02, 128.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 304/625 [00:02<00:02, 127.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 318/625 [00:02<00:02, 128.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 332/625 [00:02<00:02, 129.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 345/625 [00:02<00:02, 129.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 358/625 [00:02<00:02, 125.54it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 371/625 [00:02<00:02, 124.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 384/625 [00:03<00:01, 125.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 398/625 [00:03<00:01, 127.88it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 412/625 [00:03<00:01, 129.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 425/625 [00:03<00:01, 127.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 438/625 [00:03<00:01, 126.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 451/625 [00:03<00:01, 127.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 464/625 [00:03<00:01, 127.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 477/625 [00:03<00:01, 127.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 490/625 [00:03<00:01, 125.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 503/625 [00:03<00:00, 126.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 516/625 [00:04<00:00, 126.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 529/625 [00:04<00:00, 126.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 542/625 [00:04<00:00, 127.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 555/625 [00:04<00:00, 126.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 568/625 [00:04<00:00, 127.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 581/625 [00:04<00:00, 127.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 594/625 [00:04<00:00, 127.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 607/625 [00:04<00:00, 127.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 621/625 [00:04<00:00, 128.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.26it/s]\u001b[A\u001b[A02/21/2020 12:33:15 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:33:15 - INFO - __main__ - perplexity = tensor(828.3847)\n",
"02/21/2020 12:33:15 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-22000/config.json\n",
"02/21/2020 12:33:15 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-22000/pytorch_model.bin\n",
"02/21/2020 12:33:15 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-22000\n",
"02/21/2020 12:33:15 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-18000] due to args.save_total_limit\n",
"02/21/2020 12:33:15 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-22000\n",
"\n",
"Iteration: 91% 5128/5625 [03:36<04:22, 1.90it/s]\u001b[A\n",
"Iteration: 91% 5132/5625 [03:36<03:06, 2.65it/s]\u001b[A\n",
"Iteration: 91% 5136/5625 [03:36<02:13, 3.67it/s]\u001b[A\n",
"Iteration: 91% 5140/5625 [03:36<01:36, 5.03it/s]\u001b[A\n",
"Iteration: 91% 5144/5625 [03:36<01:11, 6.76it/s]\u001b[A\n",
"Iteration: 92% 5148/5625 [03:36<00:53, 8.96it/s]\u001b[A\n",
"Iteration: 92% 5152/5625 [03:36<00:40, 11.57it/s]\u001b[A\n",
"Iteration: 92% 5156/5625 [03:36<00:32, 14.53it/s]\u001b[A\n",
"Iteration: 92% 5160/5625 [03:37<00:25, 17.89it/s]\u001b[A\n",
"Iteration: 92% 5164/5625 [03:37<00:21, 21.12it/s]\u001b[A\n",
"Iteration: 92% 5168/5625 [03:37<00:18, 24.28it/s]\u001b[A\n",
"Iteration: 92% 5172/5625 [03:37<00:16, 27.09it/s]\u001b[A\n",
"Iteration: 92% 5176/5625 [03:37<00:15, 29.36it/s]\u001b[A\n",
"Iteration: 92% 5180/5625 [03:37<00:14, 30.90it/s]\u001b[A\n",
"Iteration: 92% 5184/5625 [03:37<00:13, 32.65it/s]\u001b[A\n",
"Iteration: 92% 5188/5625 [03:37<00:12, 33.85it/s]\u001b[A\n",
"Iteration: 92% 5192/5625 [03:37<00:12, 34.90it/s]\u001b[A\n",
"Iteration: 92% 5196/5625 [03:38<00:12, 35.53it/s]\u001b[A\n",
"Iteration: 92% 5200/5625 [03:38<00:11, 35.75it/s]\u001b[A\n",
"Iteration: 93% 5204/5625 [03:38<00:11, 35.24it/s]\u001b[A\n",
"Iteration: 93% 5208/5625 [03:38<00:11, 35.40it/s]\u001b[A\n",
"Iteration: 93% 5212/5625 [03:38<00:11, 35.68it/s]\u001b[A\n",
"Iteration: 93% 5216/5625 [03:38<00:11, 35.37it/s]\u001b[A\n",
"Iteration: 93% 5220/5625 [03:38<00:11, 35.97it/s]\u001b[A\n",
"Iteration: 93% 5224/5625 [03:38<00:11, 35.41it/s]\u001b[A\n",
"Iteration: 93% 5228/5625 [03:38<00:11, 34.98it/s]\u001b[A\n",
"Iteration: 93% 5232/5625 [03:39<00:11, 35.66it/s]\u001b[A\n",
"Iteration: 93% 5236/5625 [03:39<00:10, 35.81it/s]\u001b[A\n",
"Iteration: 93% 5240/5625 [03:39<00:10, 35.51it/s]\u001b[A\n",
"Iteration: 93% 5244/5625 [03:39<00:10, 35.98it/s]\u001b[A\n",
"Iteration: 93% 5248/5625 [03:39<00:10, 36.38it/s]\u001b[A\n",
"Iteration: 93% 5252/5625 [03:39<00:10, 35.84it/s]\u001b[A\n",
"Iteration: 93% 5256/5625 [03:39<00:10, 35.98it/s]\u001b[A\n",
"Iteration: 94% 5260/5625 [03:39<00:10, 35.79it/s]\u001b[A\n",
"Iteration: 94% 5264/5625 [03:39<00:09, 36.61it/s]\u001b[A\n",
"Iteration: 94% 5268/5625 [03:40<00:09, 36.64it/s]\u001b[A\n",
"Iteration: 94% 5272/5625 [03:40<00:09, 36.77it/s]\u001b[A\n",
"Iteration: 94% 5276/5625 [03:40<00:09, 36.77it/s]\u001b[A\n",
"Iteration: 94% 5280/5625 [03:40<00:09, 35.97it/s]\u001b[A\n",
"Iteration: 94% 5284/5625 [03:40<00:09, 36.05it/s]\u001b[A\n",
"Iteration: 94% 5288/5625 [03:40<00:09, 35.24it/s]\u001b[A\n",
"Iteration: 94% 5292/5625 [03:40<00:09, 35.42it/s]\u001b[A\n",
"Iteration: 94% 5296/5625 [03:40<00:09, 36.11it/s]\u001b[A\n",
"Iteration: 94% 5300/5625 [03:40<00:08, 36.64it/s]\u001b[A\n",
"Iteration: 94% 5304/5625 [03:41<00:08, 36.70it/s]\u001b[A\n",
"Iteration: 94% 5308/5625 [03:41<00:08, 36.19it/s]\u001b[A\n",
"Iteration: 94% 5312/5625 [03:41<00:08, 36.29it/s]\u001b[A\n",
"Iteration: 95% 5316/5625 [03:41<00:08, 35.53it/s]\u001b[A\n",
"Iteration: 95% 5320/5625 [03:41<00:08, 35.75it/s]\u001b[A\n",
"Iteration: 95% 5324/5625 [03:41<00:08, 34.87it/s]\u001b[A\n",
"Iteration: 95% 5328/5625 [03:41<00:08, 34.57it/s]\u001b[A\n",
"Iteration: 95% 5332/5625 [03:41<00:08, 35.22it/s]\u001b[A\n",
"Iteration: 95% 5336/5625 [03:41<00:08, 35.84it/s]\u001b[A\n",
"Iteration: 95% 5340/5625 [03:42<00:07, 35.76it/s]\u001b[A\n",
"Iteration: 95% 5344/5625 [03:42<00:07, 36.37it/s]\u001b[A\n",
"Iteration: 95% 5348/5625 [03:42<00:07, 36.65it/s]\u001b[A\n",
"Iteration: 95% 5352/5625 [03:42<00:07, 36.52it/s]\u001b[A\n",
"Iteration: 95% 5356/5625 [03:42<00:07, 36.27it/s]\u001b[A\n",
"Iteration: 95% 5360/5625 [03:42<00:07, 36.11it/s]\u001b[A\n",
"Iteration: 95% 5364/5625 [03:42<00:07, 36.01it/s]\u001b[A\n",
"Iteration: 95% 5368/5625 [03:42<00:07, 36.42it/s]\u001b[A\n",
"Iteration: 96% 5372/5625 [03:42<00:07, 36.12it/s]\u001b[A\n",
"Iteration: 96% 5376/5625 [03:43<00:06, 35.59it/s]\u001b[A\n",
"Iteration: 96% 5380/5625 [03:43<00:06, 36.10it/s]\u001b[A\n",
"Iteration: 96% 5384/5625 [03:43<00:06, 36.32it/s]\u001b[A\n",
"Iteration: 96% 5388/5625 [03:43<00:06, 35.29it/s]\u001b[A\n",
"Iteration: 96% 5392/5625 [03:43<00:06, 35.49it/s]\u001b[A\n",
"Iteration: 96% 5396/5625 [03:43<00:06, 36.01it/s]\u001b[A\n",
"Iteration: 96% 5400/5625 [03:43<00:06, 35.51it/s]\u001b[A\n",
"Iteration: 96% 5404/5625 [03:43<00:06, 35.91it/s]\u001b[A\n",
"Iteration: 96% 5408/5625 [03:43<00:05, 36.30it/s]\u001b[A\n",
"Iteration: 96% 5412/5625 [03:44<00:05, 36.42it/s]\u001b[A\n",
"Iteration: 96% 5416/5625 [03:44<00:05, 36.66it/s]\u001b[A\n",
"Iteration: 96% 5420/5625 [03:44<00:05, 36.87it/s]\u001b[A\n",
"Iteration: 96% 5424/5625 [03:44<00:05, 36.60it/s]\u001b[A\n",
"Iteration: 96% 5428/5625 [03:44<00:05, 36.45it/s]\u001b[A\n",
"Iteration: 97% 5432/5625 [03:44<00:05, 35.33it/s]\u001b[A\n",
"Iteration: 97% 5436/5625 [03:44<00:05, 35.15it/s]\u001b[A\n",
"Iteration: 97% 5440/5625 [03:44<00:05, 35.70it/s]\u001b[A\n",
"Iteration: 97% 5444/5625 [03:44<00:05, 36.13it/s]\u001b[A\n",
"Iteration: 97% 5448/5625 [03:45<00:04, 36.48it/s]\u001b[A\n",
"Iteration: 97% 5452/5625 [03:45<00:04, 36.72it/s]\u001b[A\n",
"Iteration: 97% 5456/5625 [03:45<00:04, 36.91it/s]\u001b[A\n",
"Iteration: 97% 5460/5625 [03:45<00:04, 37.05it/s]\u001b[A\n",
"Iteration: 97% 5464/5625 [03:45<00:04, 37.13it/s]\u001b[A\n",
"Iteration: 97% 5468/5625 [03:45<00:04, 36.30it/s]\u001b[A\n",
"Iteration: 97% 5472/5625 [03:45<00:04, 35.49it/s]\u001b[A\n",
"Iteration: 97% 5476/5625 [03:45<00:04, 35.33it/s]\u001b[A\n",
"Iteration: 97% 5480/5625 [03:45<00:04, 35.83it/s]\u001b[A\n",
"Iteration: 97% 5484/5625 [03:46<00:03, 35.93it/s]\u001b[A\n",
"Iteration: 98% 5488/5625 [03:46<00:03, 36.52it/s]\u001b[A\n",
"Iteration: 98% 5492/5625 [03:46<00:03, 36.65it/s]\u001b[A\n",
"Iteration: 98% 5496/5625 [03:46<00:03, 36.85it/s]\u001b[A\n",
"Iteration: 98% 5500/5625 [03:46<00:03, 36.82it/s]\u001b[A\n",
"Iteration: 98% 5504/5625 [03:46<00:03, 36.90it/s]\u001b[A\n",
"Iteration: 98% 5508/5625 [03:46<00:03, 36.99it/s]\u001b[A\n",
"Iteration: 98% 5512/5625 [03:46<00:03, 36.34it/s]\u001b[A\n",
"Iteration: 98% 5516/5625 [03:46<00:03, 36.28it/s]\u001b[A\n",
"Iteration: 98% 5520/5625 [03:47<00:02, 35.42it/s]\u001b[A\n",
"Iteration: 98% 5524/5625 [03:47<00:02, 35.73it/s]\u001b[A\n",
"Iteration: 98% 5528/5625 [03:47<00:02, 36.24it/s]\u001b[A\n",
"Iteration: 98% 5532/5625 [03:47<00:02, 36.63it/s]\u001b[A\n",
"Iteration: 98% 5536/5625 [03:47<00:02, 36.77it/s]\u001b[A\n",
"Iteration: 98% 5540/5625 [03:47<00:02, 36.69it/s]\u001b[A\n",
"Iteration: 99% 5544/5625 [03:47<00:02, 36.94it/s]\u001b[A\n",
"Iteration: 99% 5548/5625 [03:47<00:02, 36.57it/s]\u001b[A\n",
"Iteration: 99% 5552/5625 [03:47<00:02, 36.41it/s]\u001b[A\n",
"Iteration: 99% 5556/5625 [03:48<00:01, 36.51it/s]\u001b[A\n",
"Iteration: 99% 5560/5625 [03:48<00:01, 36.83it/s]\u001b[A\n",
"Iteration: 99% 5564/5625 [03:48<00:01, 36.75it/s]\u001b[A\n",
"Iteration: 99% 5568/5625 [03:48<00:01, 37.08it/s]\u001b[A\n",
"Iteration: 99% 5572/5625 [03:48<00:01, 37.17it/s]\u001b[A\n",
"Iteration: 99% 5576/5625 [03:48<00:01, 36.97it/s]\u001b[A\n",
"Iteration: 99% 5580/5625 [03:48<00:01, 36.49it/s]\u001b[A\n",
"Iteration: 99% 5584/5625 [03:48<00:01, 36.11it/s]\u001b[A\n",
"Iteration: 99% 5588/5625 [03:48<00:01, 36.04it/s]\u001b[A\n",
"Iteration: 99% 5592/5625 [03:49<00:00, 35.92it/s]\u001b[A\n",
"Iteration: 99% 5596/5625 [03:49<00:00, 36.53it/s]\u001b[A\n",
"Iteration: 100% 5600/5625 [03:49<00:00, 36.63it/s]\u001b[A\n",
"Iteration: 100% 5604/5625 [03:49<00:00, 36.69it/s]\u001b[A\n",
"Iteration: 100% 5608/5625 [03:49<00:00, 37.30it/s]\u001b[A\n",
"Iteration: 100% 5612/5625 [03:49<00:00, 36.59it/s]\u001b[A\n",
"Iteration: 100% 5616/5625 [03:49<00:00, 36.88it/s]\u001b[A\n",
"Iteration: 100% 5620/5625 [03:49<00:00, 37.07it/s]\u001b[A\n",
"Iteration: 100% 5624/5625 [03:49<00:00, 36.19it/s]\u001b[A02/21/2020 12:33:29 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:33:31 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:33:31 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:33:31 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 129.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 129.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 127.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 127.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 128.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 80/625 [00:00<00:04, 129.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 129.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 128.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:03, 128.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 130.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 129.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 128.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 126.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 126.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 198/625 [00:01<00:03, 121.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 211/625 [00:01<00:03, 122.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 224/625 [00:01<00:03, 124.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 237/625 [00:01<00:03, 125.81it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 250/625 [00:01<00:02, 126.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 263/625 [00:02<00:02, 126.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 276/625 [00:02<00:02, 127.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 289/625 [00:02<00:02, 125.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 302/625 [00:02<00:02, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 315/625 [00:02<00:02, 124.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 328/625 [00:02<00:02, 125.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 341/625 [00:02<00:02, 123.92it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 354/625 [00:02<00:02, 124.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 368/625 [00:02<00:02, 126.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 382/625 [00:03<00:01, 127.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 63% 396/625 [00:03<00:01, 129.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 65% 409/625 [00:03<00:01, 129.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 422/625 [00:03<00:01, 126.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 436/625 [00:03<00:01, 128.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 449/625 [00:03<00:01, 128.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 462/625 [00:03<00:01, 127.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 475/625 [00:03<00:01, 125.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 488/625 [00:03<00:01, 124.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 501/625 [00:03<00:00, 125.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 514/625 [00:04<00:00, 126.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 527/625 [00:04<00:00, 125.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 541/625 [00:04<00:00, 126.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 554/625 [00:04<00:00, 123.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 567/625 [00:04<00:00, 122.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 580/625 [00:04<00:00, 123.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 593/625 [00:04<00:00, 123.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 606/625 [00:04<00:00, 124.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 620/625 [00:04<00:00, 127.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 126.49it/s]\u001b[A\u001b[A02/21/2020 12:33:36 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:33:36 - INFO - __main__ - perplexity = tensor(811.6390)\n",
"\n",
"Epoch: 80% 4/5 [15:30<03:53, 233.49s/it]\n",
"Iteration: 0% 0/5625 [00:00<?, ?it/s]\u001b[A\n",
"Iteration: 0% 4/5625 [00:00<02:48, 33.35it/s]\u001b[A\n",
"Iteration: 0% 8/5625 [00:00<02:43, 34.45it/s]\u001b[A\n",
"Iteration: 0% 12/5625 [00:00<02:40, 35.04it/s]\u001b[A\n",
"Iteration: 0% 16/5625 [00:00<02:39, 35.22it/s]\u001b[A\n",
"Iteration: 0% 20/5625 [00:00<02:37, 35.54it/s]\u001b[A\n",
"Iteration: 0% 24/5625 [00:00<02:35, 35.98it/s]\u001b[A\n",
"Iteration: 0% 28/5625 [00:00<02:39, 35.05it/s]\u001b[A\n",
"Iteration: 1% 32/5625 [00:00<02:38, 35.30it/s]\u001b[A\n",
"Iteration: 1% 36/5625 [00:01<02:38, 35.20it/s]\u001b[A\n",
"Iteration: 1% 40/5625 [00:01<02:37, 35.35it/s]\u001b[A\n",
"Iteration: 1% 44/5625 [00:01<02:36, 35.72it/s]\u001b[A\n",
"Iteration: 1% 48/5625 [00:01<02:35, 35.85it/s]\u001b[A\n",
"Iteration: 1% 52/5625 [00:01<02:38, 35.27it/s]\u001b[A\n",
"Iteration: 1% 56/5625 [00:01<02:33, 36.18it/s]\u001b[A\n",
"Iteration: 1% 60/5625 [00:01<02:32, 36.48it/s]\u001b[A\n",
"Iteration: 1% 64/5625 [00:01<02:31, 36.66it/s]\u001b[A\n",
"Iteration: 1% 68/5625 [00:01<02:31, 36.73it/s]\u001b[A\n",
"Iteration: 1% 72/5625 [00:01<02:30, 36.87it/s]\u001b[A\n",
"Iteration: 1% 76/5625 [00:02<02:30, 36.91it/s]\u001b[A\n",
"Iteration: 1% 80/5625 [00:02<02:30, 36.76it/s]\u001b[A\n",
"Iteration: 1% 84/5625 [00:02<02:29, 37.02it/s]\u001b[A\n",
"Iteration: 2% 88/5625 [00:02<02:31, 36.44it/s]\u001b[A\n",
"Iteration: 2% 92/5625 [00:02<02:30, 36.66it/s]\u001b[A\n",
"Iteration: 2% 96/5625 [00:02<02:31, 36.49it/s]\u001b[A\n",
"Iteration: 2% 100/5625 [00:02<02:29, 36.87it/s]\u001b[A\n",
"Iteration: 2% 104/5625 [00:02<02:30, 36.78it/s]\u001b[A\n",
"Iteration: 2% 108/5625 [00:02<02:32, 36.22it/s]\u001b[A\n",
"Iteration: 2% 112/5625 [00:03<02:31, 36.40it/s]\u001b[A\n",
"Iteration: 2% 116/5625 [00:03<02:29, 36.73it/s]\u001b[A\n",
"Iteration: 2% 120/5625 [00:03<02:28, 36.99it/s]\u001b[A\n",
"Iteration: 2% 124/5625 [00:03<02:28, 37.10it/s]\u001b[A\n",
"Iteration: 2% 128/5625 [00:03<02:29, 36.69it/s]\u001b[A\n",
"Iteration: 2% 132/5625 [00:03<02:28, 36.95it/s]\u001b[A\n",
"Iteration: 2% 136/5625 [00:03<02:29, 36.83it/s]\u001b[A\n",
"Iteration: 2% 140/5625 [00:03<02:30, 36.38it/s]\u001b[A\n",
"Iteration: 3% 144/5625 [00:03<02:30, 36.37it/s]\u001b[A\n",
"Iteration: 3% 148/5625 [00:04<02:30, 36.28it/s]\u001b[A\n",
"Iteration: 3% 152/5625 [00:04<02:29, 36.57it/s]\u001b[A\n",
"Iteration: 3% 156/5625 [00:04<02:28, 36.75it/s]\u001b[A\n",
"Iteration: 3% 160/5625 [00:04<02:27, 36.98it/s]\u001b[A\n",
"Iteration: 3% 164/5625 [00:04<02:30, 36.22it/s]\u001b[A\n",
"Iteration: 3% 168/5625 [00:04<02:31, 35.98it/s]\u001b[A\n",
"Iteration: 3% 172/5625 [00:04<02:31, 36.01it/s]\u001b[A\n",
"Iteration: 3% 176/5625 [00:04<02:29, 36.55it/s]\u001b[A\n",
"Iteration: 3% 180/5625 [00:04<02:30, 36.30it/s]\u001b[A\n",
"Iteration: 3% 184/5625 [00:05<02:29, 36.37it/s]\u001b[A\n",
"Iteration: 3% 188/5625 [00:05<02:29, 36.45it/s]\u001b[A\n",
"Iteration: 3% 192/5625 [00:05<02:30, 36.13it/s]\u001b[A\n",
"Iteration: 3% 196/5625 [00:05<02:28, 36.61it/s]\u001b[A\n",
"Iteration: 4% 200/5625 [00:05<02:31, 35.82it/s]\u001b[A\n",
"Iteration: 4% 204/5625 [00:05<02:29, 36.15it/s]\u001b[A\n",
"Iteration: 4% 208/5625 [00:05<02:27, 36.72it/s]\u001b[A\n",
"Iteration: 4% 212/5625 [00:05<02:28, 36.56it/s]\u001b[A\n",
"Iteration: 4% 216/5625 [00:05<02:26, 37.02it/s]\u001b[A\n",
"Iteration: 4% 220/5625 [00:06<02:26, 36.77it/s]\u001b[A\n",
"Iteration: 4% 224/5625 [00:06<02:29, 36.25it/s]\u001b[A\n",
"Iteration: 4% 228/5625 [00:06<02:27, 36.62it/s]\u001b[A\n",
"Iteration: 4% 232/5625 [00:06<02:27, 36.60it/s]\u001b[A\n",
"Iteration: 4% 236/5625 [00:06<02:25, 37.10it/s]\u001b[A\n",
"Iteration: 4% 240/5625 [00:06<02:25, 36.92it/s]\u001b[A\n",
"Iteration: 4% 244/5625 [00:06<02:27, 36.60it/s]\u001b[A\n",
"Iteration: 4% 248/5625 [00:06<02:26, 36.59it/s]\u001b[A\n",
"Iteration: 4% 252/5625 [00:06<02:27, 36.52it/s]\u001b[A\n",
"Iteration: 5% 256/5625 [00:07<02:27, 36.44it/s]\u001b[A\n",
"Iteration: 5% 260/5625 [00:07<02:28, 36.15it/s]\u001b[A\n",
"Iteration: 5% 264/5625 [00:07<02:28, 36.06it/s]\u001b[A\n",
"Iteration: 5% 268/5625 [00:07<02:27, 36.30it/s]\u001b[A\n",
"Iteration: 5% 272/5625 [00:07<02:33, 34.84it/s]\u001b[A\n",
"Iteration: 5% 276/5625 [00:07<02:31, 35.36it/s]\u001b[A\n",
"Iteration: 5% 280/5625 [00:07<02:29, 35.81it/s]\u001b[A\n",
"Iteration: 5% 284/5625 [00:07<02:30, 35.54it/s]\u001b[A\n",
"Iteration: 5% 288/5625 [00:07<02:26, 36.39it/s]\u001b[A\n",
"Iteration: 5% 292/5625 [00:08<02:25, 36.60it/s]\u001b[A\n",
"Iteration: 5% 296/5625 [00:08<02:25, 36.51it/s]\u001b[A\n",
"Iteration: 5% 300/5625 [00:08<02:24, 36.73it/s]\u001b[A\n",
"Iteration: 5% 304/5625 [00:08<02:29, 35.69it/s]\u001b[A\n",
"Iteration: 5% 308/5625 [00:08<02:27, 35.98it/s]\u001b[A\n",
"Iteration: 6% 312/5625 [00:08<02:28, 35.82it/s]\u001b[A\n",
"Iteration: 6% 316/5625 [00:08<02:29, 35.62it/s]\u001b[A\n",
"Iteration: 6% 320/5625 [00:08<02:27, 35.98it/s]\u001b[A\n",
"Iteration: 6% 324/5625 [00:08<02:27, 35.89it/s]\u001b[A\n",
"Iteration: 6% 328/5625 [00:09<02:29, 35.50it/s]\u001b[A\n",
"Iteration: 6% 332/5625 [00:09<02:28, 35.67it/s]\u001b[A\n",
"Iteration: 6% 336/5625 [00:09<02:29, 35.28it/s]\u001b[A\n",
"Iteration: 6% 340/5625 [00:09<02:27, 35.85it/s]\u001b[A\n",
"Iteration: 6% 344/5625 [00:09<02:25, 36.41it/s]\u001b[A\n",
"Iteration: 6% 348/5625 [00:09<02:26, 35.90it/s]\u001b[A\n",
"Iteration: 6% 352/5625 [00:09<02:26, 35.93it/s]\u001b[A\n",
"Iteration: 6% 356/5625 [00:09<02:24, 36.35it/s]\u001b[A\n",
"Iteration: 6% 360/5625 [00:09<02:24, 36.46it/s]\u001b[A\n",
"Iteration: 6% 364/5625 [00:10<02:25, 36.12it/s]\u001b[A\n",
"Iteration: 7% 368/5625 [00:10<02:23, 36.56it/s]\u001b[A\n",
"Iteration: 7% 372/5625 [00:10<02:24, 36.41it/s]\u001b[A\n",
"Iteration: 7% 376/5625 [00:10<02:25, 36.01it/s]\u001b[A\n",
"Iteration: 7% 380/5625 [00:10<02:25, 35.96it/s]\u001b[A\n",
"Iteration: 7% 384/5625 [00:10<02:28, 35.21it/s]\u001b[A\n",
"Iteration: 7% 388/5625 [00:10<02:28, 35.19it/s]\u001b[A\n",
"Iteration: 7% 392/5625 [00:10<02:26, 35.68it/s]\u001b[A\n",
"Iteration: 7% 396/5625 [00:10<02:24, 36.09it/s]\u001b[A\n",
"Iteration: 7% 400/5625 [00:11<02:23, 36.30it/s]\u001b[A\n",
"Iteration: 7% 404/5625 [00:11<02:24, 36.14it/s]\u001b[A\n",
"Iteration: 7% 408/5625 [00:11<02:23, 36.25it/s]\u001b[A\n",
"Iteration: 7% 412/5625 [00:11<02:23, 36.22it/s]\u001b[A\n",
"Iteration: 7% 416/5625 [00:11<02:22, 36.47it/s]\u001b[A\n",
"Iteration: 7% 420/5625 [00:11<02:22, 36.40it/s]\u001b[A\n",
"Iteration: 8% 424/5625 [00:11<02:24, 35.90it/s]\u001b[A\n",
"Iteration: 8% 428/5625 [00:11<02:23, 36.16it/s]\u001b[A\n",
"Iteration: 8% 432/5625 [00:11<02:23, 36.23it/s]\u001b[A\n",
"Iteration: 8% 436/5625 [00:12<02:23, 36.23it/s]\u001b[A\n",
"Iteration: 8% 440/5625 [00:12<02:22, 36.42it/s]\u001b[A\n",
"Iteration: 8% 444/5625 [00:12<02:24, 35.78it/s]\u001b[A\n",
"Iteration: 8% 448/5625 [00:12<02:29, 34.73it/s]\u001b[A\n",
"Iteration: 8% 452/5625 [00:12<02:25, 35.47it/s]\u001b[A\n",
"Iteration: 8% 456/5625 [00:12<02:24, 35.75it/s]\u001b[A\n",
"Iteration: 8% 460/5625 [00:12<02:25, 35.42it/s]\u001b[A\n",
"Iteration: 8% 464/5625 [00:12<02:24, 35.60it/s]\u001b[A\n",
"Iteration: 8% 468/5625 [00:12<02:23, 36.00it/s]\u001b[A\n",
"Iteration: 8% 472/5625 [00:13<02:22, 36.12it/s]\u001b[A\n",
"Iteration: 8% 476/5625 [00:13<02:20, 36.54it/s]\u001b[A\n",
"Iteration: 9% 480/5625 [00:13<02:21, 36.44it/s]\u001b[A\n",
"Iteration: 9% 484/5625 [00:13<02:20, 36.53it/s]\u001b[A\n",
"Iteration: 9% 488/5625 [00:13<02:20, 36.68it/s]\u001b[A\n",
"Iteration: 9% 492/5625 [00:13<02:20, 36.62it/s]\u001b[A\n",
"Iteration: 9% 496/5625 [00:13<02:22, 35.98it/s]\u001b[A02/21/2020 12:33:50 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:33:51 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:33:51 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:33:51 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 129.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 128.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 125.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 127.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 67/625 [00:00<00:04, 128.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 80/625 [00:00<00:04, 127.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 127.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 127.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:03, 127.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 128.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 128.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 126.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 123.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 121.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 124.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 212/625 [00:01<00:03, 125.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 127.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 127.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:01<00:02, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 121.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 278/625 [00:02<00:02, 117.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 290/625 [00:02<00:03, 111.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 302/625 [00:02<00:02, 111.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 314/625 [00:02<00:02, 110.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 326/625 [00:02<00:02, 110.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 54% 338/625 [00:02<00:02, 110.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 351/625 [00:02<00:02, 113.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 364/625 [00:02<00:02, 117.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 376/625 [00:03<00:02, 117.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 388/625 [00:03<00:02, 114.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 400/625 [00:03<00:01, 114.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 412/625 [00:03<00:01, 109.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 423/625 [00:03<00:01, 104.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 434/625 [00:03<00:01, 106.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 446/625 [00:03<00:01, 108.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 457/625 [00:03<00:01, 108.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 470/625 [00:03<00:01, 112.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 483/625 [00:04<00:01, 116.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 496/625 [00:04<00:01, 120.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 509/625 [00:04<00:00, 122.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 522/625 [00:04<00:00, 124.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 535/625 [00:04<00:00, 123.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 549/625 [00:04<00:00, 125.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 562/625 [00:04<00:00, 125.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 575/625 [00:04<00:00, 126.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 588/625 [00:04<00:00, 126.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 601/625 [00:04<00:00, 124.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 614/625 [00:05<00:00, 125.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 120.63it/s]\u001b[A\u001b[A02/21/2020 12:33:56 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:33:56 - INFO - __main__ - perplexity = tensor(818.6373)\n",
"\n",
"Iteration: 9% 500/5625 [00:20<45:47, 1.87it/s]\u001b[A\n",
"Iteration: 9% 504/5625 [00:20<32:42, 2.61it/s]\u001b[A\n",
"Iteration: 9% 508/5625 [00:20<23:36, 3.61it/s]\u001b[A\n",
"Iteration: 9% 512/5625 [00:20<17:12, 4.95it/s]\u001b[A\n",
"Iteration: 9% 516/5625 [00:21<12:43, 6.69it/s]\u001b[A\n",
"Iteration: 9% 520/5625 [00:21<09:35, 8.86it/s]\u001b[A\n",
"Iteration: 9% 524/5625 [00:21<07:23, 11.50it/s]\u001b[A\n",
"Iteration: 9% 528/5625 [00:21<05:50, 14.53it/s]\u001b[A\n",
"Iteration: 9% 532/5625 [00:21<04:47, 17.69it/s]\u001b[A\n",
"Iteration: 10% 536/5625 [00:21<04:02, 20.96it/s]\u001b[A\n",
"Iteration: 10% 540/5625 [00:21<03:30, 24.18it/s]\u001b[A\n",
"Iteration: 10% 544/5625 [00:21<03:07, 27.13it/s]\u001b[A\n",
"Iteration: 10% 548/5625 [00:21<02:54, 29.17it/s]\u001b[A\n",
"Iteration: 10% 552/5625 [00:22<02:43, 30.97it/s]\u001b[A\n",
"Iteration: 10% 556/5625 [00:22<02:35, 32.55it/s]\u001b[A\n",
"Iteration: 10% 560/5625 [00:22<02:29, 33.82it/s]\u001b[A\n",
"Iteration: 10% 564/5625 [00:22<02:26, 34.58it/s]\u001b[A\n",
"Iteration: 10% 568/5625 [00:22<02:22, 35.49it/s]\u001b[A\n",
"Iteration: 10% 572/5625 [00:22<02:19, 36.13it/s]\u001b[A\n",
"Iteration: 10% 576/5625 [00:22<02:18, 36.45it/s]\u001b[A\n",
"Iteration: 10% 580/5625 [00:22<02:19, 36.09it/s]\u001b[A\n",
"Iteration: 10% 584/5625 [00:22<02:18, 36.30it/s]\u001b[A\n",
"Iteration: 10% 588/5625 [00:22<02:18, 36.45it/s]\u001b[A\n",
"Iteration: 11% 592/5625 [00:23<02:17, 36.48it/s]\u001b[A\n",
"Iteration: 11% 596/5625 [00:23<02:17, 36.62it/s]\u001b[A\n",
"Iteration: 11% 600/5625 [00:23<02:16, 36.70it/s]\u001b[A\n",
"Iteration: 11% 604/5625 [00:23<02:16, 36.88it/s]\u001b[A\n",
"Iteration: 11% 608/5625 [00:23<02:15, 36.96it/s]\u001b[A\n",
"Iteration: 11% 612/5625 [00:23<02:14, 37.29it/s]\u001b[A\n",
"Iteration: 11% 616/5625 [00:23<02:15, 36.90it/s]\u001b[A\n",
"Iteration: 11% 620/5625 [00:23<02:19, 35.91it/s]\u001b[A\n",
"Iteration: 11% 624/5625 [00:23<02:19, 35.97it/s]\u001b[A\n",
"Iteration: 11% 628/5625 [00:24<02:18, 36.10it/s]\u001b[A\n",
"Iteration: 11% 632/5625 [00:24<02:18, 36.15it/s]\u001b[A\n",
"Iteration: 11% 636/5625 [00:24<02:16, 36.47it/s]\u001b[A\n",
"Iteration: 11% 640/5625 [00:24<02:16, 36.59it/s]\u001b[A\n",
"Iteration: 11% 644/5625 [00:24<02:14, 36.99it/s]\u001b[A\n",
"Iteration: 12% 648/5625 [00:24<02:13, 37.29it/s]\u001b[A\n",
"Iteration: 12% 652/5625 [00:24<02:13, 37.27it/s]\u001b[A\n",
"Iteration: 12% 656/5625 [00:24<02:13, 37.17it/s]\u001b[A\n",
"Iteration: 12% 660/5625 [00:24<02:16, 36.34it/s]\u001b[A\n",
"Iteration: 12% 664/5625 [00:25<02:14, 36.89it/s]\u001b[A\n",
"Iteration: 12% 668/5625 [00:25<02:15, 36.50it/s]\u001b[A\n",
"Iteration: 12% 672/5625 [00:25<02:15, 36.63it/s]\u001b[A\n",
"Iteration: 12% 676/5625 [00:25<02:14, 36.75it/s]\u001b[A\n",
"Iteration: 12% 680/5625 [00:25<02:13, 37.07it/s]\u001b[A\n",
"Iteration: 12% 684/5625 [00:25<02:12, 37.41it/s]\u001b[A\n",
"Iteration: 12% 688/5625 [00:25<02:12, 37.29it/s]\u001b[A\n",
"Iteration: 12% 692/5625 [00:25<02:12, 37.10it/s]\u001b[A\n",
"Iteration: 12% 696/5625 [00:25<02:19, 35.39it/s]\u001b[A\n",
"Iteration: 12% 700/5625 [00:26<02:18, 35.52it/s]\u001b[A\n",
"Iteration: 13% 704/5625 [00:26<02:19, 35.32it/s]\u001b[A\n",
"Iteration: 13% 708/5625 [00:26<02:17, 35.87it/s]\u001b[A\n",
"Iteration: 13% 712/5625 [00:26<02:15, 36.33it/s]\u001b[A\n",
"Iteration: 13% 716/5625 [00:26<02:14, 36.61it/s]\u001b[A\n",
"Iteration: 13% 720/5625 [00:26<02:17, 35.56it/s]\u001b[A\n",
"Iteration: 13% 724/5625 [00:26<02:16, 35.82it/s]\u001b[A\n",
"Iteration: 13% 728/5625 [00:26<02:15, 36.21it/s]\u001b[A\n",
"Iteration: 13% 732/5625 [00:26<02:16, 35.95it/s]\u001b[A\n",
"Iteration: 13% 736/5625 [00:27<02:16, 35.70it/s]\u001b[A\n",
"Iteration: 13% 740/5625 [00:27<02:16, 35.82it/s]\u001b[A\n",
"Iteration: 13% 744/5625 [00:27<02:16, 35.64it/s]\u001b[A\n",
"Iteration: 13% 748/5625 [00:27<02:15, 36.04it/s]\u001b[A\n",
"Iteration: 13% 752/5625 [00:27<02:13, 36.43it/s]\u001b[A\n",
"Iteration: 13% 756/5625 [00:27<02:11, 36.93it/s]\u001b[A\n",
"Iteration: 14% 760/5625 [00:27<02:11, 37.10it/s]\u001b[A\n",
"Iteration: 14% 764/5625 [00:27<02:10, 37.11it/s]\u001b[A\n",
"Iteration: 14% 768/5625 [00:27<02:14, 36.15it/s]\u001b[A\n",
"Iteration: 14% 772/5625 [00:28<02:15, 35.83it/s]\u001b[A\n",
"Iteration: 14% 776/5625 [00:28<02:12, 36.50it/s]\u001b[A\n",
"Iteration: 14% 780/5625 [00:28<02:12, 36.47it/s]\u001b[A\n",
"Iteration: 14% 784/5625 [00:28<02:13, 36.37it/s]\u001b[A\n",
"Iteration: 14% 788/5625 [00:28<02:12, 36.53it/s]\u001b[A\n",
"Iteration: 14% 792/5625 [00:28<02:12, 36.60it/s]\u001b[A\n",
"Iteration: 14% 796/5625 [00:28<02:10, 36.87it/s]\u001b[A\n",
"Iteration: 14% 800/5625 [00:28<02:11, 36.60it/s]\u001b[A\n",
"Iteration: 14% 804/5625 [00:28<02:11, 36.79it/s]\u001b[A\n",
"Iteration: 14% 808/5625 [00:29<02:14, 35.88it/s]\u001b[A\n",
"Iteration: 14% 812/5625 [00:29<02:21, 34.06it/s]\u001b[A\n",
"Iteration: 15% 816/5625 [00:29<02:24, 33.20it/s]\u001b[A\n",
"Iteration: 15% 820/5625 [00:29<02:26, 32.76it/s]\u001b[A\n",
"Iteration: 15% 824/5625 [00:29<02:33, 31.34it/s]\u001b[A\n",
"Iteration: 15% 828/5625 [00:29<02:32, 31.48it/s]\u001b[A\n",
"Iteration: 15% 832/5625 [00:29<02:32, 31.49it/s]\u001b[A\n",
"Iteration: 15% 836/5625 [00:29<02:27, 32.41it/s]\u001b[A\n",
"Iteration: 15% 840/5625 [00:30<02:23, 33.39it/s]\u001b[A\n",
"Iteration: 15% 844/5625 [00:30<02:19, 34.25it/s]\u001b[A\n",
"Iteration: 15% 848/5625 [00:30<02:15, 35.30it/s]\u001b[A\n",
"Iteration: 15% 852/5625 [00:30<02:17, 34.61it/s]\u001b[A\n",
"Iteration: 15% 856/5625 [00:30<02:17, 34.78it/s]\u001b[A\n",
"Iteration: 15% 860/5625 [00:30<02:13, 35.64it/s]\u001b[A\n",
"Iteration: 15% 864/5625 [00:30<02:10, 36.42it/s]\u001b[A\n",
"Iteration: 15% 868/5625 [00:30<02:10, 36.45it/s]\u001b[A\n",
"Iteration: 16% 872/5625 [00:30<02:10, 36.53it/s]\u001b[A\n",
"Iteration: 16% 876/5625 [00:31<02:12, 35.97it/s]\u001b[A\n",
"Iteration: 16% 880/5625 [00:31<02:11, 35.98it/s]\u001b[A\n",
"Iteration: 16% 884/5625 [00:31<02:09, 36.63it/s]\u001b[A\n",
"Iteration: 16% 888/5625 [00:31<02:09, 36.57it/s]\u001b[A\n",
"Iteration: 16% 892/5625 [00:31<02:10, 36.26it/s]\u001b[A\n",
"Iteration: 16% 896/5625 [00:31<02:10, 36.27it/s]\u001b[A\n",
"Iteration: 16% 900/5625 [00:31<02:08, 36.65it/s]\u001b[A\n",
"Iteration: 16% 904/5625 [00:31<02:08, 36.80it/s]\u001b[A\n",
"Iteration: 16% 908/5625 [00:31<02:07, 36.90it/s]\u001b[A\n",
"Iteration: 16% 912/5625 [00:31<02:06, 37.20it/s]\u001b[A\n",
"Iteration: 16% 916/5625 [00:32<02:10, 36.10it/s]\u001b[A\n",
"Iteration: 16% 920/5625 [00:32<02:07, 36.93it/s]\u001b[A\n",
"Iteration: 16% 924/5625 [00:32<02:06, 37.20it/s]\u001b[A\n",
"Iteration: 16% 928/5625 [00:32<02:05, 37.47it/s]\u001b[A\n",
"Iteration: 17% 932/5625 [00:32<02:06, 37.08it/s]\u001b[A\n",
"Iteration: 17% 936/5625 [00:32<02:06, 36.98it/s]\u001b[A\n",
"Iteration: 17% 940/5625 [00:32<02:12, 35.49it/s]\u001b[A\n",
"Iteration: 17% 944/5625 [00:32<02:16, 34.30it/s]\u001b[A\n",
"Iteration: 17% 948/5625 [00:33<02:20, 33.18it/s]\u001b[A\n",
"Iteration: 17% 952/5625 [00:33<02:25, 32.15it/s]\u001b[A\n",
"Iteration: 17% 956/5625 [00:33<02:24, 32.24it/s]\u001b[A\n",
"Iteration: 17% 960/5625 [00:33<02:24, 32.30it/s]\u001b[A\n",
"Iteration: 17% 964/5625 [00:33<02:25, 31.97it/s]\u001b[A\n",
"Iteration: 17% 968/5625 [00:33<02:21, 32.94it/s]\u001b[A\n",
"Iteration: 17% 972/5625 [00:33<02:17, 33.89it/s]\u001b[A\n",
"Iteration: 17% 976/5625 [00:33<02:14, 34.65it/s]\u001b[A\n",
"Iteration: 17% 980/5625 [00:33<02:10, 35.63it/s]\u001b[A\n",
"Iteration: 17% 984/5625 [00:34<02:08, 36.05it/s]\u001b[A\n",
"Iteration: 18% 988/5625 [00:34<02:07, 36.42it/s]\u001b[A\n",
"Iteration: 18% 992/5625 [00:34<02:05, 36.78it/s]\u001b[A\n",
"Iteration: 18% 996/5625 [00:34<02:06, 36.50it/s]\u001b[A02/21/2020 12:34:10 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:34:12 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:34:12 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:34:12 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 130.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 131.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 7% 41/625 [00:00<00:04, 129.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 55/625 [00:00<00:04, 130.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 68/625 [00:00<00:04, 129.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 82/625 [00:00<00:04, 130.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 95/625 [00:00<00:04, 129.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 108/625 [00:00<00:04, 128.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 121/625 [00:00<00:03, 126.92it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 22% 135/625 [00:01<00:03, 128.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 24% 148/625 [00:01<00:03, 127.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 26% 161/625 [00:01<00:03, 122.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 174/625 [00:01<00:03, 124.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 187/625 [00:01<00:03, 125.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 200/625 [00:01<00:03, 125.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 213/625 [00:01<00:03, 126.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 127.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 127.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:01<00:02, 124.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 125.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 279/625 [00:02<00:02, 127.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 292/625 [00:02<00:02, 127.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 128.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 318/625 [00:02<00:02, 128.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 332/625 [00:02<00:02, 129.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 346/625 [00:02<00:02, 127.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 359/625 [00:02<00:02, 127.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 373/625 [00:02<00:01, 129.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 386/625 [00:03<00:01, 127.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 400/625 [00:03<00:01, 128.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 413/625 [00:03<00:01, 128.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 427/625 [00:03<00:01, 129.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 441/625 [00:03<00:01, 129.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 454/625 [00:03<00:01, 129.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 467/625 [00:03<00:01, 129.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:03<00:01, 127.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:03<00:01, 125.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:03<00:00, 124.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 125.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 126.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 545/625 [00:04<00:00, 127.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 559/625 [00:04<00:00, 128.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 572/625 [00:04<00:00, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 585/625 [00:04<00:00, 128.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 598/625 [00:04<00:00, 127.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 611/625 [00:04<00:00, 126.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 128.55it/s]\u001b[A\u001b[A\n",
"\n",
"\u001b[A\u001b[A02/21/2020 12:34:17 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:34:17 - INFO - __main__ - perplexity = tensor(814.9326)\n",
"\n",
"Iteration: 18% 1000/5625 [00:41<40:18, 1.91it/s]\u001b[A\n",
"Iteration: 18% 1004/5625 [00:41<28:50, 2.67it/s]\u001b[A\n",
"Iteration: 18% 1008/5625 [00:41<20:48, 3.70it/s]\u001b[A\n",
"Iteration: 18% 1012/5625 [00:41<15:10, 5.07it/s]\u001b[A\n",
"Iteration: 18% 1016/5625 [00:41<11:13, 6.84it/s]\u001b[A\n",
"Iteration: 18% 1020/5625 [00:41<08:28, 9.06it/s]\u001b[A\n",
"Iteration: 18% 1024/5625 [00:41<06:32, 11.72it/s]\u001b[A\n",
"Iteration: 18% 1028/5625 [00:41<05:10, 14.79it/s]\u001b[A\n",
"Iteration: 18% 1032/5625 [00:41<04:15, 18.01it/s]\u001b[A\n",
"Iteration: 18% 1036/5625 [00:42<03:39, 20.89it/s]\u001b[A\n",
"Iteration: 18% 1040/5625 [00:42<03:15, 23.43it/s]\u001b[A\n",
"Iteration: 19% 1044/5625 [00:42<02:58, 25.61it/s]\u001b[A\n",
"Iteration: 19% 1048/5625 [00:42<02:42, 28.13it/s]\u001b[A\n",
"Iteration: 19% 1052/5625 [00:42<02:32, 30.09it/s]\u001b[A\n",
"Iteration: 19% 1056/5625 [00:42<02:25, 31.39it/s]\u001b[A\n",
"Iteration: 19% 1060/5625 [00:42<02:18, 32.94it/s]\u001b[A\n",
"Iteration: 19% 1064/5625 [00:42<02:14, 33.85it/s]\u001b[A\n",
"Iteration: 19% 1068/5625 [00:43<02:10, 34.86it/s]\u001b[A\n",
"Iteration: 19% 1072/5625 [00:43<02:09, 35.05it/s]\u001b[A\n",
"Iteration: 19% 1076/5625 [00:43<02:07, 35.67it/s]\u001b[A\n",
"Iteration: 19% 1080/5625 [00:43<02:06, 35.86it/s]\u001b[A\n",
"Iteration: 19% 1084/5625 [00:43<02:05, 36.12it/s]\u001b[A\n",
"Iteration: 19% 1088/5625 [00:43<02:03, 36.66it/s]\u001b[A\n",
"Iteration: 19% 1092/5625 [00:43<02:02, 37.07it/s]\u001b[A\n",
"Iteration: 19% 1096/5625 [00:43<02:02, 37.10it/s]\u001b[A\n",
"Iteration: 20% 1100/5625 [00:43<02:01, 37.25it/s]\u001b[A\n",
"Iteration: 20% 1104/5625 [00:43<02:00, 37.52it/s]\u001b[A\n",
"Iteration: 20% 1108/5625 [00:44<02:00, 37.35it/s]\u001b[A\n",
"Iteration: 20% 1112/5625 [00:44<02:03, 36.55it/s]\u001b[A\n",
"Iteration: 20% 1116/5625 [00:44<02:05, 35.91it/s]\u001b[A\n",
"Iteration: 20% 1120/5625 [00:44<02:04, 36.07it/s]\u001b[A\n",
"Iteration: 20% 1124/5625 [00:44<02:03, 36.53it/s]\u001b[A\n",
"Iteration: 20% 1128/5625 [00:44<02:06, 35.66it/s]\u001b[A\n",
"Iteration: 20% 1132/5625 [00:44<02:07, 35.26it/s]\u001b[A\n",
"Iteration: 20% 1136/5625 [00:44<02:08, 34.98it/s]\u001b[A\n",
"Iteration: 20% 1140/5625 [00:44<02:07, 35.31it/s]\u001b[A\n",
"Iteration: 20% 1144/5625 [00:45<02:06, 35.30it/s]\u001b[A\n",
"Iteration: 20% 1148/5625 [00:45<02:06, 35.32it/s]\u001b[A\n",
"Iteration: 20% 1152/5625 [00:45<02:07, 35.16it/s]\u001b[A\n",
"Iteration: 21% 1156/5625 [00:45<02:04, 35.79it/s]\u001b[A\n",
"Iteration: 21% 1160/5625 [00:45<02:03, 36.01it/s]\u001b[A\n",
"Iteration: 21% 1164/5625 [00:45<02:03, 36.23it/s]\u001b[A\n",
"Iteration: 21% 1168/5625 [00:45<02:03, 36.21it/s]\u001b[A\n",
"Iteration: 21% 1172/5625 [00:45<02:04, 35.90it/s]\u001b[A\n",
"Iteration: 21% 1176/5625 [00:45<02:03, 36.02it/s]\u001b[A\n",
"Iteration: 21% 1180/5625 [00:46<02:01, 36.57it/s]\u001b[A\n",
"Iteration: 21% 1184/5625 [00:46<02:01, 36.60it/s]\u001b[A\n",
"Iteration: 21% 1188/5625 [00:46<02:02, 36.23it/s]\u001b[A\n",
"Iteration: 21% 1192/5625 [00:46<02:02, 36.05it/s]\u001b[A\n",
"Iteration: 21% 1196/5625 [00:46<02:02, 36.23it/s]\u001b[A\n",
"Iteration: 21% 1200/5625 [00:46<02:01, 36.40it/s]\u001b[A\n",
"Iteration: 21% 1204/5625 [00:46<02:04, 35.56it/s]\u001b[A\n",
"Iteration: 21% 1208/5625 [00:46<02:05, 35.23it/s]\u001b[A\n",
"Iteration: 22% 1212/5625 [00:46<02:02, 35.90it/s]\u001b[A\n",
"Iteration: 22% 1216/5625 [00:47<02:02, 35.99it/s]\u001b[A\n",
"Iteration: 22% 1220/5625 [00:47<02:00, 36.67it/s]\u001b[A\n",
"Iteration: 22% 1224/5625 [00:47<02:01, 36.21it/s]\u001b[A\n",
"Iteration: 22% 1228/5625 [00:47<02:02, 35.99it/s]\u001b[A\n",
"Iteration: 22% 1232/5625 [00:47<02:00, 36.35it/s]\u001b[A\n",
"Iteration: 22% 1236/5625 [00:47<02:01, 36.23it/s]\u001b[A\n",
"Iteration: 22% 1240/5625 [00:47<02:00, 36.30it/s]\u001b[A\n",
"Iteration: 22% 1244/5625 [00:47<02:00, 36.36it/s]\u001b[A\n",
"Iteration: 22% 1248/5625 [00:47<02:00, 36.31it/s]\u001b[A\n",
"Iteration: 22% 1252/5625 [00:48<02:01, 36.06it/s]\u001b[A\n",
"Iteration: 22% 1256/5625 [00:48<01:59, 36.41it/s]\u001b[A\n",
"Iteration: 22% 1260/5625 [00:48<01:59, 36.55it/s]\u001b[A\n",
"Iteration: 22% 1264/5625 [00:48<02:00, 36.22it/s]\u001b[A\n",
"Iteration: 23% 1268/5625 [00:48<01:59, 36.32it/s]\u001b[A\n",
"Iteration: 23% 1272/5625 [00:48<01:59, 36.54it/s]\u001b[A\n",
"Iteration: 23% 1276/5625 [00:48<01:59, 36.38it/s]\u001b[A\n",
"Iteration: 23% 1280/5625 [00:48<01:59, 36.50it/s]\u001b[A\n",
"Iteration: 23% 1284/5625 [00:48<01:57, 36.93it/s]\u001b[A\n",
"Iteration: 23% 1288/5625 [00:49<01:57, 37.04it/s]\u001b[A\n",
"Iteration: 23% 1292/5625 [00:49<01:59, 36.12it/s]\u001b[A\n",
"Iteration: 23% 1296/5625 [00:49<01:58, 36.61it/s]\u001b[A\n",
"Iteration: 23% 1300/5625 [00:49<01:59, 36.24it/s]\u001b[A\n",
"Iteration: 23% 1304/5625 [00:49<01:58, 36.36it/s]\u001b[A\n",
"Iteration: 23% 1308/5625 [00:49<01:57, 36.68it/s]\u001b[A\n",
"Iteration: 23% 1312/5625 [00:49<01:57, 36.61it/s]\u001b[A\n",
"Iteration: 23% 1316/5625 [00:49<01:57, 36.81it/s]\u001b[A\n",
"Iteration: 23% 1320/5625 [00:49<01:56, 36.80it/s]\u001b[A\n",
"Iteration: 24% 1324/5625 [00:50<01:58, 36.43it/s]\u001b[A\n",
"Iteration: 24% 1328/5625 [00:50<01:58, 36.35it/s]\u001b[A\n",
"Iteration: 24% 1332/5625 [00:50<01:57, 36.66it/s]\u001b[A\n",
"Iteration: 24% 1336/5625 [00:50<01:57, 36.48it/s]\u001b[A\n",
"Iteration: 24% 1340/5625 [00:50<02:01, 35.19it/s]\u001b[A\n",
"Iteration: 24% 1344/5625 [00:50<02:03, 34.63it/s]\u001b[A\n",
"Iteration: 24% 1348/5625 [00:50<02:04, 34.46it/s]\u001b[A\n",
"Iteration: 24% 1352/5625 [00:50<02:01, 35.14it/s]\u001b[A\n",
"Iteration: 24% 1356/5625 [00:50<02:01, 35.24it/s]\u001b[A\n",
"Iteration: 24% 1360/5625 [00:51<01:58, 36.09it/s]\u001b[A\n",
"Iteration: 24% 1364/5625 [00:51<01:56, 36.53it/s]\u001b[A\n",
"Iteration: 24% 1368/5625 [00:51<01:55, 36.84it/s]\u001b[A\n",
"Iteration: 24% 1372/5625 [00:51<01:56, 36.38it/s]\u001b[A\n",
"Iteration: 24% 1376/5625 [00:51<01:57, 36.30it/s]\u001b[A\n",
"Iteration: 25% 1380/5625 [00:51<01:56, 36.57it/s]\u001b[A\n",
"Iteration: 25% 1384/5625 [00:51<01:54, 37.05it/s]\u001b[A\n",
"Iteration: 25% 1388/5625 [00:51<01:54, 36.97it/s]\u001b[A\n",
"Iteration: 25% 1392/5625 [00:51<01:54, 37.06it/s]\u001b[A\n",
"Iteration: 25% 1396/5625 [00:52<01:53, 37.23it/s]\u001b[A\n",
"Iteration: 25% 1400/5625 [00:52<01:53, 37.26it/s]\u001b[A\n",
"Iteration: 25% 1404/5625 [00:52<01:51, 37.70it/s]\u001b[A\n",
"Iteration: 25% 1408/5625 [00:52<01:53, 37.07it/s]\u001b[A\n",
"Iteration: 25% 1412/5625 [00:52<01:54, 36.79it/s]\u001b[A\n",
"Iteration: 25% 1416/5625 [00:52<01:52, 37.39it/s]\u001b[A\n",
"Iteration: 25% 1420/5625 [00:52<01:53, 36.89it/s]\u001b[A\n",
"Iteration: 25% 1424/5625 [00:52<01:52, 37.28it/s]\u001b[A\n",
"Iteration: 25% 1428/5625 [00:52<01:51, 37.62it/s]\u001b[A\n",
"Iteration: 25% 1432/5625 [00:53<01:52, 37.31it/s]\u001b[A\n",
"Iteration: 26% 1436/5625 [00:53<01:51, 37.48it/s]\u001b[A\n",
"Iteration: 26% 1440/5625 [00:53<01:52, 37.24it/s]\u001b[A\n",
"Iteration: 26% 1444/5625 [00:53<01:52, 37.14it/s]\u001b[A\n",
"Iteration: 26% 1448/5625 [00:53<01:53, 36.92it/s]\u001b[A\n",
"Iteration: 26% 1452/5625 [00:53<01:52, 36.95it/s]\u001b[A\n",
"Iteration: 26% 1456/5625 [00:53<01:54, 36.38it/s]\u001b[A\n",
"Iteration: 26% 1460/5625 [00:53<01:53, 36.68it/s]\u001b[A\n",
"Iteration: 26% 1464/5625 [00:53<01:53, 36.73it/s]\u001b[A\n",
"Iteration: 26% 1468/5625 [00:53<01:52, 36.80it/s]\u001b[A\n",
"Iteration: 26% 1472/5625 [00:54<01:53, 36.70it/s]\u001b[A\n",
"Iteration: 26% 1476/5625 [00:54<01:51, 37.16it/s]\u001b[A\n",
"Iteration: 26% 1480/5625 [00:54<01:52, 36.95it/s]\u001b[A\n",
"Iteration: 26% 1484/5625 [00:54<01:52, 36.91it/s]\u001b[A\n",
"Iteration: 26% 1488/5625 [00:54<01:54, 36.28it/s]\u001b[A\n",
"Iteration: 27% 1492/5625 [00:54<01:54, 36.24it/s]\u001b[A\n",
"Iteration: 27% 1496/5625 [00:54<01:57, 35.14it/s]\u001b[A02/21/2020 12:34:31 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:34:32 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:34:32 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:34:32 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:05, 119.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 120.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 122.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 52/625 [00:00<00:04, 123.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 125.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 126.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 127.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 126.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:03, 126.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 128.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 126.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 126.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 173/625 [00:01<00:03, 127.64it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 186/625 [00:01<00:03, 127.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 127.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 212/625 [00:01<00:03, 127.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 128.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 127.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:01<00:02, 128.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 127.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 278/625 [00:02<00:02, 126.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 291/625 [00:02<00:02, 126.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 127.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 319/625 [00:02<00:02, 129.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 332/625 [00:02<00:02, 129.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 345/625 [00:02<00:02, 127.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 359/625 [00:02<00:02, 128.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 373/625 [00:02<00:01, 129.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 386/625 [00:03<00:01, 126.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 399/625 [00:03<00:01, 126.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 412/625 [00:03<00:01, 126.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 426/625 [00:03<00:01, 127.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 440/625 [00:03<00:01, 128.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 453/625 [00:03<00:01, 128.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 466/625 [00:03<00:01, 127.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 479/625 [00:03<00:01, 127.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 492/625 [00:03<00:01, 125.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 505/625 [00:03<00:00, 126.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 127.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 126.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 545/625 [00:04<00:00, 126.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 558/625 [00:04<00:00, 122.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 571/625 [00:04<00:00, 124.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 584/625 [00:04<00:00, 124.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 597/625 [00:04<00:00, 124.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 610/625 [00:04<00:00, 126.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 624/625 [00:04<00:00, 128.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.04it/s]\u001b[A\u001b[A02/21/2020 12:34:37 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:34:37 - INFO - __main__ - perplexity = tensor(803.7330)\n",
"02/21/2020 12:34:37 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-24000/config.json\n",
"02/21/2020 12:34:37 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-24000/pytorch_model.bin\n",
"02/21/2020 12:34:37 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-24000\n",
"02/21/2020 12:34:37 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-20000] due to args.save_total_limit\n",
"02/21/2020 12:34:37 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-24000\n",
"\n",
"Iteration: 27% 1500/5625 [01:01<35:21, 1.94it/s]\u001b[A\n",
"Iteration: 27% 1504/5625 [01:01<25:22, 2.71it/s]\u001b[A\n",
"Iteration: 27% 1508/5625 [01:01<18:18, 3.75it/s]\u001b[A\n",
"Iteration: 27% 1512/5625 [01:01<13:22, 5.13it/s]\u001b[A\n",
"Iteration: 27% 1516/5625 [01:01<09:53, 6.92it/s]\u001b[A\n",
"Iteration: 27% 1520/5625 [01:01<07:31, 9.10it/s]\u001b[A\n",
"Iteration: 27% 1524/5625 [01:02<05:48, 11.76it/s]\u001b[A\n",
"Iteration: 27% 1528/5625 [01:02<04:36, 14.81it/s]\u001b[A\n",
"Iteration: 27% 1532/5625 [01:02<03:45, 18.12it/s]\u001b[A\n",
"Iteration: 27% 1536/5625 [01:02<03:10, 21.41it/s]\u001b[A\n",
"Iteration: 27% 1540/5625 [01:02<02:46, 24.58it/s]\u001b[A\n",
"Iteration: 27% 1544/5625 [01:02<02:28, 27.44it/s]\u001b[A\n",
"Iteration: 28% 1548/5625 [01:02<02:18, 29.36it/s]\u001b[A\n",
"Iteration: 28% 1552/5625 [01:02<02:09, 31.40it/s]\u001b[A\n",
"Iteration: 28% 1556/5625 [01:02<02:03, 32.83it/s]\u001b[A\n",
"Iteration: 28% 1560/5625 [01:03<01:59, 33.98it/s]\u001b[A\n",
"Iteration: 28% 1564/5625 [01:03<01:56, 34.99it/s]\u001b[A\n",
"Iteration: 28% 1568/5625 [01:03<01:55, 35.15it/s]\u001b[A\n",
"Iteration: 28% 1572/5625 [01:03<01:55, 35.22it/s]\u001b[A\n",
"Iteration: 28% 1576/5625 [01:03<01:53, 35.64it/s]\u001b[A\n",
"Iteration: 28% 1580/5625 [01:03<01:51, 36.18it/s]\u001b[A\n",
"Iteration: 28% 1584/5625 [01:03<01:51, 36.36it/s]\u001b[A\n",
"Iteration: 28% 1588/5625 [01:03<01:53, 35.71it/s]\u001b[A\n",
"Iteration: 28% 1592/5625 [01:03<01:51, 36.31it/s]\u001b[A\n",
"Iteration: 28% 1596/5625 [01:04<01:51, 36.06it/s]\u001b[A\n",
"Iteration: 28% 1600/5625 [01:04<01:50, 36.40it/s]\u001b[A\n",
"Iteration: 29% 1604/5625 [01:04<01:50, 36.48it/s]\u001b[A\n",
"Iteration: 29% 1608/5625 [01:04<01:49, 36.55it/s]\u001b[A\n",
"Iteration: 29% 1612/5625 [01:04<01:48, 36.87it/s]\u001b[A\n",
"Iteration: 29% 1616/5625 [01:04<01:48, 36.84it/s]\u001b[A\n",
"Iteration: 29% 1620/5625 [01:04<01:47, 37.15it/s]\u001b[A\n",
"Iteration: 29% 1624/5625 [01:04<01:49, 36.61it/s]\u001b[A\n",
"Iteration: 29% 1628/5625 [01:04<01:50, 36.08it/s]\u001b[A\n",
"Iteration: 29% 1632/5625 [01:04<01:50, 36.28it/s]\u001b[A\n",
"Iteration: 29% 1636/5625 [01:05<01:49, 36.55it/s]\u001b[A\n",
"Iteration: 29% 1640/5625 [01:05<01:48, 36.68it/s]\u001b[A\n",
"Iteration: 29% 1644/5625 [01:05<01:49, 36.30it/s]\u001b[A\n",
"Iteration: 29% 1648/5625 [01:05<01:48, 36.59it/s]\u001b[A\n",
"Iteration: 29% 1652/5625 [01:05<01:48, 36.56it/s]\u001b[A\n",
"Iteration: 29% 1656/5625 [01:05<01:48, 36.64it/s]\u001b[A\n",
"Iteration: 30% 1660/5625 [01:05<01:49, 36.22it/s]\u001b[A\n",
"Iteration: 30% 1664/5625 [01:05<01:48, 36.48it/s]\u001b[A\n",
"Iteration: 30% 1668/5625 [01:05<01:47, 36.75it/s]\u001b[A\n",
"Iteration: 30% 1672/5625 [01:06<01:47, 36.63it/s]\u001b[A\n",
"Iteration: 30% 1676/5625 [01:06<01:47, 36.85it/s]\u001b[A\n",
"Iteration: 30% 1680/5625 [01:06<01:46, 36.88it/s]\u001b[A\n",
"Iteration: 30% 1684/5625 [01:06<01:47, 36.55it/s]\u001b[A\n",
"Iteration: 30% 1688/5625 [01:06<01:46, 36.80it/s]\u001b[A\n",
"Iteration: 30% 1692/5625 [01:06<01:46, 37.08it/s]\u001b[A\n",
"Iteration: 30% 1696/5625 [01:06<01:47, 36.43it/s]\u001b[A\n",
"Iteration: 30% 1700/5625 [01:06<01:46, 36.92it/s]\u001b[A\n",
"Iteration: 30% 1704/5625 [01:06<01:46, 36.90it/s]\u001b[A\n",
"Iteration: 30% 1708/5625 [01:07<01:46, 36.73it/s]\u001b[A\n",
"Iteration: 30% 1712/5625 [01:07<01:46, 36.78it/s]\u001b[A\n",
"Iteration: 31% 1716/5625 [01:07<01:45, 36.98it/s]\u001b[A\n",
"Iteration: 31% 1720/5625 [01:07<01:46, 36.71it/s]\u001b[A\n",
"Iteration: 31% 1724/5625 [01:07<01:49, 35.67it/s]\u001b[A\n",
"Iteration: 31% 1728/5625 [01:07<01:47, 36.19it/s]\u001b[A\n",
"Iteration: 31% 1732/5625 [01:07<01:47, 36.32it/s]\u001b[A\n",
"Iteration: 31% 1736/5625 [01:07<01:47, 36.09it/s]\u001b[A\n",
"Iteration: 31% 1740/5625 [01:07<01:46, 36.38it/s]\u001b[A\n",
"Iteration: 31% 1744/5625 [01:08<01:46, 36.49it/s]\u001b[A\n",
"Iteration: 31% 1748/5625 [01:08<01:44, 37.04it/s]\u001b[A\n",
"Iteration: 31% 1752/5625 [01:08<01:44, 37.20it/s]\u001b[A\n",
"Iteration: 31% 1756/5625 [01:08<01:45, 36.81it/s]\u001b[A\n",
"Iteration: 31% 1760/5625 [01:08<01:45, 36.54it/s]\u001b[A\n",
"Iteration: 31% 1764/5625 [01:08<01:45, 36.67it/s]\u001b[A\n",
"Iteration: 31% 1768/5625 [01:08<01:45, 36.52it/s]\u001b[A\n",
"Iteration: 32% 1772/5625 [01:08<01:46, 36.19it/s]\u001b[A\n",
"Iteration: 32% 1776/5625 [01:08<01:46, 36.30it/s]\u001b[A\n",
"Iteration: 32% 1780/5625 [01:09<01:47, 35.80it/s]\u001b[A\n",
"Iteration: 32% 1784/5625 [01:09<01:46, 36.10it/s]\u001b[A\n",
"Iteration: 32% 1788/5625 [01:09<01:44, 36.66it/s]\u001b[A\n",
"Iteration: 32% 1792/5625 [01:09<01:43, 36.99it/s]\u001b[A\n",
"Iteration: 32% 1796/5625 [01:09<01:42, 37.49it/s]\u001b[A\n",
"Iteration: 32% 1800/5625 [01:09<01:43, 37.05it/s]\u001b[A\n",
"Iteration: 32% 1804/5625 [01:09<01:43, 36.97it/s]\u001b[A\n",
"Iteration: 32% 1808/5625 [01:09<01:44, 36.38it/s]\u001b[A\n",
"Iteration: 32% 1812/5625 [01:09<01:44, 36.41it/s]\u001b[A\n",
"Iteration: 32% 1816/5625 [01:10<01:44, 36.54it/s]\u001b[A\n",
"Iteration: 32% 1820/5625 [01:10<01:43, 36.93it/s]\u001b[A\n",
"Iteration: 32% 1824/5625 [01:10<01:43, 36.72it/s]\u001b[A\n",
"Iteration: 32% 1828/5625 [01:10<01:43, 36.58it/s]\u001b[A\n",
"Iteration: 33% 1832/5625 [01:10<01:43, 36.48it/s]\u001b[A\n",
"Iteration: 33% 1836/5625 [01:10<01:42, 36.86it/s]\u001b[A\n",
"Iteration: 33% 1840/5625 [01:10<01:43, 36.49it/s]\u001b[A\n",
"Iteration: 33% 1844/5625 [01:10<01:43, 36.54it/s]\u001b[A\n",
"Iteration: 33% 1848/5625 [01:10<01:44, 35.98it/s]\u001b[A\n",
"Iteration: 33% 1852/5625 [01:10<01:43, 36.45it/s]\u001b[A\n",
"Iteration: 33% 1856/5625 [01:11<01:43, 36.49it/s]\u001b[A\n",
"Iteration: 33% 1860/5625 [01:11<01:42, 36.81it/s]\u001b[A\n",
"Iteration: 33% 1864/5625 [01:11<01:42, 36.74it/s]\u001b[A\n",
"Iteration: 33% 1868/5625 [01:11<01:41, 37.04it/s]\u001b[A\n",
"Iteration: 33% 1872/5625 [01:11<01:41, 36.84it/s]\u001b[A\n",
"Iteration: 33% 1876/5625 [01:11<01:42, 36.46it/s]\u001b[A\n",
"Iteration: 33% 1880/5625 [01:11<01:41, 36.76it/s]\u001b[A\n",
"Iteration: 33% 1884/5625 [01:11<01:43, 36.25it/s]\u001b[A\n",
"Iteration: 34% 1888/5625 [01:11<01:41, 36.65it/s]\u001b[A\n",
"Iteration: 34% 1892/5625 [01:12<01:41, 36.73it/s]\u001b[A\n",
"Iteration: 34% 1896/5625 [01:12<01:47, 34.56it/s]\u001b[A\n",
"Iteration: 34% 1900/5625 [01:12<01:49, 34.00it/s]\u001b[A\n",
"Iteration: 34% 1904/5625 [01:12<01:46, 35.06it/s]\u001b[A\n",
"Iteration: 34% 1908/5625 [01:12<01:43, 35.85it/s]\u001b[A\n",
"Iteration: 34% 1912/5625 [01:12<01:43, 35.84it/s]\u001b[A\n",
"Iteration: 34% 1916/5625 [01:12<01:42, 36.19it/s]\u001b[A\n",
"Iteration: 34% 1920/5625 [01:12<01:43, 35.69it/s]\u001b[A\n",
"Iteration: 34% 1924/5625 [01:12<01:42, 36.15it/s]\u001b[A\n",
"Iteration: 34% 1928/5625 [01:13<01:41, 36.60it/s]\u001b[A\n",
"Iteration: 34% 1932/5625 [01:13<01:40, 36.71it/s]\u001b[A\n",
"Iteration: 34% 1936/5625 [01:13<01:39, 37.14it/s]\u001b[A\n",
"Iteration: 34% 1940/5625 [01:13<01:39, 37.08it/s]\u001b[A\n",
"Iteration: 35% 1944/5625 [01:13<01:39, 37.14it/s]\u001b[A\n",
"Iteration: 35% 1948/5625 [01:13<01:39, 37.05it/s]\u001b[A\n",
"Iteration: 35% 1952/5625 [01:13<01:40, 36.69it/s]\u001b[A\n",
"Iteration: 35% 1956/5625 [01:13<01:41, 36.25it/s]\u001b[A\n",
"Iteration: 35% 1960/5625 [01:13<01:39, 36.93it/s]\u001b[A\n",
"Iteration: 35% 1964/5625 [01:14<01:39, 36.82it/s]\u001b[A\n",
"Iteration: 35% 1968/5625 [01:14<01:38, 37.15it/s]\u001b[A\n",
"Iteration: 35% 1972/5625 [01:14<01:38, 37.04it/s]\u001b[A\n",
"Iteration: 35% 1976/5625 [01:14<01:39, 36.74it/s]\u001b[A\n",
"Iteration: 35% 1980/5625 [01:14<01:37, 37.22it/s]\u001b[A\n",
"Iteration: 35% 1984/5625 [01:14<01:41, 35.93it/s]\u001b[A\n",
"Iteration: 35% 1988/5625 [01:14<01:40, 36.06it/s]\u001b[A\n",
"Iteration: 35% 1992/5625 [01:14<01:40, 36.12it/s]\u001b[A\n",
"Iteration: 35% 1996/5625 [01:14<01:40, 36.10it/s]\u001b[A02/21/2020 12:34:51 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:34:52 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:34:52 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:34:52 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 130.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 126.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 38/625 [00:00<00:04, 120.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 52/625 [00:00<00:04, 123.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 125.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 126.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 127.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 104/625 [00:00<00:04, 121.36it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 116/625 [00:00<00:04, 116.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 20% 128/625 [00:01<00:04, 114.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 22% 140/625 [00:01<00:04, 113.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 24% 152/625 [00:01<00:04, 111.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 26% 163/625 [00:01<00:04, 107.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 174/625 [00:01<00:04, 108.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:04, 108.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 197/625 [00:01<00:03, 111.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 210/625 [00:01<00:03, 115.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 222/625 [00:01<00:03, 113.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 37% 234/625 [00:02<00:03, 111.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 39% 246/625 [00:02<00:03, 110.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 41% 258/625 [00:02<00:03, 110.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 43% 270/625 [00:02<00:03, 107.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 281/625 [00:02<00:03, 105.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 293/625 [00:02<00:03, 106.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 109.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 319/625 [00:02<00:02, 114.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 333/625 [00:02<00:02, 119.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 56% 347/625 [00:03<00:02, 122.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 361/625 [00:03<00:02, 125.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 375/625 [00:03<00:01, 127.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 388/625 [00:03<00:01, 128.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 401/625 [00:03<00:01, 125.91it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 414/625 [00:03<00:01, 125.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 428/625 [00:03<00:01, 127.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 441/625 [00:03<00:01, 125.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 454/625 [00:03<00:01, 121.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 467/625 [00:03<00:01, 123.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:04<00:01, 124.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:04<00:01, 125.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:04<00:00, 126.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 127.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 125.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 545/625 [00:04<00:00, 125.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 558/625 [00:04<00:00, 124.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 572/625 [00:04<00:00, 126.70it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 585/625 [00:04<00:00, 126.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 598/625 [00:04<00:00, 127.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 612/625 [00:05<00:00, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 120.35it/s]\u001b[A\u001b[A02/21/2020 12:34:58 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:34:58 - INFO - __main__ - perplexity = tensor(799.2672)\n",
"\n",
"Iteration: 36% 2000/5625 [01:21<32:09, 1.88it/s]\u001b[A\n",
"Iteration: 36% 2004/5625 [01:21<22:58, 2.63it/s]\u001b[A\n",
"Iteration: 36% 2008/5625 [01:22<16:32, 3.64it/s]\u001b[A\n",
"Iteration: 36% 2012/5625 [01:22<12:05, 4.98it/s]\u001b[A\n",
"Iteration: 36% 2016/5625 [01:22<08:56, 6.73it/s]\u001b[A\n",
"Iteration: 36% 2020/5625 [01:22<06:44, 8.92it/s]\u001b[A\n",
"Iteration: 36% 2024/5625 [01:22<05:12, 11.52it/s]\u001b[A\n",
"Iteration: 36% 2028/5625 [01:22<04:07, 14.52it/s]\u001b[A\n",
"Iteration: 36% 2032/5625 [01:22<03:22, 17.72it/s]\u001b[A\n",
"Iteration: 36% 2036/5625 [01:22<02:53, 20.67it/s]\u001b[A\n",
"Iteration: 36% 2040/5625 [01:22<02:32, 23.48it/s]\u001b[A\n",
"Iteration: 36% 2044/5625 [01:23<02:15, 26.37it/s]\u001b[A\n",
"Iteration: 36% 2048/5625 [01:23<02:05, 28.48it/s]\u001b[A\n",
"Iteration: 36% 2052/5625 [01:23<01:57, 30.31it/s]\u001b[A\n",
"Iteration: 37% 2056/5625 [01:23<01:51, 31.93it/s]\u001b[A\n",
"Iteration: 37% 2060/5625 [01:23<01:46, 33.41it/s]\u001b[A\n",
"Iteration: 37% 2064/5625 [01:23<01:45, 33.88it/s]\u001b[A\n",
"Iteration: 37% 2068/5625 [01:23<01:43, 34.26it/s]\u001b[A\n",
"Iteration: 37% 2072/5625 [01:23<01:42, 34.75it/s]\u001b[A\n",
"Iteration: 37% 2076/5625 [01:23<01:40, 35.20it/s]\u001b[A\n",
"Iteration: 37% 2080/5625 [01:24<01:39, 35.79it/s]\u001b[A\n",
"Iteration: 37% 2084/5625 [01:24<01:39, 35.63it/s]\u001b[A\n",
"Iteration: 37% 2088/5625 [01:24<01:37, 36.25it/s]\u001b[A\n",
"Iteration: 37% 2092/5625 [01:24<02:22, 24.75it/s]\u001b[A\n",
"Iteration: 37% 2096/5625 [01:24<02:08, 27.46it/s]\u001b[A\n",
"Iteration: 37% 2100/5625 [01:24<01:58, 29.79it/s]\u001b[A\n",
"Iteration: 37% 2104/5625 [01:24<01:52, 31.37it/s]\u001b[A\n",
"Iteration: 37% 2108/5625 [01:24<01:46, 32.99it/s]\u001b[A\n",
"Iteration: 38% 2112/5625 [01:25<01:43, 34.10it/s]\u001b[A\n",
"Iteration: 38% 2116/5625 [01:25<01:41, 34.49it/s]\u001b[A\n",
"Iteration: 38% 2120/5625 [01:25<01:39, 35.19it/s]\u001b[A\n",
"Iteration: 38% 2124/5625 [01:25<01:39, 35.28it/s]\u001b[A\n",
"Iteration: 38% 2128/5625 [01:25<01:39, 34.99it/s]\u001b[A\n",
"Iteration: 38% 2132/5625 [01:25<01:39, 35.02it/s]\u001b[A\n",
"Iteration: 38% 2136/5625 [01:25<01:38, 35.28it/s]\u001b[A\n",
"Iteration: 38% 2140/5625 [01:25<01:37, 35.65it/s]\u001b[A\n",
"Iteration: 38% 2144/5625 [01:25<01:40, 34.74it/s]\u001b[A\n",
"Iteration: 38% 2148/5625 [01:26<01:38, 35.38it/s]\u001b[A\n",
"Iteration: 38% 2152/5625 [01:26<01:37, 35.51it/s]\u001b[A\n",
"Iteration: 38% 2156/5625 [01:26<01:36, 35.85it/s]\u001b[A\n",
"Iteration: 38% 2160/5625 [01:26<01:36, 36.04it/s]\u001b[A\n",
"Iteration: 38% 2164/5625 [01:26<01:37, 35.47it/s]\u001b[A\n",
"Iteration: 39% 2168/5625 [01:26<01:37, 35.29it/s]\u001b[A\n",
"Iteration: 39% 2172/5625 [01:26<01:37, 35.30it/s]\u001b[A\n",
"Iteration: 39% 2176/5625 [01:26<01:35, 35.94it/s]\u001b[A\n",
"Iteration: 39% 2180/5625 [01:26<01:37, 35.51it/s]\u001b[A\n",
"Iteration: 39% 2184/5625 [01:27<01:36, 35.55it/s]\u001b[A\n",
"Iteration: 39% 2188/5625 [01:27<01:37, 35.29it/s]\u001b[A\n",
"Iteration: 39% 2192/5625 [01:27<01:35, 35.82it/s]\u001b[A\n",
"Iteration: 39% 2196/5625 [01:27<01:35, 36.04it/s]\u001b[A\n",
"Iteration: 39% 2200/5625 [01:27<01:35, 35.69it/s]\u001b[A\n",
"Iteration: 39% 2204/5625 [01:27<01:35, 35.89it/s]\u001b[A\n",
"Iteration: 39% 2208/5625 [01:27<01:35, 35.81it/s]\u001b[A\n",
"Iteration: 39% 2212/5625 [01:27<01:34, 36.04it/s]\u001b[A\n",
"Iteration: 39% 2216/5625 [01:27<01:33, 36.60it/s]\u001b[A\n",
"Iteration: 39% 2220/5625 [01:28<01:32, 36.87it/s]\u001b[A\n",
"Iteration: 40% 2224/5625 [01:28<01:33, 36.28it/s]\u001b[A\n",
"Iteration: 40% 2228/5625 [01:28<01:36, 35.13it/s]\u001b[A\n",
"Iteration: 40% 2232/5625 [01:28<01:35, 35.71it/s]\u001b[A\n",
"Iteration: 40% 2236/5625 [01:28<01:34, 36.02it/s]\u001b[A\n",
"Iteration: 40% 2240/5625 [01:28<01:35, 35.55it/s]\u001b[A\n",
"Iteration: 40% 2244/5625 [01:28<01:33, 36.01it/s]\u001b[A\n",
"Iteration: 40% 2248/5625 [01:28<01:33, 36.27it/s]\u001b[A\n",
"Iteration: 40% 2252/5625 [01:28<01:33, 35.94it/s]\u001b[A\n",
"Iteration: 40% 2256/5625 [01:29<01:37, 34.56it/s]\u001b[A\n",
"Iteration: 40% 2260/5625 [01:29<01:42, 32.87it/s]\u001b[A\n",
"Iteration: 40% 2264/5625 [01:29<01:43, 32.44it/s]\u001b[A\n",
"Iteration: 40% 2268/5625 [01:29<01:43, 32.35it/s]\u001b[A\n",
"Iteration: 40% 2272/5625 [01:29<01:44, 32.02it/s]\u001b[A\n",
"Iteration: 40% 2276/5625 [01:29<01:44, 32.10it/s]\u001b[A\n",
"Iteration: 41% 2280/5625 [01:29<01:45, 31.72it/s]\u001b[A\n",
"Iteration: 41% 2284/5625 [01:29<01:41, 32.89it/s]\u001b[A\n",
"Iteration: 41% 2288/5625 [01:30<01:37, 34.14it/s]\u001b[A\n",
"Iteration: 41% 2292/5625 [01:30<01:37, 34.21it/s]\u001b[A\n",
"Iteration: 41% 2296/5625 [01:30<01:36, 34.52it/s]\u001b[A\n",
"Iteration: 41% 2300/5625 [01:30<01:35, 34.90it/s]\u001b[A\n",
"Iteration: 41% 2304/5625 [01:30<01:33, 35.50it/s]\u001b[A\n",
"Iteration: 41% 2308/5625 [01:30<01:32, 35.96it/s]\u001b[A\n",
"Iteration: 41% 2312/5625 [01:30<01:33, 35.61it/s]\u001b[A\n",
"Iteration: 41% 2316/5625 [01:30<01:31, 36.11it/s]\u001b[A\n",
"Iteration: 41% 2320/5625 [01:30<01:32, 35.88it/s]\u001b[A\n",
"Iteration: 41% 2324/5625 [01:31<01:31, 36.13it/s]\u001b[A\n",
"Iteration: 41% 2328/5625 [01:31<01:30, 36.41it/s]\u001b[A\n",
"Iteration: 41% 2332/5625 [01:31<01:32, 35.75it/s]\u001b[A\n",
"Iteration: 42% 2336/5625 [01:31<01:32, 35.75it/s]\u001b[A\n",
"Iteration: 42% 2340/5625 [01:31<01:31, 35.97it/s]\u001b[A\n",
"Iteration: 42% 2344/5625 [01:31<01:30, 36.24it/s]\u001b[A\n",
"Iteration: 42% 2348/5625 [01:31<01:30, 36.17it/s]\u001b[A\n",
"Iteration: 42% 2352/5625 [01:31<01:30, 36.15it/s]\u001b[A\n",
"Iteration: 42% 2356/5625 [01:31<01:29, 36.36it/s]\u001b[A\n",
"Iteration: 42% 2360/5625 [01:32<01:29, 36.40it/s]\u001b[A\n",
"Iteration: 42% 2364/5625 [01:32<01:28, 36.80it/s]\u001b[A\n",
"Iteration: 42% 2368/5625 [01:32<01:28, 36.77it/s]\u001b[A\n",
"Iteration: 42% 2372/5625 [01:32<01:28, 36.84it/s]\u001b[A\n",
"Iteration: 42% 2376/5625 [01:32<01:28, 36.67it/s]\u001b[A\n",
"Iteration: 42% 2380/5625 [01:32<01:28, 36.61it/s]\u001b[A\n",
"Iteration: 42% 2384/5625 [01:32<01:29, 36.15it/s]\u001b[A\n",
"Iteration: 42% 2388/5625 [01:32<01:34, 34.31it/s]\u001b[A\n",
"Iteration: 43% 2392/5625 [01:32<01:36, 33.50it/s]\u001b[A\n",
"Iteration: 43% 2396/5625 [01:33<01:39, 32.59it/s]\u001b[A\n",
"Iteration: 43% 2400/5625 [01:33<01:43, 31.21it/s]\u001b[A\n",
"Iteration: 43% 2404/5625 [01:33<01:43, 31.20it/s]\u001b[A\n",
"Iteration: 43% 2408/5625 [01:33<01:46, 30.30it/s]\u001b[A\n",
"Iteration: 43% 2412/5625 [01:33<01:41, 31.66it/s]\u001b[A\n",
"Iteration: 43% 2416/5625 [01:33<01:37, 32.77it/s]\u001b[A\n",
"Iteration: 43% 2420/5625 [01:33<01:36, 33.36it/s]\u001b[A\n",
"Iteration: 43% 2424/5625 [01:33<01:33, 34.08it/s]\u001b[A\n",
"Iteration: 43% 2428/5625 [01:34<01:32, 34.59it/s]\u001b[A\n",
"Iteration: 43% 2432/5625 [01:34<01:31, 34.93it/s]\u001b[A\n",
"Iteration: 43% 2436/5625 [01:34<01:30, 35.27it/s]\u001b[A\n",
"Iteration: 43% 2440/5625 [01:34<01:29, 35.50it/s]\u001b[A\n",
"Iteration: 43% 2444/5625 [01:34<01:28, 35.99it/s]\u001b[A\n",
"Iteration: 44% 2448/5625 [01:34<01:27, 36.15it/s]\u001b[A\n",
"Iteration: 44% 2452/5625 [01:34<01:27, 36.43it/s]\u001b[A\n",
"Iteration: 44% 2456/5625 [01:34<01:27, 36.08it/s]\u001b[A\n",
"Iteration: 44% 2460/5625 [01:34<01:28, 35.69it/s]\u001b[A\n",
"Iteration: 44% 2464/5625 [01:35<01:27, 36.31it/s]\u001b[A\n",
"Iteration: 44% 2468/5625 [01:35<01:26, 36.66it/s]\u001b[A\n",
"Iteration: 44% 2472/5625 [01:35<01:27, 35.87it/s]\u001b[A\n",
"Iteration: 44% 2476/5625 [01:35<01:27, 36.10it/s]\u001b[A\n",
"Iteration: 44% 2480/5625 [01:35<01:26, 36.48it/s]\u001b[A\n",
"Iteration: 44% 2484/5625 [01:35<01:25, 36.55it/s]\u001b[A\n",
"Iteration: 44% 2488/5625 [01:35<01:25, 36.69it/s]\u001b[A\n",
"Iteration: 44% 2492/5625 [01:35<01:25, 36.43it/s]\u001b[A\n",
"Iteration: 44% 2496/5625 [01:35<01:26, 36.04it/s]\u001b[A02/21/2020 12:35:12 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:35:13 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:35:13 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:35:13 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 129.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 130.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 7% 41/625 [00:00<00:04, 129.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 54/625 [00:00<00:04, 128.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 68/625 [00:00<00:04, 129.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 82/625 [00:00<00:04, 130.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 94/625 [00:00<00:04, 126.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 107/625 [00:00<00:04, 127.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 120/625 [00:00<00:04, 125.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 134/625 [00:01<00:03, 127.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 24% 147/625 [00:01<00:03, 127.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 26% 160/625 [00:01<00:03, 127.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 173/625 [00:01<00:03, 127.99it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 187/625 [00:01<00:03, 127.88it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 200/625 [00:01<00:03, 127.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 213/625 [00:01<00:03, 128.02it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 126.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 253/625 [00:01<00:02, 127.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 43% 266/625 [00:02<00:02, 128.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 45% 279/625 [00:02<00:02, 126.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 292/625 [00:02<00:02, 126.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 126.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 318/625 [00:02<00:02, 127.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 331/625 [00:02<00:02, 128.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 344/625 [00:02<00:02, 127.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 357/625 [00:02<00:02, 126.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 371/625 [00:02<00:01, 128.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 384/625 [00:03<00:01, 126.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 398/625 [00:03<00:01, 128.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 411/625 [00:03<00:01, 128.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 425/625 [00:03<00:01, 129.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 438/625 [00:03<00:01, 129.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 452/625 [00:03<00:01, 129.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 465/625 [00:03<00:01, 128.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 478/625 [00:03<00:01, 128.31it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 491/625 [00:03<00:01, 125.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 504/625 [00:03<00:00, 125.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 518/625 [00:04<00:00, 127.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 127.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 546/625 [00:04<00:00, 128.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 560/625 [00:04<00:00, 129.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 573/625 [00:04<00:00, 128.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 586/625 [00:04<00:00, 124.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 599/625 [00:04<00:00, 121.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 612/625 [00:04<00:00, 123.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.48it/s]\u001b[A\u001b[A02/21/2020 12:35:18 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:35:18 - INFO - __main__ - perplexity = tensor(799.5935)\n",
"\n",
"Iteration: 44% 2500/5625 [01:42<26:47, 1.94it/s]\u001b[A\n",
"Iteration: 45% 2504/5625 [01:42<19:11, 2.71it/s]\u001b[A\n",
"Iteration: 45% 2508/5625 [01:42<13:50, 3.75it/s]\u001b[A\n",
"Iteration: 45% 2512/5625 [01:42<10:05, 5.14it/s]\u001b[A\n",
"Iteration: 45% 2516/5625 [01:43<07:28, 6.93it/s]\u001b[A\n",
"Iteration: 45% 2520/5625 [01:43<05:39, 9.15it/s]\u001b[A\n",
"Iteration: 45% 2524/5625 [01:43<04:21, 11.86it/s]\u001b[A\n",
"Iteration: 45% 2528/5625 [01:43<03:27, 14.95it/s]\u001b[A\n",
"Iteration: 45% 2532/5625 [01:43<02:51, 18.04it/s]\u001b[A\n",
"Iteration: 45% 2536/5625 [01:43<02:25, 21.29it/s]\u001b[A\n",
"Iteration: 45% 2540/5625 [01:43<02:05, 24.52it/s]\u001b[A\n",
"Iteration: 45% 2544/5625 [01:43<01:52, 27.31it/s]\u001b[A\n",
"Iteration: 45% 2548/5625 [01:43<01:43, 29.79it/s]\u001b[A\n",
"Iteration: 45% 2552/5625 [01:43<01:36, 31.72it/s]\u001b[A\n",
"Iteration: 45% 2556/5625 [01:44<01:36, 31.92it/s]\u001b[A\n",
"Iteration: 46% 2560/5625 [01:44<01:32, 33.20it/s]\u001b[A\n",
"Iteration: 46% 2564/5625 [01:44<01:30, 33.98it/s]\u001b[A\n",
"Iteration: 46% 2568/5625 [01:44<01:28, 34.63it/s]\u001b[A\n",
"Iteration: 46% 2572/5625 [01:44<01:28, 34.69it/s]\u001b[A\n",
"Iteration: 46% 2576/5625 [01:44<01:28, 34.44it/s]\u001b[A\n",
"Iteration: 46% 2580/5625 [01:44<01:27, 34.89it/s]\u001b[A\n",
"Iteration: 46% 2584/5625 [01:44<01:27, 34.95it/s]\u001b[A\n",
"Iteration: 46% 2588/5625 [01:45<01:25, 35.42it/s]\u001b[A\n",
"Iteration: 46% 2592/5625 [01:45<01:24, 35.71it/s]\u001b[A\n",
"Iteration: 46% 2596/5625 [01:45<01:26, 35.05it/s]\u001b[A\n",
"Iteration: 46% 2600/5625 [01:45<01:24, 35.74it/s]\u001b[A\n",
"Iteration: 46% 2604/5625 [01:45<01:23, 36.17it/s]\u001b[A\n",
"Iteration: 46% 2608/5625 [01:45<01:24, 35.68it/s]\u001b[A\n",
"Iteration: 46% 2612/5625 [01:45<01:23, 36.28it/s]\u001b[A\n",
"Iteration: 47% 2616/5625 [01:45<01:22, 36.52it/s]\u001b[A\n",
"Iteration: 47% 2620/5625 [01:45<01:22, 36.54it/s]\u001b[A\n",
"Iteration: 47% 2624/5625 [01:45<01:22, 36.59it/s]\u001b[A\n",
"Iteration: 47% 2628/5625 [01:46<01:20, 37.05it/s]\u001b[A\n",
"Iteration: 47% 2632/5625 [01:46<01:21, 36.91it/s]\u001b[A\n",
"Iteration: 47% 2636/5625 [01:46<01:20, 37.20it/s]\u001b[A\n",
"Iteration: 47% 2640/5625 [01:46<01:20, 36.97it/s]\u001b[A\n",
"Iteration: 47% 2644/5625 [01:46<01:22, 36.23it/s]\u001b[A\n",
"Iteration: 47% 2648/5625 [01:46<01:21, 36.35it/s]\u001b[A\n",
"Iteration: 47% 2652/5625 [01:46<01:21, 36.39it/s]\u001b[A\n",
"Iteration: 47% 2656/5625 [01:46<01:23, 35.46it/s]\u001b[A\n",
"Iteration: 47% 2660/5625 [01:46<01:22, 35.78it/s]\u001b[A\n",
"Iteration: 47% 2664/5625 [01:47<01:22, 35.76it/s]\u001b[A\n",
"Iteration: 47% 2668/5625 [01:47<01:22, 35.90it/s]\u001b[A\n",
"Iteration: 48% 2672/5625 [01:47<01:21, 36.07it/s]\u001b[A\n",
"Iteration: 48% 2676/5625 [01:47<01:21, 36.40it/s]\u001b[A\n",
"Iteration: 48% 2680/5625 [01:47<01:22, 35.84it/s]\u001b[A\n",
"Iteration: 48% 2684/5625 [01:47<01:22, 35.82it/s]\u001b[A\n",
"Iteration: 48% 2688/5625 [01:47<01:21, 36.20it/s]\u001b[A\n",
"Iteration: 48% 2692/5625 [01:47<01:20, 36.57it/s]\u001b[A\n",
"Iteration: 48% 2696/5625 [01:47<01:20, 36.30it/s]\u001b[A\n",
"Iteration: 48% 2700/5625 [01:48<01:21, 36.06it/s]\u001b[A\n",
"Iteration: 48% 2704/5625 [01:48<01:20, 36.12it/s]\u001b[A\n",
"Iteration: 48% 2708/5625 [01:48<01:19, 36.53it/s]\u001b[A\n",
"Iteration: 48% 2712/5625 [01:48<01:19, 36.84it/s]\u001b[A\n",
"Iteration: 48% 2716/5625 [01:48<01:18, 36.90it/s]\u001b[A\n",
"Iteration: 48% 2720/5625 [01:48<01:20, 36.21it/s]\u001b[A\n",
"Iteration: 48% 2724/5625 [01:48<01:19, 36.30it/s]\u001b[A\n",
"Iteration: 48% 2728/5625 [01:48<01:20, 35.79it/s]\u001b[A\n",
"Iteration: 49% 2732/5625 [01:48<01:19, 36.50it/s]\u001b[A\n",
"Iteration: 49% 2736/5625 [01:49<01:19, 36.53it/s]\u001b[A\n",
"Iteration: 49% 2740/5625 [01:49<01:18, 36.69it/s]\u001b[A\n",
"Iteration: 49% 2744/5625 [01:49<01:18, 36.77it/s]\u001b[A\n",
"Iteration: 49% 2748/5625 [01:49<01:18, 36.69it/s]\u001b[A\n",
"Iteration: 49% 2752/5625 [01:49<01:17, 36.85it/s]\u001b[A\n",
"Iteration: 49% 2756/5625 [01:49<01:18, 36.41it/s]\u001b[A\n",
"Iteration: 49% 2760/5625 [01:49<01:18, 36.34it/s]\u001b[A\n",
"Iteration: 49% 2764/5625 [01:49<01:18, 36.63it/s]\u001b[A\n",
"Iteration: 49% 2768/5625 [01:49<01:17, 36.71it/s]\u001b[A\n",
"Iteration: 49% 2772/5625 [01:50<01:18, 36.49it/s]\u001b[A\n",
"Iteration: 49% 2776/5625 [01:50<01:17, 36.94it/s]\u001b[A\n",
"Iteration: 49% 2780/5625 [01:50<01:17, 36.86it/s]\u001b[A\n",
"Iteration: 49% 2784/5625 [01:50<01:17, 36.55it/s]\u001b[A\n",
"Iteration: 50% 2788/5625 [01:50<01:18, 35.97it/s]\u001b[A\n",
"Iteration: 50% 2792/5625 [01:50<01:19, 35.74it/s]\u001b[A\n",
"Iteration: 50% 2796/5625 [01:50<01:18, 36.03it/s]\u001b[A\n",
"Iteration: 50% 2800/5625 [01:50<01:17, 36.37it/s]\u001b[A\n",
"Iteration: 50% 2804/5625 [01:50<01:16, 36.73it/s]\u001b[A\n",
"Iteration: 50% 2808/5625 [01:51<01:17, 36.50it/s]\u001b[A\n",
"Iteration: 50% 2812/5625 [01:51<01:16, 36.83it/s]\u001b[A\n",
"Iteration: 50% 2816/5625 [01:51<01:15, 36.97it/s]\u001b[A\n",
"Iteration: 50% 2820/5625 [01:51<01:15, 37.04it/s]\u001b[A\n",
"Iteration: 50% 2824/5625 [01:51<01:14, 37.44it/s]\u001b[A\n",
"Iteration: 50% 2828/5625 [01:51<01:14, 37.55it/s]\u001b[A\n",
"Iteration: 50% 2832/5625 [01:51<01:15, 36.98it/s]\u001b[A\n",
"Iteration: 50% 2836/5625 [01:51<01:15, 36.78it/s]\u001b[A\n",
"Iteration: 50% 2840/5625 [01:51<01:16, 36.23it/s]\u001b[A\n",
"Iteration: 51% 2844/5625 [01:52<01:16, 36.37it/s]\u001b[A\n",
"Iteration: 51% 2848/5625 [01:52<01:15, 36.74it/s]\u001b[A\n",
"Iteration: 51% 2852/5625 [01:52<01:15, 36.72it/s]\u001b[A\n",
"Iteration: 51% 2856/5625 [01:52<01:15, 36.72it/s]\u001b[A\n",
"Iteration: 51% 2860/5625 [01:52<01:14, 37.16it/s]\u001b[A\n",
"Iteration: 51% 2864/5625 [01:52<01:13, 37.32it/s]\u001b[A\n",
"Iteration: 51% 2868/5625 [01:52<01:15, 36.55it/s]\u001b[A\n",
"Iteration: 51% 2872/5625 [01:52<01:15, 36.68it/s]\u001b[A\n",
"Iteration: 51% 2876/5625 [01:52<01:14, 36.68it/s]\u001b[A\n",
"Iteration: 51% 2880/5625 [01:53<01:15, 36.33it/s]\u001b[A\n",
"Iteration: 51% 2884/5625 [01:53<01:14, 36.63it/s]\u001b[A\n",
"Iteration: 51% 2888/5625 [01:53<01:14, 36.91it/s]\u001b[A\n",
"Iteration: 51% 2892/5625 [01:53<01:13, 37.06it/s]\u001b[A\n",
"Iteration: 51% 2896/5625 [01:53<01:13, 37.04it/s]\u001b[A\n",
"Iteration: 52% 2900/5625 [01:53<01:13, 37.04it/s]\u001b[A\n",
"Iteration: 52% 2904/5625 [01:53<01:14, 36.60it/s]\u001b[A\n",
"Iteration: 52% 2908/5625 [01:53<01:15, 36.06it/s]\u001b[A\n",
"Iteration: 52% 2912/5625 [01:53<01:14, 36.48it/s]\u001b[A\n",
"Iteration: 52% 2916/5625 [01:53<01:14, 36.50it/s]\u001b[A\n",
"Iteration: 52% 2920/5625 [01:54<01:13, 36.85it/s]\u001b[A\n",
"Iteration: 52% 2924/5625 [01:54<01:13, 36.94it/s]\u001b[A\n",
"Iteration: 52% 2928/5625 [01:54<01:13, 36.58it/s]\u001b[A\n",
"Iteration: 52% 2932/5625 [01:54<01:13, 36.88it/s]\u001b[A\n",
"Iteration: 52% 2936/5625 [01:54<01:12, 37.20it/s]\u001b[A\n",
"Iteration: 52% 2940/5625 [01:54<01:13, 36.71it/s]\u001b[A\n",
"Iteration: 52% 2944/5625 [01:54<01:13, 36.33it/s]\u001b[A\n",
"Iteration: 52% 2948/5625 [01:54<01:12, 36.85it/s]\u001b[A\n",
"Iteration: 52% 2952/5625 [01:54<01:12, 36.82it/s]\u001b[A\n",
"Iteration: 53% 2956/5625 [01:55<01:12, 36.87it/s]\u001b[A\n",
"Iteration: 53% 2960/5625 [01:55<01:11, 37.04it/s]\u001b[A\n",
"Iteration: 53% 2964/5625 [01:55<01:13, 36.31it/s]\u001b[A\n",
"Iteration: 53% 2968/5625 [01:55<01:13, 36.31it/s]\u001b[A\n",
"Iteration: 53% 2972/5625 [01:55<01:12, 36.75it/s]\u001b[A\n",
"Iteration: 53% 2976/5625 [01:55<01:12, 36.78it/s]\u001b[A\n",
"Iteration: 53% 2980/5625 [01:55<01:12, 36.61it/s]\u001b[A\n",
"Iteration: 53% 2984/5625 [01:55<01:12, 36.59it/s]\u001b[A\n",
"Iteration: 53% 2988/5625 [01:55<01:11, 36.72it/s]\u001b[A\n",
"Iteration: 53% 2992/5625 [01:56<01:11, 36.66it/s]\u001b[A\n",
"Iteration: 53% 2996/5625 [01:56<01:11, 36.98it/s]\u001b[A02/21/2020 12:35:32 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:35:34 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:35:34 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:35:34 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 131.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 131.37it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 127.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 127.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 128.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 126.99it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 105/625 [00:00<00:04, 127.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 118/625 [00:00<00:04, 126.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 132/625 [00:01<00:03, 127.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 145/625 [00:01<00:03, 127.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 158/625 [00:01<00:03, 127.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 171/625 [00:01<00:03, 127.46it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 184/625 [00:01<00:03, 127.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 197/625 [00:01<00:03, 127.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 210/625 [00:01<00:03, 123.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 223/625 [00:01<00:03, 125.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 237/625 [00:01<00:03, 126.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 250/625 [00:01<00:03, 124.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 264/625 [00:02<00:02, 126.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 277/625 [00:02<00:02, 126.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 290/625 [00:02<00:02, 126.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 304/625 [00:02<00:02, 127.81it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 317/625 [00:02<00:02, 127.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 330/625 [00:02<00:02, 127.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 344/625 [00:02<00:02, 128.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 357/625 [00:02<00:02, 129.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 370/625 [00:02<00:02, 126.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 383/625 [00:03<00:01, 124.57it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 397/625 [00:03<00:01, 126.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 411/625 [00:03<00:01, 127.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 425/625 [00:03<00:01, 129.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 438/625 [00:03<00:01, 129.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 451/625 [00:03<00:01, 128.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 464/625 [00:03<00:01, 127.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 477/625 [00:03<00:01, 127.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 490/625 [00:03<00:01, 128.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 503/625 [00:03<00:00, 128.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 516/625 [00:04<00:00, 126.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 529/625 [00:04<00:00, 127.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 542/625 [00:04<00:00, 125.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 556/625 [00:04<00:00, 127.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 569/625 [00:04<00:00, 128.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 582/625 [00:04<00:00, 127.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 595/625 [00:04<00:00, 127.42it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 609/625 [00:04<00:00, 128.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 623/625 [00:04<00:00, 129.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.58it/s]\u001b[A\u001b[A02/21/2020 12:35:38 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:35:38 - INFO - __main__ - perplexity = tensor(787.9789)\n",
"\n",
"Iteration: 53% 3000/5625 [02:02<22:25, 1.95it/s]\u001b[A\n",
"Iteration: 53% 3004/5625 [02:02<16:02, 2.72it/s]\u001b[A\n",
"Iteration: 53% 3008/5625 [02:02<11:34, 3.77it/s]\u001b[A\n",
"Iteration: 54% 3012/5625 [02:03<08:27, 5.15it/s]\u001b[A\n",
"Iteration: 54% 3016/5625 [02:03<06:15, 6.94it/s]\u001b[A\n",
"Iteration: 54% 3020/5625 [02:03<04:44, 9.16it/s]\u001b[A\n",
"Iteration: 54% 3024/5625 [02:03<03:39, 11.85it/s]\u001b[A\n",
"Iteration: 54% 3028/5625 [02:03<02:55, 14.81it/s]\u001b[A\n",
"Iteration: 54% 3032/5625 [02:03<02:23, 18.12it/s]\u001b[A\n",
"Iteration: 54% 3036/5625 [02:03<02:00, 21.49it/s]\u001b[A\n",
"Iteration: 54% 3040/5625 [02:03<01:45, 24.47it/s]\u001b[A\n",
"Iteration: 54% 3044/5625 [02:03<01:37, 26.54it/s]\u001b[A\n",
"Iteration: 54% 3048/5625 [02:04<01:28, 29.09it/s]\u001b[A\n",
"Iteration: 54% 3052/5625 [02:04<01:22, 31.06it/s]\u001b[A\n",
"Iteration: 54% 3056/5625 [02:04<01:18, 32.84it/s]\u001b[A\n",
"Iteration: 54% 3060/5625 [02:04<01:15, 34.19it/s]\u001b[A\n",
"Iteration: 54% 3064/5625 [02:04<01:13, 34.76it/s]\u001b[A\n",
"Iteration: 55% 3068/5625 [02:04<01:11, 35.53it/s]\u001b[A\n",
"Iteration: 55% 3072/5625 [02:04<01:10, 36.26it/s]\u001b[A\n",
"Iteration: 55% 3076/5625 [02:04<01:09, 36.49it/s]\u001b[A\n",
"Iteration: 55% 3080/5625 [02:04<01:12, 35.20it/s]\u001b[A\n",
"Iteration: 55% 3084/5625 [02:05<01:11, 35.74it/s]\u001b[A\n",
"Iteration: 55% 3088/5625 [02:05<01:10, 35.94it/s]\u001b[A\n",
"Iteration: 55% 3092/5625 [02:05<01:09, 36.22it/s]\u001b[A\n",
"Iteration: 55% 3096/5625 [02:05<01:09, 36.35it/s]\u001b[A\n",
"Iteration: 55% 3100/5625 [02:05<01:09, 36.25it/s]\u001b[A\n",
"Iteration: 55% 3104/5625 [02:05<01:09, 36.39it/s]\u001b[A\n",
"Iteration: 55% 3108/5625 [02:05<01:09, 36.45it/s]\u001b[A\n",
"Iteration: 55% 3112/5625 [02:05<01:08, 36.65it/s]\u001b[A\n",
"Iteration: 55% 3116/5625 [02:05<01:11, 34.94it/s]\u001b[A\n",
"Iteration: 55% 3120/5625 [02:06<01:11, 35.05it/s]\u001b[A\n",
"Iteration: 56% 3124/5625 [02:06<01:10, 35.49it/s]\u001b[A\n",
"Iteration: 56% 3128/5625 [02:06<01:08, 36.24it/s]\u001b[A\n",
"Iteration: 56% 3132/5625 [02:06<01:08, 36.22it/s]\u001b[A\n",
"Iteration: 56% 3136/5625 [02:06<01:08, 36.29it/s]\u001b[A\n",
"Iteration: 56% 3140/5625 [02:06<01:07, 36.87it/s]\u001b[A\n",
"Iteration: 56% 3144/5625 [02:06<01:07, 36.58it/s]\u001b[A\n",
"Iteration: 56% 3148/5625 [02:06<01:07, 36.80it/s]\u001b[A\n",
"Iteration: 56% 3152/5625 [02:06<01:07, 36.42it/s]\u001b[A\n",
"Iteration: 56% 3156/5625 [02:07<01:07, 36.32it/s]\u001b[A\n",
"Iteration: 56% 3160/5625 [02:07<01:07, 36.75it/s]\u001b[A\n",
"Iteration: 56% 3164/5625 [02:07<01:07, 36.58it/s]\u001b[A\n",
"Iteration: 56% 3168/5625 [02:07<01:06, 36.83it/s]\u001b[A\n",
"Iteration: 56% 3172/5625 [02:07<01:08, 35.83it/s]\u001b[A\n",
"Iteration: 56% 3176/5625 [02:07<01:07, 36.32it/s]\u001b[A\n",
"Iteration: 57% 3180/5625 [02:07<01:08, 35.92it/s]\u001b[A\n",
"Iteration: 57% 3184/5625 [02:07<01:07, 36.32it/s]\u001b[A\n",
"Iteration: 57% 3188/5625 [02:07<01:06, 36.71it/s]\u001b[A\n",
"Iteration: 57% 3192/5625 [02:08<01:07, 35.97it/s]\u001b[A\n",
"Iteration: 57% 3196/5625 [02:08<01:06, 36.30it/s]\u001b[A\n",
"Iteration: 57% 3200/5625 [02:08<01:05, 36.79it/s]\u001b[A\n",
"Iteration: 57% 3204/5625 [02:08<01:05, 36.82it/s]\u001b[A\n",
"Iteration: 57% 3208/5625 [02:08<01:05, 36.92it/s]\u001b[A\n",
"Iteration: 57% 3212/5625 [02:08<01:05, 37.12it/s]\u001b[A\n",
"Iteration: 57% 3216/5625 [02:08<01:05, 36.99it/s]\u001b[A\n",
"Iteration: 57% 3220/5625 [02:08<01:05, 36.61it/s]\u001b[A\n",
"Iteration: 57% 3224/5625 [02:08<01:04, 36.97it/s]\u001b[A\n",
"Iteration: 57% 3228/5625 [02:09<01:05, 36.46it/s]\u001b[A\n",
"Iteration: 57% 3232/5625 [02:09<01:06, 36.17it/s]\u001b[A\n",
"Iteration: 58% 3236/5625 [02:09<01:06, 35.83it/s]\u001b[A\n",
"Iteration: 58% 3240/5625 [02:09<01:06, 35.83it/s]\u001b[A\n",
"Iteration: 58% 3244/5625 [02:09<01:06, 36.05it/s]\u001b[A\n",
"Iteration: 58% 3248/5625 [02:09<01:05, 36.32it/s]\u001b[A\n",
"Iteration: 58% 3252/5625 [02:09<01:05, 36.45it/s]\u001b[A\n",
"Iteration: 58% 3256/5625 [02:09<01:04, 36.74it/s]\u001b[A\n",
"Iteration: 58% 3260/5625 [02:09<01:05, 36.26it/s]\u001b[A\n",
"Iteration: 58% 3264/5625 [02:10<01:05, 36.01it/s]\u001b[A\n",
"Iteration: 58% 3268/5625 [02:10<01:05, 36.08it/s]\u001b[A\n",
"Iteration: 58% 3272/5625 [02:10<01:04, 36.45it/s]\u001b[A\n",
"Iteration: 58% 3276/5625 [02:10<01:03, 36.91it/s]\u001b[A\n",
"Iteration: 58% 3280/5625 [02:10<01:03, 36.94it/s]\u001b[A\n",
"Iteration: 58% 3284/5625 [02:10<01:03, 36.91it/s]\u001b[A\n",
"Iteration: 58% 3288/5625 [02:10<01:04, 36.29it/s]\u001b[A\n",
"Iteration: 59% 3292/5625 [02:10<01:03, 36.73it/s]\u001b[A\n",
"Iteration: 59% 3296/5625 [02:10<01:03, 36.57it/s]\u001b[A\n",
"Iteration: 59% 3300/5625 [02:10<01:02, 36.91it/s]\u001b[A\n",
"Iteration: 59% 3304/5625 [02:11<01:03, 36.52it/s]\u001b[A\n",
"Iteration: 59% 3308/5625 [02:11<01:03, 36.60it/s]\u001b[A\n",
"Iteration: 59% 3312/5625 [02:11<01:02, 36.82it/s]\u001b[A\n",
"Iteration: 59% 3316/5625 [02:11<01:02, 36.85it/s]\u001b[A\n",
"Iteration: 59% 3320/5625 [02:11<01:02, 36.79it/s]\u001b[A\n",
"Iteration: 59% 3324/5625 [02:11<01:02, 36.81it/s]\u001b[A\n",
"Iteration: 59% 3328/5625 [02:11<01:02, 36.53it/s]\u001b[A\n",
"Iteration: 59% 3332/5625 [02:11<01:02, 36.52it/s]\u001b[A\n",
"Iteration: 59% 3336/5625 [02:11<01:02, 36.50it/s]\u001b[A\n",
"Iteration: 59% 3340/5625 [02:12<01:03, 36.17it/s]\u001b[A\n",
"Iteration: 59% 3344/5625 [02:12<01:03, 36.06it/s]\u001b[A\n",
"Iteration: 60% 3348/5625 [02:12<01:04, 35.19it/s]\u001b[A\n",
"Iteration: 60% 3352/5625 [02:12<01:04, 35.05it/s]\u001b[A\n",
"Iteration: 60% 3356/5625 [02:12<01:03, 35.80it/s]\u001b[A\n",
"Iteration: 60% 3360/5625 [02:12<01:02, 35.99it/s]\u001b[A\n",
"Iteration: 60% 3364/5625 [02:12<01:02, 36.05it/s]\u001b[A\n",
"Iteration: 60% 3368/5625 [02:12<01:02, 36.25it/s]\u001b[A\n",
"Iteration: 60% 3372/5625 [02:12<01:01, 36.52it/s]\u001b[A\n",
"Iteration: 60% 3376/5625 [02:13<01:02, 35.72it/s]\u001b[A\n",
"Iteration: 60% 3380/5625 [02:13<01:01, 36.38it/s]\u001b[A\n",
"Iteration: 60% 3384/5625 [02:13<01:01, 36.54it/s]\u001b[A\n",
"Iteration: 60% 3388/5625 [02:13<01:00, 36.69it/s]\u001b[A\n",
"Iteration: 60% 3392/5625 [02:13<01:00, 37.06it/s]\u001b[A\n",
"Iteration: 60% 3396/5625 [02:13<01:00, 36.95it/s]\u001b[A\n",
"Iteration: 60% 3400/5625 [02:13<00:59, 37.46it/s]\u001b[A\n",
"Iteration: 61% 3404/5625 [02:13<00:59, 37.44it/s]\u001b[A\n",
"Iteration: 61% 3408/5625 [02:13<00:59, 37.00it/s]\u001b[A\n",
"Iteration: 61% 3412/5625 [02:14<00:59, 36.90it/s]\u001b[A\n",
"Iteration: 61% 3416/5625 [02:14<01:00, 36.34it/s]\u001b[A\n",
"Iteration: 61% 3420/5625 [02:14<00:59, 36.79it/s]\u001b[A\n",
"Iteration: 61% 3424/5625 [02:14<00:59, 36.70it/s]\u001b[A\n",
"Iteration: 61% 3428/5625 [02:14<00:59, 36.67it/s]\u001b[A\n",
"Iteration: 61% 3432/5625 [02:14<01:01, 35.82it/s]\u001b[A\n",
"Iteration: 61% 3436/5625 [02:14<01:00, 35.92it/s]\u001b[A\n",
"Iteration: 61% 3440/5625 [02:14<00:59, 36.48it/s]\u001b[A\n",
"Iteration: 61% 3444/5625 [02:14<00:59, 36.74it/s]\u001b[A\n",
"Iteration: 61% 3448/5625 [02:15<01:00, 36.07it/s]\u001b[A\n",
"Iteration: 61% 3452/5625 [02:15<01:01, 35.56it/s]\u001b[A\n",
"Iteration: 61% 3456/5625 [02:15<00:59, 36.19it/s]\u001b[A\n",
"Iteration: 62% 3460/5625 [02:15<00:59, 36.16it/s]\u001b[A\n",
"Iteration: 62% 3464/5625 [02:15<00:59, 36.32it/s]\u001b[A\n",
"Iteration: 62% 3468/5625 [02:15<00:58, 36.83it/s]\u001b[A\n",
"Iteration: 62% 3472/5625 [02:15<00:58, 36.70it/s]\u001b[A\n",
"Iteration: 62% 3476/5625 [02:15<00:58, 36.99it/s]\u001b[A\n",
"Iteration: 62% 3480/5625 [02:15<00:57, 37.32it/s]\u001b[A\n",
"Iteration: 62% 3484/5625 [02:16<00:58, 36.84it/s]\u001b[A\n",
"Iteration: 62% 3488/5625 [02:16<00:58, 36.42it/s]\u001b[A\n",
"Iteration: 62% 3492/5625 [02:16<00:58, 36.59it/s]\u001b[A\n",
"Iteration: 62% 3496/5625 [02:16<00:58, 36.22it/s]\u001b[A02/21/2020 12:35:52 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:35:54 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:35:54 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:35:54 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 12/625 [00:00<00:05, 113.35it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 23/625 [00:00<00:05, 111.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 5% 34/625 [00:00<00:05, 110.68it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 7% 46/625 [00:00<00:05, 110.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 57/625 [00:00<00:05, 110.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 69/625 [00:00<00:05, 110.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 81/625 [00:00<00:04, 110.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 108.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 104/625 [00:00<00:04, 110.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 117/625 [00:01<00:04, 115.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 131/625 [00:01<00:04, 120.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 144/625 [00:01<00:03, 122.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 157/625 [00:01<00:03, 121.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 170/625 [00:01<00:03, 123.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 183/625 [00:01<00:03, 125.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 31% 196/625 [00:01<00:03, 125.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 33% 209/625 [00:01<00:03, 125.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 222/625 [00:01<00:03, 125.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 235/625 [00:01<00:03, 125.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 248/625 [00:02<00:03, 123.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 261/625 [00:02<00:02, 125.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 274/625 [00:02<00:02, 126.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 287/625 [00:02<00:02, 126.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 301/625 [00:02<00:02, 128.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 314/625 [00:02<00:02, 128.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 328/625 [00:02<00:02, 129.78it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 342/625 [00:02<00:02, 130.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 356/625 [00:02<00:02, 127.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 370/625 [00:03<00:01, 128.01it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 384/625 [00:03<00:01, 129.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 398/625 [00:03<00:01, 130.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 412/625 [00:03<00:01, 130.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 426/625 [00:03<00:01, 131.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 440/625 [00:03<00:01, 131.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 454/625 [00:03<00:01, 129.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 467/625 [00:03<00:01, 129.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:03<00:01, 125.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:03<00:01, 126.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:04<00:00, 126.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 127.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 533/625 [00:04<00:00, 128.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 546/625 [00:04<00:00, 128.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 560/625 [00:04<00:00, 129.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 573/625 [00:04<00:00, 128.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 586/625 [00:04<00:00, 128.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 599/625 [00:04<00:00, 128.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 612/625 [00:04<00:00, 127.96it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:05<00:00, 124.81it/s]\u001b[A\u001b[A02/21/2020 12:35:59 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:35:59 - INFO - __main__ - perplexity = tensor(798.7228)\n",
"02/21/2020 12:35:59 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-26000/config.json\n",
"02/21/2020 12:35:59 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-26000/pytorch_model.bin\n",
"02/21/2020 12:35:59 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-26000\n",
"02/21/2020 12:35:59 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-22000] due to args.save_total_limit\n",
"02/21/2020 12:35:59 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-26000\n",
"\n",
"Iteration: 62% 3500/5625 [02:23<19:28, 1.82it/s]\u001b[A\n",
"Iteration: 62% 3504/5625 [02:23<13:54, 2.54it/s]\u001b[A\n",
"Iteration: 62% 3508/5625 [02:23<10:00, 3.53it/s]\u001b[A\n",
"Iteration: 62% 3512/5625 [02:23<07:16, 4.84it/s]\u001b[A\n",
"Iteration: 63% 3516/5625 [02:23<05:22, 6.55it/s]\u001b[A\n",
"Iteration: 63% 3520/5625 [02:23<04:02, 8.69it/s]\u001b[A\n",
"Iteration: 63% 3524/5625 [02:24<03:05, 11.30it/s]\u001b[A\n",
"Iteration: 63% 3528/5625 [02:24<02:26, 14.27it/s]\u001b[A\n",
"Iteration: 63% 3532/5625 [02:24<02:00, 17.35it/s]\u001b[A\n",
"Iteration: 63% 3536/5625 [02:24<01:41, 20.58it/s]\u001b[A\n",
"Iteration: 63% 3540/5625 [02:24<01:27, 23.71it/s]\u001b[A\n",
"Iteration: 63% 3544/5625 [02:24<01:18, 26.42it/s]\u001b[A\n",
"Iteration: 63% 3548/5625 [02:24<01:12, 28.67it/s]\u001b[A\n",
"Iteration: 63% 3552/5625 [02:24<01:07, 30.71it/s]\u001b[A\n",
"Iteration: 63% 3556/5625 [02:24<01:04, 32.02it/s]\u001b[A\n",
"Iteration: 63% 3560/5625 [02:25<01:01, 33.56it/s]\u001b[A\n",
"Iteration: 63% 3564/5625 [02:25<00:59, 34.81it/s]\u001b[A\n",
"Iteration: 63% 3568/5625 [02:25<00:58, 35.35it/s]\u001b[A\n",
"Iteration: 64% 3572/5625 [02:25<00:59, 34.47it/s]\u001b[A\n",
"Iteration: 64% 3576/5625 [02:25<00:57, 35.33it/s]\u001b[A\n",
"Iteration: 64% 3580/5625 [02:25<00:57, 35.78it/s]\u001b[A\n",
"Iteration: 64% 3584/5625 [02:25<00:57, 35.67it/s]\u001b[A\n",
"Iteration: 64% 3588/5625 [02:25<00:57, 35.49it/s]\u001b[A\n",
"Iteration: 64% 3592/5625 [02:25<00:57, 35.49it/s]\u001b[A\n",
"Iteration: 64% 3596/5625 [02:26<00:56, 35.99it/s]\u001b[A\n",
"Iteration: 64% 3600/5625 [02:26<00:55, 36.55it/s]\u001b[A\n",
"Iteration: 64% 3604/5625 [02:26<00:55, 36.22it/s]\u001b[A\n",
"Iteration: 64% 3608/5625 [02:26<00:56, 35.92it/s]\u001b[A\n",
"Iteration: 64% 3612/5625 [02:26<00:55, 36.03it/s]\u001b[A\n",
"Iteration: 64% 3616/5625 [02:26<00:55, 36.02it/s]\u001b[A\n",
"Iteration: 64% 3620/5625 [02:26<00:55, 36.32it/s]\u001b[A\n",
"Iteration: 64% 3624/5625 [02:26<00:54, 36.67it/s]\u001b[A\n",
"Iteration: 64% 3628/5625 [02:26<00:55, 36.14it/s]\u001b[A\n",
"Iteration: 65% 3632/5625 [02:27<00:54, 36.29it/s]\u001b[A\n",
"Iteration: 65% 3636/5625 [02:27<00:55, 35.94it/s]\u001b[A\n",
"Iteration: 65% 3640/5625 [02:27<00:54, 36.29it/s]\u001b[A\n",
"Iteration: 65% 3644/5625 [02:27<00:55, 35.85it/s]\u001b[A\n",
"Iteration: 65% 3648/5625 [02:27<00:54, 36.14it/s]\u001b[A\n",
"Iteration: 65% 3652/5625 [02:27<00:55, 35.78it/s]\u001b[A\n",
"Iteration: 65% 3656/5625 [02:27<00:54, 36.32it/s]\u001b[A\n",
"Iteration: 65% 3660/5625 [02:27<00:54, 36.35it/s]\u001b[A\n",
"Iteration: 65% 3664/5625 [02:27<00:53, 36.66it/s]\u001b[A\n",
"Iteration: 65% 3668/5625 [02:28<00:53, 36.59it/s]\u001b[A\n",
"Iteration: 65% 3672/5625 [02:28<00:52, 37.12it/s]\u001b[A\n",
"Iteration: 65% 3676/5625 [02:28<00:52, 37.03it/s]\u001b[A\n",
"Iteration: 65% 3680/5625 [02:28<00:53, 36.54it/s]\u001b[A\n",
"Iteration: 65% 3684/5625 [02:28<00:52, 36.64it/s]\u001b[A\n",
"Iteration: 66% 3688/5625 [02:28<00:52, 37.16it/s]\u001b[A\n",
"Iteration: 66% 3692/5625 [02:28<00:53, 36.15it/s]\u001b[A\n",
"Iteration: 66% 3696/5625 [02:28<00:54, 35.38it/s]\u001b[A\n",
"Iteration: 66% 3700/5625 [02:28<00:53, 35.83it/s]\u001b[A\n",
"Iteration: 66% 3704/5625 [02:29<00:55, 34.48it/s]\u001b[A\n",
"Iteration: 66% 3708/5625 [02:29<00:57, 33.42it/s]\u001b[A\n",
"Iteration: 66% 3712/5625 [02:29<00:57, 33.06it/s]\u001b[A\n",
"Iteration: 66% 3716/5625 [02:29<00:59, 31.85it/s]\u001b[A\n",
"Iteration: 66% 3720/5625 [02:29<00:59, 31.77it/s]\u001b[A\n",
"Iteration: 66% 3724/5625 [02:29<00:59, 31.99it/s]\u001b[A\n",
"Iteration: 66% 3728/5625 [02:29<00:59, 31.95it/s]\u001b[A\n",
"Iteration: 66% 3732/5625 [02:29<00:56, 33.26it/s]\u001b[A\n",
"Iteration: 66% 3736/5625 [02:30<00:55, 34.08it/s]\u001b[A\n",
"Iteration: 66% 3740/5625 [02:30<00:53, 34.96it/s]\u001b[A\n",
"Iteration: 67% 3744/5625 [02:30<00:53, 34.96it/s]\u001b[A\n",
"Iteration: 67% 3748/5625 [02:30<00:53, 35.16it/s]\u001b[A\n",
"Iteration: 67% 3752/5625 [02:30<00:53, 35.30it/s]\u001b[A\n",
"Iteration: 67% 3756/5625 [02:30<00:51, 36.06it/s]\u001b[A\n",
"Iteration: 67% 3760/5625 [02:30<00:51, 36.35it/s]\u001b[A\n",
"Iteration: 67% 3764/5625 [02:30<00:50, 36.64it/s]\u001b[A\n",
"Iteration: 67% 3768/5625 [02:30<00:50, 36.97it/s]\u001b[A\n",
"Iteration: 67% 3772/5625 [02:31<00:50, 36.86it/s]\u001b[A\n",
"Iteration: 67% 3776/5625 [02:31<00:50, 36.71it/s]\u001b[A\n",
"Iteration: 67% 3780/5625 [02:31<00:49, 37.02it/s]\u001b[A\n",
"Iteration: 67% 3784/5625 [02:31<00:49, 37.21it/s]\u001b[A\n",
"Iteration: 67% 3788/5625 [02:31<00:50, 36.25it/s]\u001b[A\n",
"Iteration: 67% 3792/5625 [02:31<00:50, 36.56it/s]\u001b[A\n",
"Iteration: 67% 3796/5625 [02:31<00:49, 36.61it/s]\u001b[A\n",
"Iteration: 68% 3800/5625 [02:31<00:49, 36.89it/s]\u001b[A\n",
"Iteration: 68% 3804/5625 [02:31<00:49, 36.87it/s]\u001b[A\n",
"Iteration: 68% 3808/5625 [02:32<00:49, 36.38it/s]\u001b[A\n",
"Iteration: 68% 3812/5625 [02:32<00:49, 36.94it/s]\u001b[A\n",
"Iteration: 68% 3816/5625 [02:32<00:48, 37.10it/s]\u001b[A\n",
"Iteration: 68% 3820/5625 [02:32<00:48, 37.10it/s]\u001b[A\n",
"Iteration: 68% 3824/5625 [02:32<00:49, 36.60it/s]\u001b[A\n",
"Iteration: 68% 3828/5625 [02:32<00:49, 36.11it/s]\u001b[A\n",
"Iteration: 68% 3832/5625 [02:32<00:49, 36.41it/s]\u001b[A\n",
"Iteration: 68% 3836/5625 [02:32<00:51, 34.76it/s]\u001b[A\n",
"Iteration: 68% 3840/5625 [02:32<00:52, 33.95it/s]\u001b[A\n",
"Iteration: 68% 3844/5625 [02:33<00:54, 32.89it/s]\u001b[A\n",
"Iteration: 68% 3848/5625 [02:33<00:54, 32.42it/s]\u001b[A\n",
"Iteration: 68% 3852/5625 [02:33<00:55, 32.21it/s]\u001b[A\n",
"Iteration: 69% 3856/5625 [02:33<00:55, 32.15it/s]\u001b[A\n",
"Iteration: 69% 3860/5625 [02:33<00:55, 31.95it/s]\u001b[A\n",
"Iteration: 69% 3864/5625 [02:33<00:52, 33.27it/s]\u001b[A\n",
"Iteration: 69% 3868/5625 [02:33<00:50, 34.55it/s]\u001b[A\n",
"Iteration: 69% 3872/5625 [02:33<00:49, 35.61it/s]\u001b[A\n",
"Iteration: 69% 3876/5625 [02:33<00:48, 36.12it/s]\u001b[A\n",
"Iteration: 69% 3880/5625 [02:34<00:48, 36.25it/s]\u001b[A\n",
"Iteration: 69% 3884/5625 [02:34<00:47, 36.33it/s]\u001b[A\n",
"Iteration: 69% 3888/5625 [02:34<00:47, 36.30it/s]\u001b[A\n",
"Iteration: 69% 3892/5625 [02:34<00:47, 36.40it/s]\u001b[A\n",
"Iteration: 69% 3896/5625 [02:34<00:48, 35.92it/s]\u001b[A\n",
"Iteration: 69% 3900/5625 [02:34<00:47, 36.38it/s]\u001b[A\n",
"Iteration: 69% 3904/5625 [02:34<00:46, 36.85it/s]\u001b[A\n",
"Iteration: 69% 3908/5625 [02:34<00:46, 36.87it/s]\u001b[A\n",
"Iteration: 70% 3912/5625 [02:34<00:46, 36.75it/s]\u001b[A\n",
"Iteration: 70% 3916/5625 [02:35<00:46, 36.68it/s]\u001b[A\n",
"Iteration: 70% 3920/5625 [02:35<00:46, 36.48it/s]\u001b[A\n",
"Iteration: 70% 3924/5625 [02:35<00:46, 36.45it/s]\u001b[A\n",
"Iteration: 70% 3928/5625 [02:35<00:45, 36.94it/s]\u001b[A\n",
"Iteration: 70% 3932/5625 [02:35<00:46, 36.38it/s]\u001b[A\n",
"Iteration: 70% 3936/5625 [02:35<00:46, 36.65it/s]\u001b[A\n",
"Iteration: 70% 3940/5625 [02:35<00:46, 36.63it/s]\u001b[A\n",
"Iteration: 70% 3944/5625 [02:35<00:45, 36.95it/s]\u001b[A\n",
"Iteration: 70% 3948/5625 [02:35<00:45, 36.76it/s]\u001b[A\n",
"Iteration: 70% 3952/5625 [02:36<00:45, 36.90it/s]\u001b[A\n",
"Iteration: 70% 3956/5625 [02:36<00:45, 37.01it/s]\u001b[A\n",
"Iteration: 70% 3960/5625 [02:36<00:45, 36.65it/s]\u001b[A\n",
"Iteration: 70% 3964/5625 [02:36<00:45, 36.50it/s]\u001b[A\n",
"Iteration: 71% 3968/5625 [02:36<00:46, 36.01it/s]\u001b[A\n",
"Iteration: 71% 3972/5625 [02:36<00:46, 35.78it/s]\u001b[A\n",
"Iteration: 71% 3976/5625 [02:36<00:45, 35.99it/s]\u001b[A\n",
"Iteration: 71% 3980/5625 [02:36<00:45, 36.53it/s]\u001b[A\n",
"Iteration: 71% 3984/5625 [02:36<00:45, 36.43it/s]\u001b[A\n",
"Iteration: 71% 3988/5625 [02:37<00:44, 36.43it/s]\u001b[A\n",
"Iteration: 71% 3992/5625 [02:37<00:44, 36.68it/s]\u001b[A\n",
"Iteration: 71% 3996/5625 [02:37<00:44, 36.37it/s]\u001b[A02/21/2020 12:36:13 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:36:15 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:36:15 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:36:15 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 123.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 126.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 39/625 [00:00<00:04, 123.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 52/625 [00:00<00:04, 125.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 10% 65/625 [00:00<00:04, 125.56it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 12% 78/625 [00:00<00:04, 123.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 91/625 [00:00<00:04, 125.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 104/625 [00:00<00:04, 126.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 117/625 [00:00<00:04, 126.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 131/625 [00:01<00:03, 128.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 144/625 [00:01<00:03, 128.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 157/625 [00:01<00:03, 128.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 170/625 [00:01<00:03, 127.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 183/625 [00:01<00:03, 126.55it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 31% 196/625 [00:01<00:03, 125.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 33% 209/625 [00:01<00:03, 124.14it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 222/625 [00:01<00:03, 124.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 235/625 [00:01<00:03, 123.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 249/625 [00:01<00:02, 125.72it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 262/625 [00:02<00:02, 126.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 275/625 [00:02<00:02, 126.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 288/625 [00:02<00:02, 127.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 302/625 [00:02<00:02, 129.10it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 50% 315/625 [00:02<00:02, 128.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 52% 328/625 [00:02<00:02, 128.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 341/625 [00:02<00:02, 126.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 355/625 [00:02<00:02, 127.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 368/625 [00:02<00:02, 128.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 382/625 [00:03<00:01, 129.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 63% 396/625 [00:03<00:01, 129.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 65% 409/625 [00:03<00:01, 128.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 422/625 [00:03<00:01, 123.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 435/625 [00:03<00:01, 124.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 448/625 [00:03<00:01, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 461/625 [00:03<00:01, 125.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 474/625 [00:03<00:01, 122.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 488/625 [00:03<00:01, 124.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 501/625 [00:03<00:00, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 514/625 [00:04<00:00, 126.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 527/625 [00:04<00:00, 126.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 540/625 [00:04<00:00, 127.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 553/625 [00:04<00:00, 127.51it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 566/625 [00:04<00:00, 127.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 579/625 [00:04<00:00, 127.99it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 592/625 [00:04<00:00, 127.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 605/625 [00:04<00:00, 125.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 619/625 [00:04<00:00, 127.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 126.72it/s]\u001b[A\u001b[A02/21/2020 12:36:20 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:36:20 - INFO - __main__ - perplexity = tensor(788.5505)\n",
"\n",
"Iteration: 71% 4000/5625 [02:43<14:00, 1.93it/s]\u001b[A\n",
"Iteration: 71% 4004/5625 [02:44<09:59, 2.70it/s]\u001b[A\n",
"Iteration: 71% 4008/5625 [02:44<07:12, 3.74it/s]\u001b[A\n",
"Iteration: 71% 4012/5625 [02:44<05:14, 5.13it/s]\u001b[A\n",
"Iteration: 71% 4016/5625 [02:44<03:52, 6.93it/s]\u001b[A\n",
"Iteration: 71% 4020/5625 [02:44<02:55, 9.15it/s]\u001b[A\n",
"Iteration: 72% 4024/5625 [02:44<02:15, 11.83it/s]\u001b[A\n",
"Iteration: 72% 4028/5625 [02:44<01:48, 14.72it/s]\u001b[A\n",
"Iteration: 72% 4032/5625 [02:44<01:30, 17.57it/s]\u001b[A\n",
"Iteration: 72% 4036/5625 [02:44<01:16, 20.66it/s]\u001b[A\n",
"Iteration: 72% 4040/5625 [02:45<01:07, 23.48it/s]\u001b[A\n",
"Iteration: 72% 4044/5625 [02:45<01:00, 26.09it/s]\u001b[A\n",
"Iteration: 72% 4048/5625 [02:45<00:54, 28.76it/s]\u001b[A\n",
"Iteration: 72% 4052/5625 [02:45<00:50, 30.85it/s]\u001b[A\n",
"Iteration: 72% 4056/5625 [02:45<00:48, 32.43it/s]\u001b[A\n",
"Iteration: 72% 4060/5625 [02:45<00:47, 33.00it/s]\u001b[A\n",
"Iteration: 72% 4064/5625 [02:45<00:46, 33.43it/s]\u001b[A\n",
"Iteration: 72% 4068/5625 [02:45<00:46, 33.50it/s]\u001b[A\n",
"Iteration: 72% 4072/5625 [02:45<00:44, 34.60it/s]\u001b[A\n",
"Iteration: 72% 4076/5625 [02:46<00:44, 35.11it/s]\u001b[A\n",
"Iteration: 73% 4080/5625 [02:46<00:42, 36.04it/s]\u001b[A\n",
"Iteration: 73% 4084/5625 [02:46<00:42, 36.22it/s]\u001b[A\n",
"Iteration: 73% 4088/5625 [02:46<00:42, 36.40it/s]\u001b[A\n",
"Iteration: 73% 4092/5625 [02:46<00:41, 36.89it/s]\u001b[A\n",
"Iteration: 73% 4096/5625 [02:46<00:40, 37.30it/s]\u001b[A\n",
"Iteration: 73% 4100/5625 [02:46<00:41, 37.13it/s]\u001b[A\n",
"Iteration: 73% 4104/5625 [02:46<00:42, 36.03it/s]\u001b[A\n",
"Iteration: 73% 4108/5625 [02:46<00:43, 35.08it/s]\u001b[A\n",
"Iteration: 73% 4112/5625 [02:47<00:42, 35.52it/s]\u001b[A\n",
"Iteration: 73% 4116/5625 [02:47<00:41, 36.12it/s]\u001b[A\n",
"Iteration: 73% 4120/5625 [02:47<00:41, 35.90it/s]\u001b[A\n",
"Iteration: 73% 4124/5625 [02:47<00:42, 35.68it/s]\u001b[A\n",
"Iteration: 73% 4128/5625 [02:47<00:41, 35.92it/s]\u001b[A\n",
"Iteration: 73% 4132/5625 [02:47<00:41, 36.34it/s]\u001b[A\n",
"Iteration: 74% 4136/5625 [02:47<00:40, 36.77it/s]\u001b[A\n",
"Iteration: 74% 4140/5625 [02:47<00:41, 36.10it/s]\u001b[A\n",
"Iteration: 74% 4144/5625 [02:47<00:41, 35.73it/s]\u001b[A\n",
"Iteration: 74% 4148/5625 [02:48<00:41, 35.50it/s]\u001b[A\n",
"Iteration: 74% 4152/5625 [02:48<00:40, 35.99it/s]\u001b[A\n",
"Iteration: 74% 4156/5625 [02:48<00:40, 36.07it/s]\u001b[A\n",
"Iteration: 74% 4160/5625 [02:48<00:40, 36.08it/s]\u001b[A\n",
"Iteration: 74% 4164/5625 [02:48<00:40, 36.38it/s]\u001b[A\n",
"Iteration: 74% 4168/5625 [02:48<00:40, 36.41it/s]\u001b[A\n",
"Iteration: 74% 4172/5625 [02:48<00:39, 36.68it/s]\u001b[A\n",
"Iteration: 74% 4176/5625 [02:48<00:39, 36.27it/s]\u001b[A\n",
"Iteration: 74% 4180/5625 [02:48<00:40, 36.06it/s]\u001b[A\n",
"Iteration: 74% 4184/5625 [02:49<00:40, 35.55it/s]\u001b[A\n",
"Iteration: 74% 4188/5625 [02:49<00:40, 35.59it/s]\u001b[A\n",
"Iteration: 75% 4192/5625 [02:49<00:40, 35.71it/s]\u001b[A\n",
"Iteration: 75% 4196/5625 [02:49<00:39, 35.81it/s]\u001b[A\n",
"Iteration: 75% 4200/5625 [02:49<00:39, 36.11it/s]\u001b[A\n",
"Iteration: 75% 4204/5625 [02:49<00:39, 35.96it/s]\u001b[A\n",
"Iteration: 75% 4208/5625 [02:49<00:39, 35.82it/s]\u001b[A\n",
"Iteration: 75% 4212/5625 [02:49<00:39, 36.03it/s]\u001b[A\n",
"Iteration: 75% 4216/5625 [02:49<00:39, 35.92it/s]\u001b[A\n",
"Iteration: 75% 4220/5625 [02:50<00:39, 35.53it/s]\u001b[A\n",
"Iteration: 75% 4224/5625 [02:50<00:39, 35.51it/s]\u001b[A\n",
"Iteration: 75% 4228/5625 [02:50<00:38, 35.98it/s]\u001b[A\n",
"Iteration: 75% 4232/5625 [02:50<00:38, 36.17it/s]\u001b[A\n",
"Iteration: 75% 4236/5625 [02:50<00:38, 35.67it/s]\u001b[A\n",
"Iteration: 75% 4240/5625 [02:50<00:38, 35.80it/s]\u001b[A\n",
"Iteration: 75% 4244/5625 [02:50<00:38, 35.89it/s]\u001b[A\n",
"Iteration: 76% 4248/5625 [02:50<00:39, 34.99it/s]\u001b[A\n",
"Iteration: 76% 4252/5625 [02:50<00:39, 34.83it/s]\u001b[A\n",
"Iteration: 76% 4256/5625 [02:51<00:39, 35.09it/s]\u001b[A\n",
"Iteration: 76% 4260/5625 [02:51<00:38, 35.53it/s]\u001b[A\n",
"Iteration: 76% 4264/5625 [02:51<00:38, 35.48it/s]\u001b[A\n",
"Iteration: 76% 4268/5625 [02:51<00:37, 35.86it/s]\u001b[A\n",
"Iteration: 76% 4272/5625 [02:51<00:37, 36.28it/s]\u001b[A\n",
"Iteration: 76% 4276/5625 [02:51<00:37, 36.43it/s]\u001b[A\n",
"Iteration: 76% 4280/5625 [02:51<00:36, 36.69it/s]\u001b[A\n",
"Iteration: 76% 4284/5625 [02:51<00:36, 37.20it/s]\u001b[A\n",
"Iteration: 76% 4288/5625 [02:51<00:36, 36.61it/s]\u001b[A\n",
"Iteration: 76% 4292/5625 [02:52<00:36, 36.70it/s]\u001b[A\n",
"Iteration: 76% 4296/5625 [02:52<00:36, 36.68it/s]\u001b[A\n",
"Iteration: 76% 4300/5625 [02:52<00:36, 36.27it/s]\u001b[A\n",
"Iteration: 77% 4304/5625 [02:52<00:36, 36.34it/s]\u001b[A\n",
"Iteration: 77% 4308/5625 [02:52<00:36, 36.12it/s]\u001b[A\n",
"Iteration: 77% 4312/5625 [02:52<00:36, 36.10it/s]\u001b[A\n",
"Iteration: 77% 4316/5625 [02:52<00:35, 36.44it/s]\u001b[A\n",
"Iteration: 77% 4320/5625 [02:52<00:35, 36.81it/s]\u001b[A\n",
"Iteration: 77% 4324/5625 [02:52<00:35, 36.17it/s]\u001b[A\n",
"Iteration: 77% 4328/5625 [02:53<00:36, 35.83it/s]\u001b[A\n",
"Iteration: 77% 4332/5625 [02:53<00:35, 35.99it/s]\u001b[A\n",
"Iteration: 77% 4336/5625 [02:53<00:36, 35.51it/s]\u001b[A\n",
"Iteration: 77% 4340/5625 [02:53<00:35, 35.72it/s]\u001b[A\n",
"Iteration: 77% 4344/5625 [02:53<00:35, 36.32it/s]\u001b[A\n",
"Iteration: 77% 4348/5625 [02:53<00:35, 36.41it/s]\u001b[A\n",
"Iteration: 77% 4352/5625 [02:53<00:34, 36.53it/s]\u001b[A\n",
"Iteration: 77% 4356/5625 [02:53<00:34, 36.69it/s]\u001b[A\n",
"Iteration: 78% 4360/5625 [02:53<00:34, 36.23it/s]\u001b[A\n",
"Iteration: 78% 4364/5625 [02:53<00:34, 36.86it/s]\u001b[A\n",
"Iteration: 78% 4368/5625 [02:54<00:34, 36.73it/s]\u001b[A\n",
"Iteration: 78% 4372/5625 [02:54<00:34, 36.43it/s]\u001b[A\n",
"Iteration: 78% 4376/5625 [02:54<00:34, 36.50it/s]\u001b[A\n",
"Iteration: 78% 4380/5625 [02:54<00:33, 36.73it/s]\u001b[A\n",
"Iteration: 78% 4384/5625 [02:54<00:33, 37.06it/s]\u001b[A\n",
"Iteration: 78% 4388/5625 [02:54<00:33, 36.49it/s]\u001b[A\n",
"Iteration: 78% 4392/5625 [02:54<00:33, 36.28it/s]\u001b[A\n",
"Iteration: 78% 4396/5625 [02:54<00:33, 36.32it/s]\u001b[A\n",
"Iteration: 78% 4400/5625 [02:54<00:34, 35.24it/s]\u001b[A\n",
"Iteration: 78% 4404/5625 [02:55<00:34, 35.53it/s]\u001b[A\n",
"Iteration: 78% 4408/5625 [02:55<00:33, 36.39it/s]\u001b[A\n",
"Iteration: 78% 4412/5625 [02:55<00:33, 36.39it/s]\u001b[A\n",
"Iteration: 79% 4416/5625 [02:55<00:33, 36.37it/s]\u001b[A\n",
"Iteration: 79% 4420/5625 [02:55<00:33, 36.27it/s]\u001b[A\n",
"Iteration: 79% 4424/5625 [02:55<00:32, 36.47it/s]\u001b[A\n",
"Iteration: 79% 4428/5625 [02:55<00:32, 36.51it/s]\u001b[A\n",
"Iteration: 79% 4432/5625 [02:55<00:32, 36.72it/s]\u001b[A\n",
"Iteration: 79% 4436/5625 [02:55<00:32, 36.32it/s]\u001b[A\n",
"Iteration: 79% 4440/5625 [02:56<00:32, 36.10it/s]\u001b[A\n",
"Iteration: 79% 4444/5625 [02:56<00:32, 36.46it/s]\u001b[A\n",
"Iteration: 79% 4448/5625 [02:56<00:32, 36.38it/s]\u001b[A\n",
"Iteration: 79% 4452/5625 [02:56<00:32, 36.02it/s]\u001b[A\n",
"Iteration: 79% 4456/5625 [02:56<00:32, 35.85it/s]\u001b[A\n",
"Iteration: 79% 4460/5625 [02:56<00:32, 35.41it/s]\u001b[A\n",
"Iteration: 79% 4464/5625 [02:56<00:32, 35.97it/s]\u001b[A\n",
"Iteration: 79% 4468/5625 [02:56<00:31, 36.19it/s]\u001b[A\n",
"Iteration: 80% 4472/5625 [02:56<00:32, 35.84it/s]\u001b[A\n",
"Iteration: 80% 4476/5625 [02:57<00:31, 36.17it/s]\u001b[A\n",
"Iteration: 80% 4480/5625 [02:57<00:31, 36.58it/s]\u001b[A\n",
"Iteration: 80% 4484/5625 [02:57<00:31, 36.57it/s]\u001b[A\n",
"Iteration: 80% 4488/5625 [02:57<00:31, 35.86it/s]\u001b[A\n",
"Iteration: 80% 4492/5625 [02:57<00:31, 36.23it/s]\u001b[A\n",
"Iteration: 80% 4496/5625 [02:57<00:31, 36.31it/s]\u001b[A02/21/2020 12:36:34 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:36:35 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:36:35 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:36:35 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 128.13it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 27/625 [00:00<00:04, 129.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 40/625 [00:00<00:04, 128.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 9% 54/625 [00:00<00:04, 129.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 67/625 [00:00<00:04, 128.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 80/625 [00:00<00:04, 127.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 127.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 127.90it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:04, 125.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 128.27it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 127.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 124.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 122.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 186/625 [00:01<00:03, 124.98it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 199/625 [00:01<00:03, 123.89it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 213/625 [00:01<00:03, 125.75it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 226/625 [00:01<00:03, 126.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 239/625 [00:01<00:03, 122.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 252/625 [00:01<00:02, 124.62it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 265/625 [00:02<00:02, 125.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 278/625 [00:02<00:02, 124.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 47% 291/625 [00:02<00:02, 125.38it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 49% 305/625 [00:02<00:02, 127.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 319/625 [00:02<00:02, 128.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 332/625 [00:02<00:02, 127.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 346/625 [00:02<00:02, 128.24it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 58% 360/625 [00:02<00:02, 129.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 60% 374/625 [00:02<00:01, 130.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 388/625 [00:03<00:01, 131.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 402/625 [00:03<00:01, 131.83it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 67% 416/625 [00:03<00:01, 130.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 69% 430/625 [00:03<00:01, 130.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 71% 444/625 [00:03<00:01, 130.18it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 73% 458/625 [00:03<00:01, 128.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 471/625 [00:03<00:01, 127.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 485/625 [00:03<00:01, 128.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 498/625 [00:03<00:00, 127.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 511/625 [00:04<00:00, 127.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 524/625 [00:04<00:00, 127.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 86% 538/625 [00:04<00:00, 128.58it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 88% 551/625 [00:04<00:00, 127.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 90% 564/625 [00:04<00:00, 127.34it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 92% 577/625 [00:04<00:00, 127.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 94% 590/625 [00:04<00:00, 125.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 603/625 [00:04<00:00, 126.21it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 99% 617/625 [00:04<00:00, 128.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.62it/s]\u001b[A\u001b[A02/21/2020 12:36:40 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:36:40 - INFO - __main__ - perplexity = tensor(793.4565)\n",
"\n",
"Iteration: 80% 4500/5625 [03:04<09:51, 1.90it/s]\u001b[A\n",
"Iteration: 80% 4504/5625 [03:04<07:01, 2.66it/s]\u001b[A\n",
"Iteration: 80% 4508/5625 [03:04<05:03, 3.67it/s]\u001b[A\n",
"Iteration: 80% 4512/5625 [03:04<03:40, 5.04it/s]\u001b[A\n",
"Iteration: 80% 4516/5625 [03:04<02:43, 6.79it/s]\u001b[A\n",
"Iteration: 80% 4520/5625 [03:04<02:02, 8.99it/s]\u001b[A\n",
"Iteration: 80% 4524/5625 [03:05<01:34, 11.61it/s]\u001b[A\n",
"Iteration: 80% 4528/5625 [03:05<01:15, 14.54it/s]\u001b[A\n",
"Iteration: 81% 4532/5625 [03:05<01:01, 17.75it/s]\u001b[A\n",
"Iteration: 81% 4536/5625 [03:05<00:51, 21.03it/s]\u001b[A\n",
"Iteration: 81% 4540/5625 [03:05<00:44, 24.19it/s]\u001b[A\n",
"Iteration: 81% 4544/5625 [03:05<00:40, 27.02it/s]\u001b[A\n",
"Iteration: 81% 4548/5625 [03:05<00:36, 29.19it/s]\u001b[A\n",
"Iteration: 81% 4552/5625 [03:05<00:34, 31.05it/s]\u001b[A\n",
"Iteration: 81% 4556/5625 [03:05<00:32, 32.82it/s]\u001b[A\n",
"Iteration: 81% 4560/5625 [03:06<00:31, 33.61it/s]\u001b[A\n",
"Iteration: 81% 4564/5625 [03:06<00:31, 34.01it/s]\u001b[A\n",
"Iteration: 81% 4568/5625 [03:06<00:30, 35.21it/s]\u001b[A\n",
"Iteration: 81% 4572/5625 [03:06<00:29, 35.67it/s]\u001b[A\n",
"Iteration: 81% 4576/5625 [03:06<00:28, 36.19it/s]\u001b[A\n",
"Iteration: 81% 4580/5625 [03:06<00:28, 36.74it/s]\u001b[A\n",
"Iteration: 81% 4584/5625 [03:06<00:28, 36.59it/s]\u001b[A\n",
"Iteration: 82% 4588/5625 [03:06<00:28, 36.60it/s]\u001b[A\n",
"Iteration: 82% 4592/5625 [03:06<00:28, 36.42it/s]\u001b[A\n",
"Iteration: 82% 4596/5625 [03:07<00:28, 36.14it/s]\u001b[A\n",
"Iteration: 82% 4600/5625 [03:07<00:28, 35.98it/s]\u001b[A\n",
"Iteration: 82% 4604/5625 [03:07<00:28, 36.44it/s]\u001b[A\n",
"Iteration: 82% 4608/5625 [03:07<00:28, 35.88it/s]\u001b[A\n",
"Iteration: 82% 4612/5625 [03:07<00:28, 35.27it/s]\u001b[A\n",
"Iteration: 82% 4616/5625 [03:07<00:28, 35.65it/s]\u001b[A\n",
"Iteration: 82% 4620/5625 [03:07<00:27, 36.04it/s]\u001b[A\n",
"Iteration: 82% 4624/5625 [03:07<00:27, 36.39it/s]\u001b[A\n",
"Iteration: 82% 4628/5625 [03:07<00:27, 36.15it/s]\u001b[A\n",
"Iteration: 82% 4632/5625 [03:08<00:27, 35.73it/s]\u001b[A\n",
"Iteration: 82% 4636/5625 [03:08<00:27, 35.82it/s]\u001b[A\n",
"Iteration: 82% 4640/5625 [03:08<00:27, 35.82it/s]\u001b[A\n",
"Iteration: 83% 4644/5625 [03:08<00:27, 35.59it/s]\u001b[A\n",
"Iteration: 83% 4648/5625 [03:08<00:27, 36.18it/s]\u001b[A\n",
"Iteration: 83% 4652/5625 [03:08<00:26, 36.45it/s]\u001b[A\n",
"Iteration: 83% 4656/5625 [03:08<00:26, 36.58it/s]\u001b[A\n",
"Iteration: 83% 4660/5625 [03:08<00:26, 36.77it/s]\u001b[A\n",
"Iteration: 83% 4664/5625 [03:08<00:26, 36.57it/s]\u001b[A\n",
"Iteration: 83% 4668/5625 [03:09<00:26, 35.68it/s]\u001b[A\n",
"Iteration: 83% 4672/5625 [03:09<00:26, 36.02it/s]\u001b[A\n",
"Iteration: 83% 4676/5625 [03:09<00:26, 35.65it/s]\u001b[A\n",
"Iteration: 83% 4680/5625 [03:09<00:25, 36.38it/s]\u001b[A\n",
"Iteration: 83% 4684/5625 [03:09<00:25, 36.27it/s]\u001b[A\n",
"Iteration: 83% 4688/5625 [03:09<00:25, 36.57it/s]\u001b[A\n",
"Iteration: 83% 4692/5625 [03:09<00:25, 36.79it/s]\u001b[A\n",
"Iteration: 83% 4696/5625 [03:09<00:25, 37.06it/s]\u001b[A\n",
"Iteration: 84% 4700/5625 [03:09<00:24, 37.01it/s]\u001b[A\n",
"Iteration: 84% 4704/5625 [03:10<00:25, 36.80it/s]\u001b[A\n",
"Iteration: 84% 4708/5625 [03:10<00:25, 36.32it/s]\u001b[A\n",
"Iteration: 84% 4712/5625 [03:10<00:25, 35.78it/s]\u001b[A\n",
"Iteration: 84% 4716/5625 [03:10<00:25, 36.35it/s]\u001b[A\n",
"Iteration: 84% 4720/5625 [03:10<00:25, 36.02it/s]\u001b[A\n",
"Iteration: 84% 4724/5625 [03:10<00:24, 36.41it/s]\u001b[A\n",
"Iteration: 84% 4728/5625 [03:10<00:24, 36.03it/s]\u001b[A\n",
"Iteration: 84% 4732/5625 [03:10<00:24, 36.46it/s]\u001b[A\n",
"Iteration: 84% 4736/5625 [03:10<00:23, 37.10it/s]\u001b[A\n",
"Iteration: 84% 4740/5625 [03:10<00:24, 36.53it/s]\u001b[A\n",
"Iteration: 84% 4744/5625 [03:11<00:24, 36.52it/s]\u001b[A\n",
"Iteration: 84% 4748/5625 [03:11<00:24, 36.41it/s]\u001b[A\n",
"Iteration: 84% 4752/5625 [03:11<00:24, 35.94it/s]\u001b[A\n",
"Iteration: 85% 4756/5625 [03:11<00:23, 36.70it/s]\u001b[A\n",
"Iteration: 85% 4760/5625 [03:11<00:23, 36.73it/s]\u001b[A\n",
"Iteration: 85% 4764/5625 [03:11<00:23, 36.45it/s]\u001b[A\n",
"Iteration: 85% 4768/5625 [03:11<00:23, 36.82it/s]\u001b[A\n",
"Iteration: 85% 4772/5625 [03:11<00:22, 37.20it/s]\u001b[A\n",
"Iteration: 85% 4776/5625 [03:11<00:23, 36.37it/s]\u001b[A\n",
"Iteration: 85% 4780/5625 [03:12<00:23, 35.62it/s]\u001b[A\n",
"Iteration: 85% 4784/5625 [03:12<00:23, 35.19it/s]\u001b[A\n",
"Iteration: 85% 4788/5625 [03:12<00:24, 33.91it/s]\u001b[A\n",
"Iteration: 85% 4792/5625 [03:12<00:24, 34.56it/s]\u001b[A\n",
"Iteration: 85% 4796/5625 [03:12<00:23, 35.60it/s]\u001b[A\n",
"Iteration: 85% 4800/5625 [03:12<00:23, 35.14it/s]\u001b[A\n",
"Iteration: 85% 4804/5625 [03:12<00:23, 35.46it/s]\u001b[A\n",
"Iteration: 85% 4808/5625 [03:12<00:22, 36.29it/s]\u001b[A\n",
"Iteration: 86% 4812/5625 [03:13<00:22, 36.15it/s]\u001b[A\n",
"Iteration: 86% 4816/5625 [03:13<00:22, 36.48it/s]\u001b[A\n",
"Iteration: 86% 4820/5625 [03:13<00:21, 36.80it/s]\u001b[A\n",
"Iteration: 86% 4824/5625 [03:13<00:22, 36.06it/s]\u001b[A\n",
"Iteration: 86% 4828/5625 [03:13<00:21, 36.23it/s]\u001b[A\n",
"Iteration: 86% 4832/5625 [03:13<00:21, 36.60it/s]\u001b[A\n",
"Iteration: 86% 4836/5625 [03:13<00:21, 36.94it/s]\u001b[A\n",
"Iteration: 86% 4840/5625 [03:13<00:21, 37.14it/s]\u001b[A\n",
"Iteration: 86% 4844/5625 [03:13<00:21, 36.83it/s]\u001b[A\n",
"Iteration: 86% 4848/5625 [03:13<00:21, 36.92it/s]\u001b[A\n",
"Iteration: 86% 4852/5625 [03:14<00:21, 36.74it/s]\u001b[A\n",
"Iteration: 86% 4856/5625 [03:14<00:20, 36.79it/s]\u001b[A\n",
"Iteration: 86% 4860/5625 [03:14<00:21, 36.40it/s]\u001b[A\n",
"Iteration: 86% 4864/5625 [03:14<00:20, 36.57it/s]\u001b[A\n",
"Iteration: 87% 4868/5625 [03:14<00:20, 36.82it/s]\u001b[A\n",
"Iteration: 87% 4872/5625 [03:14<00:20, 36.05it/s]\u001b[A\n",
"Iteration: 87% 4876/5625 [03:14<00:20, 36.57it/s]\u001b[A\n",
"Iteration: 87% 4880/5625 [03:14<00:20, 36.84it/s]\u001b[A\n",
"Iteration: 87% 4884/5625 [03:14<00:20, 36.39it/s]\u001b[A\n",
"Iteration: 87% 4888/5625 [03:15<00:20, 36.41it/s]\u001b[A\n",
"Iteration: 87% 4892/5625 [03:15<00:20, 36.45it/s]\u001b[A\n",
"Iteration: 87% 4896/5625 [03:15<00:19, 36.52it/s]\u001b[A\n",
"Iteration: 87% 4900/5625 [03:15<00:20, 35.85it/s]\u001b[A\n",
"Iteration: 87% 4904/5625 [03:15<00:20, 35.92it/s]\u001b[A\n",
"Iteration: 87% 4908/5625 [03:15<00:20, 35.77it/s]\u001b[A\n",
"Iteration: 87% 4912/5625 [03:15<00:19, 36.02it/s]\u001b[A\n",
"Iteration: 87% 4916/5625 [03:15<00:19, 35.57it/s]\u001b[A\n",
"Iteration: 87% 4920/5625 [03:15<00:19, 35.78it/s]\u001b[A\n",
"Iteration: 88% 4924/5625 [03:16<00:19, 36.03it/s]\u001b[A\n",
"Iteration: 88% 4928/5625 [03:16<00:19, 36.31it/s]\u001b[A\n",
"Iteration: 88% 4932/5625 [03:16<00:18, 36.64it/s]\u001b[A\n",
"Iteration: 88% 4936/5625 [03:16<00:19, 35.90it/s]\u001b[A\n",
"Iteration: 88% 4940/5625 [03:16<00:18, 36.44it/s]\u001b[A\n",
"Iteration: 88% 4944/5625 [03:16<00:18, 36.55it/s]\u001b[A\n",
"Iteration: 88% 4948/5625 [03:16<00:18, 36.92it/s]\u001b[A\n",
"Iteration: 88% 4952/5625 [03:16<00:18, 36.12it/s]\u001b[A\n",
"Iteration: 88% 4956/5625 [03:16<00:18, 36.57it/s]\u001b[A\n",
"Iteration: 88% 4960/5625 [03:17<00:18, 36.81it/s]\u001b[A\n",
"Iteration: 88% 4964/5625 [03:17<00:17, 37.24it/s]\u001b[A\n",
"Iteration: 88% 4968/5625 [03:17<00:17, 37.28it/s]\u001b[A\n",
"Iteration: 88% 4972/5625 [03:17<00:18, 35.80it/s]\u001b[A\n",
"Iteration: 88% 4976/5625 [03:17<00:18, 34.63it/s]\u001b[A\n",
"Iteration: 89% 4980/5625 [03:17<00:19, 33.08it/s]\u001b[A\n",
"Iteration: 89% 4984/5625 [03:17<00:19, 32.50it/s]\u001b[A\n",
"Iteration: 89% 4988/5625 [03:17<00:19, 32.08it/s]\u001b[A\n",
"Iteration: 89% 4992/5625 [03:18<00:19, 32.03it/s]\u001b[A\n",
"Iteration: 89% 4996/5625 [03:18<00:19, 31.95it/s]\u001b[A02/21/2020 12:36:54 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:36:56 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:36:56 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:36:56 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 14/625 [00:00<00:04, 129.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 28/625 [00:00<00:04, 130.52it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 7% 41/625 [00:00<00:04, 129.88it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 53/625 [00:00<00:04, 123.44it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 123.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 79/625 [00:00<00:04, 124.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 92/625 [00:00<00:04, 125.59it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 105/625 [00:00<00:04, 125.50it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 118/625 [00:00<00:04, 124.82it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 132/625 [00:01<00:03, 127.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 145/625 [00:01<00:03, 127.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 158/625 [00:01<00:03, 127.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 27% 171/625 [00:01<00:03, 127.28it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 29% 184/625 [00:01<00:03, 126.26it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 197/625 [00:01<00:03, 127.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 210/625 [00:01<00:03, 127.49it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 223/625 [00:01<00:03, 126.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 236/625 [00:01<00:03, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 249/625 [00:01<00:02, 127.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 262/625 [00:02<00:02, 127.04it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 275/625 [00:02<00:02, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 289/625 [00:02<00:02, 127.99it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 303/625 [00:02<00:02, 128.94it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 316/625 [00:02<00:02, 127.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 330/625 [00:02<00:02, 128.45it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 344/625 [00:02<00:02, 129.39it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 357/625 [00:02<00:02, 124.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 370/625 [00:02<00:02, 124.22it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 61% 383/625 [00:03<00:01, 125.69it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 397/625 [00:03<00:01, 127.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 411/625 [00:03<00:01, 128.73it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 424/625 [00:03<00:01, 127.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 437/625 [00:03<00:01, 127.66it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 450/625 [00:03<00:01, 126.93it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 74% 463/625 [00:03<00:01, 126.32it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 76% 476/625 [00:03<00:01, 126.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 78% 489/625 [00:03<00:01, 126.76it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 80% 502/625 [00:03<00:00, 126.41it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 82% 515/625 [00:04<00:00, 127.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 84% 528/625 [00:04<00:00, 127.05it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 542/625 [00:04<00:00, 128.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 556/625 [00:04<00:00, 129.20it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 569/625 [00:04<00:00, 127.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 582/625 [00:04<00:00, 127.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 95% 595/625 [00:04<00:00, 128.29it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 97% 608/625 [00:04<00:00, 128.48it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 622/625 [00:04<00:00, 129.08it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 127.23it/s]\u001b[A\u001b[A02/21/2020 12:37:01 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:37:01 - INFO - __main__ - perplexity = tensor(797.2593)\n",
"\n",
"Iteration: 89% 5000/5625 [03:24<05:31, 1.88it/s]\u001b[A\n",
"Iteration: 89% 5004/5625 [03:25<03:55, 2.63it/s]\u001b[A\n",
"Iteration: 89% 5008/5625 [03:25<02:49, 3.65it/s]\u001b[A\n",
"Iteration: 89% 5012/5625 [03:25<02:02, 5.01it/s]\u001b[A\n",
"Iteration: 89% 5016/5625 [03:25<01:30, 6.76it/s]\u001b[A\n",
"Iteration: 89% 5020/5625 [03:25<01:07, 8.92it/s]\u001b[A\n",
"Iteration: 89% 5024/5625 [03:25<00:52, 11.52it/s]\u001b[A\n",
"Iteration: 89% 5028/5625 [03:25<00:41, 14.54it/s]\u001b[A\n",
"Iteration: 89% 5032/5625 [03:25<00:33, 17.82it/s]\u001b[A\n",
"Iteration: 90% 5036/5625 [03:25<00:28, 20.61it/s]\u001b[A\n",
"Iteration: 90% 5040/5625 [03:26<00:24, 23.76it/s]\u001b[A\n",
"Iteration: 90% 5044/5625 [03:26<00:21, 26.53it/s]\u001b[A\n",
"Iteration: 90% 5048/5625 [03:26<00:19, 28.93it/s]\u001b[A\n",
"Iteration: 90% 5052/5625 [03:26<00:18, 31.12it/s]\u001b[A\n",
"Iteration: 90% 5056/5625 [03:26<00:17, 32.71it/s]\u001b[A\n",
"Iteration: 90% 5060/5625 [03:26<00:17, 32.69it/s]\u001b[A\n",
"Iteration: 90% 5064/5625 [03:26<00:16, 33.86it/s]\u001b[A\n",
"Iteration: 90% 5068/5625 [03:26<00:15, 35.05it/s]\u001b[A\n",
"Iteration: 90% 5072/5625 [03:26<00:15, 35.59it/s]\u001b[A\n",
"Iteration: 90% 5076/5625 [03:27<00:15, 35.72it/s]\u001b[A\n",
"Iteration: 90% 5080/5625 [03:27<00:15, 35.63it/s]\u001b[A\n",
"Iteration: 90% 5084/5625 [03:27<00:15, 35.42it/s]\u001b[A\n",
"Iteration: 90% 5088/5625 [03:27<00:15, 35.67it/s]\u001b[A\n",
"Iteration: 91% 5092/5625 [03:27<00:14, 36.43it/s]\u001b[A\n",
"Iteration: 91% 5096/5625 [03:27<00:14, 36.10it/s]\u001b[A\n",
"Iteration: 91% 5100/5625 [03:27<00:14, 36.21it/s]\u001b[A\n",
"Iteration: 91% 5104/5625 [03:27<00:14, 36.51it/s]\u001b[A\n",
"Iteration: 91% 5108/5625 [03:27<00:13, 36.96it/s]\u001b[A\n",
"Iteration: 91% 5112/5625 [03:28<00:13, 36.68it/s]\u001b[A\n",
"Iteration: 91% 5116/5625 [03:28<00:13, 36.47it/s]\u001b[A\n",
"Iteration: 91% 5120/5625 [03:28<00:13, 36.43it/s]\u001b[A\n",
"Iteration: 91% 5124/5625 [03:28<00:13, 36.36it/s]\u001b[A\n",
"Iteration: 91% 5128/5625 [03:28<00:13, 36.46it/s]\u001b[A\n",
"Iteration: 91% 5132/5625 [03:28<00:13, 36.41it/s]\u001b[A\n",
"Iteration: 91% 5136/5625 [03:28<00:13, 36.42it/s]\u001b[A\n",
"Iteration: 91% 5140/5625 [03:28<00:13, 36.41it/s]\u001b[A\n",
"Iteration: 91% 5144/5625 [03:28<00:13, 36.51it/s]\u001b[A\n",
"Iteration: 92% 5148/5625 [03:29<00:13, 35.98it/s]\u001b[A\n",
"Iteration: 92% 5152/5625 [03:29<00:13, 34.39it/s]\u001b[A\n",
"Iteration: 92% 5156/5625 [03:29<00:13, 33.59it/s]\u001b[A\n",
"Iteration: 92% 5160/5625 [03:29<00:14, 32.92it/s]\u001b[A\n",
"Iteration: 92% 5164/5625 [03:29<00:14, 32.67it/s]\u001b[A\n",
"Iteration: 92% 5168/5625 [03:29<00:14, 32.08it/s]\u001b[A\n",
"Iteration: 92% 5172/5625 [03:29<00:14, 32.23it/s]\u001b[A\n",
"Iteration: 92% 5176/5625 [03:29<00:13, 32.81it/s]\u001b[A\n",
"Iteration: 92% 5180/5625 [03:30<00:13, 33.89it/s]\u001b[A\n",
"Iteration: 92% 5184/5625 [03:30<00:12, 34.62it/s]\u001b[A\n",
"Iteration: 92% 5188/5625 [03:30<00:12, 34.74it/s]\u001b[A\n",
"Iteration: 92% 5192/5625 [03:30<00:12, 34.50it/s]\u001b[A\n",
"Iteration: 92% 5196/5625 [03:30<00:12, 34.84it/s]\u001b[A\n",
"Iteration: 92% 5200/5625 [03:30<00:11, 35.55it/s]\u001b[A\n",
"Iteration: 93% 5204/5625 [03:30<00:11, 35.43it/s]\u001b[A\n",
"Iteration: 93% 5208/5625 [03:30<00:11, 36.15it/s]\u001b[A\n",
"Iteration: 93% 5212/5625 [03:30<00:11, 34.71it/s]\u001b[A\n",
"Iteration: 93% 5216/5625 [03:31<00:11, 35.32it/s]\u001b[A\n",
"Iteration: 93% 5220/5625 [03:31<00:11, 35.82it/s]\u001b[A\n",
"Iteration: 93% 5224/5625 [03:31<00:11, 35.74it/s]\u001b[A\n",
"Iteration: 93% 5228/5625 [03:31<00:11, 36.05it/s]\u001b[A\n",
"Iteration: 93% 5232/5625 [03:31<00:10, 36.40it/s]\u001b[A\n",
"Iteration: 93% 5236/5625 [03:31<00:10, 36.21it/s]\u001b[A\n",
"Iteration: 93% 5240/5625 [03:31<00:10, 35.99it/s]\u001b[A\n",
"Iteration: 93% 5244/5625 [03:31<00:10, 36.35it/s]\u001b[A\n",
"Iteration: 93% 5248/5625 [03:31<00:10, 36.27it/s]\u001b[A\n",
"Iteration: 93% 5252/5625 [03:32<00:10, 36.56it/s]\u001b[A\n",
"Iteration: 93% 5256/5625 [03:32<00:10, 36.68it/s]\u001b[A\n",
"Iteration: 94% 5260/5625 [03:32<00:09, 36.65it/s]\u001b[A\n",
"Iteration: 94% 5264/5625 [03:32<00:09, 36.56it/s]\u001b[A\n",
"Iteration: 94% 5268/5625 [03:32<00:09, 36.46it/s]\u001b[A\n",
"Iteration: 94% 5272/5625 [03:32<00:09, 36.88it/s]\u001b[A\n",
"Iteration: 94% 5276/5625 [03:32<00:09, 36.65it/s]\u001b[A\n",
"Iteration: 94% 5280/5625 [03:32<00:09, 34.88it/s]\u001b[A\n",
"Iteration: 94% 5284/5625 [03:32<00:10, 33.47it/s]\u001b[A\n",
"Iteration: 94% 5288/5625 [03:33<00:10, 32.82it/s]\u001b[A\n",
"Iteration: 94% 5292/5625 [03:33<00:10, 32.26it/s]\u001b[A\n",
"Iteration: 94% 5296/5625 [03:33<00:10, 32.06it/s]\u001b[A\n",
"Iteration: 94% 5300/5625 [03:33<00:10, 31.88it/s]\u001b[A\n",
"Iteration: 94% 5304/5625 [03:33<00:10, 31.53it/s]\u001b[A\n",
"Iteration: 94% 5308/5625 [03:33<00:09, 32.70it/s]\u001b[A\n",
"Iteration: 94% 5312/5625 [03:33<00:09, 33.17it/s]\u001b[A\n",
"Iteration: 95% 5316/5625 [03:33<00:09, 33.66it/s]\u001b[A\n",
"Iteration: 95% 5320/5625 [03:34<00:08, 34.93it/s]\u001b[A\n",
"Iteration: 95% 5324/5625 [03:34<00:08, 35.23it/s]\u001b[A\n",
"Iteration: 95% 5328/5625 [03:34<00:08, 35.85it/s]\u001b[A\n",
"Iteration: 95% 5332/5625 [03:34<00:08, 36.09it/s]\u001b[A\n",
"Iteration: 95% 5336/5625 [03:34<00:07, 36.52it/s]\u001b[A\n",
"Iteration: 95% 5340/5625 [03:34<00:07, 36.54it/s]\u001b[A\n",
"Iteration: 95% 5344/5625 [03:34<00:07, 36.50it/s]\u001b[A\n",
"Iteration: 95% 5348/5625 [03:34<00:07, 35.90it/s]\u001b[A\n",
"Iteration: 95% 5352/5625 [03:34<00:07, 36.48it/s]\u001b[A\n",
"Iteration: 95% 5356/5625 [03:34<00:07, 36.75it/s]\u001b[A\n",
"Iteration: 95% 5360/5625 [03:35<00:07, 37.14it/s]\u001b[A\n",
"Iteration: 95% 5364/5625 [03:35<00:06, 37.32it/s]\u001b[A\n",
"Iteration: 95% 5368/5625 [03:35<00:06, 37.55it/s]\u001b[A\n",
"Iteration: 96% 5372/5625 [03:35<00:06, 37.56it/s]\u001b[A\n",
"Iteration: 96% 5376/5625 [03:35<00:06, 36.80it/s]\u001b[A\n",
"Iteration: 96% 5380/5625 [03:35<00:06, 36.79it/s]\u001b[A\n",
"Iteration: 96% 5384/5625 [03:35<00:06, 36.29it/s]\u001b[A\n",
"Iteration: 96% 5388/5625 [03:35<00:06, 36.57it/s]\u001b[A\n",
"Iteration: 96% 5392/5625 [03:35<00:06, 36.62it/s]\u001b[A\n",
"Iteration: 96% 5396/5625 [03:36<00:06, 36.85it/s]\u001b[A\n",
"Iteration: 96% 5400/5625 [03:36<00:06, 36.78it/s]\u001b[A\n",
"Iteration: 96% 5404/5625 [03:36<00:06, 36.60it/s]\u001b[A\n",
"Iteration: 96% 5408/5625 [03:36<00:05, 37.14it/s]\u001b[A\n",
"Iteration: 96% 5412/5625 [03:36<00:05, 37.20it/s]\u001b[A\n",
"Iteration: 96% 5416/5625 [03:36<00:05, 37.06it/s]\u001b[A\n",
"Iteration: 96% 5420/5625 [03:36<00:05, 36.73it/s]\u001b[A\n",
"Iteration: 96% 5424/5625 [03:36<00:05, 36.92it/s]\u001b[A\n",
"Iteration: 96% 5428/5625 [03:36<00:05, 36.60it/s]\u001b[A\n",
"Iteration: 97% 5432/5625 [03:37<00:05, 36.80it/s]\u001b[A\n",
"Iteration: 97% 5436/5625 [03:37<00:05, 37.01it/s]\u001b[A\n",
"Iteration: 97% 5440/5625 [03:37<00:05, 36.82it/s]\u001b[A\n",
"Iteration: 97% 5444/5625 [03:37<00:04, 36.73it/s]\u001b[A\n",
"Iteration: 97% 5448/5625 [03:37<00:04, 35.65it/s]\u001b[A\n",
"Iteration: 97% 5452/5625 [03:37<00:04, 35.56it/s]\u001b[A\n",
"Iteration: 97% 5456/5625 [03:37<00:04, 35.80it/s]\u001b[A\n",
"Iteration: 97% 5460/5625 [03:37<00:04, 35.45it/s]\u001b[A\n",
"Iteration: 97% 5464/5625 [03:37<00:04, 36.23it/s]\u001b[A\n",
"Iteration: 97% 5468/5625 [03:38<00:04, 36.80it/s]\u001b[A\n",
"Iteration: 97% 5472/5625 [03:38<00:04, 36.44it/s]\u001b[A\n",
"Iteration: 97% 5476/5625 [03:38<00:04, 36.77it/s]\u001b[A\n",
"Iteration: 97% 5480/5625 [03:38<00:03, 36.95it/s]\u001b[A\n",
"Iteration: 97% 5484/5625 [03:38<00:03, 36.95it/s]\u001b[A\n",
"Iteration: 98% 5488/5625 [03:38<00:03, 37.25it/s]\u001b[A\n",
"Iteration: 98% 5492/5625 [03:38<00:03, 37.11it/s]\u001b[A\n",
"Iteration: 98% 5496/5625 [03:38<00:03, 36.16it/s]\u001b[A02/21/2020 12:37:15 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:37:16 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:37:16 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:37:16 - INFO - __main__ - Batch size = 32\n",
"\n",
"\n",
"Evaluating: 0% 0/625 [00:00<?, ?it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 2% 13/625 [00:00<00:04, 126.30it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 4% 26/625 [00:00<00:04, 125.03it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 6% 38/625 [00:00<00:04, 122.00it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 8% 52/625 [00:00<00:04, 125.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 11% 66/625 [00:00<00:04, 127.07it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 13% 80/625 [00:00<00:04, 128.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 15% 93/625 [00:00<00:04, 128.87it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 17% 106/625 [00:00<00:04, 129.15it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 19% 119/625 [00:00<00:03, 129.06it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 21% 133/625 [00:01<00:03, 130.79it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 23% 146/625 [00:01<00:03, 127.67it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 25% 159/625 [00:01<00:03, 124.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 28% 172/625 [00:01<00:03, 124.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 30% 185/625 [00:01<00:03, 124.92it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 32% 198/625 [00:01<00:03, 124.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 34% 211/625 [00:01<00:03, 120.43it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 36% 224/625 [00:01<00:03, 121.47it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 38% 237/625 [00:01<00:03, 123.53it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 40% 250/625 [00:01<00:03, 124.60it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 42% 263/625 [00:02<00:02, 123.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 44% 276/625 [00:02<00:02, 125.61it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 46% 289/625 [00:02<00:02, 122.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 48% 303/625 [00:02<00:02, 124.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 51% 316/625 [00:02<00:02, 125.85it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 53% 330/625 [00:02<00:02, 128.19it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 55% 344/625 [00:02<00:02, 128.97it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 57% 357/625 [00:02<00:02, 129.17it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 59% 371/625 [00:02<00:01, 130.63it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 62% 385/625 [00:03<00:01, 129.95it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 64% 399/625 [00:03<00:01, 131.09it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 66% 413/625 [00:03<00:01, 129.23it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 68% 426/625 [00:03<00:01, 129.12it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 70% 439/625 [00:03<00:01, 126.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 72% 453/625 [00:03<00:01, 127.40it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 75% 466/625 [00:03<00:01, 126.84it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 77% 480/625 [00:03<00:01, 127.74it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 79% 493/625 [00:03<00:01, 127.33it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 81% 506/625 [00:04<00:00, 121.71it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 83% 519/625 [00:04<00:00, 123.77it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 85% 532/625 [00:04<00:00, 121.86it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 87% 545/625 [00:04<00:00, 122.11it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 89% 558/625 [00:04<00:00, 118.65it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 91% 571/625 [00:04<00:00, 119.80it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 93% 584/625 [00:04<00:00, 122.16it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 96% 597/625 [00:04<00:00, 123.91it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 98% 611/625 [00:04<00:00, 126.25it/s]\u001b[A\u001b[A\n",
"\n",
"Evaluating: 100% 625/625 [00:04<00:00, 128.41it/s]\u001b[A\u001b[A\n",
"\n",
"\u001b[A\u001b[A02/21/2020 12:37:21 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:37:21 - INFO - __main__ - perplexity = tensor(787.6224)\n",
"02/21/2020 12:37:21 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/checkpoint-28000/config.json\n",
"02/21/2020 12:37:21 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/checkpoint-28000/pytorch_model.bin\n",
"02/21/2020 12:37:21 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights/checkpoint-28000\n",
"02/21/2020 12:37:21 - INFO - __main__ - Deleting older checkpoint [/content/models/smallBERTa/weights/checkpoint-24000] due to args.save_total_limit\n",
"02/21/2020 12:37:21 - INFO - __main__ - Saving optimizer and scheduler states to /content/models/smallBERTa/weights/checkpoint-28000\n",
"\n",
"Iteration: 98% 5500/5625 [03:45<01:06, 1.89it/s]\u001b[A\n",
"Iteration: 98% 5504/5625 [03:45<00:45, 2.64it/s]\u001b[A\n",
"Iteration: 98% 5508/5625 [03:45<00:32, 3.66it/s]\u001b[A\n",
"Iteration: 98% 5512/5625 [03:45<00:22, 5.00it/s]\u001b[A\n",
"Iteration: 98% 5516/5625 [03:46<00:16, 6.72it/s]\u001b[A\n",
"Iteration: 98% 5520/5625 [03:46<00:11, 8.92it/s]\u001b[A\n",
"Iteration: 98% 5524/5625 [03:46<00:08, 11.53it/s]\u001b[A\n",
"Iteration: 98% 5528/5625 [03:46<00:06, 14.57it/s]\u001b[A\n",
"Iteration: 98% 5532/5625 [03:46<00:05, 17.84it/s]\u001b[A\n",
"Iteration: 98% 5536/5625 [03:46<00:04, 21.14it/s]\u001b[A\n",
"Iteration: 98% 5540/5625 [03:46<00:03, 24.31it/s]\u001b[A\n",
"Iteration: 99% 5544/5625 [03:46<00:03, 26.41it/s]\u001b[A\n",
"Iteration: 99% 5548/5625 [03:46<00:02, 28.68it/s]\u001b[A\n",
"Iteration: 99% 5552/5625 [03:47<00:02, 30.54it/s]\u001b[A\n",
"Iteration: 99% 5556/5625 [03:47<00:02, 31.89it/s]\u001b[A\n",
"Iteration: 99% 5560/5625 [03:47<00:01, 33.02it/s]\u001b[A\n",
"Iteration: 99% 5564/5625 [03:47<00:01, 33.75it/s]\u001b[A\n",
"Iteration: 99% 5568/5625 [03:47<00:01, 33.29it/s]\u001b[A\n",
"Iteration: 99% 5572/5625 [03:47<00:01, 33.61it/s]\u001b[A\n",
"Iteration: 99% 5576/5625 [03:47<00:01, 33.91it/s]\u001b[A\n",
"Iteration: 99% 5580/5625 [03:47<00:01, 34.73it/s]\u001b[A\n",
"Iteration: 99% 5584/5625 [03:47<00:01, 34.29it/s]\u001b[A\n",
"Iteration: 99% 5588/5625 [03:48<00:01, 34.15it/s]\u001b[A\n",
"Iteration: 99% 5592/5625 [03:48<00:00, 34.74it/s]\u001b[A\n",
"Iteration: 99% 5596/5625 [03:48<00:00, 35.24it/s]\u001b[A\n",
"Iteration: 100% 5600/5625 [03:48<00:00, 35.71it/s]\u001b[A\n",
"Iteration: 100% 5604/5625 [03:48<00:00, 36.07it/s]\u001b[A\n",
"Iteration: 100% 5608/5625 [03:48<00:00, 36.22it/s]\u001b[A\n",
"Iteration: 100% 5612/5625 [03:48<00:00, 36.41it/s]\u001b[A\n",
"Iteration: 100% 5616/5625 [03:48<00:00, 36.52it/s]\u001b[A\n",
"Iteration: 100% 5620/5625 [03:48<00:00, 36.48it/s]\u001b[A\n",
"Iteration: 100% 5624/5625 [03:49<00:00, 35.76it/s]\u001b[A\n",
"Epoch: 100% 5/5 [19:20<00:00, 232.18s/it]\n",
"02/21/2020 12:37:25 - INFO - __main__ - global_step = 28125, average loss = 6.974579660627577\n",
"02/21/2020 12:37:25 - INFO - __main__ - Saving model checkpoint to /content/models/smallBERTa/weights\n",
"02/21/2020 12:37:25 - INFO - transformers.configuration_utils - Configuration saved in /content/models/smallBERTa/weights/config.json\n",
"02/21/2020 12:37:25 - INFO - transformers.modeling_utils - Model weights saved in /content/models/smallBERTa/weights/pytorch_model.bin\n",
"02/21/2020 12:37:25 - INFO - transformers.configuration_utils - loading configuration file /content/models/smallBERTa/weights/config.json\n",
"02/21/2020 12:37:25 - INFO - transformers.configuration_utils - Model config RobertaConfig {\n",
" \"architectures\": [\n",
" \"RobertaForMaskedLM\"\n",
" ],\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"bos_token_id\": 0,\n",
" \"do_sample\": false,\n",
" \"eos_token_ids\": 0,\n",
" \"finetuning_task\": null,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.3,\n",
" \"hidden_size\": 128,\n",
" \"id2label\": {\n",
" \"0\": \"LABEL_0\",\n",
" \"1\": \"LABEL_1\"\n",
" },\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 256,\n",
" \"is_decoder\": false,\n",
" \"label2id\": {\n",
" \"LABEL_0\": 0,\n",
" \"LABEL_1\": 1\n",
" },\n",
" \"layer_norm_eps\": 1e-12,\n",
" \"length_penalty\": 1.0,\n",
" \"max_length\": 20,\n",
" \"max_position_embeddings\": 256,\n",
" \"model_type\": \"roberta\",\n",
" \"num_attention_heads\": 4,\n",
" \"num_beams\": 1,\n",
" \"num_hidden_layers\": 2,\n",
" \"num_labels\": 2,\n",
" \"num_return_sequences\": 1,\n",
" \"output_attentions\": false,\n",
" \"output_hidden_states\": false,\n",
" \"output_past\": true,\n",
" \"pad_token_id\": 0,\n",
" \"pruned_heads\": {},\n",
" \"repetition_penalty\": 1.0,\n",
" \"temperature\": 1.0,\n",
" \"top_k\": 50,\n",
" \"top_p\": 1.0,\n",
" \"torchscript\": false,\n",
" \"type_vocab_size\": 2,\n",
" \"use_bfloat16\": false,\n",
" \"vocab_size\": 5000\n",
"}\n",
"\n",
"02/21/2020 12:37:25 - INFO - transformers.modeling_utils - loading weights file /content/models/smallBERTa/weights/pytorch_model.bin\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - Model name '/content/models/smallBERTa/weights' not found in model shortcut name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). Assuming '/content/models/smallBERTa/weights' is a path, a model identifier, or url to a directory containing tokenizer files.\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - Didn't find file /content/models/smallBERTa/weights/added_tokens.json. We won't load it.\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - loading file /content/models/smallBERTa/weights/vocab.json\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - loading file /content/models/smallBERTa/weights/merges.txt\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - loading file None\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - loading file /content/models/smallBERTa/weights/special_tokens_map.json\n",
"02/21/2020 12:37:25 - INFO - transformers.tokenization_utils - loading file /content/models/smallBERTa/weights/tokenizer_config.json\n",
"02/21/2020 12:37:25 - INFO - __main__ - Evaluate the following checkpoints: ['/content/models/smallBERTa/weights']\n",
"02/21/2020 12:37:25 - INFO - transformers.configuration_utils - loading configuration file /content/models/smallBERTa/weights/config.json\n",
"02/21/2020 12:37:25 - INFO - transformers.configuration_utils - Model config RobertaConfig {\n",
" \"architectures\": [\n",
" \"RobertaForMaskedLM\"\n",
" ],\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"bos_token_id\": 0,\n",
" \"do_sample\": false,\n",
" \"eos_token_ids\": 0,\n",
" \"finetuning_task\": null,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.3,\n",
" \"hidden_size\": 128,\n",
" \"id2label\": {\n",
" \"0\": \"LABEL_0\",\n",
" \"1\": \"LABEL_1\"\n",
" },\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 256,\n",
" \"is_decoder\": false,\n",
" \"label2id\": {\n",
" \"LABEL_0\": 0,\n",
" \"LABEL_1\": 1\n",
" },\n",
" \"layer_norm_eps\": 1e-12,\n",
" \"length_penalty\": 1.0,\n",
" \"max_length\": 20,\n",
" \"max_position_embeddings\": 256,\n",
" \"model_type\": \"roberta\",\n",
" \"num_attention_heads\": 4,\n",
" \"num_beams\": 1,\n",
" \"num_hidden_layers\": 2,\n",
" \"num_labels\": 2,\n",
" \"num_return_sequences\": 1,\n",
" \"output_attentions\": false,\n",
" \"output_hidden_states\": false,\n",
" \"output_past\": true,\n",
" \"pad_token_id\": 0,\n",
" \"pruned_heads\": {},\n",
" \"repetition_penalty\": 1.0,\n",
" \"temperature\": 1.0,\n",
" \"top_k\": 50,\n",
" \"top_p\": 1.0,\n",
" \"torchscript\": false,\n",
" \"type_vocab_size\": 2,\n",
" \"use_bfloat16\": false,\n",
" \"vocab_size\": 5000\n",
"}\n",
"\n",
"02/21/2020 12:37:25 - INFO - transformers.modeling_utils - loading weights file /content/models/smallBERTa/weights/pytorch_model.bin\n",
"02/21/2020 12:37:25 - INFO - __main__ - Creating features from dataset file at /tmp/lm_data/eval.txt\n",
"02/21/2020 12:37:28 - INFO - __main__ - ***** Running evaluation *****\n",
"02/21/2020 12:37:28 - INFO - __main__ - Num examples = 20000\n",
"02/21/2020 12:37:28 - INFO - __main__ - Batch size = 32\n",
"Evaluating: 100% 625/625 [00:04<00:00, 126.10it/s]\n",
"02/21/2020 12:37:33 - INFO - __main__ - ***** Eval results *****\n",
"02/21/2020 12:37:33 - INFO - __main__ - perplexity = tensor(788.3641)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Pirlhhohke8T",
"colab_type": "text"
},
"source": [
"## View Results on Tensorboard"
]
},
{
"cell_type": "code",
"metadata": {
"id": "siSNrtODEqUV",
"colab_type": "code",
"outputId": "01e91567-abde-47ee-eea2-e674fab6292b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"!tensorboard dev upload --logdir /content/runs"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"\n",
"***** TensorBoard Uploader *****\n",
"\n",
"This will upload your TensorBoard logs to https://tensorboard.dev/ from\n",
"the following directory:\n",
"\n",
"/content/runs\n",
"\n",
"This TensorBoard will be visible to everyone. Do not upload sensitive\n",
"data.\n",
"\n",
"Your use of this service is subject to Google's Terms of Service\n",
"<https://policies.google.com/terms> and Privacy Policy\n",
"<https://policies.google.com/privacy>, and TensorBoard.dev's Terms of Service\n",
"<https://tensorboard.dev/policy/terms/>.\n",
"\n",
"This notice will not be shown again while you are logged into the uploader.\n",
"To log out, run `tensorboard dev auth revoke`.\n",
"\n",
"Continue? (yes/NO) yes\n",
"\n",
"Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=373649185512-8v619h5kft38l4456nm2dj4ubeqsrvh6.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&state=kgAdxJj3xxL6gDgTUoUWbPVrkXeIzl&prompt=consent&access_type=offline\n",
"Enter the authorization code: 4/wwHWmLi7O1avExJ9mp5Ka_Bbo3lSCOsRUHS1r2a5lqOiyIAllUK6KpY\n",
"\n",
"Upload started and will continue reading any new data as it's added\n",
"to the logdir. To stop uploading, press Ctrl-C.\n",
"View your TensorBoard live at: https://tensorboard.dev/experiment/wKOIBs5zRgCb0MY8KGi7Sg/\n",
"\n",
"Traceback (most recent call last):\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader_main.py\", line 426, in execute\n",
" uploader.start_uploading()\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader.py\", line 111, in start_uploading\n",
" self._upload_once()\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader.py\", line 116, in _upload_once\n",
" self._rate_limiter.tick()\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/util.py\", line 41, in tick\n",
" self._time.sleep(wait_secs)\n",
"KeyboardInterrupt\n",
"\n",
"During handling of the above exception, another exception occurred:\n",
"\n",
"Traceback (most recent call last):\n",
" File \"/usr/local/bin/tensorboard\", line 8, in <module>\n",
" sys.exit(run_main())\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/main.py\", line 66, in run_main\n",
" app.run(tensorboard.main, flags_parser=tensorboard.configure)\n",
" File \"/usr/local/lib/python3.6/dist-packages/absl/app.py\", line 299, in run\n",
" _run_main(main, args)\n",
" File \"/usr/local/lib/python3.6/dist-packages/absl/app.py\", line 250, in _run_main\n",
" sys.exit(main(argv))\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/program.py\", line 268, in main\n",
" return runner(self.flags) or 0\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader_main.py\", line 579, in run\n",
" return _run(flags)\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader_main.py\", line 259, in _run\n",
" intent.execute(server_info, channel)\n",
" File \"/usr/local/lib/python3.6/dist-packages/tensorboard/uploader/uploader_main.py\", line 431, in execute\n",
" print()\n",
"KeyboardInterrupt\n",
"^C\n"
],
"name": "stdout"
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment