Skip to content

Instantly share code, notes, and snippets.

@dgunning
Created October 29, 2019 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dgunning/72f7684dc55365ae2753d52bdc61c1ae to your computer and use it in GitHub Desktop.
Save dgunning/72f7684dc55365ae2753d52bdc61c1ae to your computer and use it in GitHub Desktop.
FinetuningBertOnWikiText
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "FinetuningBertOnWikiText",
"provenance": [],
"private_outputs": true,
"collapsed_sections": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/dgunning/72f7684dc55365ae2753d52bdc61c1ae/finetuningbertonwikitext.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oC3n3O3sDcxs",
"colab_type": "text"
},
"source": [
"# Bert Finetuning on Wikitext\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TLkFN-B-eQH3",
"colab_type": "text"
},
"source": [
"In this notebook we will finetune a **HuggingFace** BERT model on Wikitext data. This is based on the example [Language Model Finetuning](https://huggingface.co/transformers/examples.html#language-model-fine-tuning) on the Huggingface Transformer documentation site.\n",
"\n",
"Huggingface offers tools to simplify working with NLP models, including the latest BERT, GPT, XLM models etc. By using Huggingface you will be able to keep up to date with cutting edge NLP.\n",
"\n",
"The most useful of Huggingface's tool is the Transformers library which provides **Pytorch** and **Tensorflow** wrappers to the latest transformer based models. https://huggingface.co/\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jBkcXjyreRn8",
"colab_type": "text"
},
"source": [
"# Download Wikitext\n",
"\n",
"First, download the Wikitext Raw files. \n",
"\n",
"The [Wikitext](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) dataset is built from a set of Good an Featured articles on Wikipedia, and is commonly used for NLP model development."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ORQPll3obTOx",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 286
},
"outputId": "83beb46f-d840-4cf6-8d8f-9a5f9387e3e5"
},
"source": [
"%%shell\n",
"if [ ! -d \"wikitext-2-raw\" ]; \n",
" then\n",
" wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip\n",
" unzip wikitext-2-raw-v1.zip\n",
"fi "
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"--2019-10-29 17:37:58-- https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip\n",
"Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.16.147\n",
"Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.16.147|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 4721645 (4.5M) [application/zip]\n",
"Saving to: ‘wikitext-2-raw-v1.zip’\n",
"\n",
"wikitext-2-raw-v1.z 100%[===================>] 4.50M 6.96MB/s in 0.6s \n",
"\n",
"2019-10-29 17:37:59 (6.96 MB/s) - ‘wikitext-2-raw-v1.zip’ saved [4721645/4721645]\n",
"\n",
"Archive: wikitext-2-raw-v1.zip\n",
" creating: wikitext-2-raw/\n",
" inflating: wikitext-2-raw/wiki.test.raw \n",
" inflating: wikitext-2-raw/wiki.valid.raw \n",
" inflating: wikitext-2-raw/wiki.train.raw \n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n6WyrLAh1nNh",
"colab_type": "text"
},
"source": [
"We can see the files by listing the contents of the wikitext-2-raw directory"
]
},
{
"cell_type": "code",
"metadata": {
"id": "13HLgPcvb5Mh",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 118
},
"outputId": "2506c5d9-b049-4faa-fc7f-fe1a683d0efd"
},
"source": [
"!ls -alh wikitext-2-raw"
],
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": [
"total 13M\n",
"drwxrwx--- 2 root root 4.0K Sep 27 2016 .\n",
"drwxr-xr-x 1 root root 4.0K Oct 29 17:37 ..\n",
"-rw-rw---- 1 root root 1.3M Aug 15 2016 wiki.test.raw\n",
"-rw-rw---- 1 root root 11M Aug 15 2016 wiki.train.raw\n",
"-rw-rw---- 1 root root 1.1M Aug 15 2016 wiki.valid.raw\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RC_3x0YyhZNu",
"colab_type": "text"
},
"source": [
"# Create an output directory\n",
"\n",
"We will use the command **mkdir** with the -p flag to create an output directory if it does not already exist."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9RsgvUBGfyZQ",
"colab_type": "text"
},
"source": [
"!mkdir -p output"
]
},
{
"cell_type": "code",
"metadata": {
"id": "B6mjj1wif6_W",
"colab_type": "code",
"colab": {}
},
"source": [
"!mkdir -p output"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "XJanxu8cAQTp",
"colab_type": "text"
},
"source": [
"# Install Higgingface Transformers"
]
},
{
"cell_type": "code",
"metadata": {
"id": "nnKZwJ-qAz1d",
"colab_type": "code",
"outputId": "2a79ac61-1de0-413f-ac0e-3399b6e96977",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 625
}
},
"source": [
"!pip install transformers"
],
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting transformers\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/fd/f9/51824e40f0a23a49eab4fcaa45c1c797cbf9761adedd0b558dab7c958b34/transformers-2.1.1-py3-none-any.whl (311kB)\n",
"\r\u001b[K |█ | 10kB 14.1MB/s eta 0:00:01\r\u001b[K |██ | 20kB 4.5MB/s eta 0:00:01\r\u001b[K |███▏ | 30kB 6.4MB/s eta 0:00:01\r\u001b[K |████▏ | 40kB 4.0MB/s eta 0:00:01\r\u001b[K |█████▎ | 51kB 4.9MB/s eta 0:00:01\r\u001b[K |██████▎ | 61kB 5.8MB/s eta 0:00:01\r\u001b[K |███████▍ | 71kB 6.6MB/s eta 0:00:01\r\u001b[K |████████▍ | 81kB 7.4MB/s eta 0:00:01\r\u001b[K |█████████▌ | 92kB 8.2MB/s eta 0:00:01\r\u001b[K |██████████▌ | 102kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████▋ | 112kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████▋ | 122kB 6.5MB/s eta 0:00:01\r\u001b[K |█████████████▊ | 133kB 6.5MB/s eta 0:00:01\r\u001b[K |██████████████▊ | 143kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████████▊ | 153kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████████▉ | 163kB 6.5MB/s eta 0:00:01\r\u001b[K |█████████████████▉ | 174kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████████████ | 184kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████████████ | 194kB 6.5MB/s eta 0:00:01\r\u001b[K |█████████████████████ | 204kB 6.5MB/s eta 0:00:01\r\u001b[K |██████████████████████ | 215kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████████████████▏ | 225kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████████████████▏ | 235kB 6.5MB/s eta 0:00:01\r\u001b[K |█████████████████████████▎ | 245kB 6.5MB/s eta 0:00:01\r\u001b[K |██████████████████████████▎ | 256kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████████████████████▍ | 266kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████████████████████▍ | 276kB 6.5MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▍ | 286kB 6.5MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▌ | 296kB 6.5MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▌| 307kB 6.5MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 317kB 6.5MB/s \n",
"\u001b[?25hCollecting regex\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/ff/60/d9782c56ceefa76033a00e1f84cd8c586c75e6e7fea2cd45ee8b46a386c5/regex-2019.08.19-cp36-cp36m-manylinux1_x86_64.whl (643kB)\n",
"\r\u001b[K |▌ | 10kB 19.0MB/s eta 0:00:01\r\u001b[K |█ | 20kB 26.0MB/s eta 0:00:01\r\u001b[K |█▌ | 30kB 32.4MB/s eta 0:00:01\r\u001b[K |██ | 40kB 36.9MB/s eta 0:00:01\r\u001b[K |██▌ | 51kB 39.9MB/s eta 0:00:01\r\u001b[K |███ | 61kB 43.3MB/s eta 0:00:01\r\u001b[K |███▋ | 71kB 44.4MB/s eta 0:00:01\r\u001b[K |████ | 81kB 45.7MB/s eta 0:00:01\r\u001b[K |████▋ | 92kB 47.9MB/s eta 0:00:01\r\u001b[K |█████ | 102kB 49.3MB/s eta 0:00:01\r\u001b[K |█████▋ | 112kB 49.3MB/s eta 0:00:01\r\u001b[K |██████ | 122kB 49.3MB/s eta 0:00:01\r\u001b[K |██████▋ | 133kB 49.3MB/s eta 0:00:01\r\u001b[K |███████▏ | 143kB 49.3MB/s eta 0:00:01\r\u001b[K |███████▋ | 153kB 49.3MB/s eta 0:00:01\r\u001b[K |████████▏ | 163kB 49.3MB/s eta 0:00:01\r\u001b[K |████████▋ | 174kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████▏ | 184kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████▊ | 194kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████▏ | 204kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████▊ | 215kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████▏ | 225kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████▊ | 235kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████▏ | 245kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████▊ | 256kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████▎ | 266kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████▊ | 276kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████▎ | 286kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████▊ | 296kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████▎ | 307kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████▊ | 317kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████▎ | 327kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████▉ | 337kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████▎ | 348kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████▉ | 358kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████▎ | 368kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████▉ | 378kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████▍ | 389kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████▉ | 399kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████▍ | 409kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████▉ | 419kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████▍ | 430kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████▉ | 440kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████████▍ | 450kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████ | 460kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████▍ | 471kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████████ | 481kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████████▍ | 491kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████████ | 501kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████████▌ | 512kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████████████ | 522kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████████████▌ | 532kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████████ | 542kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████████▌ | 552kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████████████ | 563kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████████████▌ | 573kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████████████ | 583kB 49.3MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▌ | 593kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████████████████ | 604kB 49.3MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▌ | 614kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████████████ | 624kB 49.3MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▌| 634kB 49.3MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 645kB 49.3MB/s \n",
"\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from transformers) (4.28.1)\n",
"Requirement already satisfied: boto3 in /usr/local/lib/python3.6/dist-packages (from transformers) (1.10.2)\n",
"Collecting sentencepiece\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/14/3d/efb655a670b98f62ec32d66954e1109f403db4d937c50d779a75b9763a29/sentencepiece-0.1.83-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)\n",
"\u001b[K |████████████████████████████████| 1.0MB 36kB/s \n",
"\u001b[?25hCollecting sacremoses\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/1f/8e/ed5364a06a9ba720fddd9820155cc57300d28f5f43a6fd7b7e817177e642/sacremoses-0.0.35.tar.gz (859kB)\n",
"\u001b[K |████████████████████████████████| 860kB 46.0MB/s \n",
"\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.21.0)\n",
"Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.17.3)\n",
"Requirement already satisfied: s3transfer<0.3.0,>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.2.1)\n",
"Requirement already satisfied: botocore<1.14.0,>=1.13.2 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (1.13.2)\n",
"Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.6/dist-packages (from boto3->transformers) (0.9.4)\n",
"Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.12.0)\n",
"Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.0)\n",
"Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.14.0)\n",
"Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4)\n",
"Requirement already satisfied: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3)\n",
"Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.8)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2019.9.11)\n",
"Requirement already satisfied: docutils<0.16,>=0.10 in /usr/local/lib/python3.6/dist-packages (from botocore<1.14.0,>=1.13.2->boto3->transformers) (0.15.2)\n",
"Requirement already satisfied: python-dateutil<3.0.0,>=2.1; python_version >= \"2.7\" in /usr/local/lib/python3.6/dist-packages (from botocore<1.14.0,>=1.13.2->boto3->transformers) (2.6.1)\n",
"Building wheels for collected packages: sacremoses\n",
" Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for sacremoses: filename=sacremoses-0.0.35-cp36-none-any.whl size=883999 sha256=6007a20a0d7bbe5a84efa0ad626f1fbb39988546e8b905480090141edabfa854\n",
" Stored in directory: /root/.cache/pip/wheels/63/2a/db/63e2909042c634ef551d0d9ac825b2b0b32dede4a6d87ddc94\n",
"Successfully built sacremoses\n",
"Installing collected packages: regex, sentencepiece, sacremoses, transformers\n",
"Successfully installed regex-2019.8.19 sacremoses-0.0.35 sentencepiece-0.1.83 transformers-2.1.1\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2sXnZCjOh1j2",
"colab_type": "text"
},
"source": [
"# Get Finetuning Script"
]
},
{
"cell_type": "code",
"metadata": {
"id": "qv0n725Lh6x-",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 202
},
"outputId": "d37715a2-dc8e-43a7-c090-fea6882df089"
},
"source": [
"%%shell\n",
"if [ ! -f \"run_lm_finetuning.py\" ]; \n",
" then\n",
" wget https://raw.githubusercontent.com/huggingface/transformers/master/examples/run_lm_finetuning.py\n",
"fi"
],
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": [
"--2019-10-29 16:09:28-- https://raw.githubusercontent.com/huggingface/transformers/master/examples/run_lm_finetuning.py\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 28255 (28K) [text/plain]\n",
"Saving to: ‘run_lm_finetuning.py’\n",
"\n",
"\rrun_lm_finetuning.p 0%[ ] 0 --.-KB/s \rrun_lm_finetuning.p 100%[===================>] 27.59K --.-KB/s in 0.005s \n",
"\n",
"2019-10-29 16:09:29 (5.30 MB/s) - ‘run_lm_finetuning.py’ saved [28255/28255]\n",
"\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7p4OHCvShu_l",
"colab_type": "text"
},
"source": [
"# Run Finetuning\n",
"Now we can run the finetuning script using the wikitext data as input. This takes about 20 minutes to train on the Google Colab GPU runtime."
]
},
{
"cell_type": "code",
"metadata": {
"id": "5V8Pa-f_A_e1",
"colab_type": "code",
"outputId": "6825115a-4b4f-46c4-de8c-0476e3028cba",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"%%shell\n",
"python run_lm_finetuning.py \\\n",
" --output_dir=output \\\n",
" --overwrite_output_dir \\\n",
" --mlm \\\n",
" --model_type=bert \\\n",
" --model_name_or_path=bert-base-cased \\\n",
" --do_train --train_data_file=wikitext-2-raw/wiki.train.raw \\\n",
" --do_eval --eval_data_file=wikitext-2-raw/wiki.test.raw "
],
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"text": [
"10/29/2019 16:09:48 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False\n",
"10/29/2019 16:09:48 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-config.json not found in cache or force_download set to True, downloading to /tmp/tmppc12d333\n",
"100% 313/313 [00:00<00:00, 193089.74B/s]\n",
"10/29/2019 16:09:49 - INFO - transformers.file_utils - copying /tmp/tmppc12d333 to cache at /root/.cache/torch/transformers/b945b69218e98b3e2c95acf911789741307dec43c698d35fad11c1ae28bda352.d7a3af18ce3a2ab7c0f48f04dc8daff45ed9a3ed333b9e9a79d012a0dedf87a6\n",
"10/29/2019 16:09:49 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/b945b69218e98b3e2c95acf911789741307dec43c698d35fad11c1ae28bda352.d7a3af18ce3a2ab7c0f48f04dc8daff45ed9a3ed333b9e9a79d012a0dedf87a6\n",
"10/29/2019 16:09:49 - INFO - transformers.file_utils - removing temp file /tmp/tmppc12d333\n",
"10/29/2019 16:09:49 - INFO - transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-config.json from cache at /root/.cache/torch/transformers/b945b69218e98b3e2c95acf911789741307dec43c698d35fad11c1ae28bda352.d7a3af18ce3a2ab7c0f48f04dc8daff45ed9a3ed333b9e9a79d012a0dedf87a6\n",
"10/29/2019 16:09:49 - INFO - transformers.configuration_utils - Model config {\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"finetuning_task\": null,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.1,\n",
" \"hidden_size\": 768,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 3072,\n",
" \"layer_norm_eps\": 1e-12,\n",
" \"max_position_embeddings\": 512,\n",
" \"num_attention_heads\": 12,\n",
" \"num_hidden_layers\": 12,\n",
" \"num_labels\": 2,\n",
" \"output_attentions\": false,\n",
" \"output_hidden_states\": false,\n",
" \"output_past\": true,\n",
" \"pruned_heads\": {},\n",
" \"torchscript\": false,\n",
" \"type_vocab_size\": 2,\n",
" \"use_bfloat16\": false,\n",
" \"vocab_size\": 28996\n",
"}\n",
"\n",
"10/29/2019 16:09:49 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt not found in cache or force_download set to True, downloading to /tmp/tmpqc0f5avx\n",
"100% 213450/213450 [00:00<00:00, 856371.79B/s]\n",
"10/29/2019 16:09:50 - INFO - transformers.file_utils - copying /tmp/tmpqc0f5avx to cache at /root/.cache/torch/transformers/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1\n",
"10/29/2019 16:09:50 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1\n",
"10/29/2019 16:09:50 - INFO - transformers.file_utils - removing temp file /tmp/tmpqc0f5avx\n",
"10/29/2019 16:09:50 - INFO - transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt from cache at /root/.cache/torch/transformers/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1\n",
"10/29/2019 16:09:50 - INFO - transformers.file_utils - https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-pytorch_model.bin not found in cache or force_download set to True, downloading to /tmp/tmpk2ke91ge\n",
"100% 435779157/435779157 [00:15<00:00, 27493696.88B/s]\n",
"10/29/2019 16:10:06 - INFO - transformers.file_utils - copying /tmp/tmpk2ke91ge to cache at /root/.cache/torch/transformers/35d8b9d36faaf46728a0192d82bf7d00137490cd6074e8500778afed552a67e5.3fadbea36527ae472139fe84cddaa65454d7429f12d543d80bfc3ad70de55ac2\n",
"10/29/2019 16:10:08 - INFO - transformers.file_utils - creating metadata file for /root/.cache/torch/transformers/35d8b9d36faaf46728a0192d82bf7d00137490cd6074e8500778afed552a67e5.3fadbea36527ae472139fe84cddaa65454d7429f12d543d80bfc3ad70de55ac2\n",
"10/29/2019 16:10:08 - INFO - transformers.file_utils - removing temp file /tmp/tmpk2ke91ge\n",
"10/29/2019 16:10:08 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-pytorch_model.bin from cache at /root/.cache/torch/transformers/35d8b9d36faaf46728a0192d82bf7d00137490cd6074e8500778afed552a67e5.3fadbea36527ae472139fe84cddaa65454d7429f12d543d80bfc3ad70de55ac2\n",
"10/29/2019 16:10:12 - INFO - transformers.modeling_utils - Weights from pretrained model not used in BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']\n",
"10/29/2019 16:10:17 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, block_size=510, cache_dir='', config_name='', device=device(type='cuda'), do_eval=True, do_lower_case=False, do_train=True, eval_all_checkpoints=False, eval_data_file='wikitext-2-raw/wiki.test.raw', evaluate_during_training=False, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, learning_rate=5e-05, local_rank=-1, logging_steps=50, max_grad_norm=1.0, max_steps=-1, mlm=True, mlm_probability=0.15, model_name_or_path='bert-base-cased', model_type='bert', n_gpu=1, no_cuda=False, num_train_epochs=1.0, output_dir='output', overwrite_cache=False, overwrite_output_dir=True, per_gpu_eval_batch_size=4, per_gpu_train_batch_size=4, save_steps=50, save_total_limit=None, seed=42, server_ip='', server_port='', tokenizer_name='', train_data_file='wikitext-2-raw/wiki.train.raw', warmup_steps=0, weight_decay=0.0)\n",
"10/29/2019 16:10:17 - INFO - __main__ - Creating features from dataset file at wikitext-2-raw\n",
"10/29/2019 16:10:47 - WARNING - transformers.tokenization_utils - Token indices sequence length is longer than the specified maximum sequence length for this model (2378951 > 512). Running this sequence through the model will result in indexing errors\n",
"10/29/2019 16:10:47 - INFO - __main__ - Saving features into cached file wikitext-2-raw/cached_lm_510_wiki.train.raw\n",
"10/29/2019 16:10:47 - INFO - __main__ - ***** Running training *****\n",
"10/29/2019 16:10:47 - INFO - __main__ - Num examples = 4664\n",
"10/29/2019 16:10:47 - INFO - __main__ - Num Epochs = 1\n",
"10/29/2019 16:10:47 - INFO - __main__ - Instantaneous batch size per GPU = 4\n",
"10/29/2019 16:10:47 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4\n",
"10/29/2019 16:10:47 - INFO - __main__ - Gradient Accumulation steps = 1\n",
"10/29/2019 16:10:47 - INFO - __main__ - Total optimization steps = 1166\n",
"Epoch: 0% 0/1 [00:00<?, ?it/s]\n",
"Iteration: 0% 0/1166 [00:00<?, ?it/s]\u001b[A\n",
"Iteration: 0% 1/1166 [00:01<22:42, 1.17s/it]\u001b[A\n",
"Iteration: 0% 2/1166 [00:02<22:01, 1.14s/it]\u001b[A\n",
"Iteration: 0% 3/1166 [00:03<21:36, 1.11s/it]\u001b[A\n",
"Iteration: 0% 4/1166 [00:04<21:14, 1.10s/it]\u001b[A\n",
"Iteration: 0% 5/1166 [00:05<21:01, 1.09s/it]\u001b[A\n",
"Iteration: 1% 6/1166 [00:06<20:49, 1.08s/it]\u001b[A\n",
"Iteration: 1% 7/1166 [00:07<20:44, 1.07s/it]\u001b[A\n",
"Iteration: 1% 8/1166 [00:08<20:39, 1.07s/it]\u001b[A\n",
"Iteration: 1% 9/1166 [00:09<20:32, 1.07s/it]\u001b[A\n",
"Iteration: 1% 10/1166 [00:10<20:30, 1.06s/it]\u001b[A\n",
"Iteration: 1% 11/1166 [00:11<20:30, 1.07s/it]\u001b[A\n",
"Iteration: 1% 12/1166 [00:12<20:30, 1.07s/it]\u001b[A\n",
"Iteration: 1% 13/1166 [00:13<20:27, 1.06s/it]\u001b[A\n",
"Iteration: 1% 14/1166 [00:14<20:25, 1.06s/it]\u001b[A\n",
"Iteration: 1% 15/1166 [00:16<20:26, 1.07s/it]\u001b[A\n",
"Iteration: 1% 16/1166 [00:17<20:25, 1.07s/it]\u001b[A\n",
"Iteration: 1% 17/1166 [00:18<20:24, 1.07s/it]\u001b[A\n",
"Iteration: 2% 18/1166 [00:19<20:20, 1.06s/it]\u001b[A\n",
"Iteration: 2% 19/1166 [00:20<20:21, 1.07s/it]\u001b[A\n",
"Iteration: 2% 20/1166 [00:21<20:19, 1.06s/it]\u001b[A\n",
"Iteration: 2% 21/1166 [00:22<20:17, 1.06s/it]\u001b[A\n",
"Iteration: 2% 22/1166 [00:23<20:17, 1.06s/it]\u001b[A\n",
"Iteration: 2% 23/1166 [00:24<20:18, 1.07s/it]\u001b[A\n",
"Iteration: 2% 24/1166 [00:25<20:19, 1.07s/it]\u001b[A\n",
"Iteration: 2% 25/1166 [00:26<20:18, 1.07s/it]\u001b[A\n",
"Iteration: 2% 26/1166 [00:27<20:19, 1.07s/it]\u001b[A\n",
"Iteration: 2% 27/1166 [00:28<20:19, 1.07s/it]\u001b[A\n",
"Iteration: 2% 28/1166 [00:29<20:18, 1.07s/it]\u001b[A\n",
"Iteration: 2% 29/1166 [00:30<20:16, 1.07s/it]\u001b[A\n",
"Iteration: 3% 30/1166 [00:32<20:18, 1.07s/it]\u001b[A\n",
"Iteration: 3% 31/1166 [00:33<20:15, 1.07s/it]\u001b[A\n",
"Iteration: 3% 32/1166 [00:34<20:09, 1.07s/it]\u001b[A\n",
"Iteration: 3% 33/1166 [00:35<20:10, 1.07s/it]\u001b[A\n",
"Iteration: 3% 34/1166 [00:36<20:13, 1.07s/it]\u001b[A\n",
"Iteration: 3% 35/1166 [00:37<20:15, 1.07s/it]\u001b[A\n",
"Iteration: 3% 36/1166 [00:38<20:13, 1.07s/it]\u001b[A\n",
"Iteration: 3% 37/1166 [00:39<20:18, 1.08s/it]\u001b[A\n",
"Iteration: 3% 38/1166 [00:40<20:18, 1.08s/it]\u001b[A\n",
"Iteration: 3% 39/1166 [00:41<20:16, 1.08s/it]\u001b[A\n",
"Iteration: 3% 40/1166 [00:42<20:17, 1.08s/it]\u001b[A\n",
"Iteration: 4% 41/1166 [00:43<20:21, 1.09s/it]\u001b[A\n",
"Iteration: 4% 42/1166 [00:45<20:19, 1.08s/it]\u001b[A\n",
"Iteration: 4% 43/1166 [00:46<20:20, 1.09s/it]\u001b[A\n",
"Iteration: 4% 44/1166 [00:47<20:19, 1.09s/it]\u001b[A\n",
"Iteration: 4% 45/1166 [00:48<20:17, 1.09s/it]\u001b[A\n",
"Iteration: 4% 46/1166 [00:49<20:17, 1.09s/it]\u001b[A\n",
"Iteration: 4% 47/1166 [00:50<20:11, 1.08s/it]\u001b[A\n",
"Iteration: 4% 48/1166 [00:51<20:09, 1.08s/it]\u001b[A\n",
"Iteration: 4% 49/1166 [00:52<20:08, 1.08s/it]\u001b[A10/29/2019 16:11:41 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-50/config.json\n",
"10/29/2019 16:11:42 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-50/pytorch_model.bin\n",
"10/29/2019 16:11:42 - INFO - __main__ - Saving model checkpoint to output/checkpoint-50\n",
"\n",
"Iteration: 4% 50/1166 [00:54<25:12, 1.36s/it]\u001b[A\n",
"Iteration: 4% 51/1166 [00:55<23:35, 1.27s/it]\u001b[A\n",
"Iteration: 4% 52/1166 [00:56<22:29, 1.21s/it]\u001b[A\n",
"Iteration: 5% 53/1166 [00:57<21:48, 1.18s/it]\u001b[A\n",
"Iteration: 5% 54/1166 [00:58<21:18, 1.15s/it]\u001b[A\n",
"Iteration: 5% 55/1166 [00:59<20:54, 1.13s/it]\u001b[A\n",
"Iteration: 5% 56/1166 [01:01<20:37, 1.12s/it]\u001b[A\n",
"Iteration: 5% 57/1166 [01:02<20:27, 1.11s/it]\u001b[A\n",
"Iteration: 5% 58/1166 [01:03<20:21, 1.10s/it]\u001b[A\n",
"Iteration: 5% 59/1166 [01:04<20:14, 1.10s/it]\u001b[A\n",
"Iteration: 5% 60/1166 [01:05<20:11, 1.10s/it]\u001b[A\n",
"Iteration: 5% 61/1166 [01:06<20:06, 1.09s/it]\u001b[A\n",
"Iteration: 5% 62/1166 [01:07<20:03, 1.09s/it]\u001b[A\n",
"Iteration: 5% 63/1166 [01:08<19:59, 1.09s/it]\u001b[A\n",
"Iteration: 5% 64/1166 [01:09<19:54, 1.08s/it]\u001b[A\n",
"Iteration: 6% 65/1166 [01:10<19:52, 1.08s/it]\u001b[A\n",
"Iteration: 6% 66/1166 [01:11<19:54, 1.09s/it]\u001b[A\n",
"Iteration: 6% 67/1166 [01:13<19:54, 1.09s/it]\u001b[A\n",
"Iteration: 6% 68/1166 [01:14<19:55, 1.09s/it]\u001b[A\n",
"Iteration: 6% 69/1166 [01:15<19:53, 1.09s/it]\u001b[A\n",
"Iteration: 6% 70/1166 [01:16<19:50, 1.09s/it]\u001b[A\n",
"Iteration: 6% 71/1166 [01:17<19:49, 1.09s/it]\u001b[A\n",
"Iteration: 6% 72/1166 [01:18<19:51, 1.09s/it]\u001b[A\n",
"Iteration: 6% 73/1166 [01:19<19:50, 1.09s/it]\u001b[A\n",
"Iteration: 6% 74/1166 [01:20<19:48, 1.09s/it]\u001b[A\n",
"Iteration: 6% 75/1166 [01:21<19:43, 1.09s/it]\u001b[A\n",
"Iteration: 7% 76/1166 [01:22<19:41, 1.08s/it]\u001b[A\n",
"Iteration: 7% 77/1166 [01:23<19:44, 1.09s/it]\u001b[A\n",
"Iteration: 7% 78/1166 [01:24<19:39, 1.08s/it]\u001b[A\n",
"Iteration: 7% 79/1166 [01:26<19:38, 1.08s/it]\u001b[A\n",
"Iteration: 7% 80/1166 [01:27<19:38, 1.09s/it]\u001b[A\n",
"Iteration: 7% 81/1166 [01:28<19:39, 1.09s/it]\u001b[A\n",
"Iteration: 7% 82/1166 [01:29<19:36, 1.09s/it]\u001b[A\n",
"Iteration: 7% 83/1166 [01:30<19:38, 1.09s/it]\u001b[A\n",
"Iteration: 7% 84/1166 [01:31<19:33, 1.08s/it]\u001b[A\n",
"Iteration: 7% 85/1166 [01:32<19:30, 1.08s/it]\u001b[A\n",
"Iteration: 7% 86/1166 [01:33<19:30, 1.08s/it]\u001b[A\n",
"Iteration: 7% 87/1166 [01:34<19:32, 1.09s/it]\u001b[A\n",
"Iteration: 8% 88/1166 [01:35<19:32, 1.09s/it]\u001b[A\n",
"Iteration: 8% 89/1166 [01:36<19:31, 1.09s/it]\u001b[A\n",
"Iteration: 8% 90/1166 [01:38<19:29, 1.09s/it]\u001b[A\n",
"Iteration: 8% 91/1166 [01:39<19:26, 1.09s/it]\u001b[A\n",
"Iteration: 8% 92/1166 [01:40<19:25, 1.08s/it]\u001b[A\n",
"Iteration: 8% 93/1166 [01:41<19:20, 1.08s/it]\u001b[A\n",
"Iteration: 8% 94/1166 [01:42<19:17, 1.08s/it]\u001b[A\n",
"Iteration: 8% 95/1166 [01:43<19:19, 1.08s/it]\u001b[A\n",
"Iteration: 8% 96/1166 [01:44<19:22, 1.09s/it]\u001b[A\n",
"Iteration: 8% 97/1166 [01:45<19:19, 1.08s/it]\u001b[A\n",
"Iteration: 8% 98/1166 [01:46<19:21, 1.09s/it]\u001b[A\n",
"Iteration: 8% 99/1166 [01:47<19:17, 1.08s/it]\u001b[A10/29/2019 16:12:36 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-100/config.json\n",
"10/29/2019 16:12:37 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-100/pytorch_model.bin\n",
"10/29/2019 16:12:37 - INFO - __main__ - Saving model checkpoint to output/checkpoint-100\n",
"\n",
"Iteration: 9% 100/1166 [01:49<24:15, 1.37s/it]\u001b[A\n",
"Iteration: 9% 101/1166 [01:50<22:39, 1.28s/it]\u001b[A\n",
"Iteration: 9% 102/1166 [01:51<21:36, 1.22s/it]\u001b[A\n",
"Iteration: 9% 103/1166 [01:53<20:53, 1.18s/it]\u001b[A\n",
"Iteration: 9% 104/1166 [01:54<20:22, 1.15s/it]\u001b[A\n",
"Iteration: 9% 105/1166 [01:55<20:01, 1.13s/it]\u001b[A\n",
"Iteration: 9% 106/1166 [01:56<19:45, 1.12s/it]\u001b[A\n",
"Iteration: 9% 107/1166 [01:57<19:32, 1.11s/it]\u001b[A\n",
"Iteration: 9% 108/1166 [01:58<19:25, 1.10s/it]\u001b[A\n",
"Iteration: 9% 109/1166 [01:59<19:16, 1.09s/it]\u001b[A\n",
"Iteration: 9% 110/1166 [02:00<19:11, 1.09s/it]\u001b[A\n",
"Iteration: 10% 111/1166 [02:01<19:07, 1.09s/it]\u001b[A\n",
"Iteration: 10% 112/1166 [02:02<19:05, 1.09s/it]\u001b[A\n",
"Iteration: 10% 113/1166 [02:03<19:03, 1.09s/it]\u001b[A\n",
"Iteration: 10% 114/1166 [02:04<19:00, 1.08s/it]\u001b[A\n",
"Iteration: 10% 115/1166 [02:06<18:58, 1.08s/it]\u001b[A\n",
"Iteration: 10% 116/1166 [02:07<18:56, 1.08s/it]\u001b[A\n",
"Iteration: 10% 117/1166 [02:08<18:57, 1.08s/it]\u001b[A\n",
"Iteration: 10% 118/1166 [02:09<18:56, 1.08s/it]\u001b[A\n",
"Iteration: 10% 119/1166 [02:10<18:54, 1.08s/it]\u001b[A\n",
"Iteration: 10% 120/1166 [02:11<18:51, 1.08s/it]\u001b[A\n",
"Iteration: 10% 121/1166 [02:12<18:53, 1.08s/it]\u001b[A\n",
"Iteration: 10% 122/1166 [02:13<18:50, 1.08s/it]\u001b[A\n",
"Iteration: 11% 123/1166 [02:14<18:49, 1.08s/it]\u001b[A\n",
"Iteration: 11% 124/1166 [02:15<18:48, 1.08s/it]\u001b[A\n",
"Iteration: 11% 125/1166 [02:16<18:44, 1.08s/it]\u001b[A\n",
"Iteration: 11% 126/1166 [02:17<18:45, 1.08s/it]\u001b[A\n",
"Iteration: 11% 127/1166 [02:19<18:45, 1.08s/it]\u001b[A\n",
"Iteration: 11% 128/1166 [02:20<18:45, 1.08s/it]\u001b[A\n",
"Iteration: 11% 129/1166 [02:21<18:41, 1.08s/it]\u001b[A\n",
"Iteration: 11% 130/1166 [02:22<18:40, 1.08s/it]\u001b[A\n",
"Iteration: 11% 131/1166 [02:23<18:37, 1.08s/it]\u001b[A\n",
"Iteration: 11% 132/1166 [02:24<18:35, 1.08s/it]\u001b[A\n",
"Iteration: 11% 133/1166 [02:25<18:35, 1.08s/it]\u001b[A\n",
"Iteration: 11% 134/1166 [02:26<18:37, 1.08s/it]\u001b[A\n",
"Iteration: 12% 135/1166 [02:27<18:38, 1.08s/it]\u001b[A\n",
"Iteration: 12% 136/1166 [02:28<18:37, 1.09s/it]\u001b[A\n",
"Iteration: 12% 137/1166 [02:29<18:35, 1.08s/it]\u001b[A\n",
"Iteration: 12% 138/1166 [02:30<18:32, 1.08s/it]\u001b[A\n",
"Iteration: 12% 139/1166 [02:32<18:32, 1.08s/it]\u001b[A\n",
"Iteration: 12% 140/1166 [02:33<18:28, 1.08s/it]\u001b[A\n",
"Iteration: 12% 141/1166 [02:34<18:27, 1.08s/it]\u001b[A\n",
"Iteration: 12% 142/1166 [02:35<18:24, 1.08s/it]\u001b[A\n",
"Iteration: 12% 143/1166 [02:36<18:24, 1.08s/it]\u001b[A\n",
"Iteration: 12% 144/1166 [02:37<18:26, 1.08s/it]\u001b[A\n",
"Iteration: 12% 145/1166 [02:38<18:25, 1.08s/it]\u001b[A\n",
"Iteration: 13% 146/1166 [02:39<18:21, 1.08s/it]\u001b[A\n",
"Iteration: 13% 147/1166 [02:40<18:20, 1.08s/it]\u001b[A\n",
"Iteration: 13% 148/1166 [02:41<18:16, 1.08s/it]\u001b[A\n",
"Iteration: 13% 149/1166 [02:42<18:18, 1.08s/it]\u001b[A10/29/2019 16:13:31 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-150/config.json\n",
"10/29/2019 16:13:32 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-150/pytorch_model.bin\n",
"10/29/2019 16:13:32 - INFO - __main__ - Saving model checkpoint to output/checkpoint-150\n",
"\n",
"Iteration: 13% 150/1166 [02:44<23:35, 1.39s/it]\u001b[A\n",
"Iteration: 13% 151/1166 [02:45<21:53, 1.29s/it]\u001b[A\n",
"Iteration: 13% 152/1166 [02:47<20:50, 1.23s/it]\u001b[A\n",
"Iteration: 13% 153/1166 [02:48<20:01, 1.19s/it]\u001b[A\n",
"Iteration: 13% 154/1166 [02:49<19:32, 1.16s/it]\u001b[A\n",
"Iteration: 13% 155/1166 [02:50<19:07, 1.14s/it]\u001b[A\n",
"Iteration: 13% 156/1166 [02:51<18:47, 1.12s/it]\u001b[A\n",
"Iteration: 13% 157/1166 [02:52<18:34, 1.10s/it]\u001b[A\n",
"Iteration: 14% 158/1166 [02:53<18:31, 1.10s/it]\u001b[A\n",
"Iteration: 14% 159/1166 [02:54<18:26, 1.10s/it]\u001b[A\n",
"Iteration: 14% 160/1166 [02:55<18:19, 1.09s/it]\u001b[A\n",
"Iteration: 14% 161/1166 [02:56<18:14, 1.09s/it]\u001b[A\n",
"Iteration: 14% 162/1166 [02:57<18:10, 1.09s/it]\u001b[A\n",
"Iteration: 14% 163/1166 [02:58<18:09, 1.09s/it]\u001b[A\n",
"Iteration: 14% 164/1166 [03:00<18:09, 1.09s/it]\u001b[A\n",
"Iteration: 14% 165/1166 [03:01<18:06, 1.09s/it]\u001b[A\n",
"Iteration: 14% 166/1166 [03:02<18:04, 1.08s/it]\u001b[A\n",
"Iteration: 14% 167/1166 [03:03<18:06, 1.09s/it]\u001b[A\n",
"Iteration: 14% 168/1166 [03:04<18:05, 1.09s/it]\u001b[A\n",
"Iteration: 14% 169/1166 [03:05<18:03, 1.09s/it]\u001b[A\n",
"Iteration: 15% 170/1166 [03:06<17:59, 1.08s/it]\u001b[A\n",
"Iteration: 15% 171/1166 [03:07<17:55, 1.08s/it]\u001b[A\n",
"Iteration: 15% 172/1166 [03:08<17:53, 1.08s/it]\u001b[A\n",
"Iteration: 15% 173/1166 [03:09<17:55, 1.08s/it]\u001b[A\n",
"Iteration: 15% 174/1166 [03:10<17:53, 1.08s/it]\u001b[A\n",
"Iteration: 15% 175/1166 [03:11<17:52, 1.08s/it]\u001b[A\n",
"Iteration: 15% 176/1166 [03:13<17:53, 1.08s/it]\u001b[A\n",
"Iteration: 15% 177/1166 [03:14<17:50, 1.08s/it]\u001b[A\n",
"Iteration: 15% 178/1166 [03:15<17:47, 1.08s/it]\u001b[A\n",
"Iteration: 15% 179/1166 [03:16<17:47, 1.08s/it]\u001b[A\n",
"Iteration: 15% 180/1166 [03:17<17:47, 1.08s/it]\u001b[A\n",
"Iteration: 16% 181/1166 [03:18<17:45, 1.08s/it]\u001b[A\n",
"Iteration: 16% 182/1166 [03:19<17:46, 1.08s/it]\u001b[A\n",
"Iteration: 16% 183/1166 [03:20<17:42, 1.08s/it]\u001b[A\n",
"Iteration: 16% 184/1166 [03:21<17:41, 1.08s/it]\u001b[A\n",
"Iteration: 16% 185/1166 [03:22<17:43, 1.08s/it]\u001b[A\n",
"Iteration: 16% 186/1166 [03:23<17:42, 1.08s/it]\u001b[A\n",
"Iteration: 16% 187/1166 [03:24<17:40, 1.08s/it]\u001b[A\n",
"Iteration: 16% 188/1166 [03:26<17:38, 1.08s/it]\u001b[A\n",
"Iteration: 16% 189/1166 [03:27<17:39, 1.08s/it]\u001b[A\n",
"Iteration: 16% 190/1166 [03:28<17:35, 1.08s/it]\u001b[A\n",
"Iteration: 16% 191/1166 [03:29<17:34, 1.08s/it]\u001b[A\n",
"Iteration: 16% 192/1166 [03:30<17:34, 1.08s/it]\u001b[A\n",
"Iteration: 17% 193/1166 [03:31<17:31, 1.08s/it]\u001b[A\n",
"Iteration: 17% 194/1166 [03:32<17:32, 1.08s/it]\u001b[A\n",
"Iteration: 17% 195/1166 [03:33<17:31, 1.08s/it]\u001b[A\n",
"Iteration: 17% 196/1166 [03:34<17:31, 1.08s/it]\u001b[A\n",
"Iteration: 17% 197/1166 [03:35<17:29, 1.08s/it]\u001b[A\n",
"Iteration: 17% 198/1166 [03:36<17:30, 1.08s/it]\u001b[A\n",
"Iteration: 17% 199/1166 [03:37<17:26, 1.08s/it]\u001b[A10/29/2019 16:14:26 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-200/config.json\n",
"10/29/2019 16:14:27 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-200/pytorch_model.bin\n",
"10/29/2019 16:14:27 - INFO - __main__ - Saving model checkpoint to output/checkpoint-200\n",
"\n",
"Iteration: 17% 200/1166 [03:40<22:09, 1.38s/it]\u001b[A\n",
"Iteration: 17% 201/1166 [03:41<20:36, 1.28s/it]\u001b[A\n",
"Iteration: 17% 202/1166 [03:42<19:37, 1.22s/it]\u001b[A\n",
"Iteration: 17% 203/1166 [03:43<18:56, 1.18s/it]\u001b[A\n",
"Iteration: 17% 204/1166 [03:44<18:26, 1.15s/it]\u001b[A\n",
"Iteration: 18% 205/1166 [03:45<18:06, 1.13s/it]\u001b[A\n",
"Iteration: 18% 206/1166 [03:46<17:51, 1.12s/it]\u001b[A\n",
"Iteration: 18% 207/1166 [03:47<17:40, 1.11s/it]\u001b[A\n",
"Iteration: 18% 208/1166 [03:48<17:30, 1.10s/it]\u001b[A\n",
"Iteration: 18% 209/1166 [03:49<17:25, 1.09s/it]\u001b[A\n",
"Iteration: 18% 210/1166 [03:50<17:20, 1.09s/it]\u001b[A\n",
"Iteration: 18% 211/1166 [03:51<17:20, 1.09s/it]\u001b[A\n",
"Iteration: 18% 212/1166 [03:53<17:18, 1.09s/it]\u001b[A\n",
"Iteration: 18% 213/1166 [03:54<17:15, 1.09s/it]\u001b[A\n",
"Iteration: 18% 214/1166 [03:55<17:14, 1.09s/it]\u001b[A\n",
"Iteration: 18% 215/1166 [03:56<17:11, 1.08s/it]\u001b[A\n",
"Iteration: 19% 216/1166 [03:57<17:10, 1.09s/it]\u001b[A\n",
"Iteration: 19% 217/1166 [03:58<17:08, 1.08s/it]\u001b[A\n",
"Iteration: 19% 218/1166 [03:59<17:05, 1.08s/it]\u001b[A\n",
"Iteration: 19% 219/1166 [04:00<17:04, 1.08s/it]\u001b[A\n",
"Iteration: 19% 220/1166 [04:01<17:04, 1.08s/it]\u001b[A\n",
"Iteration: 19% 221/1166 [04:02<17:04, 1.08s/it]\u001b[A\n",
"Iteration: 19% 222/1166 [04:03<17:02, 1.08s/it]\u001b[A\n",
"Iteration: 19% 223/1166 [04:04<17:02, 1.08s/it]\u001b[A\n",
"Iteration: 19% 224/1166 [04:05<16:58, 1.08s/it]\u001b[A\n",
"Iteration: 19% 225/1166 [04:07<16:56, 1.08s/it]\u001b[A\n",
"Iteration: 19% 226/1166 [04:08<16:54, 1.08s/it]\u001b[A\n",
"Iteration: 19% 227/1166 [04:09<16:56, 1.08s/it]\u001b[A\n",
"Iteration: 20% 228/1166 [04:10<16:55, 1.08s/it]\u001b[A\n",
"Iteration: 20% 229/1166 [04:11<16:55, 1.08s/it]\u001b[A\n",
"Iteration: 20% 230/1166 [04:12<16:55, 1.08s/it]\u001b[A\n",
"Iteration: 20% 231/1166 [04:13<16:53, 1.08s/it]\u001b[A\n",
"Iteration: 20% 232/1166 [04:14<16:51, 1.08s/it]\u001b[A\n",
"Iteration: 20% 233/1166 [04:15<16:49, 1.08s/it]\u001b[A\n",
"Iteration: 20% 234/1166 [04:16<16:47, 1.08s/it]\u001b[A\n",
"Iteration: 20% 235/1166 [04:17<16:45, 1.08s/it]\u001b[A\n",
"Iteration: 20% 236/1166 [04:18<16:47, 1.08s/it]\u001b[A\n",
"Iteration: 20% 237/1166 [04:20<16:43, 1.08s/it]\u001b[A\n",
"Iteration: 20% 238/1166 [04:21<16:45, 1.08s/it]\u001b[A\n",
"Iteration: 20% 239/1166 [04:22<16:44, 1.08s/it]\u001b[A\n",
"Iteration: 21% 240/1166 [04:23<16:40, 1.08s/it]\u001b[A\n",
"Iteration: 21% 241/1166 [04:24<16:40, 1.08s/it]\u001b[A\n",
"Iteration: 21% 242/1166 [04:25<16:43, 1.09s/it]\u001b[A\n",
"Iteration: 21% 243/1166 [04:26<16:40, 1.08s/it]\u001b[A\n",
"Iteration: 21% 244/1166 [04:27<16:38, 1.08s/it]\u001b[A\n",
"Iteration: 21% 245/1166 [04:28<16:39, 1.09s/it]\u001b[A\n",
"Iteration: 21% 246/1166 [04:29<16:37, 1.08s/it]\u001b[A\n",
"Iteration: 21% 247/1166 [04:30<16:37, 1.09s/it]\u001b[A\n",
"Iteration: 21% 248/1166 [04:32<16:36, 1.09s/it]\u001b[A\n",
"Iteration: 21% 249/1166 [04:33<16:31, 1.08s/it]\u001b[A10/29/2019 16:15:21 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-250/config.json\n",
"10/29/2019 16:15:22 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-250/pytorch_model.bin\n",
"10/29/2019 16:15:22 - INFO - __main__ - Saving model checkpoint to output/checkpoint-250\n",
"\n",
"Iteration: 21% 250/1166 [04:34<20:08, 1.32s/it]\u001b[A\n",
"Iteration: 22% 251/1166 [04:36<18:56, 1.24s/it]\u001b[A\n",
"Iteration: 22% 252/1166 [04:37<18:09, 1.19s/it]\u001b[A\n",
"Iteration: 22% 253/1166 [04:38<17:37, 1.16s/it]\u001b[A\n",
"Iteration: 22% 254/1166 [04:39<17:17, 1.14s/it]\u001b[A\n",
"Iteration: 22% 255/1166 [04:40<17:01, 1.12s/it]\u001b[A\n",
"Iteration: 22% 256/1166 [04:41<16:49, 1.11s/it]\u001b[A\n",
"Iteration: 22% 257/1166 [04:42<16:41, 1.10s/it]\u001b[A\n",
"Iteration: 22% 258/1166 [04:43<16:32, 1.09s/it]\u001b[A\n",
"Iteration: 22% 259/1166 [04:44<16:29, 1.09s/it]\u001b[A\n",
"Iteration: 22% 260/1166 [04:45<16:26, 1.09s/it]\u001b[A\n",
"Iteration: 22% 261/1166 [04:46<16:24, 1.09s/it]\u001b[A\n",
"Iteration: 22% 262/1166 [04:47<16:22, 1.09s/it]\u001b[A\n",
"Iteration: 23% 263/1166 [04:49<16:22, 1.09s/it]\u001b[A\n",
"Iteration: 23% 264/1166 [04:50<16:20, 1.09s/it]\u001b[A\n",
"Iteration: 23% 265/1166 [04:51<16:18, 1.09s/it]\u001b[A\n",
"Iteration: 23% 266/1166 [04:52<16:16, 1.08s/it]\u001b[A\n",
"Iteration: 23% 267/1166 [04:53<16:13, 1.08s/it]\u001b[A\n",
"Iteration: 23% 268/1166 [04:54<16:10, 1.08s/it]\u001b[A\n",
"Iteration: 23% 269/1166 [04:55<16:10, 1.08s/it]\u001b[A\n",
"Iteration: 23% 270/1166 [04:56<16:08, 1.08s/it]\u001b[A\n",
"Iteration: 23% 271/1166 [04:57<16:08, 1.08s/it]\u001b[A\n",
"Iteration: 23% 272/1166 [04:58<16:07, 1.08s/it]\u001b[A\n",
"Iteration: 23% 273/1166 [04:59<16:05, 1.08s/it]\u001b[A\n",
"Iteration: 23% 274/1166 [05:00<16:01, 1.08s/it]\u001b[A\n",
"Iteration: 24% 275/1166 [05:01<16:03, 1.08s/it]\u001b[A\n",
"Iteration: 24% 276/1166 [05:03<16:02, 1.08s/it]\u001b[A\n",
"Iteration: 24% 277/1166 [05:04<16:01, 1.08s/it]\u001b[A\n",
"Iteration: 24% 278/1166 [05:05<15:59, 1.08s/it]\u001b[A\n",
"Iteration: 24% 279/1166 [05:06<15:58, 1.08s/it]\u001b[A\n",
"Iteration: 24% 280/1166 [05:07<15:56, 1.08s/it]\u001b[A\n",
"Iteration: 24% 281/1166 [05:08<15:55, 1.08s/it]\u001b[A\n",
"Iteration: 24% 282/1166 [05:09<15:55, 1.08s/it]\u001b[A\n",
"Iteration: 24% 283/1166 [05:10<15:53, 1.08s/it]\u001b[A\n",
"Iteration: 24% 284/1166 [05:11<15:51, 1.08s/it]\u001b[A\n",
"Iteration: 24% 285/1166 [05:12<15:53, 1.08s/it]\u001b[A\n",
"Iteration: 25% 286/1166 [05:13<15:52, 1.08s/it]\u001b[A\n",
"Iteration: 25% 287/1166 [05:14<15:51, 1.08s/it]\u001b[A\n",
"Iteration: 25% 288/1166 [05:16<15:49, 1.08s/it]\u001b[A\n",
"Iteration: 25% 289/1166 [05:17<15:46, 1.08s/it]\u001b[A\n",
"Iteration: 25% 290/1166 [05:18<15:43, 1.08s/it]\u001b[A\n",
"Iteration: 25% 291/1166 [05:19<15:44, 1.08s/it]\u001b[A\n",
"Iteration: 25% 292/1166 [05:20<15:45, 1.08s/it]\u001b[A\n",
"Iteration: 25% 293/1166 [05:21<15:44, 1.08s/it]\u001b[A\n",
"Iteration: 25% 294/1166 [05:22<15:44, 1.08s/it]\u001b[A\n",
"Iteration: 25% 295/1166 [05:23<15:44, 1.08s/it]\u001b[A\n",
"Iteration: 25% 296/1166 [05:24<15:41, 1.08s/it]\u001b[A\n",
"Iteration: 25% 297/1166 [05:25<15:40, 1.08s/it]\u001b[A\n",
"Iteration: 26% 298/1166 [05:26<15:40, 1.08s/it]\u001b[A\n",
"Iteration: 26% 299/1166 [05:27<15:38, 1.08s/it]\u001b[A10/29/2019 16:16:16 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-300/config.json\n",
"10/29/2019 16:16:17 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-300/pytorch_model.bin\n",
"10/29/2019 16:16:17 - INFO - __main__ - Saving model checkpoint to output/checkpoint-300\n",
"\n",
"Iteration: 26% 300/1166 [05:29<19:32, 1.35s/it]\u001b[A\n",
"Iteration: 26% 301/1166 [05:30<18:19, 1.27s/it]\u001b[A\n",
"Iteration: 26% 302/1166 [05:32<17:30, 1.22s/it]\u001b[A\n",
"Iteration: 26% 303/1166 [05:33<16:56, 1.18s/it]\u001b[A\n",
"Iteration: 26% 304/1166 [05:34<16:29, 1.15s/it]\u001b[A\n",
"Iteration: 26% 305/1166 [05:35<16:12, 1.13s/it]\u001b[A\n",
"Iteration: 26% 306/1166 [05:36<15:59, 1.12s/it]\u001b[A\n",
"Iteration: 26% 307/1166 [05:37<15:50, 1.11s/it]\u001b[A\n",
"Iteration: 26% 308/1166 [05:38<15:45, 1.10s/it]\u001b[A\n",
"Iteration: 27% 309/1166 [05:39<15:39, 1.10s/it]\u001b[A\n",
"Iteration: 27% 310/1166 [05:40<15:35, 1.09s/it]\u001b[A\n",
"Iteration: 27% 311/1166 [05:41<15:35, 1.09s/it]\u001b[A\n",
"Iteration: 27% 312/1166 [05:42<15:31, 1.09s/it]\u001b[A\n",
"Iteration: 27% 313/1166 [05:44<15:31, 1.09s/it]\u001b[A\n",
"Iteration: 27% 314/1166 [05:45<15:26, 1.09s/it]\u001b[A\n",
"Iteration: 27% 315/1166 [05:46<15:20, 1.08s/it]\u001b[A\n",
"Iteration: 27% 316/1166 [05:47<15:20, 1.08s/it]\u001b[A\n",
"Iteration: 27% 317/1166 [05:48<15:22, 1.09s/it]\u001b[A\n",
"Iteration: 27% 318/1166 [05:49<15:22, 1.09s/it]\u001b[A\n",
"Iteration: 27% 319/1166 [05:50<15:20, 1.09s/it]\u001b[A\n",
"Iteration: 27% 320/1166 [05:51<15:17, 1.08s/it]\u001b[A\n",
"Iteration: 28% 321/1166 [05:52<15:14, 1.08s/it]\u001b[A\n",
"Iteration: 28% 322/1166 [05:53<15:12, 1.08s/it]\u001b[A\n",
"Iteration: 28% 323/1166 [05:54<15:16, 1.09s/it]\u001b[A\n",
"Iteration: 28% 324/1166 [05:55<15:13, 1.09s/it]\u001b[A\n",
"Iteration: 28% 325/1166 [05:57<15:10, 1.08s/it]\u001b[A\n",
"Iteration: 28% 326/1166 [05:58<15:10, 1.08s/it]\u001b[A\n",
"Iteration: 28% 327/1166 [05:59<15:08, 1.08s/it]\u001b[A\n",
"Iteration: 28% 328/1166 [06:00<15:07, 1.08s/it]\u001b[A\n",
"Iteration: 28% 329/1166 [06:01<15:04, 1.08s/it]\u001b[A\n",
"Iteration: 28% 330/1166 [06:02<15:01, 1.08s/it]\u001b[A\n",
"Iteration: 28% 331/1166 [06:03<14:59, 1.08s/it]\u001b[A\n",
"Iteration: 28% 332/1166 [06:04<15:01, 1.08s/it]\u001b[A\n",
"Iteration: 29% 333/1166 [06:05<15:00, 1.08s/it]\u001b[A\n",
"Iteration: 29% 334/1166 [06:06<14:58, 1.08s/it]\u001b[A\n",
"Iteration: 29% 335/1166 [06:07<15:00, 1.08s/it]\u001b[A\n",
"Iteration: 29% 336/1166 [06:08<15:00, 1.09s/it]\u001b[A\n",
"Iteration: 29% 337/1166 [06:10<14:56, 1.08s/it]\u001b[A\n",
"Iteration: 29% 338/1166 [06:11<14:54, 1.08s/it]\u001b[A\n",
"Iteration: 29% 339/1166 [06:12<14:54, 1.08s/it]\u001b[A\n",
"Iteration: 29% 340/1166 [06:13<14:52, 1.08s/it]\u001b[A\n",
"Iteration: 29% 341/1166 [06:14<14:52, 1.08s/it]\u001b[A\n",
"Iteration: 29% 342/1166 [06:15<14:51, 1.08s/it]\u001b[A\n",
"Iteration: 29% 343/1166 [06:16<14:50, 1.08s/it]\u001b[A\n",
"Iteration: 30% 344/1166 [06:17<14:51, 1.08s/it]\u001b[A\n",
"Iteration: 30% 345/1166 [06:18<14:48, 1.08s/it]\u001b[A\n",
"Iteration: 30% 346/1166 [06:19<14:46, 1.08s/it]\u001b[A\n",
"Iteration: 30% 347/1166 [06:20<14:46, 1.08s/it]\u001b[A\n",
"Iteration: 30% 348/1166 [06:21<14:45, 1.08s/it]\u001b[A\n",
"Iteration: 30% 349/1166 [06:22<14:45, 1.08s/it]\u001b[A10/29/2019 16:17:11 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-350/config.json\n",
"10/29/2019 16:17:12 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-350/pytorch_model.bin\n",
"10/29/2019 16:17:12 - INFO - __main__ - Saving model checkpoint to output/checkpoint-350\n",
"\n",
"Iteration: 30% 350/1166 [06:25<19:03, 1.40s/it]\u001b[A\n",
"Iteration: 30% 351/1166 [06:26<17:53, 1.32s/it]\u001b[A\n",
"Iteration: 30% 352/1166 [06:27<16:55, 1.25s/it]\u001b[A\n",
"Iteration: 30% 353/1166 [06:28<16:14, 1.20s/it]\u001b[A\n",
"Iteration: 30% 354/1166 [06:29<15:45, 1.17s/it]\u001b[A\n",
"Iteration: 30% 355/1166 [06:30<15:25, 1.14s/it]\u001b[A\n",
"Iteration: 31% 356/1166 [06:31<15:09, 1.12s/it]\u001b[A\n",
"Iteration: 31% 357/1166 [06:32<14:58, 1.11s/it]\u001b[A\n",
"Iteration: 31% 358/1166 [06:33<14:51, 1.10s/it]\u001b[A\n",
"Iteration: 31% 359/1166 [06:34<14:46, 1.10s/it]\u001b[A\n",
"Iteration: 31% 360/1166 [06:36<14:38, 1.09s/it]\u001b[A\n",
"Iteration: 31% 361/1166 [06:37<14:36, 1.09s/it]\u001b[A\n",
"Iteration: 31% 362/1166 [06:38<14:36, 1.09s/it]\u001b[A\n",
"Iteration: 31% 363/1166 [06:39<14:32, 1.09s/it]\u001b[A\n",
"Iteration: 31% 364/1166 [06:40<14:29, 1.08s/it]\u001b[A\n",
"Iteration: 31% 365/1166 [06:41<14:29, 1.09s/it]\u001b[A\n",
"Iteration: 31% 366/1166 [06:42<14:26, 1.08s/it]\u001b[A\n",
"Iteration: 31% 367/1166 [06:43<14:24, 1.08s/it]\u001b[A\n",
"Iteration: 32% 368/1166 [06:44<14:22, 1.08s/it]\u001b[A\n",
"Iteration: 32% 369/1166 [06:45<14:22, 1.08s/it]\u001b[A\n",
"Iteration: 32% 370/1166 [06:46<14:21, 1.08s/it]\u001b[A\n",
"Iteration: 32% 371/1166 [06:47<14:24, 1.09s/it]\u001b[A\n",
"Iteration: 32% 372/1166 [06:49<14:21, 1.09s/it]\u001b[A\n",
"Iteration: 32% 373/1166 [06:50<14:19, 1.08s/it]\u001b[A\n",
"Iteration: 32% 374/1166 [06:51<14:17, 1.08s/it]\u001b[A\n",
"Iteration: 32% 375/1166 [06:52<14:14, 1.08s/it]\u001b[A\n",
"Iteration: 32% 376/1166 [06:53<14:12, 1.08s/it]\u001b[A\n",
"Iteration: 32% 377/1166 [06:54<14:13, 1.08s/it]\u001b[A\n",
"Iteration: 32% 378/1166 [06:55<14:12, 1.08s/it]\u001b[A\n",
"Iteration: 33% 379/1166 [06:56<14:10, 1.08s/it]\u001b[A\n",
"Iteration: 33% 380/1166 [06:57<14:09, 1.08s/it]\u001b[A\n",
"Iteration: 33% 381/1166 [06:58<14:10, 1.08s/it]\u001b[A\n",
"Iteration: 33% 382/1166 [06:59<14:08, 1.08s/it]\u001b[A\n",
"Iteration: 33% 383/1166 [07:00<14:06, 1.08s/it]\u001b[A\n",
"Iteration: 33% 384/1166 [07:01<14:07, 1.08s/it]\u001b[A\n",
"Iteration: 33% 385/1166 [07:03<14:04, 1.08s/it]\u001b[A\n",
"Iteration: 33% 386/1166 [07:04<14:05, 1.08s/it]\u001b[A\n",
"Iteration: 33% 387/1166 [07:05<14:04, 1.08s/it]\u001b[A\n",
"Iteration: 33% 388/1166 [07:06<14:04, 1.09s/it]\u001b[A\n",
"Iteration: 33% 389/1166 [07:07<14:04, 1.09s/it]\u001b[A\n",
"Iteration: 33% 390/1166 [07:08<14:01, 1.08s/it]\u001b[A\n",
"Iteration: 34% 391/1166 [07:09<13:59, 1.08s/it]\u001b[A\n",
"Iteration: 34% 392/1166 [07:10<13:56, 1.08s/it]\u001b[A\n",
"Iteration: 34% 393/1166 [07:11<13:55, 1.08s/it]\u001b[A\n",
"Iteration: 34% 394/1166 [07:12<13:54, 1.08s/it]\u001b[A\n",
"Iteration: 34% 395/1166 [07:13<13:53, 1.08s/it]\u001b[A\n",
"Iteration: 34% 396/1166 [07:14<13:52, 1.08s/it]\u001b[A\n",
"Iteration: 34% 397/1166 [07:16<13:49, 1.08s/it]\u001b[A\n",
"Iteration: 34% 398/1166 [07:17<13:49, 1.08s/it]\u001b[A\n",
"Iteration: 34% 399/1166 [07:18<13:50, 1.08s/it]\u001b[A10/29/2019 16:18:07 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-400/config.json\n",
"10/29/2019 16:18:07 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-400/pytorch_model.bin\n",
"10/29/2019 16:18:07 - INFO - __main__ - Saving model checkpoint to output/checkpoint-400\n",
"\n",
"Iteration: 34% 400/1166 [07:20<17:09, 1.34s/it]\u001b[A\n",
"Iteration: 34% 401/1166 [07:21<16:05, 1.26s/it]\u001b[A\n",
"Iteration: 34% 402/1166 [07:22<15:22, 1.21s/it]\u001b[A\n",
"Iteration: 35% 403/1166 [07:23<14:53, 1.17s/it]\u001b[A\n",
"Iteration: 35% 404/1166 [07:24<14:33, 1.15s/it]\u001b[A\n",
"Iteration: 35% 405/1166 [07:25<14:16, 1.13s/it]\u001b[A\n",
"Iteration: 35% 406/1166 [07:26<14:05, 1.11s/it]\u001b[A\n",
"Iteration: 35% 407/1166 [07:27<13:56, 1.10s/it]\u001b[A\n",
"Iteration: 35% 408/1166 [07:28<13:49, 1.09s/it]\u001b[A\n",
"Iteration: 35% 409/1166 [07:29<13:44, 1.09s/it]\u001b[A\n",
"Iteration: 35% 410/1166 [07:31<13:46, 1.09s/it]\u001b[A\n",
"Iteration: 35% 411/1166 [07:32<13:42, 1.09s/it]\u001b[A\n",
"Iteration: 35% 412/1166 [07:33<13:38, 1.09s/it]\u001b[A\n",
"Iteration: 35% 413/1166 [07:34<13:36, 1.08s/it]\u001b[A\n",
"Iteration: 36% 414/1166 [07:35<13:34, 1.08s/it]\u001b[A\n",
"Iteration: 36% 415/1166 [07:36<13:31, 1.08s/it]\u001b[A\n",
"Iteration: 36% 416/1166 [07:37<13:29, 1.08s/it]\u001b[A\n",
"Iteration: 36% 417/1166 [07:38<13:30, 1.08s/it]\u001b[A\n",
"Iteration: 36% 418/1166 [07:39<13:29, 1.08s/it]\u001b[A\n",
"Iteration: 36% 419/1166 [07:40<13:31, 1.09s/it]\u001b[A\n",
"Iteration: 36% 420/1166 [07:41<13:29, 1.08s/it]\u001b[A\n",
"Iteration: 36% 421/1166 [07:42<13:27, 1.08s/it]\u001b[A\n",
"Iteration: 36% 422/1166 [07:43<13:27, 1.08s/it]\u001b[A\n",
"Iteration: 36% 423/1166 [07:45<13:23, 1.08s/it]\u001b[A\n",
"Iteration: 36% 424/1166 [07:46<13:21, 1.08s/it]\u001b[A\n",
"Iteration: 36% 425/1166 [07:47<13:19, 1.08s/it]\u001b[A\n",
"Iteration: 37% 426/1166 [07:48<13:20, 1.08s/it]\u001b[A\n",
"Iteration: 37% 427/1166 [07:49<13:19, 1.08s/it]\u001b[A\n",
"Iteration: 37% 428/1166 [07:50<13:18, 1.08s/it]\u001b[A\n",
"Iteration: 37% 429/1166 [07:51<13:15, 1.08s/it]\u001b[A\n",
"Iteration: 37% 430/1166 [07:52<13:14, 1.08s/it]\u001b[A\n",
"Iteration: 37% 431/1166 [07:53<13:15, 1.08s/it]\u001b[A\n",
"Iteration: 37% 432/1166 [07:54<13:16, 1.08s/it]\u001b[A\n",
"Iteration: 37% 433/1166 [07:55<13:14, 1.08s/it]\u001b[A\n",
"Iteration: 37% 434/1166 [07:56<13:12, 1.08s/it]\u001b[A\n",
"Iteration: 37% 435/1166 [07:58<13:13, 1.09s/it]\u001b[A\n",
"Iteration: 37% 436/1166 [07:59<13:10, 1.08s/it]\u001b[A\n",
"Iteration: 37% 437/1166 [08:00<13:10, 1.08s/it]\u001b[A\n",
"Iteration: 38% 438/1166 [08:01<13:07, 1.08s/it]\u001b[A\n",
"Iteration: 38% 439/1166 [08:02<13:06, 1.08s/it]\u001b[A\n",
"Iteration: 38% 440/1166 [08:03<13:04, 1.08s/it]\u001b[A\n",
"Iteration: 38% 441/1166 [08:04<13:07, 1.09s/it]\u001b[A\n",
"Iteration: 38% 442/1166 [08:05<13:04, 1.08s/it]\u001b[A\n",
"Iteration: 38% 443/1166 [08:06<13:01, 1.08s/it]\u001b[A\n",
"Iteration: 38% 444/1166 [08:07<13:02, 1.08s/it]\u001b[A\n",
"Iteration: 38% 445/1166 [08:08<13:01, 1.08s/it]\u001b[A\n",
"Iteration: 38% 446/1166 [08:09<13:01, 1.08s/it]\u001b[A\n",
"Iteration: 38% 447/1166 [08:11<13:00, 1.09s/it]\u001b[A\n",
"Iteration: 38% 448/1166 [08:12<12:58, 1.08s/it]\u001b[A\n",
"Iteration: 39% 449/1166 [08:13<12:57, 1.08s/it]\u001b[A10/29/2019 16:19:02 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-450/config.json\n",
"10/29/2019 16:19:03 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-450/pytorch_model.bin\n",
"10/29/2019 16:19:03 - INFO - __main__ - Saving model checkpoint to output/checkpoint-450\n",
"\n",
"Iteration: 39% 450/1166 [08:15<16:39, 1.40s/it]\u001b[A\n",
"Iteration: 39% 451/1166 [08:16<15:28, 1.30s/it]\u001b[A\n",
"Iteration: 39% 452/1166 [08:17<14:40, 1.23s/it]\u001b[A\n",
"Iteration: 39% 453/1166 [08:18<14:06, 1.19s/it]\u001b[A\n",
"Iteration: 39% 454/1166 [08:19<13:40, 1.15s/it]\u001b[A\n",
"Iteration: 39% 455/1166 [08:20<13:25, 1.13s/it]\u001b[A\n",
"Iteration: 39% 456/1166 [08:21<13:15, 1.12s/it]\u001b[A\n",
"Iteration: 39% 457/1166 [08:22<13:05, 1.11s/it]\u001b[A\n",
"Iteration: 39% 458/1166 [08:23<12:58, 1.10s/it]\u001b[A\n",
"Iteration: 39% 459/1166 [08:25<12:55, 1.10s/it]\u001b[A\n",
"Iteration: 39% 460/1166 [08:26<12:49, 1.09s/it]\u001b[A\n",
"Iteration: 40% 461/1166 [08:27<12:46, 1.09s/it]\u001b[A\n",
"Iteration: 40% 462/1166 [08:28<12:45, 1.09s/it]\u001b[A\n",
"Iteration: 40% 463/1166 [08:29<12:45, 1.09s/it]\u001b[A\n",
"Iteration: 40% 464/1166 [08:30<12:43, 1.09s/it]\u001b[A\n",
"Iteration: 40% 465/1166 [08:31<12:43, 1.09s/it]\u001b[A\n",
"Iteration: 40% 466/1166 [08:32<12:41, 1.09s/it]\u001b[A\n",
"Iteration: 40% 467/1166 [08:33<12:40, 1.09s/it]\u001b[A\n",
"Iteration: 40% 468/1166 [08:34<12:38, 1.09s/it]\u001b[A\n",
"Iteration: 40% 469/1166 [08:35<12:36, 1.09s/it]\u001b[A\n",
"Iteration: 40% 470/1166 [08:37<12:33, 1.08s/it]\u001b[A\n",
"Iteration: 40% 471/1166 [08:38<12:33, 1.08s/it]\u001b[A\n",
"Iteration: 40% 472/1166 [08:39<12:33, 1.09s/it]\u001b[A\n",
"Iteration: 41% 473/1166 [08:40<12:31, 1.08s/it]\u001b[A\n",
"Iteration: 41% 474/1166 [08:41<12:29, 1.08s/it]\u001b[A\n",
"Iteration: 41% 475/1166 [08:42<12:29, 1.08s/it]\u001b[A\n",
"Iteration: 41% 476/1166 [08:43<12:28, 1.08s/it]\u001b[A\n",
"Iteration: 41% 477/1166 [08:44<12:25, 1.08s/it]\u001b[A\n",
"Iteration: 41% 478/1166 [08:45<12:24, 1.08s/it]\u001b[A\n",
"Iteration: 41% 479/1166 [08:46<12:23, 1.08s/it]\u001b[A\n",
"Iteration: 41% 480/1166 [08:47<12:23, 1.08s/it]\u001b[A\n",
"Iteration: 41% 481/1166 [08:48<12:24, 1.09s/it]\u001b[A\n",
"Iteration: 41% 482/1166 [08:50<12:22, 1.09s/it]\u001b[A\n",
"Iteration: 41% 483/1166 [08:51<12:21, 1.09s/it]\u001b[A\n",
"Iteration: 42% 484/1166 [08:52<12:18, 1.08s/it]\u001b[A\n",
"Iteration: 42% 485/1166 [08:53<12:16, 1.08s/it]\u001b[A\n",
"Iteration: 42% 486/1166 [08:54<12:15, 1.08s/it]\u001b[A\n",
"Iteration: 42% 487/1166 [08:55<12:15, 1.08s/it]\u001b[A\n",
"Iteration: 42% 488/1166 [08:56<12:14, 1.08s/it]\u001b[A\n",
"Iteration: 42% 489/1166 [08:57<12:14, 1.09s/it]\u001b[A\n",
"Iteration: 42% 490/1166 [08:58<12:13, 1.08s/it]\u001b[A\n",
"Iteration: 42% 491/1166 [08:59<12:10, 1.08s/it]\u001b[A\n",
"Iteration: 42% 492/1166 [09:00<12:10, 1.08s/it]\u001b[A\n",
"Iteration: 42% 493/1166 [09:01<12:09, 1.08s/it]\u001b[A\n",
"Iteration: 42% 494/1166 [09:03<12:05, 1.08s/it]\u001b[A\n",
"Iteration: 42% 495/1166 [09:04<12:05, 1.08s/it]\u001b[A\n",
"Iteration: 43% 496/1166 [09:05<12:05, 1.08s/it]\u001b[A\n",
"Iteration: 43% 497/1166 [09:06<12:03, 1.08s/it]\u001b[A\n",
"Iteration: 43% 498/1166 [09:07<12:03, 1.08s/it]\u001b[A\n",
"Iteration: 43% 499/1166 [09:08<12:01, 1.08s/it]\u001b[A10/29/2019 16:19:57 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-500/config.json\n",
"10/29/2019 16:19:58 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-500/pytorch_model.bin\n",
"10/29/2019 16:19:58 - INFO - __main__ - Saving model checkpoint to output/checkpoint-500\n",
"\n",
"Iteration: 43% 500/1166 [09:10<15:12, 1.37s/it]\u001b[A\n",
"Iteration: 43% 501/1166 [09:11<14:13, 1.28s/it]\u001b[A\n",
"Iteration: 43% 502/1166 [09:12<13:30, 1.22s/it]\u001b[A\n",
"Iteration: 43% 503/1166 [09:13<13:03, 1.18s/it]\u001b[A\n",
"Iteration: 43% 504/1166 [09:14<12:42, 1.15s/it]\u001b[A\n",
"Iteration: 43% 505/1166 [09:15<12:28, 1.13s/it]\u001b[A\n",
"Iteration: 43% 506/1166 [09:16<12:17, 1.12s/it]\u001b[A\n",
"Iteration: 43% 507/1166 [09:18<12:07, 1.10s/it]\u001b[A\n",
"Iteration: 44% 508/1166 [09:19<12:01, 1.10s/it]\u001b[A\n",
"Iteration: 44% 509/1166 [09:20<11:58, 1.09s/it]\u001b[A\n",
"Iteration: 44% 510/1166 [09:21<11:56, 1.09s/it]\u001b[A\n",
"Iteration: 44% 511/1166 [09:22<11:55, 1.09s/it]\u001b[A\n",
"Iteration: 44% 512/1166 [09:23<11:52, 1.09s/it]\u001b[A\n",
"Iteration: 44% 513/1166 [09:24<11:49, 1.09s/it]\u001b[A\n",
"Iteration: 44% 514/1166 [09:25<11:48, 1.09s/it]\u001b[A\n",
"Iteration: 44% 515/1166 [09:26<11:44, 1.08s/it]\u001b[A\n",
"Iteration: 44% 516/1166 [09:27<11:42, 1.08s/it]\u001b[A\n",
"Iteration: 44% 517/1166 [09:28<11:42, 1.08s/it]\u001b[A\n",
"Iteration: 44% 518/1166 [09:29<11:43, 1.09s/it]\u001b[A\n",
"Iteration: 45% 519/1166 [09:31<11:41, 1.08s/it]\u001b[A\n",
"Iteration: 45% 520/1166 [09:32<11:40, 1.08s/it]\u001b[A\n",
"Iteration: 45% 521/1166 [09:33<11:38, 1.08s/it]\u001b[A\n",
"Iteration: 45% 522/1166 [09:34<11:36, 1.08s/it]\u001b[A\n",
"Iteration: 45% 523/1166 [09:35<11:34, 1.08s/it]\u001b[A\n",
"Iteration: 45% 524/1166 [09:36<11:35, 1.08s/it]\u001b[A\n",
"Iteration: 45% 525/1166 [09:37<11:34, 1.08s/it]\u001b[A\n",
"Iteration: 45% 526/1166 [09:38<11:32, 1.08s/it]\u001b[A\n",
"Iteration: 45% 527/1166 [09:39<11:31, 1.08s/it]\u001b[A\n",
"Iteration: 45% 528/1166 [09:40<11:31, 1.08s/it]\u001b[A\n",
"Iteration: 45% 529/1166 [09:41<11:30, 1.08s/it]\u001b[A\n",
"Iteration: 45% 530/1166 [09:42<11:29, 1.08s/it]\u001b[A\n",
"Iteration: 46% 531/1166 [09:44<11:27, 1.08s/it]\u001b[A\n",
"Iteration: 46% 532/1166 [09:45<11:26, 1.08s/it]\u001b[A\n",
"Iteration: 46% 533/1166 [09:46<11:25, 1.08s/it]\u001b[A\n",
"Iteration: 46% 534/1166 [09:47<11:24, 1.08s/it]\u001b[A\n",
"Iteration: 46% 535/1166 [09:48<11:23, 1.08s/it]\u001b[A\n",
"Iteration: 46% 536/1166 [09:49<11:22, 1.08s/it]\u001b[A\n",
"Iteration: 46% 537/1166 [09:50<11:20, 1.08s/it]\u001b[A\n",
"Iteration: 46% 538/1166 [09:51<11:18, 1.08s/it]\u001b[A\n",
"Iteration: 46% 539/1166 [09:52<11:17, 1.08s/it]\u001b[A\n",
"Iteration: 46% 540/1166 [09:53<11:17, 1.08s/it]\u001b[A\n",
"Iteration: 46% 541/1166 [09:54<11:15, 1.08s/it]\u001b[A\n",
"Iteration: 46% 542/1166 [09:55<11:15, 1.08s/it]\u001b[A\n",
"Iteration: 47% 543/1166 [09:57<11:13, 1.08s/it]\u001b[A\n",
"Iteration: 47% 544/1166 [09:58<11:12, 1.08s/it]\u001b[A\n",
"Iteration: 47% 545/1166 [09:59<11:12, 1.08s/it]\u001b[A\n",
"Iteration: 47% 546/1166 [10:00<11:10, 1.08s/it]\u001b[A\n",
"Iteration: 47% 547/1166 [10:01<11:07, 1.08s/it]\u001b[A\n",
"Iteration: 47% 548/1166 [10:02<11:05, 1.08s/it]\u001b[A\n",
"Iteration: 47% 549/1166 [10:03<11:07, 1.08s/it]\u001b[A10/29/2019 16:20:52 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-550/config.json\n",
"10/29/2019 16:20:53 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-550/pytorch_model.bin\n",
"10/29/2019 16:20:53 - INFO - __main__ - Saving model checkpoint to output/checkpoint-550\n",
"\n",
"Iteration: 47% 550/1166 [10:06<15:47, 1.54s/it]\u001b[A\n",
"Iteration: 47% 551/1166 [10:07<14:18, 1.40s/it]\u001b[A\n",
"Iteration: 47% 552/1166 [10:08<13:20, 1.30s/it]\u001b[A\n",
"Iteration: 47% 553/1166 [10:09<12:37, 1.24s/it]\u001b[A\n",
"Iteration: 48% 554/1166 [10:10<12:08, 1.19s/it]\u001b[A\n",
"Iteration: 48% 555/1166 [10:11<11:48, 1.16s/it]\u001b[A\n",
"Iteration: 48% 556/1166 [10:12<11:33, 1.14s/it]\u001b[A\n",
"Iteration: 48% 557/1166 [10:13<11:23, 1.12s/it]\u001b[A\n",
"Iteration: 48% 558/1166 [10:14<11:16, 1.11s/it]\u001b[A\n",
"Iteration: 48% 559/1166 [10:15<11:09, 1.10s/it]\u001b[A\n",
"Iteration: 48% 560/1166 [10:16<11:04, 1.10s/it]\u001b[A\n",
"Iteration: 48% 561/1166 [10:17<10:59, 1.09s/it]\u001b[A\n",
"Iteration: 48% 562/1166 [10:19<10:55, 1.09s/it]\u001b[A\n",
"Iteration: 48% 563/1166 [10:20<10:53, 1.08s/it]\u001b[A\n",
"Iteration: 48% 564/1166 [10:21<10:52, 1.08s/it]\u001b[A\n",
"Iteration: 48% 565/1166 [10:22<10:50, 1.08s/it]\u001b[A\n",
"Iteration: 49% 566/1166 [10:23<10:49, 1.08s/it]\u001b[A\n",
"Iteration: 49% 567/1166 [10:24<10:48, 1.08s/it]\u001b[A\n",
"Iteration: 49% 568/1166 [10:25<10:47, 1.08s/it]\u001b[A\n",
"Iteration: 49% 569/1166 [10:26<10:44, 1.08s/it]\u001b[A\n",
"Iteration: 49% 570/1166 [10:27<10:44, 1.08s/it]\u001b[A\n",
"Iteration: 49% 571/1166 [10:28<10:44, 1.08s/it]\u001b[A\n",
"Iteration: 49% 572/1166 [10:29<10:43, 1.08s/it]\u001b[A\n",
"Iteration: 49% 573/1166 [10:30<10:43, 1.09s/it]\u001b[A\n",
"Iteration: 49% 574/1166 [10:32<10:41, 1.08s/it]\u001b[A\n",
"Iteration: 49% 575/1166 [10:33<10:40, 1.08s/it]\u001b[A\n",
"Iteration: 49% 576/1166 [10:34<10:39, 1.08s/it]\u001b[A\n",
"Iteration: 49% 577/1166 [10:35<10:36, 1.08s/it]\u001b[A\n",
"Iteration: 50% 578/1166 [10:36<10:34, 1.08s/it]\u001b[A\n",
"Iteration: 50% 579/1166 [10:37<10:33, 1.08s/it]\u001b[A\n",
"Iteration: 50% 580/1166 [10:38<10:33, 1.08s/it]\u001b[A\n",
"Iteration: 50% 581/1166 [10:39<10:33, 1.08s/it]\u001b[A\n",
"Iteration: 50% 582/1166 [10:40<10:32, 1.08s/it]\u001b[A\n",
"Iteration: 50% 583/1166 [10:41<10:31, 1.08s/it]\u001b[A\n",
"Iteration: 50% 584/1166 [10:42<10:30, 1.08s/it]\u001b[A\n",
"Iteration: 50% 585/1166 [10:43<10:29, 1.08s/it]\u001b[A\n",
"Iteration: 50% 586/1166 [10:45<10:29, 1.08s/it]\u001b[A\n",
"Iteration: 50% 587/1166 [10:46<10:28, 1.09s/it]\u001b[A\n",
"Iteration: 50% 588/1166 [10:47<10:26, 1.08s/it]\u001b[A\n",
"Iteration: 51% 589/1166 [10:48<10:26, 1.09s/it]\u001b[A\n",
"Iteration: 51% 590/1166 [10:49<10:24, 1.08s/it]\u001b[A\n",
"Iteration: 51% 591/1166 [10:50<10:22, 1.08s/it]\u001b[A\n",
"Iteration: 51% 592/1166 [10:51<10:21, 1.08s/it]\u001b[A\n",
"Iteration: 51% 593/1166 [10:52<10:19, 1.08s/it]\u001b[A\n",
"Iteration: 51% 594/1166 [10:53<10:17, 1.08s/it]\u001b[A\n",
"Iteration: 51% 595/1166 [10:54<10:17, 1.08s/it]\u001b[A\n",
"Iteration: 51% 596/1166 [10:55<10:17, 1.08s/it]\u001b[A\n",
"Iteration: 51% 597/1166 [10:56<10:16, 1.08s/it]\u001b[A\n",
"Iteration: 51% 598/1166 [10:58<10:15, 1.08s/it]\u001b[A\n",
"Iteration: 51% 599/1166 [10:59<10:14, 1.08s/it]\u001b[A10/29/2019 16:21:47 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-600/config.json\n",
"10/29/2019 16:21:48 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-600/pytorch_model.bin\n",
"10/29/2019 16:21:48 - INFO - __main__ - Saving model checkpoint to output/checkpoint-600\n",
"\n",
"Iteration: 51% 600/1166 [11:01<12:47, 1.36s/it]\u001b[A\n",
"Iteration: 52% 601/1166 [11:02<11:54, 1.26s/it]\u001b[A\n",
"Iteration: 52% 602/1166 [11:03<11:21, 1.21s/it]\u001b[A\n",
"Iteration: 52% 603/1166 [11:04<11:00, 1.17s/it]\u001b[A\n",
"Iteration: 52% 604/1166 [11:05<10:45, 1.15s/it]\u001b[A\n",
"Iteration: 52% 605/1166 [11:06<10:32, 1.13s/it]\u001b[A\n",
"Iteration: 52% 606/1166 [11:07<10:25, 1.12s/it]\u001b[A\n",
"Iteration: 52% 607/1166 [11:08<10:17, 1.11s/it]\u001b[A\n",
"Iteration: 52% 608/1166 [11:09<10:13, 1.10s/it]\u001b[A\n",
"Iteration: 52% 609/1166 [11:10<10:08, 1.09s/it]\u001b[A\n",
"Iteration: 52% 610/1166 [11:11<10:05, 1.09s/it]\u001b[A\n",
"Iteration: 52% 611/1166 [11:13<10:03, 1.09s/it]\u001b[A\n",
"Iteration: 52% 612/1166 [11:14<10:01, 1.09s/it]\u001b[A\n",
"Iteration: 53% 613/1166 [11:15<09:59, 1.08s/it]\u001b[A\n",
"Iteration: 53% 614/1166 [11:16<09:57, 1.08s/it]\u001b[A\n",
"Iteration: 53% 615/1166 [11:17<09:57, 1.08s/it]\u001b[A\n",
"Iteration: 53% 616/1166 [11:18<09:54, 1.08s/it]\u001b[A\n",
"Iteration: 53% 617/1166 [11:19<09:53, 1.08s/it]\u001b[A\n",
"Iteration: 53% 618/1166 [11:20<09:52, 1.08s/it]\u001b[A\n",
"Iteration: 53% 619/1166 [11:21<09:53, 1.08s/it]\u001b[A\n",
"Iteration: 53% 620/1166 [11:22<09:51, 1.08s/it]\u001b[A\n",
"Iteration: 53% 621/1166 [11:23<09:51, 1.09s/it]\u001b[A\n",
"Iteration: 53% 622/1166 [11:24<09:51, 1.09s/it]\u001b[A\n",
"Iteration: 53% 623/1166 [11:26<09:49, 1.09s/it]\u001b[A\n",
"Iteration: 54% 624/1166 [11:27<09:47, 1.08s/it]\u001b[A\n",
"Iteration: 54% 625/1166 [11:28<09:44, 1.08s/it]\u001b[A\n",
"Iteration: 54% 626/1166 [11:29<09:43, 1.08s/it]\u001b[A\n",
"Iteration: 54% 627/1166 [11:30<09:42, 1.08s/it]\u001b[A\n",
"Iteration: 54% 628/1166 [11:31<09:43, 1.08s/it]\u001b[A\n",
"Iteration: 54% 629/1166 [11:32<09:42, 1.08s/it]\u001b[A\n",
"Iteration: 54% 630/1166 [11:33<09:41, 1.08s/it]\u001b[A\n",
"Iteration: 54% 631/1166 [11:34<09:38, 1.08s/it]\u001b[A\n",
"Iteration: 54% 632/1166 [11:35<09:36, 1.08s/it]\u001b[A\n",
"Iteration: 54% 633/1166 [11:36<09:35, 1.08s/it]\u001b[A\n",
"Iteration: 54% 634/1166 [11:37<09:36, 1.08s/it]\u001b[A\n",
"Iteration: 54% 635/1166 [11:39<09:36, 1.09s/it]\u001b[A\n",
"Iteration: 55% 636/1166 [11:40<09:35, 1.09s/it]\u001b[A\n",
"Iteration: 55% 637/1166 [11:41<09:33, 1.08s/it]\u001b[A\n",
"Iteration: 55% 638/1166 [11:42<09:31, 1.08s/it]\u001b[A\n",
"Iteration: 55% 639/1166 [11:43<09:31, 1.08s/it]\u001b[A\n",
"Iteration: 55% 640/1166 [11:44<09:29, 1.08s/it]\u001b[A\n",
"Iteration: 55% 641/1166 [11:45<09:27, 1.08s/it]\u001b[A\n",
"Iteration: 55% 642/1166 [11:46<09:26, 1.08s/it]\u001b[A\n",
"Iteration: 55% 643/1166 [11:47<09:26, 1.08s/it]\u001b[A\n",
"Iteration: 55% 644/1166 [11:48<09:24, 1.08s/it]\u001b[A\n",
"Iteration: 55% 645/1166 [11:49<09:23, 1.08s/it]\u001b[A\n",
"Iteration: 55% 646/1166 [11:50<09:22, 1.08s/it]\u001b[A\n",
"Iteration: 55% 647/1166 [11:51<09:21, 1.08s/it]\u001b[A\n",
"Iteration: 56% 648/1166 [11:53<09:21, 1.08s/it]\u001b[A\n",
"Iteration: 56% 649/1166 [11:54<09:20, 1.08s/it]\u001b[A10/29/2019 16:22:42 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-650/config.json\n",
"10/29/2019 16:22:43 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-650/pytorch_model.bin\n",
"10/29/2019 16:22:43 - INFO - __main__ - Saving model checkpoint to output/checkpoint-650\n",
"\n",
"Iteration: 56% 650/1166 [11:56<11:41, 1.36s/it]\u001b[A\n",
"Iteration: 56% 651/1166 [11:57<10:53, 1.27s/it]\u001b[A\n",
"Iteration: 56% 652/1166 [11:58<10:21, 1.21s/it]\u001b[A\n",
"Iteration: 56% 653/1166 [11:59<10:02, 1.17s/it]\u001b[A\n",
"Iteration: 56% 654/1166 [12:00<09:46, 1.15s/it]\u001b[A\n",
"Iteration: 56% 655/1166 [12:01<09:34, 1.12s/it]\u001b[A\n",
"Iteration: 56% 656/1166 [12:02<09:28, 1.11s/it]\u001b[A\n",
"Iteration: 56% 657/1166 [12:03<09:21, 1.10s/it]\u001b[A\n",
"Iteration: 56% 658/1166 [12:04<09:16, 1.10s/it]\u001b[A\n",
"Iteration: 57% 659/1166 [12:05<09:13, 1.09s/it]\u001b[A\n",
"Iteration: 57% 660/1166 [12:06<09:11, 1.09s/it]\u001b[A\n",
"Iteration: 57% 661/1166 [12:08<09:09, 1.09s/it]\u001b[A\n",
"Iteration: 57% 662/1166 [12:09<09:06, 1.08s/it]\u001b[A\n",
"Iteration: 57% 663/1166 [12:10<09:05, 1.09s/it]\u001b[A\n",
"Iteration: 57% 664/1166 [12:11<09:02, 1.08s/it]\u001b[A\n",
"Iteration: 57% 665/1166 [12:12<09:01, 1.08s/it]\u001b[A\n",
"Iteration: 57% 666/1166 [12:13<09:01, 1.08s/it]\u001b[A\n",
"Iteration: 57% 667/1166 [12:14<09:00, 1.08s/it]\u001b[A\n",
"Iteration: 57% 668/1166 [12:15<08:58, 1.08s/it]\u001b[A\n",
"Iteration: 57% 669/1166 [12:16<08:57, 1.08s/it]\u001b[A\n",
"Iteration: 57% 670/1166 [12:17<08:57, 1.08s/it]\u001b[A\n",
"Iteration: 58% 671/1166 [12:18<08:56, 1.08s/it]\u001b[A\n",
"Iteration: 58% 672/1166 [12:19<08:55, 1.08s/it]\u001b[A\n",
"Iteration: 58% 673/1166 [12:21<08:53, 1.08s/it]\u001b[A\n",
"Iteration: 58% 674/1166 [12:22<08:51, 1.08s/it]\u001b[A\n",
"Iteration: 58% 675/1166 [12:23<08:50, 1.08s/it]\u001b[A\n",
"Iteration: 58% 676/1166 [12:24<08:50, 1.08s/it]\u001b[A\n",
"Iteration: 58% 677/1166 [12:25<08:50, 1.08s/it]\u001b[A\n",
"Iteration: 58% 678/1166 [12:26<08:49, 1.08s/it]\u001b[A\n",
"Iteration: 58% 679/1166 [12:27<08:47, 1.08s/it]\u001b[A\n",
"Iteration: 58% 680/1166 [12:28<08:45, 1.08s/it]\u001b[A\n",
"Iteration: 58% 681/1166 [12:29<08:45, 1.08s/it]\u001b[A\n",
"Iteration: 58% 682/1166 [12:30<08:45, 1.08s/it]\u001b[A\n",
"Iteration: 59% 683/1166 [12:31<08:43, 1.08s/it]\u001b[A\n",
"Iteration: 59% 684/1166 [12:32<08:43, 1.09s/it]\u001b[A\n",
"Iteration: 59% 685/1166 [12:34<08:42, 1.09s/it]\u001b[A\n",
"Iteration: 59% 686/1166 [12:35<08:40, 1.08s/it]\u001b[A\n",
"Iteration: 59% 687/1166 [12:36<08:38, 1.08s/it]\u001b[A\n",
"Iteration: 59% 688/1166 [12:37<08:36, 1.08s/it]\u001b[A\n",
"Iteration: 59% 689/1166 [12:38<08:35, 1.08s/it]\u001b[A\n",
"Iteration: 59% 690/1166 [12:39<08:34, 1.08s/it]\u001b[A\n",
"Iteration: 59% 691/1166 [12:40<08:33, 1.08s/it]\u001b[A\n",
"Iteration: 59% 692/1166 [12:41<08:32, 1.08s/it]\u001b[A\n",
"Iteration: 59% 693/1166 [12:42<08:31, 1.08s/it]\u001b[A\n",
"Iteration: 60% 694/1166 [12:43<08:30, 1.08s/it]\u001b[A\n",
"Iteration: 60% 695/1166 [12:44<08:28, 1.08s/it]\u001b[A\n",
"Iteration: 60% 696/1166 [12:45<08:27, 1.08s/it]\u001b[A\n",
"Iteration: 60% 697/1166 [12:46<08:27, 1.08s/it]\u001b[A\n",
"Iteration: 60% 698/1166 [12:48<08:25, 1.08s/it]\u001b[A\n",
"Iteration: 60% 699/1166 [12:49<08:23, 1.08s/it]\u001b[A10/29/2019 16:23:37 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-700/config.json\n",
"10/29/2019 16:23:38 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-700/pytorch_model.bin\n",
"10/29/2019 16:23:38 - INFO - __main__ - Saving model checkpoint to output/checkpoint-700\n",
"\n",
"Iteration: 60% 700/1166 [12:51<10:32, 1.36s/it]\u001b[A\n",
"Iteration: 60% 701/1166 [12:52<09:52, 1.27s/it]\u001b[A\n",
"Iteration: 60% 702/1166 [12:53<09:24, 1.22s/it]\u001b[A\n",
"Iteration: 60% 703/1166 [12:54<09:03, 1.17s/it]\u001b[A\n",
"Iteration: 60% 704/1166 [12:55<08:49, 1.15s/it]\u001b[A\n",
"Iteration: 60% 705/1166 [12:56<08:38, 1.13s/it]\u001b[A\n",
"Iteration: 61% 706/1166 [12:57<08:30, 1.11s/it]\u001b[A\n",
"Iteration: 61% 707/1166 [12:58<08:25, 1.10s/it]\u001b[A\n",
"Iteration: 61% 708/1166 [12:59<08:22, 1.10s/it]\u001b[A\n",
"Iteration: 61% 709/1166 [13:00<08:18, 1.09s/it]\u001b[A\n",
"Iteration: 61% 710/1166 [13:01<08:16, 1.09s/it]\u001b[A\n",
"Iteration: 61% 711/1166 [13:03<08:15, 1.09s/it]\u001b[A\n",
"Iteration: 61% 712/1166 [13:04<08:12, 1.08s/it]\u001b[A\n",
"Iteration: 61% 713/1166 [13:05<08:11, 1.09s/it]\u001b[A\n",
"Iteration: 61% 714/1166 [13:06<08:11, 1.09s/it]\u001b[A\n",
"Iteration: 61% 715/1166 [13:07<08:08, 1.08s/it]\u001b[A\n",
"Iteration: 61% 716/1166 [13:08<08:07, 1.08s/it]\u001b[A\n",
"Iteration: 61% 717/1166 [13:09<08:07, 1.09s/it]\u001b[A\n",
"Iteration: 62% 718/1166 [13:10<08:05, 1.08s/it]\u001b[A\n",
"Iteration: 62% 719/1166 [13:11<08:04, 1.08s/it]\u001b[A\n",
"Iteration: 62% 720/1166 [13:12<08:02, 1.08s/it]\u001b[A\n",
"Iteration: 62% 721/1166 [13:13<08:00, 1.08s/it]\u001b[A\n",
"Iteration: 62% 722/1166 [13:14<07:58, 1.08s/it]\u001b[A\n",
"Iteration: 62% 723/1166 [13:16<07:58, 1.08s/it]\u001b[A\n",
"Iteration: 62% 724/1166 [13:17<07:58, 1.08s/it]\u001b[A\n",
"Iteration: 62% 725/1166 [13:18<07:57, 1.08s/it]\u001b[A\n",
"Iteration: 62% 726/1166 [13:19<07:56, 1.08s/it]\u001b[A\n",
"Iteration: 62% 727/1166 [13:20<07:55, 1.08s/it]\u001b[A\n",
"Iteration: 62% 728/1166 [13:21<07:54, 1.08s/it]\u001b[A\n",
"Iteration: 63% 729/1166 [13:22<07:52, 1.08s/it]\u001b[A\n",
"Iteration: 63% 730/1166 [13:23<07:52, 1.08s/it]\u001b[A\n",
"Iteration: 63% 731/1166 [13:24<07:51, 1.08s/it]\u001b[A\n",
"Iteration: 63% 732/1166 [13:25<07:51, 1.09s/it]\u001b[A\n",
"Iteration: 63% 733/1166 [13:26<07:50, 1.09s/it]\u001b[A\n",
"Iteration: 63% 734/1166 [13:27<07:48, 1.08s/it]\u001b[A\n",
"Iteration: 63% 735/1166 [13:29<07:47, 1.09s/it]\u001b[A\n",
"Iteration: 63% 736/1166 [13:30<07:45, 1.08s/it]\u001b[A\n",
"Iteration: 63% 737/1166 [13:31<07:43, 1.08s/it]\u001b[A\n",
"Iteration: 63% 738/1166 [13:32<07:42, 1.08s/it]\u001b[A\n",
"Iteration: 63% 739/1166 [13:33<07:43, 1.08s/it]\u001b[A\n",
"Iteration: 63% 740/1166 [13:34<07:40, 1.08s/it]\u001b[A\n",
"Iteration: 64% 741/1166 [13:35<07:40, 1.08s/it]\u001b[A\n",
"Iteration: 64% 742/1166 [13:36<07:39, 1.08s/it]\u001b[A\n",
"Iteration: 64% 743/1166 [13:37<07:37, 1.08s/it]\u001b[A\n",
"Iteration: 64% 744/1166 [13:38<07:36, 1.08s/it]\u001b[A\n",
"Iteration: 64% 745/1166 [13:39<07:36, 1.08s/it]\u001b[A\n",
"Iteration: 64% 746/1166 [13:40<07:35, 1.08s/it]\u001b[A\n",
"Iteration: 64% 747/1166 [13:42<07:33, 1.08s/it]\u001b[A\n",
"Iteration: 64% 748/1166 [13:43<07:33, 1.08s/it]\u001b[A\n",
"Iteration: 64% 749/1166 [13:44<07:31, 1.08s/it]\u001b[A10/29/2019 16:24:32 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-750/config.json\n",
"10/29/2019 16:24:33 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-750/pytorch_model.bin\n",
"10/29/2019 16:24:33 - INFO - __main__ - Saving model checkpoint to output/checkpoint-750\n",
"\n",
"Iteration: 64% 750/1166 [13:46<09:29, 1.37s/it]\u001b[A\n",
"Iteration: 64% 751/1166 [13:47<08:50, 1.28s/it]\u001b[A\n",
"Iteration: 64% 752/1166 [13:48<08:25, 1.22s/it]\u001b[A\n",
"Iteration: 65% 753/1166 [13:49<08:06, 1.18s/it]\u001b[A\n",
"Iteration: 65% 754/1166 [13:50<07:53, 1.15s/it]\u001b[A\n",
"Iteration: 65% 755/1166 [13:51<07:43, 1.13s/it]\u001b[A\n",
"Iteration: 65% 756/1166 [13:52<07:37, 1.12s/it]\u001b[A\n",
"Iteration: 65% 757/1166 [13:53<07:32, 1.11s/it]\u001b[A\n",
"Iteration: 65% 758/1166 [13:54<07:27, 1.10s/it]\u001b[A\n",
"Iteration: 65% 759/1166 [13:55<07:23, 1.09s/it]\u001b[A\n",
"Iteration: 65% 760/1166 [13:57<07:22, 1.09s/it]\u001b[A\n",
"Iteration: 65% 761/1166 [13:58<07:21, 1.09s/it]\u001b[A\n",
"Iteration: 65% 762/1166 [13:59<07:20, 1.09s/it]\u001b[A\n",
"Iteration: 65% 763/1166 [14:00<07:18, 1.09s/it]\u001b[A\n",
"Iteration: 66% 764/1166 [14:01<07:17, 1.09s/it]\u001b[A\n",
"Iteration: 66% 765/1166 [14:02<07:15, 1.08s/it]\u001b[A\n",
"Iteration: 66% 766/1166 [14:03<07:14, 1.09s/it]\u001b[A\n",
"Iteration: 66% 767/1166 [14:04<07:12, 1.08s/it]\u001b[A\n",
"Iteration: 66% 768/1166 [14:05<07:10, 1.08s/it]\u001b[A\n",
"Iteration: 66% 769/1166 [14:06<07:09, 1.08s/it]\u001b[A\n",
"Iteration: 66% 770/1166 [14:07<07:08, 1.08s/it]\u001b[A\n",
"Iteration: 66% 771/1166 [14:08<07:08, 1.09s/it]\u001b[A\n",
"Iteration: 66% 772/1166 [14:10<07:06, 1.08s/it]\u001b[A\n",
"Iteration: 66% 773/1166 [14:11<07:06, 1.09s/it]\u001b[A\n",
"Iteration: 66% 774/1166 [14:12<07:04, 1.08s/it]\u001b[A\n",
"Iteration: 66% 775/1166 [14:13<07:02, 1.08s/it]\u001b[A\n",
"Iteration: 67% 776/1166 [14:14<07:00, 1.08s/it]\u001b[A\n",
"Iteration: 67% 777/1166 [14:15<07:00, 1.08s/it]\u001b[A\n",
"Iteration: 67% 778/1166 [14:16<07:00, 1.08s/it]\u001b[A\n",
"Iteration: 67% 779/1166 [14:17<06:59, 1.09s/it]\u001b[A\n",
"Iteration: 67% 780/1166 [14:18<06:58, 1.08s/it]\u001b[A\n",
"Iteration: 67% 781/1166 [14:19<06:55, 1.08s/it]\u001b[A\n",
"Iteration: 67% 782/1166 [14:20<06:55, 1.08s/it]\u001b[A\n",
"Iteration: 67% 783/1166 [14:21<06:54, 1.08s/it]\u001b[A\n",
"Iteration: 67% 784/1166 [14:23<06:52, 1.08s/it]\u001b[A\n",
"Iteration: 67% 785/1166 [14:24<06:51, 1.08s/it]\u001b[A\n",
"Iteration: 67% 786/1166 [14:25<06:51, 1.08s/it]\u001b[A\n",
"Iteration: 67% 787/1166 [14:26<06:50, 1.08s/it]\u001b[A\n",
"Iteration: 68% 788/1166 [14:27<06:49, 1.08s/it]\u001b[A\n",
"Iteration: 68% 789/1166 [14:28<06:48, 1.08s/it]\u001b[A\n",
"Iteration: 68% 790/1166 [14:29<06:46, 1.08s/it]\u001b[A\n",
"Iteration: 68% 791/1166 [14:30<06:45, 1.08s/it]\u001b[A\n",
"Iteration: 68% 792/1166 [14:31<06:45, 1.08s/it]\u001b[A\n",
"Iteration: 68% 793/1166 [14:32<06:44, 1.08s/it]\u001b[A\n",
"Iteration: 68% 794/1166 [14:33<06:43, 1.08s/it]\u001b[A\n",
"Iteration: 68% 795/1166 [14:34<06:42, 1.09s/it]\u001b[A\n",
"Iteration: 68% 796/1166 [14:36<06:40, 1.08s/it]\u001b[A\n",
"Iteration: 68% 797/1166 [14:37<06:40, 1.08s/it]\u001b[A\n",
"Iteration: 68% 798/1166 [14:38<06:39, 1.08s/it]\u001b[A\n",
"Iteration: 69% 799/1166 [14:39<06:36, 1.08s/it]\u001b[A10/29/2019 16:25:28 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-800/config.json\n",
"10/29/2019 16:25:29 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-800/pytorch_model.bin\n",
"10/29/2019 16:25:29 - INFO - __main__ - Saving model checkpoint to output/checkpoint-800\n",
"\n",
"Iteration: 69% 800/1166 [14:41<08:26, 1.38s/it]\u001b[A\n",
"Iteration: 69% 801/1166 [14:42<07:51, 1.29s/it]\u001b[A\n",
"Iteration: 69% 802/1166 [14:43<07:26, 1.23s/it]\u001b[A\n",
"Iteration: 69% 803/1166 [14:44<07:09, 1.18s/it]\u001b[A\n",
"Iteration: 69% 804/1166 [14:45<06:56, 1.15s/it]\u001b[A\n",
"Iteration: 69% 805/1166 [14:46<06:47, 1.13s/it]\u001b[A\n",
"Iteration: 69% 806/1166 [14:47<06:41, 1.11s/it]\u001b[A\n",
"Iteration: 69% 807/1166 [14:48<06:36, 1.10s/it]\u001b[A\n",
"Iteration: 69% 808/1166 [14:49<06:33, 1.10s/it]\u001b[A\n",
"Iteration: 69% 809/1166 [14:51<06:30, 1.09s/it]\u001b[A\n",
"Iteration: 69% 810/1166 [14:52<06:29, 1.09s/it]\u001b[A\n",
"Iteration: 70% 811/1166 [14:53<06:27, 1.09s/it]\u001b[A\n",
"Iteration: 70% 812/1166 [14:54<06:24, 1.09s/it]\u001b[A\n",
"Iteration: 70% 813/1166 [14:55<06:23, 1.09s/it]\u001b[A\n",
"Iteration: 70% 814/1166 [14:56<06:21, 1.08s/it]\u001b[A\n",
"Iteration: 70% 815/1166 [14:57<06:19, 1.08s/it]\u001b[A\n",
"Iteration: 70% 816/1166 [14:58<06:18, 1.08s/it]\u001b[A\n",
"Iteration: 70% 817/1166 [14:59<06:18, 1.09s/it]\u001b[A\n",
"Iteration: 70% 818/1166 [15:00<06:17, 1.09s/it]\u001b[A\n",
"Iteration: 70% 819/1166 [15:01<06:17, 1.09s/it]\u001b[A\n",
"Iteration: 70% 820/1166 [15:03<06:15, 1.09s/it]\u001b[A\n",
"Iteration: 70% 821/1166 [15:04<06:13, 1.08s/it]\u001b[A\n",
"Iteration: 70% 822/1166 [15:05<06:11, 1.08s/it]\u001b[A\n",
"Iteration: 71% 823/1166 [15:06<06:10, 1.08s/it]\u001b[A\n",
"Iteration: 71% 824/1166 [15:07<06:10, 1.08s/it]\u001b[A\n",
"Iteration: 71% 825/1166 [15:08<06:10, 1.09s/it]\u001b[A\n",
"Iteration: 71% 826/1166 [15:09<06:08, 1.08s/it]\u001b[A\n",
"Iteration: 71% 827/1166 [15:10<06:07, 1.08s/it]\u001b[A\n",
"Iteration: 71% 828/1166 [15:11<06:05, 1.08s/it]\u001b[A\n",
"Iteration: 71% 829/1166 [15:12<06:04, 1.08s/it]\u001b[A\n",
"Iteration: 71% 830/1166 [15:13<06:02, 1.08s/it]\u001b[A\n",
"Iteration: 71% 831/1166 [15:14<06:01, 1.08s/it]\u001b[A\n",
"Iteration: 71% 832/1166 [15:15<06:00, 1.08s/it]\u001b[A\n",
"Iteration: 71% 833/1166 [15:17<06:00, 1.08s/it]\u001b[A\n",
"Iteration: 72% 834/1166 [15:18<06:00, 1.08s/it]\u001b[A\n",
"Iteration: 72% 835/1166 [15:19<05:58, 1.08s/it]\u001b[A\n",
"Iteration: 72% 836/1166 [15:20<05:56, 1.08s/it]\u001b[A\n",
"Iteration: 72% 837/1166 [15:21<05:56, 1.08s/it]\u001b[A\n",
"Iteration: 72% 838/1166 [15:22<05:55, 1.08s/it]\u001b[A\n",
"Iteration: 72% 839/1166 [15:23<05:54, 1.08s/it]\u001b[A\n",
"Iteration: 72% 840/1166 [15:24<05:52, 1.08s/it]\u001b[A\n",
"Iteration: 72% 841/1166 [15:25<05:52, 1.08s/it]\u001b[A\n",
"Iteration: 72% 842/1166 [15:26<05:51, 1.09s/it]\u001b[A\n",
"Iteration: 72% 843/1166 [15:27<05:50, 1.09s/it]\u001b[A\n",
"Iteration: 72% 844/1166 [15:29<05:49, 1.09s/it]\u001b[A\n",
"Iteration: 72% 845/1166 [15:30<05:48, 1.08s/it]\u001b[A\n",
"Iteration: 73% 846/1166 [15:31<05:46, 1.08s/it]\u001b[A\n",
"Iteration: 73% 847/1166 [15:32<05:46, 1.09s/it]\u001b[A\n",
"Iteration: 73% 848/1166 [15:33<05:44, 1.08s/it]\u001b[A\n",
"Iteration: 73% 849/1166 [15:34<05:42, 1.08s/it]\u001b[A10/29/2019 16:26:23 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-850/config.json\n",
"10/29/2019 16:26:24 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-850/pytorch_model.bin\n",
"10/29/2019 16:26:24 - INFO - __main__ - Saving model checkpoint to output/checkpoint-850\n",
"\n",
"Iteration: 73% 850/1166 [15:36<07:21, 1.40s/it]\u001b[A\n",
"Iteration: 73% 851/1166 [15:37<06:48, 1.30s/it]\u001b[A\n",
"Iteration: 73% 852/1166 [15:38<06:27, 1.23s/it]\u001b[A\n",
"Iteration: 73% 853/1166 [15:39<06:12, 1.19s/it]\u001b[A\n",
"Iteration: 73% 854/1166 [15:40<06:02, 1.16s/it]\u001b[A\n",
"Iteration: 73% 855/1166 [15:41<05:53, 1.14s/it]\u001b[A\n",
"Iteration: 73% 856/1166 [15:43<05:47, 1.12s/it]\u001b[A\n",
"Iteration: 73% 857/1166 [15:44<05:42, 1.11s/it]\u001b[A\n",
"Iteration: 74% 858/1166 [15:45<05:39, 1.10s/it]\u001b[A\n",
"Iteration: 74% 859/1166 [15:46<05:36, 1.10s/it]\u001b[A\n",
"Iteration: 74% 860/1166 [15:47<05:33, 1.09s/it]\u001b[A\n",
"Iteration: 74% 861/1166 [15:48<05:31, 1.09s/it]\u001b[A\n",
"Iteration: 74% 862/1166 [15:49<05:30, 1.09s/it]\u001b[A\n",
"Iteration: 74% 863/1166 [15:50<05:28, 1.09s/it]\u001b[A\n",
"Iteration: 74% 864/1166 [15:51<05:26, 1.08s/it]\u001b[A\n",
"Iteration: 74% 865/1166 [15:52<05:26, 1.08s/it]\u001b[A\n",
"Iteration: 74% 866/1166 [15:53<05:24, 1.08s/it]\u001b[A\n",
"Iteration: 74% 867/1166 [15:54<05:23, 1.08s/it]\u001b[A\n",
"Iteration: 74% 868/1166 [15:56<05:23, 1.08s/it]\u001b[A\n",
"Iteration: 75% 869/1166 [15:57<05:21, 1.08s/it]\u001b[A\n",
"Iteration: 75% 870/1166 [15:58<05:20, 1.08s/it]\u001b[A\n",
"Iteration: 75% 871/1166 [15:59<05:19, 1.08s/it]\u001b[A\n",
"Iteration: 75% 872/1166 [16:00<05:18, 1.08s/it]\u001b[A\n",
"Iteration: 75% 873/1166 [16:01<05:17, 1.08s/it]\u001b[A\n",
"Iteration: 75% 874/1166 [16:02<05:16, 1.08s/it]\u001b[A\n",
"Iteration: 75% 875/1166 [16:03<05:15, 1.08s/it]\u001b[A\n",
"Iteration: 75% 876/1166 [16:04<05:14, 1.08s/it]\u001b[A\n",
"Iteration: 75% 877/1166 [16:05<05:14, 1.09s/it]\u001b[A\n",
"Iteration: 75% 878/1166 [16:06<05:12, 1.09s/it]\u001b[A\n",
"Iteration: 75% 879/1166 [16:07<05:10, 1.08s/it]\u001b[A\n",
"Iteration: 75% 880/1166 [16:09<05:10, 1.09s/it]\u001b[A\n",
"Iteration: 76% 881/1166 [16:10<05:08, 1.08s/it]\u001b[A\n",
"Iteration: 76% 882/1166 [16:11<05:06, 1.08s/it]\u001b[A\n",
"Iteration: 76% 883/1166 [16:12<05:05, 1.08s/it]\u001b[A\n",
"Iteration: 76% 884/1166 [16:13<05:05, 1.08s/it]\u001b[A\n",
"Iteration: 76% 885/1166 [16:14<05:03, 1.08s/it]\u001b[A\n",
"Iteration: 76% 886/1166 [16:15<05:03, 1.08s/it]\u001b[A\n",
"Iteration: 76% 887/1166 [16:16<05:02, 1.08s/it]\u001b[A\n",
"Iteration: 76% 888/1166 [16:17<05:00, 1.08s/it]\u001b[A\n",
"Iteration: 76% 889/1166 [16:18<04:59, 1.08s/it]\u001b[A\n",
"Iteration: 76% 890/1166 [16:19<04:58, 1.08s/it]\u001b[A\n",
"Iteration: 76% 891/1166 [16:20<04:57, 1.08s/it]\u001b[A\n",
"Iteration: 77% 892/1166 [16:22<04:56, 1.08s/it]\u001b[A\n",
"Iteration: 77% 893/1166 [16:23<04:55, 1.08s/it]\u001b[A\n",
"Iteration: 77% 894/1166 [16:24<04:54, 1.08s/it]\u001b[A\n",
"Iteration: 77% 895/1166 [16:25<04:54, 1.09s/it]\u001b[A\n",
"Iteration: 77% 896/1166 [16:26<04:53, 1.09s/it]\u001b[A\n",
"Iteration: 77% 897/1166 [16:27<04:51, 1.08s/it]\u001b[A\n",
"Iteration: 77% 898/1166 [16:28<04:50, 1.09s/it]\u001b[A\n",
"Iteration: 77% 899/1166 [16:29<04:49, 1.09s/it]\u001b[A10/29/2019 16:27:18 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-900/config.json\n",
"10/29/2019 16:27:19 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-900/pytorch_model.bin\n",
"10/29/2019 16:27:19 - INFO - __main__ - Saving model checkpoint to output/checkpoint-900\n",
"\n",
"Iteration: 77% 900/1166 [16:31<06:09, 1.39s/it]\u001b[A\n",
"Iteration: 77% 901/1166 [16:32<05:41, 1.29s/it]\u001b[A\n",
"Iteration: 77% 902/1166 [16:33<05:24, 1.23s/it]\u001b[A\n",
"Iteration: 77% 903/1166 [16:34<05:11, 1.18s/it]\u001b[A\n",
"Iteration: 78% 904/1166 [16:36<05:02, 1.15s/it]\u001b[A\n",
"Iteration: 78% 905/1166 [16:37<04:55, 1.13s/it]\u001b[A\n",
"Iteration: 78% 906/1166 [16:38<04:49, 1.11s/it]\u001b[A\n",
"Iteration: 78% 907/1166 [16:39<04:46, 1.10s/it]\u001b[A\n",
"Iteration: 78% 908/1166 [16:40<04:43, 1.10s/it]\u001b[A\n",
"Iteration: 78% 909/1166 [16:41<04:41, 1.09s/it]\u001b[A\n",
"Iteration: 78% 910/1166 [16:42<04:39, 1.09s/it]\u001b[A\n",
"Iteration: 78% 911/1166 [16:43<04:37, 1.09s/it]\u001b[A\n",
"Iteration: 78% 912/1166 [16:44<04:35, 1.08s/it]\u001b[A\n",
"Iteration: 78% 913/1166 [16:45<04:33, 1.08s/it]\u001b[A\n",
"Iteration: 78% 914/1166 [16:46<04:32, 1.08s/it]\u001b[A\n",
"Iteration: 78% 915/1166 [16:47<04:31, 1.08s/it]\u001b[A\n",
"Iteration: 79% 916/1166 [16:48<04:30, 1.08s/it]\u001b[A\n",
"Iteration: 79% 917/1166 [16:50<04:30, 1.09s/it]\u001b[A\n",
"Iteration: 79% 918/1166 [16:51<04:29, 1.09s/it]\u001b[A\n",
"Iteration: 79% 919/1166 [16:52<04:27, 1.08s/it]\u001b[A\n",
"Iteration: 79% 920/1166 [16:53<04:26, 1.08s/it]\u001b[A\n",
"Iteration: 79% 921/1166 [16:54<04:25, 1.08s/it]\u001b[A\n",
"Iteration: 79% 922/1166 [16:55<04:23, 1.08s/it]\u001b[A\n",
"Iteration: 79% 923/1166 [16:56<04:22, 1.08s/it]\u001b[A\n",
"Iteration: 79% 924/1166 [16:57<04:21, 1.08s/it]\u001b[A\n",
"Iteration: 79% 925/1166 [16:58<04:20, 1.08s/it]\u001b[A\n",
"Iteration: 79% 926/1166 [16:59<04:19, 1.08s/it]\u001b[A\n",
"Iteration: 80% 927/1166 [17:00<04:18, 1.08s/it]\u001b[A\n",
"Iteration: 80% 928/1166 [17:01<04:16, 1.08s/it]\u001b[A\n",
"Iteration: 80% 929/1166 [17:03<04:16, 1.08s/it]\u001b[A\n",
"Iteration: 80% 930/1166 [17:04<04:16, 1.09s/it]\u001b[A\n",
"Iteration: 80% 931/1166 [17:05<04:14, 1.08s/it]\u001b[A\n",
"Iteration: 80% 932/1166 [17:06<04:13, 1.08s/it]\u001b[A\n",
"Iteration: 80% 933/1166 [17:07<04:12, 1.08s/it]\u001b[A\n",
"Iteration: 80% 934/1166 [17:08<04:11, 1.08s/it]\u001b[A\n",
"Iteration: 80% 935/1166 [17:09<04:10, 1.08s/it]\u001b[A\n",
"Iteration: 80% 936/1166 [17:10<04:09, 1.08s/it]\u001b[A\n",
"Iteration: 80% 937/1166 [17:11<04:07, 1.08s/it]\u001b[A\n",
"Iteration: 80% 938/1166 [17:12<04:06, 1.08s/it]\u001b[A\n",
"Iteration: 81% 939/1166 [17:13<04:06, 1.08s/it]\u001b[A\n",
"Iteration: 81% 940/1166 [17:14<04:04, 1.08s/it]\u001b[A\n",
"Iteration: 81% 941/1166 [17:16<04:03, 1.08s/it]\u001b[A\n",
"Iteration: 81% 942/1166 [17:17<04:02, 1.08s/it]\u001b[A\n",
"Iteration: 81% 943/1166 [17:18<04:01, 1.08s/it]\u001b[A\n",
"Iteration: 81% 944/1166 [17:19<04:00, 1.08s/it]\u001b[A\n",
"Iteration: 81% 945/1166 [17:20<03:58, 1.08s/it]\u001b[A\n",
"Iteration: 81% 946/1166 [17:21<03:58, 1.08s/it]\u001b[A\n",
"Iteration: 81% 947/1166 [17:22<03:57, 1.08s/it]\u001b[A\n",
"Iteration: 81% 948/1166 [17:23<03:56, 1.09s/it]\u001b[A\n",
"Iteration: 81% 949/1166 [17:24<03:55, 1.09s/it]\u001b[A10/29/2019 16:28:13 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-950/config.json\n",
"10/29/2019 16:28:14 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-950/pytorch_model.bin\n",
"10/29/2019 16:28:14 - INFO - __main__ - Saving model checkpoint to output/checkpoint-950\n",
"\n",
"Iteration: 81% 950/1166 [17:26<05:11, 1.44s/it]\u001b[A\n",
"Iteration: 82% 951/1166 [17:28<04:46, 1.33s/it]\u001b[A\n",
"Iteration: 82% 952/1166 [17:29<04:29, 1.26s/it]\u001b[A\n",
"Iteration: 82% 953/1166 [17:30<04:16, 1.20s/it]\u001b[A\n",
"Iteration: 82% 954/1166 [17:31<04:07, 1.17s/it]\u001b[A\n",
"Iteration: 82% 955/1166 [17:32<04:00, 1.14s/it]\u001b[A\n",
"Iteration: 82% 956/1166 [17:33<03:55, 1.12s/it]\u001b[A\n",
"Iteration: 82% 957/1166 [17:34<03:51, 1.11s/it]\u001b[A\n",
"Iteration: 82% 958/1166 [17:35<03:49, 1.10s/it]\u001b[A\n",
"Iteration: 82% 959/1166 [17:36<03:47, 1.10s/it]\u001b[A\n",
"Iteration: 82% 960/1166 [17:37<03:45, 1.09s/it]\u001b[A\n",
"Iteration: 82% 961/1166 [17:38<03:43, 1.09s/it]\u001b[A\n",
"Iteration: 83% 962/1166 [17:39<03:42, 1.09s/it]\u001b[A\n",
"Iteration: 83% 963/1166 [17:41<03:40, 1.09s/it]\u001b[A\n",
"Iteration: 83% 964/1166 [17:42<03:39, 1.09s/it]\u001b[A\n",
"Iteration: 83% 965/1166 [17:43<03:39, 1.09s/it]\u001b[A\n",
"Iteration: 83% 966/1166 [17:44<03:37, 1.09s/it]\u001b[A\n",
"Iteration: 83% 967/1166 [17:45<03:35, 1.09s/it]\u001b[A\n",
"Iteration: 83% 968/1166 [17:46<03:34, 1.09s/it]\u001b[A\n",
"Iteration: 83% 969/1166 [17:47<03:33, 1.08s/it]\u001b[A\n",
"Iteration: 83% 970/1166 [17:48<03:32, 1.08s/it]\u001b[A\n",
"Iteration: 83% 971/1166 [17:49<03:31, 1.09s/it]\u001b[A\n",
"Iteration: 83% 972/1166 [17:50<03:29, 1.08s/it]\u001b[A\n",
"Iteration: 83% 973/1166 [17:51<03:28, 1.08s/it]\u001b[A\n",
"Iteration: 84% 974/1166 [17:52<03:28, 1.09s/it]\u001b[A\n",
"Iteration: 84% 975/1166 [17:54<03:27, 1.09s/it]\u001b[A\n",
"Iteration: 84% 976/1166 [17:55<03:26, 1.09s/it]\u001b[A\n",
"Iteration: 84% 977/1166 [17:56<03:25, 1.08s/it]\u001b[A\n",
"Iteration: 84% 978/1166 [17:57<03:23, 1.08s/it]\u001b[A\n",
"Iteration: 84% 979/1166 [17:58<03:22, 1.09s/it]\u001b[A\n",
"Iteration: 84% 980/1166 [17:59<03:22, 1.09s/it]\u001b[A\n",
"Iteration: 84% 981/1166 [18:00<03:20, 1.09s/it]\u001b[A\n",
"Iteration: 84% 982/1166 [18:01<03:20, 1.09s/it]\u001b[A\n",
"Iteration: 84% 983/1166 [18:02<03:19, 1.09s/it]\u001b[A\n",
"Iteration: 84% 984/1166 [18:03<03:17, 1.09s/it]\u001b[A\n",
"Iteration: 84% 985/1166 [18:04<03:16, 1.09s/it]\u001b[A\n",
"Iteration: 85% 986/1166 [18:06<03:15, 1.08s/it]\u001b[A\n",
"Iteration: 85% 987/1166 [18:07<03:13, 1.08s/it]\u001b[A\n",
"Iteration: 85% 988/1166 [18:08<03:12, 1.08s/it]\u001b[A\n",
"Iteration: 85% 989/1166 [18:09<03:11, 1.08s/it]\u001b[A\n",
"Iteration: 85% 990/1166 [18:10<03:10, 1.08s/it]\u001b[A\n",
"Iteration: 85% 991/1166 [18:11<03:09, 1.08s/it]\u001b[A\n",
"Iteration: 85% 992/1166 [18:12<03:08, 1.08s/it]\u001b[A\n",
"Iteration: 85% 993/1166 [18:13<03:07, 1.08s/it]\u001b[A\n",
"Iteration: 85% 994/1166 [18:14<03:06, 1.08s/it]\u001b[A\n",
"Iteration: 85% 995/1166 [18:15<03:05, 1.09s/it]\u001b[A\n",
"Iteration: 85% 996/1166 [18:16<03:04, 1.08s/it]\u001b[A\n",
"Iteration: 86% 997/1166 [18:17<03:02, 1.08s/it]\u001b[A\n",
"Iteration: 86% 998/1166 [18:19<03:02, 1.09s/it]\u001b[A\n",
"Iteration: 86% 999/1166 [18:20<03:01, 1.08s/it]\u001b[A10/29/2019 16:29:08 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-1000/config.json\n",
"10/29/2019 16:29:09 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-1000/pytorch_model.bin\n",
"10/29/2019 16:29:09 - INFO - __main__ - Saving model checkpoint to output/checkpoint-1000\n",
"\n",
"Iteration: 86% 1000/1166 [18:22<03:46, 1.36s/it]\u001b[A\n",
"Iteration: 86% 1001/1166 [18:23<03:29, 1.27s/it]\u001b[A\n",
"Iteration: 86% 1002/1166 [18:24<03:19, 1.22s/it]\u001b[A\n",
"Iteration: 86% 1003/1166 [18:25<03:11, 1.17s/it]\u001b[A\n",
"Iteration: 86% 1004/1166 [18:26<03:05, 1.15s/it]\u001b[A\n",
"Iteration: 86% 1005/1166 [18:27<03:01, 1.13s/it]\u001b[A\n",
"Iteration: 86% 1006/1166 [18:28<02:58, 1.12s/it]\u001b[A\n",
"Iteration: 86% 1007/1166 [18:29<02:55, 1.11s/it]\u001b[A\n",
"Iteration: 86% 1008/1166 [18:30<02:53, 1.10s/it]\u001b[A\n",
"Iteration: 87% 1009/1166 [18:31<02:51, 1.09s/it]\u001b[A\n",
"Iteration: 87% 1010/1166 [18:32<02:50, 1.09s/it]\u001b[A\n",
"Iteration: 87% 1011/1166 [18:34<02:48, 1.09s/it]\u001b[A\n",
"Iteration: 87% 1012/1166 [18:35<02:47, 1.09s/it]\u001b[A\n",
"Iteration: 87% 1013/1166 [18:36<02:45, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1014/1166 [18:37<02:44, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1015/1166 [18:38<02:43, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1016/1166 [18:39<02:42, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1017/1166 [18:40<02:41, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1018/1166 [18:41<02:40, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1019/1166 [18:42<02:38, 1.08s/it]\u001b[A\n",
"Iteration: 87% 1020/1166 [18:43<02:37, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1021/1166 [18:44<02:37, 1.09s/it]\u001b[A\n",
"Iteration: 88% 1022/1166 [18:45<02:35, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1023/1166 [18:47<02:35, 1.09s/it]\u001b[A\n",
"Iteration: 88% 1024/1166 [18:48<02:34, 1.09s/it]\u001b[A\n",
"Iteration: 88% 1025/1166 [18:49<02:32, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1026/1166 [18:50<02:31, 1.09s/it]\u001b[A\n",
"Iteration: 88% 1027/1166 [18:51<02:30, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1028/1166 [18:52<02:29, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1029/1166 [18:53<02:28, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1030/1166 [18:54<02:27, 1.08s/it]\u001b[A\n",
"Iteration: 88% 1031/1166 [18:55<02:26, 1.09s/it]\u001b[A\n",
"Iteration: 89% 1032/1166 [18:56<02:25, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1033/1166 [18:57<02:23, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1034/1166 [18:58<02:22, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1035/1166 [19:00<02:21, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1036/1166 [19:01<02:20, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1037/1166 [19:02<02:19, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1038/1166 [19:03<02:18, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1039/1166 [19:04<02:17, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1040/1166 [19:05<02:16, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1041/1166 [19:06<02:15, 1.08s/it]\u001b[A\n",
"Iteration: 89% 1042/1166 [19:07<02:14, 1.09s/it]\u001b[A\n",
"Iteration: 89% 1043/1166 [19:08<02:13, 1.09s/it]\u001b[A\n",
"Iteration: 90% 1044/1166 [19:09<02:12, 1.08s/it]\u001b[A\n",
"Iteration: 90% 1045/1166 [19:10<02:11, 1.09s/it]\u001b[A\n",
"Iteration: 90% 1046/1166 [19:11<02:09, 1.08s/it]\u001b[A\n",
"Iteration: 90% 1047/1166 [19:13<02:09, 1.09s/it]\u001b[A\n",
"Iteration: 90% 1048/1166 [19:14<02:07, 1.08s/it]\u001b[A\n",
"Iteration: 90% 1049/1166 [19:15<02:06, 1.08s/it]\u001b[A10/29/2019 16:30:03 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-1050/config.json\n",
"10/29/2019 16:30:04 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-1050/pytorch_model.bin\n",
"10/29/2019 16:30:04 - INFO - __main__ - Saving model checkpoint to output/checkpoint-1050\n",
"\n",
"Iteration: 90% 1050/1166 [19:17<02:36, 1.35s/it]\u001b[A\n",
"Iteration: 90% 1051/1166 [19:18<02:25, 1.26s/it]\u001b[A\n",
"Iteration: 90% 1052/1166 [19:19<02:17, 1.21s/it]\u001b[A\n",
"Iteration: 90% 1053/1166 [19:20<02:12, 1.17s/it]\u001b[A\n",
"Iteration: 90% 1054/1166 [19:21<02:08, 1.15s/it]\u001b[A\n",
"Iteration: 90% 1055/1166 [19:22<02:05, 1.13s/it]\u001b[A\n",
"Iteration: 91% 1056/1166 [19:23<02:02, 1.11s/it]\u001b[A\n",
"Iteration: 91% 1057/1166 [19:24<02:00, 1.10s/it]\u001b[A\n",
"Iteration: 91% 1058/1166 [19:25<01:58, 1.10s/it]\u001b[A\n",
"Iteration: 91% 1059/1166 [19:26<01:56, 1.09s/it]\u001b[A\n",
"Iteration: 91% 1060/1166 [19:27<01:55, 1.08s/it]\u001b[A\n",
"Iteration: 91% 1061/1166 [19:29<01:53, 1.09s/it]\u001b[A\n",
"Iteration: 91% 1062/1166 [19:30<01:52, 1.09s/it]\u001b[A\n",
"Iteration: 91% 1063/1166 [19:31<01:51, 1.08s/it]\u001b[A\n",
"Iteration: 91% 1064/1166 [19:32<01:50, 1.08s/it]\u001b[A\n",
"Iteration: 91% 1065/1166 [19:33<01:49, 1.08s/it]\u001b[A\n",
"Iteration: 91% 1066/1166 [19:34<01:48, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1067/1166 [19:35<01:46, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1068/1166 [19:36<01:45, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1069/1166 [19:37<01:44, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1070/1166 [19:38<01:43, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1071/1166 [19:39<01:42, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1072/1166 [19:40<01:41, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1073/1166 [19:41<01:40, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1074/1166 [19:43<01:39, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1075/1166 [19:44<01:38, 1.09s/it]\u001b[A\n",
"Iteration: 92% 1076/1166 [19:45<01:37, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1077/1166 [19:46<01:36, 1.08s/it]\u001b[A\n",
"Iteration: 92% 1078/1166 [19:47<01:35, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1079/1166 [19:48<01:34, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1080/1166 [19:49<01:33, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1081/1166 [19:50<01:31, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1082/1166 [19:51<01:30, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1083/1166 [19:52<01:29, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1084/1166 [19:53<01:28, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1085/1166 [19:54<01:27, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1086/1166 [19:56<01:26, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1087/1166 [19:57<01:25, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1088/1166 [19:58<01:24, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1089/1166 [19:59<01:23, 1.08s/it]\u001b[A\n",
"Iteration: 93% 1090/1166 [20:00<01:22, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1091/1166 [20:01<01:21, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1092/1166 [20:02<01:20, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1093/1166 [20:03<01:19, 1.09s/it]\u001b[A\n",
"Iteration: 94% 1094/1166 [20:04<01:18, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1095/1166 [20:05<01:16, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1096/1166 [20:06<01:15, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1097/1166 [20:07<01:14, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1098/1166 [20:09<01:13, 1.08s/it]\u001b[A\n",
"Iteration: 94% 1099/1166 [20:10<01:12, 1.08s/it]\u001b[A10/29/2019 16:30:58 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-1100/config.json\n",
"10/29/2019 16:30:59 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-1100/pytorch_model.bin\n",
"10/29/2019 16:30:59 - INFO - __main__ - Saving model checkpoint to output/checkpoint-1100\n",
"\n",
"Iteration: 94% 1100/1166 [20:12<01:28, 1.34s/it]\u001b[A\n",
"Iteration: 94% 1101/1166 [20:13<01:21, 1.26s/it]\u001b[A\n",
"Iteration: 95% 1102/1166 [20:14<01:17, 1.21s/it]\u001b[A\n",
"Iteration: 95% 1103/1166 [20:15<01:13, 1.17s/it]\u001b[A\n",
"Iteration: 95% 1104/1166 [20:16<01:10, 1.14s/it]\u001b[A\n",
"Iteration: 95% 1105/1166 [20:17<01:08, 1.12s/it]\u001b[A\n",
"Iteration: 95% 1106/1166 [20:18<01:06, 1.11s/it]\u001b[A\n",
"Iteration: 95% 1107/1166 [20:19<01:05, 1.11s/it]\u001b[A\n",
"Iteration: 95% 1108/1166 [20:20<01:03, 1.10s/it]\u001b[A\n",
"Iteration: 95% 1109/1166 [20:21<01:02, 1.09s/it]\u001b[A\n",
"Iteration: 95% 1110/1166 [20:22<01:00, 1.09s/it]\u001b[A\n",
"Iteration: 95% 1111/1166 [20:23<00:59, 1.09s/it]\u001b[A\n",
"Iteration: 95% 1112/1166 [20:25<00:58, 1.08s/it]\u001b[A\n",
"Iteration: 95% 1113/1166 [20:26<00:57, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1114/1166 [20:27<00:56, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1115/1166 [20:28<00:55, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1116/1166 [20:29<00:53, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1117/1166 [20:30<00:53, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1118/1166 [20:31<00:52, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1119/1166 [20:32<00:50, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1120/1166 [20:33<00:49, 1.09s/it]\u001b[A\n",
"Iteration: 96% 1121/1166 [20:34<00:48, 1.09s/it]\u001b[A\n",
"Iteration: 96% 1122/1166 [20:35<00:47, 1.09s/it]\u001b[A\n",
"Iteration: 96% 1123/1166 [20:36<00:46, 1.09s/it]\u001b[A\n",
"Iteration: 96% 1124/1166 [20:38<00:45, 1.08s/it]\u001b[A\n",
"Iteration: 96% 1125/1166 [20:39<00:44, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1126/1166 [20:40<00:43, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1127/1166 [20:41<00:42, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1128/1166 [20:42<00:41, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1129/1166 [20:43<00:40, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1130/1166 [20:44<00:38, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1131/1166 [20:45<00:37, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1132/1166 [20:46<00:36, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1133/1166 [20:47<00:35, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1134/1166 [20:48<00:34, 1.08s/it]\u001b[A\n",
"Iteration: 97% 1135/1166 [20:49<00:33, 1.09s/it]\u001b[A\n",
"Iteration: 97% 1136/1166 [20:51<00:32, 1.09s/it]\u001b[A\n",
"Iteration: 98% 1137/1166 [20:52<00:31, 1.09s/it]\u001b[A\n",
"Iteration: 98% 1138/1166 [20:53<00:30, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1139/1166 [20:54<00:29, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1140/1166 [20:55<00:28, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1141/1166 [20:56<00:27, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1142/1166 [20:57<00:26, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1143/1166 [20:58<00:24, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1144/1166 [20:59<00:23, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1145/1166 [21:00<00:22, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1146/1166 [21:01<00:21, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1147/1166 [21:02<00:20, 1.08s/it]\u001b[A\n",
"Iteration: 98% 1148/1166 [21:04<00:19, 1.09s/it]\u001b[A\n",
"Iteration: 99% 1149/1166 [21:05<00:18, 1.09s/it]\u001b[A10/29/2019 16:31:53 - INFO - transformers.configuration_utils - Configuration saved in output/checkpoint-1150/config.json\n",
"10/29/2019 16:31:54 - INFO - transformers.modeling_utils - Model weights saved in output/checkpoint-1150/pytorch_model.bin\n",
"10/29/2019 16:31:54 - INFO - __main__ - Saving model checkpoint to output/checkpoint-1150\n",
"\n",
"Iteration: 99% 1150/1166 [21:07<00:21, 1.37s/it]\u001b[A\n",
"Iteration: 99% 1151/1166 [21:08<00:19, 1.28s/it]\u001b[A\n",
"Iteration: 99% 1152/1166 [21:09<00:17, 1.22s/it]\u001b[A\n",
"Iteration: 99% 1153/1166 [21:10<00:15, 1.18s/it]\u001b[A\n",
"Iteration: 99% 1154/1166 [21:11<00:13, 1.15s/it]\u001b[A\n",
"Iteration: 99% 1155/1166 [21:12<00:12, 1.13s/it]\u001b[A\n",
"Iteration: 99% 1156/1166 [21:13<00:11, 1.11s/it]\u001b[A\n",
"Iteration: 99% 1157/1166 [21:14<00:09, 1.10s/it]\u001b[A\n",
"Iteration: 99% 1158/1166 [21:15<00:08, 1.10s/it]\u001b[A\n",
"Iteration: 99% 1159/1166 [21:16<00:07, 1.09s/it]\u001b[A\n",
"Iteration: 99% 1160/1166 [21:17<00:06, 1.09s/it]\u001b[A\n",
"Iteration: 100% 1161/1166 [21:19<00:05, 1.09s/it]\u001b[A\n",
"Iteration: 100% 1162/1166 [21:20<00:04, 1.09s/it]\u001b[A\n",
"Iteration: 100% 1163/1166 [21:21<00:03, 1.09s/it]\u001b[A\n",
"Iteration: 100% 1164/1166 [21:22<00:02, 1.09s/it]\u001b[A\n",
"Iteration: 100% 1165/1166 [21:23<00:01, 1.08s/it]\u001b[A\n",
"Iteration: 100% 1166/1166 [21:24<00:00, 1.09s/it]\u001b[A\n",
"Epoch: 100% 1/1 [21:24<00:00, 1284.50s/it]\n",
"10/29/2019 16:32:12 - INFO - __main__ - global_step = 1166, average loss = 2.82379139267151\n",
"10/29/2019 16:32:12 - INFO - __main__ - Saving model checkpoint to output\n",
"10/29/2019 16:32:12 - INFO - transformers.configuration_utils - Configuration saved in output/config.json\n",
"10/29/2019 16:32:13 - INFO - transformers.modeling_utils - Model weights saved in output/pytorch_model.bin\n",
"10/29/2019 16:32:13 - INFO - transformers.configuration_utils - loading configuration file output/config.json\n",
"10/29/2019 16:32:13 - INFO - transformers.configuration_utils - Model config {\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"finetuning_task\": null,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.1,\n",
" \"hidden_size\": 768,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 3072,\n",
" \"layer_norm_eps\": 1e-12,\n",
" \"max_position_embeddings\": 512,\n",
" \"num_attention_heads\": 12,\n",
" \"num_hidden_layers\": 12,\n",
" \"num_labels\": 2,\n",
" \"output_attentions\": false,\n",
" \"output_hidden_states\": false,\n",
" \"output_past\": true,\n",
" \"pruned_heads\": {},\n",
" \"torchscript\": false,\n",
" \"type_vocab_size\": 2,\n",
" \"use_bfloat16\": false,\n",
" \"vocab_size\": 28996\n",
"}\n",
"\n",
"10/29/2019 16:32:13 - INFO - transformers.modeling_utils - loading weights file output/pytorch_model.bin\n",
"10/29/2019 16:32:16 - INFO - transformers.tokenization_utils - Model name 'output' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). Assuming 'output' is a path or url to a directory containing tokenizer files.\n",
"10/29/2019 16:32:16 - INFO - transformers.tokenization_utils - loading file output/vocab.txt\n",
"10/29/2019 16:32:16 - INFO - transformers.tokenization_utils - loading file output/added_tokens.json\n",
"10/29/2019 16:32:16 - INFO - transformers.tokenization_utils - loading file output/special_tokens_map.json\n",
"10/29/2019 16:32:16 - INFO - transformers.tokenization_utils - loading file output/tokenizer_config.json\n",
"10/29/2019 16:32:17 - INFO - __main__ - Evaluate the following checkpoints: ['output']\n",
"10/29/2019 16:32:17 - INFO - transformers.configuration_utils - loading configuration file output/config.json\n",
"10/29/2019 16:32:17 - INFO - transformers.configuration_utils - Model config {\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"finetuning_task\": null,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.1,\n",
" \"hidden_size\": 768,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 3072,\n",
" \"layer_norm_eps\": 1e-12,\n",
" \"max_position_embeddings\": 512,\n",
" \"num_attention_heads\": 12,\n",
" \"num_hidden_layers\": 12,\n",
" \"num_labels\": 2,\n",
" \"output_attentions\": false,\n",
" \"output_hidden_states\": false,\n",
" \"output_past\": true,\n",
" \"pruned_heads\": {},\n",
" \"torchscript\": false,\n",
" \"type_vocab_size\": 2,\n",
" \"use_bfloat16\": false,\n",
" \"vocab_size\": 28996\n",
"}\n",
"\n",
"10/29/2019 16:32:17 - INFO - transformers.modeling_utils - loading weights file output/pytorch_model.bin\n",
"10/29/2019 16:32:20 - INFO - __main__ - Creating features from dataset file at wikitext-2-raw\n",
"10/29/2019 16:32:24 - WARNING - transformers.tokenization_utils - Token indices sequence length is longer than the specified maximum sequence length for this model (281595 > 512). Running this sequence through the model will result in indexing errors\n",
"10/29/2019 16:32:24 - INFO - __main__ - Saving features into cached file wikitext-2-raw/cached_lm_510_wiki.test.raw\n",
"10/29/2019 16:32:24 - INFO - __main__ - ***** Running evaluation *****\n",
"10/29/2019 16:32:24 - INFO - __main__ - Num examples = 552\n",
"10/29/2019 16:32:24 - INFO - __main__ - Batch size = 4\n",
"Evaluating: 100% 138/138 [00:48<00:00, 2.83it/s]\n",
"10/29/2019 16:33:13 - INFO - __main__ - ***** Eval results *****\n",
"10/29/2019 16:33:13 - INFO - __main__ - perplexity = tensor(11.5040)\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
""
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hMKSB6vdpcs7",
"colab_type": "text"
},
"source": [
"# Conclusion\n",
"This notebook shows the basic steps for finetuning a language model using the HuggingFace Transformer library.\n"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment