Skip to content

Instantly share code, notes, and snippets.

@sachinprasadhs
Created June 2, 2023 23:39
Show Gist options
  • Save sachinprasadhs/fcbb6443a69682a66ef30643b88751af to your computer and use it in GitHub Desktop.
Save sachinprasadhs/fcbb6443a69682a66ef30643b88751af to your computer and use it in GitHub Desktop.
text_extraction_with_bert
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/sachinprasadhs/fcbb6443a69682a66ef30643b88751af/text_extraction_with_bert.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EuEqGLM4Y0NO"
},
"source": [
"# BERT (from HuggingFace Transformers) for Text Extraction\n",
"\n",
"**Author:** [Apoorv Nandan](https://twitter.com/NandanApoorv)<br>\n",
"**Date created:** 2020/05/23<br>\n",
"**Last modified:** 2020/05/23<br>\n",
"**Description:** Fine tune pretrained BERT from HuggingFace Transformers on SQuAD."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_a-2M9vaY0NP"
},
"source": [
"## Introduction\n",
"\n",
"This demonstration uses SQuAD (Stanford Question-Answering Dataset).\n",
"In SQuAD, an input consists of a question, and a paragraph for context.\n",
"The goal is to find the span of text in the paragraph that answers the question.\n",
"We evaluate our performance on this data with the \"Exact Match\" metric,\n",
"which measures the percentage of predictions that exactly match any one of the\n",
"ground-truth answers.\n",
"\n",
"We fine-tune a BERT model to perform this task as follows:\n",
"\n",
"1. Feed the context and the question as inputs to BERT.\n",
"2. Take two vectors S and T with dimensions equal to that of\n",
" hidden states in BERT.\n",
"3. Compute the probability of each token being the start and end of\n",
" the answer span. The probability of a token being the start of\n",
" the answer is given by a dot product between S and the representation\n",
" of the token in the last layer of BERT, followed by a softmax over all tokens.\n",
" The probability of a token being the end of the answer is computed\n",
" similarly with the vector T.\n",
"4. Fine-tune BERT and learn S and T along the way.\n",
"\n",
"**References:**\n",
"\n",
"- [BERT](https://arxiv.org/pdf/1810.04805.pdf)\n",
"- [SQuAD](https://arxiv.org/abs/1606.05250)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E68W1bk2Y0NQ"
},
"source": [
"## Setup\n"
]
},
{
"cell_type": "code",
"source": [
"!pip install tokenizers"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Zch6yfY0ZHQg",
"outputId": "4336c394-7ac4-430e-ffbd-f89574c544b8"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting tokenizers\n",
" Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m52.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: tokenizers\n",
"Successfully installed tokenizers-0.13.3\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!pip install transformers"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "v7YmqmDtZTLR",
"outputId": "8d62fabe-b24e-4348-b173-7c46d3f3ec81"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers\n",
" Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.1/7.1 MB\u001b[0m \u001b[31m72.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.12.0)\n",
"Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)\n",
" Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m236.8/236.8 kB\u001b[0m \u001b[31m24.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.13.3)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers) (2023.4.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.4)\n",
"Installing collected packages: huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.15.1 transformers-4.29.2\n"
]
}
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iIeyICk6Y0NQ"
},
"outputs": [],
"source": [
"import os\n",
"import re\n",
"import json\n",
"import string\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from tensorflow.keras import layers\n",
"from tokenizers import BertWordPieceTokenizer\n",
"from transformers import BertTokenizer, TFBertModel, BertConfig\n",
"\n",
"max_len = 384\n",
"configuration = BertConfig() # default parameters and configuration for BERT\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8GwVusvPY0NR"
},
"source": [
"## Set-up BERT tokenizer\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 113,
"referenced_widgets": [
"e4a808732f6c4ac4ac487341b686b37f",
"9243987366b744c98f91d4a63af78805",
"bf0d306776ee4a5eb379036af7dfe379",
"8462611c3d104099817ffb35dba4061c",
"1f452ce7c5f6436d9241fc33ba6945ef",
"1756226c70eb4db0b43248e4bd8c08d2",
"00793ffc8afe4f86af96425d4f1d3f4d",
"1517433ed9cc4e0887fbe173211115f7",
"afeecb13d6114259a019f55515cd95c3",
"5e0f2df21127479599a68bbbb40d646d",
"9f17e2353e1844cea1fc2e5d65b31d54",
"425f65a6f9d74bd4bb188bd7d4ba05de",
"83ff7651e90a4bb6bd25ecde82955c8b",
"8ac7a110e60c4b5e801b410a828189b0",
"7adcf3232cea4f078aff5f501c8f6f83",
"e52306ee04a74975bc2691b1590d5189",
"58247ce5d13b4915b7a17166847a8a99",
"d773f7585dc0417d9591546740dad354",
"dfabba81045640bdb0315f1a44021b8b",
"9c543426bc394cbe94d96b71a754fe4a",
"e9c005c73e594a708419606d961f4673",
"558a2f5425364a26b69367b2e73cca97",
"f941258f2b0a47f6b18ee10770e1ff92",
"c098b6845b114f038bdf9c5935cbfa86",
"9cd6f9fab4fe4a529f24c1f38364cab7",
"4a0db2e5ba0546988989716f92b8da20",
"6c648d7fca464c0a994f7c40ca1c26f8",
"a4ace0e1bf5d4e89a9b461abb21cfbb7",
"32922d38f2ec4e95a77021419690b7dd",
"d07a8cebdfd344b9a59503090ac106c9",
"998732048ae64fd28b1eba6705923077",
"d6755a7286f64a13929877067872be31",
"cede7502cc884d2ab09880a0ba528ba9"
]
},
"id": "ygSFoShXY0NR",
"outputId": "2b9de9a5-7d3e-4f7e-e880-58df47309641"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"Downloading (…)solve/main/vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "e4a808732f6c4ac4ac487341b686b37f"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"Downloading (…)okenizer_config.json: 0%| | 0.00/28.0 [00:00<?, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "425f65a6f9d74bd4bb188bd7d4ba05de"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"Downloading (…)lve/main/config.json: 0%| | 0.00/570 [00:00<?, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "f941258f2b0a47f6b18ee10770e1ff92"
}
},
"metadata": {}
}
],
"source": [
"# Save the slow pretrained tokenizer\n",
"slow_tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\")\n",
"save_path = \"bert_base_uncased/\"\n",
"if not os.path.exists(save_path):\n",
" os.makedirs(save_path)\n",
"slow_tokenizer.save_pretrained(save_path)\n",
"\n",
"# Load the fast tokenizer from saved file\n",
"tokenizer = BertWordPieceTokenizer(\"bert_base_uncased/vocab.txt\", lowercase=True)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "C7qNaCMbY0NR"
},
"source": [
"## Load the data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "f8jfSRtkY0NR",
"outputId": "aacf80df-3ec8-40c3-db26-c7aeaf805a14"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Downloading data from https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json\n",
"30288272/30288272 [==============================] - 0s 0us/step\n",
"Downloading data from https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\n",
"4854279/4854279 [==============================] - 0s 0us/step\n"
]
}
],
"source": [
"train_data_url = \"https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json\"\n",
"train_path = keras.utils.get_file(\"train.json\", train_data_url)\n",
"eval_data_url = \"https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\"\n",
"eval_path = keras.utils.get_file(\"eval.json\", eval_data_url)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ebJmmKQEY0NS"
},
"source": [
"## Preprocess the data\n",
"\n",
"1. Go through the JSON file and store every record as a `SquadExample` object.\n",
"2. Go through each `SquadExample` and create `x_train, y_train, x_eval, y_eval`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BCJYjtX_Y0NS",
"outputId": "67d68ca7-27ff-4e34-f49f-96ed5d4dcaa6"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"87599 training points created.\n",
"10570 evaluation points created.\n"
]
}
],
"source": [
"\n",
"class SquadExample:\n",
" def __init__(self, question, context, start_char_idx, answer_text, all_answers):\n",
" self.question = question\n",
" self.context = context\n",
" self.start_char_idx = start_char_idx\n",
" self.answer_text = answer_text\n",
" self.all_answers = all_answers\n",
" self.skip = False\n",
"\n",
" def preprocess(self):\n",
" context = self.context\n",
" question = self.question\n",
" answer_text = self.answer_text\n",
" start_char_idx = self.start_char_idx\n",
"\n",
" # Clean context, answer and question\n",
" context = \" \".join(str(context).split())\n",
" question = \" \".join(str(question).split())\n",
" answer = \" \".join(str(answer_text).split())\n",
"\n",
" # Find end character index of answer in context\n",
" end_char_idx = start_char_idx + len(answer)\n",
" if end_char_idx >= len(context):\n",
" self.skip = True\n",
" return\n",
"\n",
" # Mark the character indexes in context that are in answer\n",
" is_char_in_ans = [0] * len(context)\n",
" for idx in range(start_char_idx, end_char_idx):\n",
" is_char_in_ans[idx] = 1\n",
"\n",
" # Tokenize context\n",
" tokenized_context = tokenizer.encode(context)\n",
"\n",
" # Find tokens that were created from answer characters\n",
" ans_token_idx = []\n",
" for idx, (start, end) in enumerate(tokenized_context.offsets):\n",
" if sum(is_char_in_ans[start:end]) > 0:\n",
" ans_token_idx.append(idx)\n",
"\n",
" if len(ans_token_idx) == 0:\n",
" self.skip = True\n",
" return\n",
"\n",
" # Find start and end token index for tokens from answer\n",
" start_token_idx = ans_token_idx[0]\n",
" end_token_idx = ans_token_idx[-1]\n",
"\n",
" # Tokenize question\n",
" tokenized_question = tokenizer.encode(question)\n",
"\n",
" # Create inputs\n",
" input_ids = tokenized_context.ids + tokenized_question.ids[1:]\n",
" token_type_ids = [0] * len(tokenized_context.ids) + [1] * len(\n",
" tokenized_question.ids[1:]\n",
" )\n",
" attention_mask = [1] * len(input_ids)\n",
"\n",
" # Pad and create attention masks.\n",
" # Skip if truncation is needed\n",
" padding_length = max_len - len(input_ids)\n",
" if padding_length > 0: # pad\n",
" input_ids = input_ids + ([0] * padding_length)\n",
" attention_mask = attention_mask + ([0] * padding_length)\n",
" token_type_ids = token_type_ids + ([0] * padding_length)\n",
" elif padding_length < 0: # skip\n",
" self.skip = True\n",
" return\n",
"\n",
" self.input_ids = input_ids\n",
" self.token_type_ids = token_type_ids\n",
" self.attention_mask = attention_mask\n",
" self.start_token_idx = start_token_idx\n",
" self.end_token_idx = end_token_idx\n",
" self.context_token_to_char = tokenized_context.offsets\n",
"\n",
"\n",
"with open(train_path) as f:\n",
" raw_train_data = json.load(f)\n",
"\n",
"with open(eval_path) as f:\n",
" raw_eval_data = json.load(f)\n",
"\n",
"\n",
"def create_squad_examples(raw_data):\n",
" squad_examples = []\n",
" for item in raw_data[\"data\"]:\n",
" for para in item[\"paragraphs\"]:\n",
" context = para[\"context\"]\n",
" for qa in para[\"qas\"]:\n",
" question = qa[\"question\"]\n",
" answer_text = qa[\"answers\"][0][\"text\"]\n",
" all_answers = [_[\"text\"] for _ in qa[\"answers\"]]\n",
" start_char_idx = qa[\"answers\"][0][\"answer_start\"]\n",
" squad_eg = SquadExample(\n",
" question, context, start_char_idx, answer_text, all_answers\n",
" )\n",
" squad_eg.preprocess()\n",
" squad_examples.append(squad_eg)\n",
" return squad_examples\n",
"\n",
"\n",
"def create_inputs_targets(squad_examples):\n",
" dataset_dict = {\n",
" \"input_ids\": [],\n",
" \"token_type_ids\": [],\n",
" \"attention_mask\": [],\n",
" \"start_token_idx\": [],\n",
" \"end_token_idx\": [],\n",
" }\n",
" for item in squad_examples:\n",
" if item.skip == False:\n",
" for key in dataset_dict:\n",
" dataset_dict[key].append(getattr(item, key))\n",
" for key in dataset_dict:\n",
" dataset_dict[key] = np.array(dataset_dict[key])\n",
"\n",
" x = [\n",
" dataset_dict[\"input_ids\"],\n",
" dataset_dict[\"token_type_ids\"],\n",
" dataset_dict[\"attention_mask\"],\n",
" ]\n",
" y = [dataset_dict[\"start_token_idx\"], dataset_dict[\"end_token_idx\"]]\n",
" return x, y\n",
"\n",
"\n",
"train_squad_examples = create_squad_examples(raw_train_data)\n",
"x_train, y_train = create_inputs_targets(train_squad_examples)\n",
"print(f\"{len(train_squad_examples)} training points created.\")\n",
"\n",
"eval_squad_examples = create_squad_examples(raw_eval_data)\n",
"x_eval, y_eval = create_inputs_targets(eval_squad_examples)\n",
"print(f\"{len(eval_squad_examples)} evaluation points created.\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ydtBkX7_Y0NS"
},
"source": [
"Create the Question-Answering Model using BERT and Functional API\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "onxVYkizY0NS"
},
"outputs": [],
"source": [
"\n",
"def create_model():\n",
" ## BERT encoder\n",
" encoder = TFBertModel.from_pretrained(\"bert-base-uncased\")\n",
"\n",
" ## QA Model\n",
" input_ids = layers.Input(shape=(max_len,), dtype=tf.int32)\n",
" token_type_ids = layers.Input(shape=(max_len,), dtype=tf.int32)\n",
" attention_mask = layers.Input(shape=(max_len,), dtype=tf.int32)\n",
" embedding = encoder(\n",
" input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask\n",
" )[0]\n",
"\n",
" start_logits = layers.Dense(1, name=\"start_logit\", use_bias=False)(embedding)\n",
" start_logits = layers.Flatten()(start_logits)\n",
"\n",
" end_logits = layers.Dense(1, name=\"end_logit\", use_bias=False)(embedding)\n",
" end_logits = layers.Flatten()(end_logits)\n",
"\n",
" start_probs = layers.Activation(keras.activations.softmax)(start_logits)\n",
" end_probs = layers.Activation(keras.activations.softmax)(end_logits)\n",
"\n",
" model = keras.Model(\n",
" inputs=[input_ids, token_type_ids, attention_mask],\n",
" outputs=[start_probs, end_probs],\n",
" )\n",
" loss = keras.losses.SparseCategoricalCrossentropy(from_logits=False)\n",
" optimizer = keras.optimizers.Adam(lr=5e-5)\n",
" model.compile(optimizer=optimizer, loss=[loss, loss])\n",
" return model\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "27LU1a0gY0NS"
},
"source": [
"This code should preferably be run on Google Colab TPU runtime.\n",
"With Colab TPUs, each epoch will take 5-6 minutes.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 867,
"referenced_widgets": [
"ac93cc27897046419a2d3c03edcc13f6",
"dec42a9a3dfb4bc39986b7211636c800",
"55aab1e665bf4305bb179b93b3f6d41f",
"304460330c7144a8bf95c5f6914eefe0",
"aba541ecb4084eea86d2b324a5c612a0",
"4d323d2f502c4c8390a727a7372d094d",
"75a76ad0d7cd4e66bf0ca928bbdef27d",
"44eb535189a141d288af6382c64c5ac4",
"2faa75ede49f48f8a6c55424145f310e",
"1944ec456ab649ae93ec7503b88760f4",
"d990e391d75f4b17a8772c170c38dab7"
]
},
"id": "-HbgPYjgY0NS",
"outputId": "44558d0c-ea7b-44dd-b577-cc3d85eecc7d"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"Downloading tf_model.h5: 0%| | 0.00/536M [00:00<?, ?B/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "ac93cc27897046419a2d3c03edcc13f6"
}
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']\n",
"- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.\n",
"If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n",
"WARNING:absl:`lr` is deprecated in Keras optimizer, please use `learning_rate` or use the legacy optimizer, e.g.,tf.keras.optimizers.legacy.Adam.\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"model\"\n",
"__________________________________________________________________________________________________\n",
" Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
" input_1 (InputLayer) [(None, 384)] 0 [] \n",
" \n",
" input_3 (InputLayer) [(None, 384)] 0 [] \n",
" \n",
" input_2 (InputLayer) [(None, 384)] 0 [] \n",
" \n",
" tf_bert_model (TFBertModel) TFBaseModelOutputWi 109482240 ['input_1[0][0]', \n",
" thPoolingAndCrossAt 'input_3[0][0]', \n",
" tentions(last_hidde 'input_2[0][0]'] \n",
" n_state=(None, 384, \n",
" 768), \n",
" pooler_output=(Non \n",
" e, 768), \n",
" past_key_values=No \n",
" ne, hidden_states=N \n",
" one, attentions=Non \n",
" e, cross_attentions \n",
" =None) \n",
" \n",
" start_logit (Dense) (None, 384, 1) 768 ['tf_bert_model[0][0]'] \n",
" \n",
" end_logit (Dense) (None, 384, 1) 768 ['tf_bert_model[0][0]'] \n",
" \n",
" flatten (Flatten) (None, 384) 0 ['start_logit[0][0]'] \n",
" \n",
" flatten_1 (Flatten) (None, 384) 0 ['end_logit[0][0]'] \n",
" \n",
" activation (Activation) (None, 384) 0 ['flatten[0][0]'] \n",
" \n",
" activation_1 (Activation) (None, 384) 0 ['flatten_1[0][0]'] \n",
" \n",
"==================================================================================================\n",
"Total params: 109,483,776\n",
"Trainable params: 109,483,776\n",
"Non-trainable params: 0\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"use_tpu = False\n",
"if use_tpu:\n",
" # Create distribution strategy\n",
" tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()\n",
" strategy = tf.distribute.TPUStrategy(tpu)\n",
"\n",
" # Create model\n",
" with strategy.scope():\n",
" model = create_model()\n",
"else:\n",
" model = create_model()\n",
"\n",
"model.summary()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9i5mMUeUY0NT"
},
"source": [
"## Create evaluation Callback\n",
"\n",
"This callback will compute the exact match score using the validation data\n",
"after every epoch.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pmxIA8PlY0NT"
},
"outputs": [],
"source": [
"\n",
"def normalize_text(text):\n",
" text = text.lower()\n",
"\n",
" # Remove punctuations\n",
" exclude = set(string.punctuation)\n",
" text = \"\".join(ch for ch in text if ch not in exclude)\n",
"\n",
" # Remove articles\n",
" regex = re.compile(r\"\\b(a|an|the)\\b\", re.UNICODE)\n",
" text = re.sub(regex, \" \", text)\n",
"\n",
" # Remove extra white space\n",
" text = \" \".join(text.split())\n",
" return text\n",
"\n",
"\n",
"class ExactMatch(keras.callbacks.Callback):\n",
" \"\"\"\n",
" Each `SquadExample` object contains the character level offsets for each token\n",
" in its input paragraph. We use them to get back the span of text corresponding\n",
" to the tokens between our predicted start and end tokens.\n",
" All the ground-truth answers are also present in each `SquadExample` object.\n",
" We calculate the percentage of data points where the span of text obtained\n",
" from model predictions matches one of the ground-truth answers.\n",
" \"\"\"\n",
"\n",
" def __init__(self, x_eval, y_eval):\n",
" self.x_eval = x_eval\n",
" self.y_eval = y_eval\n",
"\n",
" def on_epoch_end(self, epoch, logs=None):\n",
" pred_start, pred_end = self.model.predict(self.x_eval)\n",
" count = 0\n",
" eval_examples_no_skip = [_ for _ in eval_squad_examples if _.skip == False]\n",
" for idx, (start, end) in enumerate(zip(pred_start, pred_end)):\n",
" squad_eg = eval_examples_no_skip[idx]\n",
" offsets = squad_eg.context_token_to_char\n",
" start = np.argmax(start)\n",
" end = np.argmax(end)\n",
" if start >= len(offsets):\n",
" continue\n",
" pred_char_start = offsets[start][0]\n",
" if end < len(offsets):\n",
" pred_char_end = offsets[end][1]\n",
" pred_ans = squad_eg.context[pred_char_start:pred_char_end]\n",
" else:\n",
" pred_ans = squad_eg.context[pred_char_start:]\n",
"\n",
" normalized_pred_ans = normalize_text(pred_ans)\n",
" normalized_true_ans = [normalize_text(_) for _ in squad_eg.all_answers]\n",
" if normalized_pred_ans in normalized_true_ans:\n",
" count += 1\n",
" acc = count / len(self.y_eval[0])\n",
" print(f\"\\nepoch={epoch+1}, exact match score={acc:.2f}\")\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Li2Mv9uhY0NT"
},
"source": [
"## Train and Evaluate\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HuA4NyRVY0NT",
"outputId": "9e58c04b-d516-486f-ab18-b64e4a38d46f"
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?\n",
"WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?\n",
"WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?\n",
"WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_model/bert/pooler/dense/kernel:0', 'tf_bert_model/bert/pooler/dense/bias:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?\n"
]
}
],
"source": [
"exact_match_callback = ExactMatch(x_eval, y_eval)\n",
"model.fit(\n",
" x_train,\n",
" y_train,\n",
" epochs=1, # For demonstration, 3 epochs are recommended\n",
" verbose=2,\n",
" batch_size=64,\n",
" callbacks=[exact_match_callback],\n",
")\n"
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "tODNj-sreXRE"
},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"name": "text_extraction_with_bert",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"e4a808732f6c4ac4ac487341b686b37f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_9243987366b744c98f91d4a63af78805",
"IPY_MODEL_bf0d306776ee4a5eb379036af7dfe379",
"IPY_MODEL_8462611c3d104099817ffb35dba4061c"
],
"layout": "IPY_MODEL_1f452ce7c5f6436d9241fc33ba6945ef"
}
},
"9243987366b744c98f91d4a63af78805": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1756226c70eb4db0b43248e4bd8c08d2",
"placeholder": "​",
"style": "IPY_MODEL_00793ffc8afe4f86af96425d4f1d3f4d",
"value": "Downloading (…)solve/main/vocab.txt: 100%"
}
},
"bf0d306776ee4a5eb379036af7dfe379": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1517433ed9cc4e0887fbe173211115f7",
"max": 231508,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_afeecb13d6114259a019f55515cd95c3",
"value": 231508
}
},
"8462611c3d104099817ffb35dba4061c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_5e0f2df21127479599a68bbbb40d646d",
"placeholder": "​",
"style": "IPY_MODEL_9f17e2353e1844cea1fc2e5d65b31d54",
"value": " 232k/232k [00:00&lt;00:00, 1.78MB/s]"
}
},
"1f452ce7c5f6436d9241fc33ba6945ef": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"1756226c70eb4db0b43248e4bd8c08d2": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"00793ffc8afe4f86af96425d4f1d3f4d": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"1517433ed9cc4e0887fbe173211115f7": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"afeecb13d6114259a019f55515cd95c3": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"5e0f2df21127479599a68bbbb40d646d": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"9f17e2353e1844cea1fc2e5d65b31d54": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"425f65a6f9d74bd4bb188bd7d4ba05de": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_83ff7651e90a4bb6bd25ecde82955c8b",
"IPY_MODEL_8ac7a110e60c4b5e801b410a828189b0",
"IPY_MODEL_7adcf3232cea4f078aff5f501c8f6f83"
],
"layout": "IPY_MODEL_e52306ee04a74975bc2691b1590d5189"
}
},
"83ff7651e90a4bb6bd25ecde82955c8b": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_58247ce5d13b4915b7a17166847a8a99",
"placeholder": "​",
"style": "IPY_MODEL_d773f7585dc0417d9591546740dad354",
"value": "Downloading (…)okenizer_config.json: 100%"
}
},
"8ac7a110e60c4b5e801b410a828189b0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_dfabba81045640bdb0315f1a44021b8b",
"max": 28,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_9c543426bc394cbe94d96b71a754fe4a",
"value": 28
}
},
"7adcf3232cea4f078aff5f501c8f6f83": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_e9c005c73e594a708419606d961f4673",
"placeholder": "​",
"style": "IPY_MODEL_558a2f5425364a26b69367b2e73cca97",
"value": " 28.0/28.0 [00:00&lt;00:00, 1.53kB/s]"
}
},
"e52306ee04a74975bc2691b1590d5189": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"58247ce5d13b4915b7a17166847a8a99": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"d773f7585dc0417d9591546740dad354": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"dfabba81045640bdb0315f1a44021b8b": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"9c543426bc394cbe94d96b71a754fe4a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"e9c005c73e594a708419606d961f4673": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"558a2f5425364a26b69367b2e73cca97": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"f941258f2b0a47f6b18ee10770e1ff92": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_c098b6845b114f038bdf9c5935cbfa86",
"IPY_MODEL_9cd6f9fab4fe4a529f24c1f38364cab7",
"IPY_MODEL_4a0db2e5ba0546988989716f92b8da20"
],
"layout": "IPY_MODEL_6c648d7fca464c0a994f7c40ca1c26f8"
}
},
"c098b6845b114f038bdf9c5935cbfa86": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_a4ace0e1bf5d4e89a9b461abb21cfbb7",
"placeholder": "​",
"style": "IPY_MODEL_32922d38f2ec4e95a77021419690b7dd",
"value": "Downloading (…)lve/main/config.json: 100%"
}
},
"9cd6f9fab4fe4a529f24c1f38364cab7": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_d07a8cebdfd344b9a59503090ac106c9",
"max": 570,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_998732048ae64fd28b1eba6705923077",
"value": 570
}
},
"4a0db2e5ba0546988989716f92b8da20": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_d6755a7286f64a13929877067872be31",
"placeholder": "​",
"style": "IPY_MODEL_cede7502cc884d2ab09880a0ba528ba9",
"value": " 570/570 [00:00&lt;00:00, 42.6kB/s]"
}
},
"6c648d7fca464c0a994f7c40ca1c26f8": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"a4ace0e1bf5d4e89a9b461abb21cfbb7": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"32922d38f2ec4e95a77021419690b7dd": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"d07a8cebdfd344b9a59503090ac106c9": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"998732048ae64fd28b1eba6705923077": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"d6755a7286f64a13929877067872be31": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"cede7502cc884d2ab09880a0ba528ba9": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"ac93cc27897046419a2d3c03edcc13f6": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_dec42a9a3dfb4bc39986b7211636c800",
"IPY_MODEL_55aab1e665bf4305bb179b93b3f6d41f",
"IPY_MODEL_304460330c7144a8bf95c5f6914eefe0"
],
"layout": "IPY_MODEL_aba541ecb4084eea86d2b324a5c612a0"
}
},
"dec42a9a3dfb4bc39986b7211636c800": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_4d323d2f502c4c8390a727a7372d094d",
"placeholder": "​",
"style": "IPY_MODEL_75a76ad0d7cd4e66bf0ca928bbdef27d",
"value": "Downloading tf_model.h5: 100%"
}
},
"55aab1e665bf4305bb179b93b3f6d41f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_44eb535189a141d288af6382c64c5ac4",
"max": 536063208,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_2faa75ede49f48f8a6c55424145f310e",
"value": 536063208
}
},
"304460330c7144a8bf95c5f6914eefe0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1944ec456ab649ae93ec7503b88760f4",
"placeholder": "​",
"style": "IPY_MODEL_d990e391d75f4b17a8772c170c38dab7",
"value": " 536M/536M [00:08&lt;00:00, 54.9MB/s]"
}
},
"aba541ecb4084eea86d2b324a5c612a0": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"4d323d2f502c4c8390a727a7372d094d": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"75a76ad0d7cd4e66bf0ca928bbdef27d": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"44eb535189a141d288af6382c64c5ac4": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"2faa75ede49f48f8a6c55424145f310e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"1944ec456ab649ae93ec7503b88760f4": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"d990e391d75f4b17a8772c170c38dab7": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
}
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment