Skip to content

Instantly share code, notes, and snippets.

@patternproject
Created March 14, 2020 20:09
Show Gist options
  • Save patternproject/03ad6163ca015913b485ae83605240dc to your computer and use it in GitHub Desktop.
Save patternproject/03ad6163ca015913b485ae83605240dc to your computer and use it in GitHub Desktop.
Beta.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Beta.ipynb",
"provenance": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/patternproject/03ad6163ca015913b485ae83605240dc/alpha.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QazQ6ZyQR03W",
"colab_type": "text"
},
"source": [
"Manning LP \n",
"\"Classifying Customer Feedback with Imbalanced Text Data\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TnLbS4_ubvC0",
"colab_type": "text"
},
"source": [
"\n",
"### 1.Import Libraries"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GzZHmO34jaXy",
"colab_type": "text"
},
"source": [
"Text classification with TensorFlow Hub: Movie reviews\n",
"SRC: https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/text_classification_with_hub.ipynb#scrollTo=ItXfxkxvosLH"
]
},
{
"cell_type": "code",
"metadata": {
"id": "IEhr9gjObxBN",
"colab_type": "code",
"colab": {}
},
"source": [
"from __future__ import absolute_import, division, print_function\n",
"import os\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import re"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "fEzDs82C_dHu",
"colab_type": "code",
"outputId": "3aaf1098-afea-4b03-b3c1-535d6caae4db",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 552
}
},
"source": [
"pip install tensorflow-gpu==2.0.0-rc0"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting tensorflow-gpu==2.0.0-rc0\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/6a/12/8c64cc62149cc21c70c55018502831bbf4d42bd62bed196df7de6830d21b/tensorflow_gpu-2.0.0rc0-cp36-cp36m-manylinux2010_x86_64.whl (380.5MB)\n",
"\u001b[K |████████████████████████████████| 380.5MB 42kB/s \n",
"\u001b[?25hRequirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.12.0)\n",
"Collecting tf-estimator-nightly<1.14.0.dev2019080602,>=1.14.0.dev2019080601\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/21/28/f2a27a62943d5f041e4a6fd404b2d21cb7c59b2242a4e73b03d9ba166552/tf_estimator_nightly-1.14.0.dev2019080601-py2.py3-none-any.whl (501kB)\n",
"\u001b[K |████████████████████████████████| 501kB 44.7MB/s \n",
"\u001b[?25hRequirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.1.0)\n",
"Requirement already satisfied: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (0.2.2)\n",
"Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (0.1.8)\n",
"Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (3.1.0)\n",
"Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.11.2)\n",
"Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.27.1)\n",
"Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (0.34.2)\n",
"Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (0.8.1)\n",
"Collecting tb-nightly<1.15.0a20190807,>=1.15.0a20190806\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/bc/88/24b5fb7280e74c7cf65bde47c171547fd02afb3840cff41bcbe9270650f5/tb_nightly-1.15.0a20190806-py3-none-any.whl (4.3MB)\n",
"\u001b[K |████████████████████████████████| 4.3MB 35.9MB/s \n",
"\u001b[?25hRequirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.1.0)\n",
"Requirement already satisfied: keras-applications>=1.0.8 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.0.8)\n",
"Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (0.9.0)\n",
"Requirement already satisfied: numpy<2.0,>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (1.17.5)\n",
"Requirement already satisfied: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu==2.0.0-rc0) (3.10.0)\n",
"Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tb-nightly<1.15.0a20190807,>=1.15.0a20190806->tensorflow-gpu==2.0.0-rc0) (45.2.0)\n",
"Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tb-nightly<1.15.0a20190807,>=1.15.0a20190806->tensorflow-gpu==2.0.0-rc0) (1.0.0)\n",
"Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tb-nightly<1.15.0a20190807,>=1.15.0a20190806->tensorflow-gpu==2.0.0-rc0) (3.2.1)\n",
"Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.8->tensorflow-gpu==2.0.0-rc0) (2.8.0)\n",
"Installing collected packages: tf-estimator-nightly, tb-nightly, tensorflow-gpu\n",
"Successfully installed tb-nightly-1.15.0a20190806 tensorflow-gpu-2.0.0rc0 tf-estimator-nightly-1.14.0.dev2019080601\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1ZdS5rXK_hi6",
"colab_type": "text"
},
"source": [
"Check the Tensorflow version installed."
]
},
{
"cell_type": "code",
"metadata": {
"id": "DQzffr3F_j4P",
"colab_type": "code",
"outputId": "7e34532b-34a7-40e4-b221-86c18592ad40",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"import tensorflow as tf\n",
"print(tf.__version__)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"2.0.0-rc0\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Wh5bzmvFeQiy",
"colab_type": "text"
},
"source": [
"### 2.Load Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b7wst2p-_zA1",
"colab_type": "text"
},
"source": [
"Load the IMDB review data as numpy array. The dataset is nicely split into training and test, and then into data (`x`) and label (`y`)."
]
},
{
"cell_type": "code",
"metadata": {
"id": "QEyfIT25_5tm",
"colab_type": "code",
"outputId": "ddb1a0ca-c6b2-4495-96f3-87d3ea54e036",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
}
},
"source": [
"(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(\n",
" path='imdb.npz',\n",
" num_words=None,\n",
" skip_top=0,\n",
" maxlen=None,\n",
" seed=113,\n",
" start_char=1,\n",
" oov_char=2,\n",
" index_from=3\n",
")"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz\n",
"17465344/17464789 [==============================] - 0s 0us/step\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vBKlOT_-pQcD",
"colab_type": "text"
},
"source": [
"### 3.Explore Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i3b08buIATM5",
"colab_type": "text"
},
"source": [
"Examine the data type with type command."
]
},
{
"cell_type": "code",
"metadata": {
"id": "RjOG2xVfsjIO",
"colab_type": "code",
"outputId": "1e49f12e-98ab-4360-9bda-f4b7c1047b9f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"type(x_train)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"numpy.ndarray"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A0sVgExDAdTe",
"colab_type": "text"
},
"source": [
"Examine data structure with numpy's shape command"
]
},
{
"cell_type": "code",
"metadata": {
"id": "7zPPILlYAekD",
"colab_type": "code",
"outputId": "27be9a31-2f37-4ff9-9fe9-00fc99d3888d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"x_train.shape"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(25000,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ctjvEtgeAiwY",
"colab_type": "text"
},
"source": [
"Let us take a look at the content."
]
},
{
"cell_type": "code",
"metadata": {
"id": "aubigeBqAmwG",
"colab_type": "code",
"outputId": "2524e653-239b-409f-e848-5a1687707913",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 178
}
},
"source": [
"x_train"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" ...,\n",
" list([1, 11, 6, 230, 245, 6401, 9, 6, 1225, 446, 86527, 45, 2174, 84, 8322, 4007, 21, 4, 912, 84, 14532, 325, 725, 134, 15271, 1715, 84, 5, 36, 28, 57, 1099, 21, 8, 140, 8, 703, 5, 11656, 84, 56, 18, 1644, 14, 9, 31, 7, 4, 9406, 1209, 2295, 26094, 1008, 18, 6, 20, 207, 110, 563, 12, 8, 2901, 17793, 8, 97, 6, 20, 53, 4767, 74, 4, 460, 364, 1273, 29, 270, 11, 960, 108, 45, 40, 29, 2961, 395, 11, 6, 4065, 500, 7, 14492, 89, 364, 70, 29, 140, 4, 64, 4780, 11, 4, 2678, 26, 178, 4, 529, 443, 17793, 5, 27, 710, 117, 74936, 8123, 165, 47, 84, 37, 131, 818, 14, 595, 10, 10, 61, 1242, 1209, 10, 10, 288, 2260, 1702, 34, 2901, 17793, 4, 65, 496, 4, 231, 7, 790, 5, 6, 320, 234, 2766, 234, 1119, 1574, 7, 496, 4, 139, 929, 2901, 17793, 7750, 5, 4241, 18, 4, 8497, 13164, 250, 11, 1818, 7561, 4, 4217, 5408, 747, 1115, 372, 1890, 1006, 541, 9303, 7, 4, 59, 11027, 4, 3586, 22459]),\n",
" list([1, 1446, 7079, 69, 72, 3305, 13, 610, 930, 8, 12, 582, 23, 5, 16, 484, 685, 54, 349, 11, 4120, 2959, 45, 58, 1466, 13, 197, 12, 16, 43, 23, 21469, 5, 62, 30, 145, 402, 11, 4131, 51, 575, 32, 61, 369, 71, 66, 770, 12, 1054, 75, 100, 2198, 8, 4, 105, 37, 69, 147, 712, 75, 3543, 44, 257, 390, 5, 69, 263, 514, 105, 50, 286, 1814, 23, 4, 123, 13, 161, 40, 5, 421, 4, 116, 16, 897, 13, 40691, 40, 319, 5872, 112, 6700, 11, 4803, 121, 25, 70, 3468, 4, 719, 3798, 13, 18, 31, 62, 40, 8, 7200, 4, 29455, 7, 14, 123, 5, 942, 25, 8, 721, 12, 145, 5, 202, 12, 160, 580, 202, 12, 6, 52, 58, 11418, 92, 401, 728, 12, 39, 14, 251, 8, 15, 251, 5, 21213, 12, 38, 84, 80, 124, 12, 9, 23]),\n",
" list([1, 17, 6, 194, 337, 7, 4, 204, 22, 45, 254, 8, 106, 14, 123, 4, 12815, 270, 14437, 5, 16923, 12255, 732, 2098, 101, 405, 39, 14, 1034, 4, 1310, 9, 115, 50, 305, 12, 47, 4, 168, 5, 235, 7, 38, 111, 699, 102, 7, 4, 4039, 9245, 9, 24, 6, 78, 1099, 17, 2345, 16553, 21, 27, 9685, 6139, 5, 29043, 1603, 92, 1183, 4, 1310, 7, 4, 204, 42, 97, 90, 35, 221, 109, 29, 127, 27, 118, 8, 97, 12, 157, 21, 6789, 85010, 9, 6, 66, 78, 1099, 4, 631, 1191, 5, 2642, 272, 191, 1070, 6, 7585, 8, 2197, 70907, 10755, 544, 5, 383, 1271, 848, 1468, 12183, 497, 16876, 8, 1597, 8778, 19280, 21, 60, 27, 239, 9, 43, 8368, 209, 405, 10, 10, 12, 764, 40, 4, 248, 20, 12, 16, 5, 174, 1791, 72, 7, 51, 6, 1739, 22, 4, 204, 131, 9])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xfMEeyleiDTc",
"colab_type": "text"
},
"source": [
"### 4.Munge Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "p4WtRR-GAvrB",
"colab_type": "text"
},
"source": [
"It appears each element is a in the numpy array is a list of integers. This suggests that each integer encodes a word, which requires a dictionary in order to map it back to actual word."
]
},
{
"cell_type": "code",
"metadata": {
"id": "28eZ3r1ZAuqQ",
"colab_type": "code",
"outputId": "9e7f3aa1-6f2d-4e71-8b2e-ebdfc1bf7ab8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"x_test.shape"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(25000,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lYWdySV2A9b3",
"colab_type": "text"
},
"source": [
"Let us load the word index provided by the dataset."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ja-ioFlEA-Ck",
"colab_type": "code",
"outputId": "1b4a7fc8-c816-424f-dc17-e88804a8564c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
}
},
"source": [
"word_index = tf.keras.datasets.imdb.get_word_index(path='imdb_word_index.json')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json\n",
"1646592/1641221 [==============================] - 0s 0us/step\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VGIlNfb_BCE4",
"colab_type": "text"
},
"source": [
"The word index is a type of data structure known as dictionary, which is a key-value pair. Later we will use this as a basis to map integers back to words"
]
},
{
"cell_type": "code",
"metadata": {
"id": "p8gXp9iQBAe7",
"colab_type": "code",
"outputId": "486fe1ab-cb6f-4d69-af46-9a8545e335ee",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"type(word_index)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"dict"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xD5F6zZsBD_k",
"colab_type": "code",
"outputId": "067481a0-da2b-4dfe-8bcc-7ff607bd8497",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"word_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'fawn': 34701,\n",
" 'tsukino': 52006,\n",
" 'nunnery': 52007,\n",
" 'sonja': 16816,\n",
" 'vani': 63951,\n",
" 'woods': 1408,\n",
" 'spiders': 16115,\n",
" 'hanging': 2345,\n",
" 'woody': 2289,\n",
" 'trawling': 52008,\n",
" \"hold's\": 52009,\n",
" 'comically': 11307,\n",
" 'localized': 40830,\n",
" 'disobeying': 30568,\n",
" \"'royale\": 52010,\n",
" \"harpo's\": 40831,\n",
" 'canet': 52011,\n",
" 'aileen': 19313,\n",
" 'acurately': 52012,\n",
" \"diplomat's\": 52013,\n",
" 'rickman': 25242,\n",
" 'arranged': 6746,\n",
" 'rumbustious': 52014,\n",
" 'familiarness': 52015,\n",
" \"spider'\": 52016,\n",
" 'hahahah': 68804,\n",
" \"wood'\": 52017,\n",
" 'transvestism': 40833,\n",
" \"hangin'\": 34702,\n",
" 'bringing': 2338,\n",
" 'seamier': 40834,\n",
" 'wooded': 34703,\n",
" 'bravora': 52018,\n",
" 'grueling': 16817,\n",
" 'wooden': 1636,\n",
" 'wednesday': 16818,\n",
" \"'prix\": 52019,\n",
" 'altagracia': 34704,\n",
" 'circuitry': 52020,\n",
" 'crotch': 11585,\n",
" 'busybody': 57766,\n",
" \"tart'n'tangy\": 52021,\n",
" 'burgade': 14129,\n",
" 'thrace': 52023,\n",
" \"tom's\": 11038,\n",
" 'snuggles': 52025,\n",
" 'francesco': 29114,\n",
" 'complainers': 52027,\n",
" 'templarios': 52125,\n",
" '272': 40835,\n",
" '273': 52028,\n",
" 'zaniacs': 52130,\n",
" '275': 34706,\n",
" 'consenting': 27631,\n",
" 'snuggled': 40836,\n",
" 'inanimate': 15492,\n",
" 'uality': 52030,\n",
" 'bronte': 11926,\n",
" 'errors': 4010,\n",
" 'dialogs': 3230,\n",
" \"yomada's\": 52031,\n",
" \"madman's\": 34707,\n",
" 'dialoge': 30585,\n",
" 'usenet': 52033,\n",
" 'videodrome': 40837,\n",
" \"kid'\": 26338,\n",
" 'pawed': 52034,\n",
" \"'girlfriend'\": 30569,\n",
" \"'pleasure\": 52035,\n",
" \"'reloaded'\": 52036,\n",
" \"kazakos'\": 40839,\n",
" 'rocque': 52037,\n",
" 'mailings': 52038,\n",
" 'brainwashed': 11927,\n",
" 'mcanally': 16819,\n",
" \"tom''\": 52039,\n",
" 'kurupt': 25243,\n",
" 'affiliated': 21905,\n",
" 'babaganoosh': 52040,\n",
" \"noe's\": 40840,\n",
" 'quart': 40841,\n",
" 'kids': 359,\n",
" 'uplifting': 5034,\n",
" 'controversy': 7093,\n",
" 'kida': 21906,\n",
" 'kidd': 23379,\n",
" \"error'\": 52041,\n",
" 'neurologist': 52042,\n",
" 'spotty': 18510,\n",
" 'cobblers': 30570,\n",
" 'projection': 9878,\n",
" 'fastforwarding': 40842,\n",
" 'sters': 52043,\n",
" \"eggar's\": 52044,\n",
" 'etherything': 52045,\n",
" 'gateshead': 40843,\n",
" 'airball': 34708,\n",
" 'unsinkable': 25244,\n",
" 'stern': 7180,\n",
" \"cervi's\": 52046,\n",
" 'dnd': 40844,\n",
" 'dna': 11586,\n",
" 'insecurity': 20598,\n",
" \"'reboot'\": 52047,\n",
" 'trelkovsky': 11037,\n",
" 'jaekel': 52048,\n",
" 'sidebars': 52049,\n",
" \"sforza's\": 52050,\n",
" 'distortions': 17633,\n",
" 'mutinies': 52051,\n",
" 'sermons': 30602,\n",
" '7ft': 40846,\n",
" 'boobage': 52052,\n",
" \"o'bannon's\": 52053,\n",
" 'populations': 23380,\n",
" 'chulak': 52054,\n",
" 'mesmerize': 27633,\n",
" 'quinnell': 52055,\n",
" 'yahoo': 10307,\n",
" 'meteorologist': 52057,\n",
" 'beswick': 42577,\n",
" 'boorman': 15493,\n",
" 'voicework': 40847,\n",
" \"ster'\": 52058,\n",
" 'blustering': 22922,\n",
" 'hj': 52059,\n",
" 'intake': 27634,\n",
" 'morally': 5621,\n",
" 'jumbling': 40849,\n",
" 'bowersock': 52060,\n",
" \"'porky's'\": 52061,\n",
" 'gershon': 16821,\n",
" 'ludicrosity': 40850,\n",
" 'coprophilia': 52062,\n",
" 'expressively': 40851,\n",
" \"india's\": 19500,\n",
" \"post's\": 34710,\n",
" 'wana': 52063,\n",
" 'wang': 5283,\n",
" 'wand': 30571,\n",
" 'wane': 25245,\n",
" 'edgeways': 52321,\n",
" 'titanium': 34711,\n",
" 'pinta': 40852,\n",
" 'want': 178,\n",
" 'pinto': 30572,\n",
" 'whoopdedoodles': 52065,\n",
" 'tchaikovsky': 21908,\n",
" 'travel': 2103,\n",
" \"'victory'\": 52066,\n",
" 'copious': 11928,\n",
" 'gouge': 22433,\n",
" \"chapters'\": 52067,\n",
" 'barbra': 6702,\n",
" 'uselessness': 30573,\n",
" \"wan'\": 52068,\n",
" 'assimilated': 27635,\n",
" 'petiot': 16116,\n",
" 'most\\x85and': 52069,\n",
" 'dinosaurs': 3930,\n",
" 'wrong': 352,\n",
" 'seda': 52070,\n",
" 'stollen': 52071,\n",
" 'sentencing': 34712,\n",
" 'ouroboros': 40853,\n",
" 'assimilates': 40854,\n",
" 'colorfully': 40855,\n",
" 'glenne': 27636,\n",
" 'dongen': 52072,\n",
" 'subplots': 4760,\n",
" 'kiloton': 52073,\n",
" 'chandon': 23381,\n",
" \"effect'\": 34713,\n",
" 'snugly': 27637,\n",
" 'kuei': 40856,\n",
" 'welcomed': 9092,\n",
" 'dishonor': 30071,\n",
" 'concurrence': 52075,\n",
" 'stoicism': 23382,\n",
" \"guys'\": 14896,\n",
" \"beroemd'\": 52077,\n",
" 'butcher': 6703,\n",
" \"melfi's\": 40857,\n",
" 'aargh': 30623,\n",
" 'playhouse': 20599,\n",
" 'wickedly': 11308,\n",
" 'fit': 1180,\n",
" 'labratory': 52078,\n",
" 'lifeline': 40859,\n",
" 'screaming': 1927,\n",
" 'fix': 4287,\n",
" 'cineliterate': 52079,\n",
" 'fic': 52080,\n",
" 'fia': 52081,\n",
" 'fig': 34714,\n",
" 'fmvs': 52082,\n",
" 'fie': 52083,\n",
" 'reentered': 52084,\n",
" 'fin': 30574,\n",
" 'doctresses': 52085,\n",
" 'fil': 52086,\n",
" 'zucker': 12606,\n",
" 'ached': 31931,\n",
" 'counsil': 52088,\n",
" 'paterfamilias': 52089,\n",
" 'songwriter': 13885,\n",
" 'shivam': 34715,\n",
" 'hurting': 9654,\n",
" 'effects': 299,\n",
" 'slauther': 52090,\n",
" \"'flame'\": 52091,\n",
" 'sommerset': 52092,\n",
" 'interwhined': 52093,\n",
" 'whacking': 27638,\n",
" 'bartok': 52094,\n",
" 'barton': 8775,\n",
" 'frewer': 21909,\n",
" \"fi'\": 52095,\n",
" 'ingrid': 6192,\n",
" 'stribor': 30575,\n",
" 'approporiately': 52096,\n",
" 'wobblyhand': 52097,\n",
" 'tantalisingly': 52098,\n",
" 'ankylosaurus': 52099,\n",
" 'parasites': 17634,\n",
" 'childen': 52100,\n",
" \"jenkins'\": 52101,\n",
" 'metafiction': 52102,\n",
" 'golem': 17635,\n",
" 'indiscretion': 40860,\n",
" \"reeves'\": 23383,\n",
" \"inamorata's\": 57781,\n",
" 'brittannica': 52104,\n",
" 'adapt': 7916,\n",
" \"russo's\": 30576,\n",
" 'guitarists': 48246,\n",
" 'abbott': 10553,\n",
" 'abbots': 40861,\n",
" 'lanisha': 17649,\n",
" 'magickal': 40863,\n",
" 'mattter': 52105,\n",
" \"'willy\": 52106,\n",
" 'pumpkins': 34716,\n",
" 'stuntpeople': 52107,\n",
" 'estimate': 30577,\n",
" 'ugghhh': 40864,\n",
" 'gameplay': 11309,\n",
" \"wern't\": 52108,\n",
" \"n'sync\": 40865,\n",
" 'sickeningly': 16117,\n",
" 'chiara': 40866,\n",
" 'disturbed': 4011,\n",
" 'portmanteau': 40867,\n",
" 'ineffectively': 52109,\n",
" \"duchonvey's\": 82143,\n",
" \"nasty'\": 37519,\n",
" 'purpose': 1285,\n",
" 'lazers': 52112,\n",
" 'lightened': 28105,\n",
" 'kaliganj': 52113,\n",
" 'popularism': 52114,\n",
" \"damme's\": 18511,\n",
" 'stylistics': 30578,\n",
" 'mindgaming': 52115,\n",
" 'spoilerish': 46449,\n",
" \"'corny'\": 52117,\n",
" 'boerner': 34718,\n",
" 'olds': 6792,\n",
" 'bakelite': 52118,\n",
" 'renovated': 27639,\n",
" 'forrester': 27640,\n",
" \"lumiere's\": 52119,\n",
" 'gaskets': 52024,\n",
" 'needed': 884,\n",
" 'smight': 34719,\n",
" 'master': 1297,\n",
" \"edie's\": 25905,\n",
" 'seeber': 40868,\n",
" 'hiya': 52120,\n",
" 'fuzziness': 52121,\n",
" 'genesis': 14897,\n",
" 'rewards': 12607,\n",
" 'enthrall': 30579,\n",
" \"'about\": 40869,\n",
" \"recollection's\": 52122,\n",
" 'mutilated': 11039,\n",
" 'fatherlands': 52123,\n",
" \"fischer's\": 52124,\n",
" 'positively': 5399,\n",
" '270': 34705,\n",
" 'ahmed': 34720,\n",
" 'zatoichi': 9836,\n",
" 'bannister': 13886,\n",
" 'anniversaries': 52127,\n",
" \"helm's\": 30580,\n",
" \"'work'\": 52128,\n",
" 'exclaimed': 34721,\n",
" \"'unfunny'\": 52129,\n",
" '274': 52029,\n",
" 'feeling': 544,\n",
" \"wanda's\": 52131,\n",
" 'dolan': 33266,\n",
" '278': 52133,\n",
" 'peacoat': 52134,\n",
" 'brawny': 40870,\n",
" 'mishra': 40871,\n",
" 'worlders': 40872,\n",
" 'protags': 52135,\n",
" 'skullcap': 52136,\n",
" 'dastagir': 57596,\n",
" 'affairs': 5622,\n",
" 'wholesome': 7799,\n",
" 'hymen': 52137,\n",
" 'paramedics': 25246,\n",
" 'unpersons': 52138,\n",
" 'heavyarms': 52139,\n",
" 'affaire': 52140,\n",
" 'coulisses': 52141,\n",
" 'hymer': 40873,\n",
" 'kremlin': 52142,\n",
" 'shipments': 30581,\n",
" 'pixilated': 52143,\n",
" \"'00s\": 30582,\n",
" 'diminishing': 18512,\n",
" 'cinematic': 1357,\n",
" 'resonates': 14898,\n",
" 'simplify': 40874,\n",
" \"nature'\": 40875,\n",
" 'temptresses': 40876,\n",
" 'reverence': 16822,\n",
" 'resonated': 19502,\n",
" 'dailey': 34722,\n",
" '2\\x85': 52144,\n",
" 'treize': 27641,\n",
" 'majo': 52145,\n",
" 'kiya': 21910,\n",
" 'woolnough': 52146,\n",
" 'thanatos': 39797,\n",
" 'sandoval': 35731,\n",
" 'dorama': 40879,\n",
" \"o'shaughnessy\": 52147,\n",
" 'tech': 4988,\n",
" 'fugitives': 32018,\n",
" 'teck': 30583,\n",
" \"'e'\": 76125,\n",
" 'doesn’t': 40881,\n",
" 'purged': 52149,\n",
" 'saying': 657,\n",
" \"martians'\": 41095,\n",
" 'norliss': 23418,\n",
" 'dickey': 27642,\n",
" 'dicker': 52152,\n",
" \"'sependipity\": 52153,\n",
" 'padded': 8422,\n",
" 'ordell': 57792,\n",
" \"sturges'\": 40882,\n",
" 'independentcritics': 52154,\n",
" 'tempted': 5745,\n",
" \"atkinson's\": 34724,\n",
" 'hounded': 25247,\n",
" 'apace': 52155,\n",
" 'clicked': 15494,\n",
" \"'humor'\": 30584,\n",
" \"martino's\": 17177,\n",
" \"'supporting\": 52156,\n",
" 'warmongering': 52032,\n",
" \"zemeckis's\": 34725,\n",
" 'lube': 21911,\n",
" 'shocky': 52157,\n",
" 'plate': 7476,\n",
" 'plata': 40883,\n",
" 'sturgess': 40884,\n",
" \"nerds'\": 40885,\n",
" 'plato': 20600,\n",
" 'plath': 34726,\n",
" 'platt': 40886,\n",
" 'mcnab': 52159,\n",
" 'clumsiness': 27643,\n",
" 'altogether': 3899,\n",
" 'massacring': 42584,\n",
" 'bicenntinial': 52160,\n",
" 'skaal': 40887,\n",
" 'droning': 14360,\n",
" 'lds': 8776,\n",
" 'jaguar': 21912,\n",
" \"cale's\": 34727,\n",
" 'nicely': 1777,\n",
" 'mummy': 4588,\n",
" \"lot's\": 18513,\n",
" 'patch': 10086,\n",
" 'kerkhof': 50202,\n",
" \"leader's\": 52161,\n",
" \"'movie\": 27644,\n",
" 'uncomfirmed': 52162,\n",
" 'heirloom': 40888,\n",
" 'wrangle': 47360,\n",
" 'emotion\\x85': 52163,\n",
" \"'stargate'\": 52164,\n",
" 'pinoy': 40889,\n",
" 'conchatta': 40890,\n",
" 'broeke': 41128,\n",
" 'advisedly': 40891,\n",
" \"barker's\": 17636,\n",
" 'descours': 52166,\n",
" 'lots': 772,\n",
" 'lotr': 9259,\n",
" 'irs': 9879,\n",
" 'lott': 52167,\n",
" 'xvi': 40892,\n",
" 'irk': 34728,\n",
" 'irl': 52168,\n",
" 'ira': 6887,\n",
" 'belzer': 21913,\n",
" 'irc': 52169,\n",
" 'ire': 27645,\n",
" 'requisites': 40893,\n",
" 'discipline': 7693,\n",
" 'lyoko': 52961,\n",
" 'extend': 11310,\n",
" 'nature': 873,\n",
" \"'dickie'\": 52170,\n",
" 'optimist': 40894,\n",
" 'lapping': 30586,\n",
" 'superficial': 3900,\n",
" 'vestment': 52171,\n",
" 'extent': 2823,\n",
" 'tendons': 52172,\n",
" \"heller's\": 52173,\n",
" 'quagmires': 52174,\n",
" 'miyako': 52175,\n",
" 'moocow': 20601,\n",
" \"coles'\": 52176,\n",
" 'lookit': 40895,\n",
" 'ravenously': 52177,\n",
" 'levitating': 40896,\n",
" 'perfunctorily': 52178,\n",
" 'lookin': 30587,\n",
" \"lot'\": 40898,\n",
" 'lookie': 52179,\n",
" 'fearlessly': 34870,\n",
" 'libyan': 52181,\n",
" 'fondles': 40899,\n",
" 'gopher': 35714,\n",
" 'wearying': 40901,\n",
" \"nz's\": 52182,\n",
" 'minuses': 27646,\n",
" 'puposelessly': 52183,\n",
" 'shandling': 52184,\n",
" 'decapitates': 31268,\n",
" 'humming': 11929,\n",
" \"'nother\": 40902,\n",
" 'smackdown': 21914,\n",
" 'underdone': 30588,\n",
" 'frf': 40903,\n",
" 'triviality': 52185,\n",
" 'fro': 25248,\n",
" 'bothers': 8777,\n",
" \"'kensington\": 52186,\n",
" 'much': 73,\n",
" 'muco': 34730,\n",
" 'wiseguy': 22615,\n",
" \"richie's\": 27648,\n",
" 'tonino': 40904,\n",
" 'unleavened': 52187,\n",
" 'fry': 11587,\n",
" \"'tv'\": 40905,\n",
" 'toning': 40906,\n",
" 'obese': 14361,\n",
" 'sensationalized': 30589,\n",
" 'spiv': 40907,\n",
" 'spit': 6259,\n",
" 'arkin': 7364,\n",
" 'charleton': 21915,\n",
" 'jeon': 16823,\n",
" 'boardroom': 21916,\n",
" 'doubts': 4989,\n",
" 'spin': 3084,\n",
" 'hepo': 53083,\n",
" 'wildcat': 27649,\n",
" 'venoms': 10584,\n",
" 'misconstrues': 52191,\n",
" 'mesmerising': 18514,\n",
" 'misconstrued': 40908,\n",
" 'rescinds': 52192,\n",
" 'prostrate': 52193,\n",
" 'majid': 40909,\n",
" 'climbed': 16479,\n",
" 'canoeing': 34731,\n",
" 'majin': 52195,\n",
" 'animie': 57804,\n",
" 'sylke': 40910,\n",
" 'conditioned': 14899,\n",
" 'waddell': 40911,\n",
" '3\\x85': 52196,\n",
" 'hyperdrive': 41188,\n",
" 'conditioner': 34732,\n",
" 'bricklayer': 53153,\n",
" 'hong': 2576,\n",
" 'memoriam': 52198,\n",
" 'inventively': 30592,\n",
" \"levant's\": 25249,\n",
" 'portobello': 20638,\n",
" 'remand': 52200,\n",
" 'mummified': 19504,\n",
" 'honk': 27650,\n",
" 'spews': 19505,\n",
" 'visitations': 40912,\n",
" 'mummifies': 52201,\n",
" 'cavanaugh': 25250,\n",
" 'zeon': 23385,\n",
" \"jungle's\": 40913,\n",
" 'viertel': 34733,\n",
" 'frenchmen': 27651,\n",
" 'torpedoes': 52202,\n",
" 'schlessinger': 52203,\n",
" 'torpedoed': 34734,\n",
" 'blister': 69876,\n",
" 'cinefest': 52204,\n",
" 'furlough': 34735,\n",
" 'mainsequence': 52205,\n",
" 'mentors': 40914,\n",
" 'academic': 9094,\n",
" 'stillness': 20602,\n",
" 'academia': 40915,\n",
" 'lonelier': 52206,\n",
" 'nibby': 52207,\n",
" \"losers'\": 52208,\n",
" 'cineastes': 40916,\n",
" 'corporate': 4449,\n",
" 'massaging': 40917,\n",
" 'bellow': 30593,\n",
" 'absurdities': 19506,\n",
" 'expetations': 53241,\n",
" 'nyfiken': 40918,\n",
" 'mehras': 75638,\n",
" 'lasse': 52209,\n",
" 'visability': 52210,\n",
" 'militarily': 33946,\n",
" \"elder'\": 52211,\n",
" 'gainsbourg': 19023,\n",
" 'hah': 20603,\n",
" 'hai': 13420,\n",
" 'haj': 34736,\n",
" 'hak': 25251,\n",
" 'hal': 4311,\n",
" 'ham': 4892,\n",
" 'duffer': 53259,\n",
" 'haa': 52213,\n",
" 'had': 66,\n",
" 'advancement': 11930,\n",
" 'hag': 16825,\n",
" \"hand'\": 25252,\n",
" 'hay': 13421,\n",
" 'mcnamara': 20604,\n",
" \"mozart's\": 52214,\n",
" 'duffel': 30731,\n",
" 'haq': 30594,\n",
" 'har': 13887,\n",
" 'has': 44,\n",
" 'hat': 2401,\n",
" 'hav': 40919,\n",
" 'haw': 30595,\n",
" 'figtings': 52215,\n",
" 'elders': 15495,\n",
" 'underpanted': 52216,\n",
" 'pninson': 52217,\n",
" 'unequivocally': 27652,\n",
" \"barbara's\": 23673,\n",
" \"bello'\": 52219,\n",
" 'indicative': 12997,\n",
" 'yawnfest': 40920,\n",
" 'hexploitation': 52220,\n",
" \"loder's\": 52221,\n",
" 'sleuthing': 27653,\n",
" \"justin's\": 32622,\n",
" \"'ball\": 52222,\n",
" \"'summer\": 52223,\n",
" \"'demons'\": 34935,\n",
" \"mormon's\": 52225,\n",
" \"laughton's\": 34737,\n",
" 'debell': 52226,\n",
" 'shipyard': 39724,\n",
" 'unabashedly': 30597,\n",
" 'disks': 40401,\n",
" 'crowd': 2290,\n",
" 'crowe': 10087,\n",
" \"vancouver's\": 56434,\n",
" 'mosques': 34738,\n",
" 'crown': 6627,\n",
" 'culpas': 52227,\n",
" 'crows': 27654,\n",
" 'surrell': 53344,\n",
" 'flowless': 52229,\n",
" 'sheirk': 52230,\n",
" \"'three\": 40923,\n",
" \"peterson'\": 52231,\n",
" 'ooverall': 52232,\n",
" 'perchance': 40924,\n",
" 'bottom': 1321,\n",
" 'chabert': 53363,\n",
" 'sneha': 52233,\n",
" 'inhuman': 13888,\n",
" 'ichii': 52234,\n",
" 'ursla': 52235,\n",
" 'completly': 30598,\n",
" 'moviedom': 40925,\n",
" 'raddick': 52236,\n",
" 'brundage': 51995,\n",
" 'brigades': 40926,\n",
" 'starring': 1181,\n",
" \"'goal'\": 52237,\n",
" 'caskets': 52238,\n",
" 'willcock': 52239,\n",
" \"threesome's\": 52240,\n",
" \"mosque'\": 52241,\n",
" \"cover's\": 52242,\n",
" 'spaceships': 17637,\n",
" 'anomalous': 40927,\n",
" 'ptsd': 27655,\n",
" 'shirdan': 52243,\n",
" 'obscenity': 21962,\n",
" 'lemmings': 30599,\n",
" 'duccio': 30600,\n",
" \"levene's\": 52244,\n",
" \"'gorby'\": 52245,\n",
" \"teenager's\": 25255,\n",
" 'marshall': 5340,\n",
" 'honeymoon': 9095,\n",
" 'shoots': 3231,\n",
" 'despised': 12258,\n",
" 'okabasho': 52246,\n",
" 'fabric': 8289,\n",
" 'cannavale': 18515,\n",
" 'raped': 3537,\n",
" \"tutt's\": 52247,\n",
" 'grasping': 17638,\n",
" 'despises': 18516,\n",
" \"thief's\": 40928,\n",
" 'rapes': 8926,\n",
" 'raper': 52248,\n",
" \"eyre'\": 27656,\n",
" 'walchek': 52249,\n",
" \"elmo's\": 23386,\n",
" 'perfumes': 40929,\n",
" 'spurting': 21918,\n",
" \"exposition'\\x85\": 52250,\n",
" 'denoting': 52251,\n",
" 'thesaurus': 34740,\n",
" \"shoot'\": 40930,\n",
" 'bonejack': 49759,\n",
" 'simpsonian': 52253,\n",
" 'hebetude': 30601,\n",
" \"hallow's\": 34741,\n",
" 'desperation\\x85': 52254,\n",
" 'incinerator': 34742,\n",
" 'congratulations': 10308,\n",
" 'humbled': 52255,\n",
" \"else's\": 5924,\n",
" 'trelkovski': 40845,\n",
" \"rape'\": 52256,\n",
" \"'chapters'\": 59386,\n",
" '1600s': 52257,\n",
" 'martian': 7253,\n",
" 'nicest': 25256,\n",
" 'eyred': 52259,\n",
" 'passenger': 9457,\n",
" 'disgrace': 6041,\n",
" 'moderne': 52260,\n",
" 'barrymore': 5120,\n",
" 'yankovich': 52261,\n",
" 'moderns': 40931,\n",
" 'studliest': 52262,\n",
" 'bedsheet': 52263,\n",
" 'decapitation': 14900,\n",
" 'slurring': 52264,\n",
" \"'nunsploitation'\": 52265,\n",
" \"'character'\": 34743,\n",
" 'cambodia': 9880,\n",
" 'rebelious': 52266,\n",
" 'pasadena': 27657,\n",
" 'crowne': 40932,\n",
" \"'bedchamber\": 52267,\n",
" 'conjectural': 52268,\n",
" 'appologize': 52269,\n",
" 'halfassing': 52270,\n",
" 'paycheque': 57816,\n",
" 'palms': 20606,\n",
" \"'islands\": 52271,\n",
" 'hawked': 40933,\n",
" 'palme': 21919,\n",
" 'conservatively': 40934,\n",
" 'larp': 64007,\n",
" 'palma': 5558,\n",
" 'smelling': 21920,\n",
" 'aragorn': 12998,\n",
" 'hawker': 52272,\n",
" 'hawkes': 52273,\n",
" 'explosions': 3975,\n",
" 'loren': 8059,\n",
" \"pyle's\": 52274,\n",
" 'shootout': 6704,\n",
" \"mike's\": 18517,\n",
" \"driscoll's\": 52275,\n",
" 'cogsworth': 40935,\n",
" \"britian's\": 52276,\n",
" 'childs': 34744,\n",
" \"portrait's\": 52277,\n",
" 'chain': 3626,\n",
" 'whoever': 2497,\n",
" 'puttered': 52278,\n",
" 'childe': 52279,\n",
" 'maywether': 52280,\n",
" 'chair': 3036,\n",
" \"rance's\": 52281,\n",
" 'machu': 34745,\n",
" 'ballet': 4517,\n",
" 'grapples': 34746,\n",
" 'summerize': 76152,\n",
" 'freelance': 30603,\n",
" \"andrea's\": 52283,\n",
" '\\x91very': 52284,\n",
" 'coolidge': 45879,\n",
" 'mache': 18518,\n",
" 'balled': 52285,\n",
" 'grappled': 40937,\n",
" 'macha': 18519,\n",
" 'underlining': 21921,\n",
" 'macho': 5623,\n",
" 'oversight': 19507,\n",
" 'machi': 25257,\n",
" 'verbally': 11311,\n",
" 'tenacious': 21922,\n",
" 'windshields': 40938,\n",
" 'paychecks': 18557,\n",
" 'jerk': 3396,\n",
" \"good'\": 11931,\n",
" 'prancer': 34748,\n",
" 'prances': 21923,\n",
" 'olympus': 52286,\n",
" 'lark': 21924,\n",
" 'embark': 10785,\n",
" 'gloomy': 7365,\n",
" 'jehaan': 52287,\n",
" 'turaqui': 52288,\n",
" \"child'\": 20607,\n",
" 'locked': 2894,\n",
" 'pranced': 52289,\n",
" 'exact': 2588,\n",
" 'unattuned': 52290,\n",
" 'minute': 783,\n",
" 'skewed': 16118,\n",
" 'hodgins': 40940,\n",
" 'skewer': 34749,\n",
" 'think\\x85': 52291,\n",
" 'rosenstein': 38765,\n",
" 'helmit': 52292,\n",
" 'wrestlemanias': 34750,\n",
" 'hindered': 16826,\n",
" \"martha's\": 30604,\n",
" 'cheree': 52293,\n",
" \"pluckin'\": 52294,\n",
" 'ogles': 40941,\n",
" 'heavyweight': 11932,\n",
" 'aada': 82190,\n",
" 'chopping': 11312,\n",
" 'strongboy': 61534,\n",
" 'hegemonic': 41342,\n",
" 'adorns': 40942,\n",
" 'xxth': 41346,\n",
" 'nobuhiro': 34751,\n",
" 'capitães': 52298,\n",
" 'kavogianni': 52299,\n",
" 'antwerp': 13422,\n",
" 'celebrated': 6538,\n",
" 'roarke': 52300,\n",
" 'baggins': 40943,\n",
" 'cheeseburgers': 31270,\n",
" 'matras': 52301,\n",
" \"nineties'\": 52302,\n",
" \"'craig'\": 52303,\n",
" 'celebrates': 12999,\n",
" 'unintentionally': 3383,\n",
" 'drafted': 14362,\n",
" 'climby': 52304,\n",
" '303': 52305,\n",
" 'oldies': 18520,\n",
" 'climbs': 9096,\n",
" 'honour': 9655,\n",
" 'plucking': 34752,\n",
" '305': 30074,\n",
" 'address': 5514,\n",
" 'menjou': 40944,\n",
" \"'freak'\": 42592,\n",
" 'dwindling': 19508,\n",
" 'benson': 9458,\n",
" 'white’s': 52307,\n",
" 'shamelessness': 40945,\n",
" 'impacted': 21925,\n",
" 'upatz': 52308,\n",
" 'cusack': 3840,\n",
" \"flavia's\": 37567,\n",
" 'effette': 52309,\n",
" 'influx': 34753,\n",
" 'boooooooo': 52310,\n",
" 'dimitrova': 52311,\n",
" 'houseman': 13423,\n",
" 'bigas': 25259,\n",
" 'boylen': 52312,\n",
" 'phillipenes': 52313,\n",
" 'fakery': 40946,\n",
" \"grandpa's\": 27658,\n",
" 'darnell': 27659,\n",
" 'undergone': 19509,\n",
" 'handbags': 52315,\n",
" 'perished': 21926,\n",
" 'pooped': 37778,\n",
" 'vigour': 27660,\n",
" 'opposed': 3627,\n",
" 'etude': 52316,\n",
" \"caine's\": 11799,\n",
" 'doozers': 52317,\n",
" 'photojournals': 34754,\n",
" 'perishes': 52318,\n",
" 'constrains': 34755,\n",
" 'migenes': 40948,\n",
" 'consoled': 30605,\n",
" 'alastair': 16827,\n",
" 'wvs': 52319,\n",
" 'ooooooh': 52320,\n",
" 'approving': 34756,\n",
" 'consoles': 40949,\n",
" 'disparagement': 52064,\n",
" 'futureistic': 52322,\n",
" 'rebounding': 52323,\n",
" \"'date\": 52324,\n",
" 'gregoire': 52325,\n",
" 'rutherford': 21927,\n",
" 'americanised': 34757,\n",
" 'novikov': 82196,\n",
" 'following': 1042,\n",
" 'munroe': 34758,\n",
" \"morita'\": 52326,\n",
" 'christenssen': 52327,\n",
" 'oatmeal': 23106,\n",
" 'fossey': 25260,\n",
" 'livered': 40950,\n",
" 'listens': 13000,\n",
" \"'marci\": 76164,\n",
" \"otis's\": 52330,\n",
" 'thanking': 23387,\n",
" 'maude': 16019,\n",
" 'extensions': 34759,\n",
" 'ameteurish': 52332,\n",
" \"commender's\": 52333,\n",
" 'agricultural': 27661,\n",
" 'convincingly': 4518,\n",
" 'fueled': 17639,\n",
" 'mahattan': 54014,\n",
" \"paris's\": 40952,\n",
" 'vulkan': 52336,\n",
" 'stapes': 52337,\n",
" 'odysessy': 52338,\n",
" 'harmon': 12259,\n",
" 'surfing': 4252,\n",
" 'halloran': 23494,\n",
" 'unbelieveably': 49580,\n",
" \"'offed'\": 52339,\n",
" 'quadrant': 30607,\n",
" 'inhabiting': 19510,\n",
" 'nebbish': 34760,\n",
" 'forebears': 40953,\n",
" 'skirmish': 34761,\n",
" 'ocassionally': 52340,\n",
" \"'resist\": 52341,\n",
" 'impactful': 21928,\n",
" 'spicier': 52342,\n",
" 'touristy': 40954,\n",
" \"'football'\": 52343,\n",
" 'webpage': 40955,\n",
" 'exurbia': 52345,\n",
" 'jucier': 52346,\n",
" 'professors': 14901,\n",
" 'structuring': 34762,\n",
" 'jig': 30608,\n",
" 'overlord': 40956,\n",
" 'disconnect': 25261,\n",
" 'sniffle': 82201,\n",
" 'slimeball': 40957,\n",
" 'jia': 40958,\n",
" 'milked': 16828,\n",
" 'banjoes': 40959,\n",
" 'jim': 1237,\n",
" 'workforces': 52348,\n",
" 'jip': 52349,\n",
" 'rotweiller': 52350,\n",
" 'mundaneness': 34763,\n",
" \"'ninja'\": 52351,\n",
" \"dead'\": 11040,\n",
" \"cipriani's\": 40960,\n",
" 'modestly': 20608,\n",
" \"professor'\": 52352,\n",
" 'shacked': 40961,\n",
" 'bashful': 34764,\n",
" 'sorter': 23388,\n",
" 'overpowering': 16120,\n",
" 'workmanlike': 18521,\n",
" 'henpecked': 27662,\n",
" 'sorted': 18522,\n",
" \"jōb's\": 52354,\n",
" \"'always\": 52355,\n",
" \"'baptists\": 34765,\n",
" 'dreamcatchers': 52356,\n",
" \"'silence'\": 52357,\n",
" 'hickory': 21929,\n",
" 'fun\\x97yet': 52358,\n",
" 'breakumentary': 52359,\n",
" 'didn': 15496,\n",
" 'didi': 52360,\n",
" 'pealing': 52361,\n",
" 'dispite': 40962,\n",
" \"italy's\": 25262,\n",
" 'instability': 21930,\n",
" 'quarter': 6539,\n",
" 'quartet': 12608,\n",
" 'padmé': 52362,\n",
" \"'bleedmedry\": 52363,\n",
" 'pahalniuk': 52364,\n",
" 'honduras': 52365,\n",
" 'bursting': 10786,\n",
" \"pablo's\": 41465,\n",
" 'irremediably': 52367,\n",
" 'presages': 40963,\n",
" 'bowlegged': 57832,\n",
" 'dalip': 65183,\n",
" 'entering': 6260,\n",
" 'newsradio': 76172,\n",
" 'presaged': 54150,\n",
" \"giallo's\": 27663,\n",
" 'bouyant': 40964,\n",
" 'amerterish': 52368,\n",
" 'rajni': 18523,\n",
" 'leeves': 30610,\n",
" 'macauley': 34767,\n",
" 'seriously': 612,\n",
" 'sugercoma': 52369,\n",
" 'grimstead': 52370,\n",
" \"'fairy'\": 52371,\n",
" 'zenda': 30611,\n",
" \"'twins'\": 52372,\n",
" 'realisation': 17640,\n",
" 'highsmith': 27664,\n",
" 'raunchy': 7817,\n",
" 'incentives': 40965,\n",
" 'flatson': 52374,\n",
" 'snooker': 35097,\n",
" 'crazies': 16829,\n",
" 'crazier': 14902,\n",
" 'grandma': 7094,\n",
" 'napunsaktha': 52375,\n",
" 'workmanship': 30612,\n",
" 'reisner': 52376,\n",
" \"sanford's\": 61306,\n",
" '\\x91doña': 52377,\n",
" 'modest': 6108,\n",
" \"everything's\": 19153,\n",
" 'hamer': 40966,\n",
" \"couldn't'\": 52379,\n",
" 'quibble': 13001,\n",
" 'socking': 52380,\n",
" 'tingler': 21931,\n",
" 'gutman': 52381,\n",
" 'lachlan': 40967,\n",
" 'tableaus': 52382,\n",
" 'headbanger': 52383,\n",
" 'spoken': 2847,\n",
" 'cerebrally': 34768,\n",
" \"'road\": 23490,\n",
" 'tableaux': 21932,\n",
" \"proust's\": 40968,\n",
" 'periodical': 40969,\n",
" \"shoveller's\": 52385,\n",
" 'tamara': 25263,\n",
" 'affords': 17641,\n",
" 'concert': 3249,\n",
" \"yara's\": 87955,\n",
" 'someome': 52386,\n",
" 'lingering': 8424,\n",
" \"abraham's\": 41511,\n",
" 'beesley': 34769,\n",
" 'cherbourg': 34770,\n",
" 'kagan': 28624,\n",
" 'snatch': 9097,\n",
" \"miyazaki's\": 9260,\n",
" 'absorbs': 25264,\n",
" \"koltai's\": 40970,\n",
" 'tingled': 64027,\n",
" 'crossroads': 19511,\n",
" 'rehab': 16121,\n",
" 'falworth': 52389,\n",
" 'sequals': 52390,\n",
" ...}"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1Y_JHcUkBJKd",
"colab_type": "text"
},
"source": [
"We can also find out how many unique words this dictionary contains."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ofKbFQR9BGEl",
"colab_type": "code",
"outputId": "2e0d9e12-a1f6-47e4-c200-75ea6f85cf0c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"len(word_index)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"88584"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EeTNEtJwBMLt",
"colab_type": "text"
},
"source": [
"Now we know there are 88584 unique words (tokens) in the imdb dataset. Each word has a unique number associated with it. This structure is known as key-value pair. Therefore, there are 88584 key-value pairs, organized as a dictionary in Python datra structure. As examples, below are a few words (tokens) in this dictionary"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4DSf_0tBBKtH",
"colab_type": "code",
"outputId": "8fc28c9c-7b80-4f35-8028-45b1ab6776f2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 354
}
},
"source": [
"{k:v for (k,v) in word_index.items() if v < 20}"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'a': 3,\n",
" 'and': 2,\n",
" 'as': 14,\n",
" 'br': 7,\n",
" 'but': 18,\n",
" 'film': 19,\n",
" 'for': 15,\n",
" 'i': 10,\n",
" 'in': 8,\n",
" 'is': 6,\n",
" 'it': 9,\n",
" 'movie': 17,\n",
" 'of': 4,\n",
" 'that': 12,\n",
" 'the': 1,\n",
" 'this': 11,\n",
" 'to': 5,\n",
" 'was': 13,\n",
" 'with': 16}"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yUIom0-NBSif",
"colab_type": "text"
},
"source": [
"Now lets add a few special words for use later. This is a common practice for NLP problem, where it is important to bring a consistency to text strings by giving it a start, a token to handle words outside this dictionary, as well as padding to ensure all text data have same length. We bump original words by three positions, and appended the following new words to the dictionary. We also reverse the key-value relationship and created a new dictionary for reverse lookup. In addition, we created a function `decode_review` to convert data from integer into words."
]
},
{
"cell_type": "code",
"metadata": {
"id": "liqZzBcOBNz9",
"colab_type": "code",
"colab": {}
},
"source": [
"\"\"# The first indices are reserved\n",
"word_index = {k:(v+3) for k,v in word_index.items()} \n",
"word_index[\"<PAD>\"] = 0\n",
"word_index[\"<START>\"] = 1\n",
"word_index[\"<UNK>\"] = 2 # unknown\n",
"word_index[\"<UNUSED>\"] = 3\n",
"\n",
"reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n",
"\n",
"def decode_review(text):\n",
" return ' '.join([reverse_word_index.get(i, '?') for i in text])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "HyrZPrwgBWgF",
"colab_type": "code",
"colab": {}
},
"source": [
"example1 = decode_review(x_train[0])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Jkka0q2RBXsU",
"colab_type": "code",
"colab": {}
},
"source": [
"new_x_train=x_train.reshape(len(x_train), 1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "S-EDAP6NBYnV",
"colab_type": "code",
"outputId": "a6c8d7fc-4caa-457d-ae1e-6c1d00d80fc3",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"mylen = np.vectorize(len)\n",
"print(mylen(x_train))"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"[218 189 141 ... 184 150 153]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vGTzBi6LBfXD",
"colab_type": "text"
},
"source": [
"Now find the index for positive and negative reviews"
]
},
{
"cell_type": "code",
"metadata": {
"id": "EQMIKqblBZci",
"colab_type": "code",
"colab": {}
},
"source": [
"positive_index = np.where(y_train == 1) \n",
"negative_index = np.where(y_train == 0)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "8iibMRrWBhu0",
"colab_type": "code",
"colab": {}
},
"source": [
"positive_reviews = x_train[positive_index]\n",
"negative_reviews = x_train[negative_index]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "JKc4btE2Biuq",
"colab_type": "code",
"outputId": "dc962135-f1d9-488d-faff-ae19d8a49d01",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(positive_reviews)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(12500,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R057R0rAoRxX",
"colab_type": "text"
},
"source": [
"##5.Understanding Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6JVEvikRCkWF",
"colab_type": "text"
},
"source": [
"Select 50% of positive reviews and drop rest of them (setting the stage for later, where we oversample them)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w_g1xsr3EP8E",
"colab_type": "text"
},
"source": [
"Getting stock of what all exists\n",
"\n",
"* dir() will give you the list of in scope variables:\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "goPZ3OWBBjnT",
"colab_type": "code",
"outputId": "01fd164b-b06f-48e9-fdf3-72c674981caa",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"dir()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['In',\n",
" 'Out',\n",
" '_',\n",
" '_10',\n",
" '_11',\n",
" '_12',\n",
" '_13',\n",
" '_20',\n",
" '_5',\n",
" '_6',\n",
" '_7',\n",
" '_8',\n",
" '__',\n",
" '___',\n",
" '__builtin__',\n",
" '__builtins__',\n",
" '__doc__',\n",
" '__loader__',\n",
" '__name__',\n",
" '__package__',\n",
" '__spec__',\n",
" '_dh',\n",
" '_exit_code',\n",
" '_i',\n",
" '_i1',\n",
" '_i10',\n",
" '_i11',\n",
" '_i12',\n",
" '_i13',\n",
" '_i14',\n",
" '_i15',\n",
" '_i16',\n",
" '_i17',\n",
" '_i18',\n",
" '_i19',\n",
" '_i2',\n",
" '_i20',\n",
" '_i21',\n",
" '_i3',\n",
" '_i4',\n",
" '_i5',\n",
" '_i6',\n",
" '_i7',\n",
" '_i8',\n",
" '_i9',\n",
" '_ih',\n",
" '_ii',\n",
" '_iii',\n",
" '_oh',\n",
" '_sh',\n",
" 'absolute_import',\n",
" 'decode_review',\n",
" 'division',\n",
" 'example1',\n",
" 'exit',\n",
" 'get_ipython',\n",
" 'mylen',\n",
" 'negative_index',\n",
" 'negative_reviews',\n",
" 'new_x_train',\n",
" 'np',\n",
" 'os',\n",
" 'plt',\n",
" 'positive_index',\n",
" 'positive_reviews',\n",
" 'print_function',\n",
" 'quit',\n",
" 're',\n",
" 'reverse_word_index',\n",
" 'tf',\n",
" 'word_index',\n",
" 'x_test',\n",
" 'x_train',\n",
" 'y_test',\n",
" 'y_train']"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-gBnEf9bHFXR",
"colab_type": "text"
},
"source": [
"The relevant are\n",
"`negative_index, negative_reviews` and corresponding\n",
"`positive_index, positive_reviews`. Besides the standard\n",
"x_test, x_train, y_test and y_train\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uM_UkYsrFkfQ",
"colab_type": "code",
"outputId": "6631fc50-9237-4956-ac1e-892ca9ea19d0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"positive_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([ 0, 3, 6, ..., 24994, 24995, 24998]),)"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "fmDyaeykG6u6",
"colab_type": "code",
"outputId": "0ee9077b-e0a5-4406-9ca0-b42c4710251c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"x_test[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1,\n",
" 591,\n",
" 202,\n",
" 14,\n",
" 31,\n",
" 6,\n",
" 717,\n",
" 10,\n",
" 10,\n",
" 18142,\n",
" 10698,\n",
" 5,\n",
" 4,\n",
" 360,\n",
" 7,\n",
" 4,\n",
" 177,\n",
" 5760,\n",
" 394,\n",
" 354,\n",
" 4,\n",
" 123,\n",
" 9,\n",
" 1035,\n",
" 1035,\n",
" 1035,\n",
" 10,\n",
" 10,\n",
" 13,\n",
" 92,\n",
" 124,\n",
" 89,\n",
" 488,\n",
" 7944,\n",
" 100,\n",
" 28,\n",
" 1668,\n",
" 14,\n",
" 31,\n",
" 23,\n",
" 27,\n",
" 7479,\n",
" 29,\n",
" 220,\n",
" 468,\n",
" 8,\n",
" 124,\n",
" 14,\n",
" 286,\n",
" 170,\n",
" 8,\n",
" 157,\n",
" 46,\n",
" 5,\n",
" 27,\n",
" 239,\n",
" 16,\n",
" 179,\n",
" 15387,\n",
" 38,\n",
" 32,\n",
" 25,\n",
" 7944,\n",
" 451,\n",
" 202,\n",
" 14,\n",
" 6,\n",
" 717]"
]
},
"metadata": {
"tags": []
},
"execution_count": 23
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Z_Vg4CS1H0Bc",
"colab_type": "code",
"outputId": "18a0567b-256d-47fd-d69e-1adf612dcce3",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"x_train[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1,\n",
" 14,\n",
" 22,\n",
" 16,\n",
" 43,\n",
" 530,\n",
" 973,\n",
" 1622,\n",
" 1385,\n",
" 65,\n",
" 458,\n",
" 4468,\n",
" 66,\n",
" 3941,\n",
" 4,\n",
" 173,\n",
" 36,\n",
" 256,\n",
" 5,\n",
" 25,\n",
" 100,\n",
" 43,\n",
" 838,\n",
" 112,\n",
" 50,\n",
" 670,\n",
" 22665,\n",
" 9,\n",
" 35,\n",
" 480,\n",
" 284,\n",
" 5,\n",
" 150,\n",
" 4,\n",
" 172,\n",
" 112,\n",
" 167,\n",
" 21631,\n",
" 336,\n",
" 385,\n",
" 39,\n",
" 4,\n",
" 172,\n",
" 4536,\n",
" 1111,\n",
" 17,\n",
" 546,\n",
" 38,\n",
" 13,\n",
" 447,\n",
" 4,\n",
" 192,\n",
" 50,\n",
" 16,\n",
" 6,\n",
" 147,\n",
" 2025,\n",
" 19,\n",
" 14,\n",
" 22,\n",
" 4,\n",
" 1920,\n",
" 4613,\n",
" 469,\n",
" 4,\n",
" 22,\n",
" 71,\n",
" 87,\n",
" 12,\n",
" 16,\n",
" 43,\n",
" 530,\n",
" 38,\n",
" 76,\n",
" 15,\n",
" 13,\n",
" 1247,\n",
" 4,\n",
" 22,\n",
" 17,\n",
" 515,\n",
" 17,\n",
" 12,\n",
" 16,\n",
" 626,\n",
" 18,\n",
" 19193,\n",
" 5,\n",
" 62,\n",
" 386,\n",
" 12,\n",
" 8,\n",
" 316,\n",
" 8,\n",
" 106,\n",
" 5,\n",
" 4,\n",
" 2223,\n",
" 5244,\n",
" 16,\n",
" 480,\n",
" 66,\n",
" 3785,\n",
" 33,\n",
" 4,\n",
" 130,\n",
" 12,\n",
" 16,\n",
" 38,\n",
" 619,\n",
" 5,\n",
" 25,\n",
" 124,\n",
" 51,\n",
" 36,\n",
" 135,\n",
" 48,\n",
" 25,\n",
" 1415,\n",
" 33,\n",
" 6,\n",
" 22,\n",
" 12,\n",
" 215,\n",
" 28,\n",
" 77,\n",
" 52,\n",
" 5,\n",
" 14,\n",
" 407,\n",
" 16,\n",
" 82,\n",
" 10311,\n",
" 8,\n",
" 4,\n",
" 107,\n",
" 117,\n",
" 5952,\n",
" 15,\n",
" 256,\n",
" 4,\n",
" 31050,\n",
" 7,\n",
" 3766,\n",
" 5,\n",
" 723,\n",
" 36,\n",
" 71,\n",
" 43,\n",
" 530,\n",
" 476,\n",
" 26,\n",
" 400,\n",
" 317,\n",
" 46,\n",
" 7,\n",
" 4,\n",
" 12118,\n",
" 1029,\n",
" 13,\n",
" 104,\n",
" 88,\n",
" 4,\n",
" 381,\n",
" 15,\n",
" 297,\n",
" 98,\n",
" 32,\n",
" 2071,\n",
" 56,\n",
" 26,\n",
" 141,\n",
" 6,\n",
" 194,\n",
" 7486,\n",
" 18,\n",
" 4,\n",
" 226,\n",
" 22,\n",
" 21,\n",
" 134,\n",
" 476,\n",
" 26,\n",
" 480,\n",
" 5,\n",
" 144,\n",
" 30,\n",
" 5535,\n",
" 18,\n",
" 51,\n",
" 36,\n",
" 28,\n",
" 224,\n",
" 92,\n",
" 25,\n",
" 104,\n",
" 4,\n",
" 226,\n",
" 65,\n",
" 16,\n",
" 38,\n",
" 1334,\n",
" 88,\n",
" 12,\n",
" 16,\n",
" 283,\n",
" 5,\n",
" 16,\n",
" 4472,\n",
" 113,\n",
" 103,\n",
" 32,\n",
" 15,\n",
" 16,\n",
" 5345,\n",
" 19,\n",
" 178,\n",
" 32]"
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yvKgL7P8IEgP",
"colab_type": "text"
},
"source": [
"There is no direct label in x_train or x_test. Will have to use y_test i.e. \n",
"\n",
"positive_index = np.where(y_train == 1) \n",
"\n",
"to get a subset"
]
},
{
"cell_type": "code",
"metadata": {
"id": "RGWHRtdGH5X9",
"colab_type": "code",
"outputId": "eb938f70-ff68-417a-be3c-ab0679c4d021",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"positive_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([ 0, 3, 6, ..., 24994, 24995, 24998]),)"
]
},
"metadata": {
"tags": []
},
"execution_count": 25
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "TZSdx2ZvIVNK",
"colab_type": "code",
"outputId": "b9c58672-efb5-4db6-8ad1-50d926903752",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"positive_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([ 0, 3, 6, ..., 24994, 24995, 24998]),)"
]
},
"metadata": {
"tags": []
},
"execution_count": 26
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "-d63V8cLMmB9",
"colab_type": "code",
"outputId": "1301e748-40be-47b5-d980-74ee1c5f1fb6",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"positive_reviews.shape"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(12500,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 43
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1PkGWrcrTft1",
"colab_type": "code",
"outputId": "4cc03686-c367-41b6-cd63-fa533135b140",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 90
}
},
"source": [
"positive_reviews[0:2]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 44
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "t0YViwvNiMDq",
"colab_type": "code",
"colab": {}
},
"source": [
"# no of elements\n",
"i_elements = 3"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "BaxD--O7oi3z",
"colab_type": "text"
},
"source": [
"##6.Subset of Pos Reviews\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ydk9EyBTk8eN",
"colab_type": "code",
"outputId": "b625a22a-43ce-4172-f5ba-5b7cfb118170",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(positive_index)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(1, 12500)"
]
},
"metadata": {
"tags": []
},
"execution_count": 58
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "stbZAgeDlL2J",
"colab_type": "code",
"outputId": "5ad3a8b6-0bb8-4599-b9a7-36f46b42088a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(positive_reviews)[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"12500"
]
},
"metadata": {
"tags": []
},
"execution_count": 59
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8cymNV2Xowkd",
"colab_type": "text"
},
"source": [
"We need to get 12,500 indices to match the original pos set"
]
},
{
"cell_type": "code",
"metadata": {
"id": "y_jx1Hl9lga-",
"colab_type": "code",
"colab": {}
},
"source": [
"i_pos_elements = np.shape(positive_reviews)[0]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "d548jGSmlqCi",
"colab_type": "code",
"colab": {}
},
"source": [
"subset_positive_index = np.random.choice(positive_index[0],i_pos_elements,replace=False)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "th3Bpc_Nlufc",
"colab_type": "code",
"outputId": "64138772-2e0b-4916-9a15-3fd9d527771e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"subset_positive_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([20230, 24258, 14093, ..., 2218, 14690, 5601])"
]
},
"metadata": {
"tags": []
},
"execution_count": 62
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "MTdlXmq8lzV1",
"colab_type": "code",
"colab": {}
},
"source": [
"subset_positive_reviews = x_train[subset_positive_index]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YjTlS2-ll6sv",
"colab_type": "code",
"outputId": "00f4fd77-7b46-48dd-c727-528446911b44",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 178
}
},
"source": [
"subset_positive_reviews"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 9451, 213, 5496, 14, 16, 262, 221, 8, 72, 23, 6, 965, 651, 237, 12, 304, 273, 467, 6392, 2331, 19, 6392, 8197, 11, 61, 205, 113, 475, 4, 4837, 1181, 2425, 7, 4, 522, 3141, 402, 1700, 23, 5, 125, 7208, 75, 794, 107, 5684, 185, 84, 11, 3138, 537, 37, 26, 1364, 9320, 295, 31, 9, 6, 185, 132, 11, 3653, 4305, 1458, 7, 32, 4, 2203, 3905, 4198, 47311, 11, 4, 39714, 5, 61846, 7, 27, 7208, 5, 4, 85, 6, 185, 255, 1953, 8, 7635, 8, 67, 41, 336, 11168, 5, 193, 23, 35, 6697, 292, 4, 22, 9, 368, 7, 263, 1122, 2030, 24875, 6400, 1114, 67123, 7, 938, 5, 4, 1471, 54269, 7, 15, 20844, 532, 9003, 75, 124, 12, 32, 3899, 653, 285, 18, 4, 128, 1940, 43, 37, 9, 4, 4198, 51, 127, 15, 66, 384, 81, 25, 2727, 40, 31, 5, 4390, 187, 16740, 81, 25, 2223, 11, 467, 4, 9959, 21, 202, 4263, 1495, 7, 129, 4791, 81, 25, 401, 129, 2464, 8, 624, 5, 140, 44, 4, 970, 7, 653, 19, 57, 359, 18, 604, 10135, 9451, 213, 9, 407, 6, 810, 418, 368, 7, 11639, 7253, 643, 2427, 22, 388, 103, 32, 14, 9, 6, 22, 34, 4, 132, 37, 520, 178, 4, 480, 2479, 56, 7, 8136, 5, 4, 20844, 12222, 21, 12, 9, 53, 12, 2423, 10041, 18, 2667, 2634, 5, 170, 34, 6400, 37, 127, 306, 8, 30, 8887, 7, 12, 4, 1940, 26, 99, 532, 574, 8, 67, 51, 9, 887, 18, 6109, 1023, 88, 7, 14, 12, 461, 6, 976, 2532, 22, 15, 7453, 8, 259, 101, 58, 101, 273]),\n",
" list([1, 49, 6259, 108, 1671, 6, 749, 15, 14901, 11, 4, 634, 5, 504, 103, 829, 141, 9, 4, 420, 18, 16202, 1311, 5522, 49217, 852, 3690, 6, 1293, 2882, 3528, 22, 39, 5001, 448, 23, 4, 667, 8143, 39332, 852, 52860, 34, 82206, 26927, 67991, 37, 82, 3056, 4, 274, 17, 6, 881, 325, 5, 94, 3615, 26, 24, 162, 875, 551, 18, 108, 21, 54, 15, 325, 756, 299, 11, 4, 978, 17, 6, 1302, 1953, 1147, 8, 2061, 105, 262, 476, 37, 215, 393, 1158, 113, 4061, 34, 4, 1628, 7, 1473, 4, 959, 9, 6, 275, 5, 53, 4328, 5151, 7, 4, 582, 7, 559, 22, 512, 10, 10, 3690, 6544, 20894, 9, 6, 1691, 291, 154, 1875, 298, 14736, 39, 162, 782, 37, 19, 41, 10476, 1815, 452, 27731, 32045, 50149, 1747, 8, 41, 41761, 344, 11, 9075, 33, 4, 5673, 7, 4, 1875, 3286, 325, 6, 344, 15, 47, 77, 317, 5508, 34, 41, 336, 1561, 51354, 84248, 852, 17471, 237, 27, 5078, 341, 16202, 336, 670, 1022, 3723, 9, 6, 4025, 1811, 37, 47, 7009, 19, 4, 15188, 429, 4748, 5, 9, 1713, 19, 27, 223, 27731, 47, 6, 501, 59, 317, 41, 1461, 17614, 14419, 36838, 8, 2275, 670, 5, 17614, 11, 471, 1021, 41761, 1043, 799, 18700, 60939, 11265, 3690, 9, 35, 1724, 250, 37, 1287, 12319, 8, 32, 21, 41, 4494, 1561, 51354, 366, 59, 892, 409, 41, 559, 21, 24, 7, 41, 36193, 31270, 7118, 13402, 33934, 5, 27, 107, 369, 33, 86, 16741, 3690, 21, 17, 687, 2061, 3690, 5, 31270, 26, 21941, 34, 51, 764, 40, 4, 86, 38908, 7, 119, 54, 27731, 1442, 7, 41, 4696, 3690, 215, 412, 19, 17614, 5, 18700, 5, 68, 577, 25424, 17471, 49701, 246, 505, 8, 41, 4494, 18, 1425, 5, 8, 41, 3992, 118, 464, 5, 1750, 50380, 4, 210, 12250, 17933, 2907, 84062, 8, 391, 4, 24765, 200, 4132, 5, 4, 4271, 325, 15, 941, 41, 2762, 336, 39, 41, 499, 143, 6, 201, 7, 7108, 3690, 5, 31270, 850, 4, 25030, 7, 1575, 1473, 4432, 53, 14998, 11, 6, 1420, 810, 7, 4, 325, 74, 91, 7, 178, 585, 11, 6, 2646, 4, 277, 151, 619, 9, 5037, 17, 16202, 1311, 8, 8690, 9, 601, 10, 10, 4, 22, 9, 324, 11, 58651, 5, 12285, 5, 1367, 49, 8760, 307, 2775, 2004, 19, 3800, 2728, 1524, 34, 4446, 38701, 1209, 69964, 5, 6322, 34, 4, 621, 603, 34, 71252, 74863, 167, 34363, 27790, 5627, 4, 478, 347, 20114, 6968, 39, 30992, 5, 11, 23893, 8, 6470, 4, 65, 23, 4, 476, 574, 29, 166, 35, 60, 3417, 2590, 44, 4, 14522, 5, 5614, 7, 325, 4, 177, 9, 3160, 4, 381, 695, 26, 185, 6544, 20894, 5, 7118, 13402, 33934, 21, 36, 26, 5067, 34, 4, 478, 2496, 156, 11, 4, 1158, 555, 14, 9, 6, 2009, 1380, 157, 19, 6, 5863, 749, 5, 144, 169, 6, 76, 3223, 311, 74, 12, 47, 8, 14, 1304, 9254, 10277]),\n",
" list([1, 207, 77, 1064, 18, 6, 3783, 20, 40, 14, 18, 6, 196, 58, 736, 349, 304, 94, 273, 793, 4, 356, 700, 3255, 8231, 23, 248, 40, 1354, 5, 1705, 327, 5, 2473, 40, 9597, 5, 1165, 5145, 4, 172, 1103, 456, 11, 32, 7, 98, 4, 700, 3028, 11, 3850, 1163, 4, 3158, 8, 5529, 18, 4, 3818, 11, 113, 33, 4, 2960, 7, 267, 647, 4, 24, 38, 252, 15649, 96, 15, 134, 105, 1921, 8, 15633, 120, 68, 16630, 45, 4, 39822, 5371, 7, 12594, 29147, 11, 4, 815, 5, 4, 1981, 5, 45, 389, 8, 67, 12, 8905, 38, 312, 5, 1927, 133, 10, 10, 736, 349, 2013, 23, 6, 1073, 33, 6, 15876, 344, 13, 657, 32877, 10953, 69, 69, 4, 281, 8, 202, 53, 1708, 5, 113, 8, 4, 154, 84, 11, 4, 136, 17, 12, 9, 12, 272, 40, 142, 742, 5071, 238, 28, 13592, 75, 359, 8, 79, 4, 547, 15, 134, 154, 84, 26, 17, 2455, 17, 316, 334, 42, 12, 764, 14061, 34, 4, 58, 4, 767, 523, 2487, 4, 21252, 25, 320, 535, 827, 9313, 37, 299, 6, 2872, 11567, 19, 6, 10595, 18, 1914, 11, 68, 4277, 8, 759, 56, 5, 8520, 43, 51, 14, 1073, 738, 6, 117, 2318, 48, 564, 3710, 17915, 69, 101, 281, 29, 62, 28, 69, 9313, 6096, 12, 56, 19, 4, 30478, 5, 55799, 23, 4, 77563, 8, 81, 4, 172, 12, 62, 28, 93, 18, 6, 8454, 454, 5, 6, 128, 2871, 7, 4, 15067, 3993, 4, 22533, 1024, 4732, 4, 1336, 14180, 4391, 10173, 5, 443, 5176, 1022, 4106, 37, 306, 8, 216, 46, 7, 1282, 8, 607, 4, 251, 10, 10, 50, 26, 111, 85, 712, 8, 736, 349, 74, 13, 459, 8, 140, 83, 793, 98, 15, 4, 1019, 16524, 5279, 738, 8, 28, 17, 18544, 6, 281, 7, 4, 1755, 17, 4, 360, 7, 4, 84, 11, 14, 20, 5, 29, 152, 8453, 3409, 9, 4, 355, 284, 18, 4, 173, 29, 738, 8, 30, 96, 120, 4, 350, 8, 97, 4, 1944, 200, 52, 5, 445, 6, 48727, 31, 5, 3409, 47, 115, 14081, 6, 676, 18, 4, 82572, 25, 440, 18, 4, 11795, 19441, 7, 6, 308, 10947, 11, 4, 2474, 7, 30012, 22748, 42, 4, 2245, 17252, 59517, 7, 6, 723, 2831, 11, 10315, 7, 4, 416, 12258, 305, 51, 75, 79, 9, 12283, 142, 320, 8252, 5, 24, 1314, 1706, 10, 10, 50, 26, 99, 111, 1008, 620, 143, 736, 349, 18, 259, 8, 4228, 98, 6009, 295, 5, 15, 203, 30, 94, 7969, 439, 21, 845, 243, 7, 6, 947, 12, 9, 9, 4, 243, 7, 947, 13, 119, 1022, 4106, 47, 210, 468, 8, 30, 15981, 11, 4, 555, 29, 304, 14, 31, 9, 57, 1401, 21, 29, 271, 33, 12, 19, 141, 5380, 15, 25, 216, 245, 547, 15, 2708, 2051, 142, 44, 212, 1790, 56, 11, 6, 4945, 521, 34, 1516, 4106, 5, 2160, 84622, 27, 2356, 9, 2472, 8, 32, 4, 276, 725, 2087, 2728, 24569, 25839, 2110, 15, 2292, 1602, 93, 1063, 5, 726, 45, 64879, 27, 96, 83, 4, 9757, 7, 4, 4815, 33, 27, 514, 29356, 4, 18475, 2894, 57447, 42, 7533, 4, 4721, 7, 6, 1486, 3783, 3728, 23297, 17, 4, 13857, 42, 743, 6, 162, 1679, 7, 68, 5840, 604, 45594, 14895, 11, 51, 26, 869, 41, 1529, 388, 23, 268, 6, 254, 58, 29, 166, 12, 210, 253, 8, 106, 13, 426, 618, 135, 15, 44, 90, 11, 225, 142, 44, 1083, 10, 10, 45594, 14895, 1534, 19, 14, 239, 15, 59, 144, 28, 77, 4, 323, 7, 31, 283, 155, 24, 10934, 29401, 13, 92, 104, 13, 28, 126, 110, 2805, 11495, 200, 6, 577, 5, 336, 864, 38, 240, 351, 5, 27, 5258, 9, 11, 6, 10341, 1871, 38, 4720, 72, 11, 4, 102, 5, 4, 163, 173, 44, 14, 217, 9, 15, 12, 764, 40, 6, 7004, 12465, 7, 3736, 29044, 13915, 2326, 19, 558, 11, 2244, 4, 4568, 5, 11, 14, 310, 4, 250, 152, 783, 5, 11695, 92, 1746, 11, 129, 419, 10, 10, 1024, 2023, 4732, 127, 142, 55, 878, 29, 166, 20901, 13014, 25, 391, 208, 245, 803, 3624, 24395, 18150, 7014, 8, 1812, 25, 70, 82, 391, 41, 24914, 4, 25110, 584, 62, 30, 195, 8, 1278, 72, 120, 4, 1289, 21, 54, 13475, 4276, 320, 5287, 492, 272, 56, 33, 27, 336, 5, 560, 13, 264, 11, 25, 4429, 8, 63, 7014, 25579, 5, 18354, 11292, 92, 5378, 129, 336, 25, 235, 40, 2090, 5, 31544, 3710, 17915, 17, 6, 86, 967, 2210, 10, 10, 19, 4391, 10173, 625, 64, 561, 7, 854, 11, 22, 56, 8, 14, 213, 16, 27, 3883, 496, 11, 4, 31668, 5, 4841, 24131, 47, 12, 77, 53, 74, 107, 2740, 237, 75, 86, 562, 854, 7, 41, 11, 24565, 5, 1083, 11285, 1083, 11285, 17, 492, 5, 452, 37, 1497, 6, 10132, 18, 23455, 5060, 6832, 17, 1705, 480, 5, 6962, 3066, 17, 27, 18077, 23487, 3717, 17, 2504, 430, 723, 16171, 17, 4, 19832, 5, 4571, 15169, 37, 48, 59, 161, 28, 4, 171, 411, 11, 14, 20, 15, 59, 47, 62, 306, 8, 30, 5501, 4, 270]),\n",
" ...,\n",
" list([1, 45, 6, 117, 13194, 8, 28, 6, 109, 773, 8273, 185, 11, 6, 20, 256, 34, 8273, 185, 21, 14, 22, 9, 121, 8273, 188, 27, 403, 5, 82, 6, 327, 611, 8859, 103, 395, 392, 531, 467, 160, 403, 10, 10, 146, 170, 8, 140, 429, 4, 2159, 7, 4, 85, 795, 5, 1110, 15, 13, 66, 510, 14, 22, 1424, 88, 7, 4, 5755, 239, 7, 2197, 3334, 17, 11739, 59, 16, 163, 1612, 4594, 2913, 5, 10638, 17, 4, 5641, 7, 289, 2847, 625, 452, 1131, 23, 4, 38251, 5, 625, 336, 16, 303, 556, 315, 84185, 325, 13, 10, 10, 17, 4, 132, 7, 4, 313, 11739, 47, 3405, 29124, 18, 153, 429, 12659, 41, 2813, 80, 63, 62, 1617, 1741, 6, 2597, 12254, 552, 773, 1394, 16839, 8, 79, 4, 223, 344, 21, 30714, 1895, 6, 1003, 17, 8, 138, 59, 4169, 16839, 38, 76, 11192, 9165, 9, 4, 655, 32337, 799, 37, 9, 1021, 8, 35, 11759, 21, 19865, 41, 5494, 8827, 1432, 8273, 185, 10, 10, 48, 335, 6, 3334, 337, 14, 9, 6, 57, 717]),\n",
" list([1, 146, 210, 770, 44, 89, 111, 211, 490, 67, 142, 44, 182, 325, 241, 23, 4, 1124, 2082, 699, 25, 62, 104, 36, 92, 40, 8, 911, 154, 6367, 21, 50, 218, 6, 1269, 15, 271, 34, 209, 6, 664, 42, 6, 20, 44, 4, 189, 5, 9393, 7, 14, 325, 382, 45, 6, 96, 7, 1951, 19, 68, 501, 13, 92, 124, 21, 25, 252, 191, 1821, 98, 7, 7354, 51, 575, 5, 12, 47, 8, 30, 301, 91, 7, 148, 3672, 26, 66, 290, 6, 106, 88, 36, 115, 353, 8, 13314, 120, 4, 882, 5, 4, 172, 70, 30, 301, 44, 68, 102, 104, 18, 1825, 44, 4882, 19956, 42, 4, 8510, 17, 25, 238, 150, 12, 63, 26, 82, 55, 821, 10, 10, 31, 7, 148, 102, 9, 8561, 12, 716, 6, 283, 65, 5, 2033, 19, 4, 875, 7, 4, 1849, 10237, 315, 4, 325, 60, 151, 4, 20, 517, 19, 6, 223, 11, 4, 2997, 33, 4, 984, 251, 103, 32524, 336, 1131, 41, 452, 32, 6, 2117, 679, 83, 35, 15331, 9152, 60, 151, 59, 1481, 77, 55, 1736, 159, 59, 152, 124, 121, 4, 680, 2003, 7, 41, 452, 266, 39, 21, 17, 59, 517, 7847, 11, 41, 3992, 3206, 1548, 7081, 5627, 89, 117, 59, 47, 126, 573, 44, 41, 3992, 501, 10, 10, 4, 192, 15, 14, 20, 2033, 19, 4, 875, 7, 4, 1849, 10237, 315, 4, 2780, 7936, 9, 460, 179, 1767, 18, 17, 230, 17, 13, 124, 50, 1481, 77, 160, 20, 15, 2033, 19, 14, 875, 18, 148, 37, 161, 124, 14, 246, 112, 1021, 8, 6, 38, 446, 1050, 83302, 132, 42, 255, 981, 18, 111, 5536, 15, 36, 1173, 1241, 1412, 8, 31, 7, 4, 8283, 9686, 21, 15, 36, 69, 8, 157, 11, 6, 3283, 21, 12, 127, 24, 64, 376, 142, 44, 4, 712, 7, 4, 1849, 10237, 12, 82, 408, 6, 52, 326, 7, 89, 134, 84, 71, 400, 110, 34, 68, 205, 846, 5, 4784, 89, 878, 12, 518, 16, 18, 98, 315, 4, 2780, 7936, 5, 89, 134, 84, 91, 7, 4, 58, 372, 122, 285, 746, 68, 671, 8, 879, 68, 349, 280, 36, 71, 2004, 5, 2897, 245, 11, 18, 1825, 4, 8561, 10, 10, 4, 116, 9, 66, 52, 5, 4, 65, 9, 55, 73, 398, 261, 4, 96, 12, 16, 1353, 11, 4, 454, 161, 66, 81, 12, 18, 72, 5, 198, 618, 4, 64, 173, 15, 490, 79, 8, 67, 11, 4, 1472, 382, 45, 43, 72, 21, 13, 62, 28, 317, 46, 6, 194, 173, 7, 51, 571, 11, 4, 984, 251, 33, 222, 7, 4, 173, 15, 9, 12857, 11, 4, 2997, 88, 4, 173, 121, 7081, 271, 8, 4456, 5, 2326, 8, 294, 37, 694, 53, 44, 41, 3992, 501, 407, 495, 10, 10, 48, 25, 26, 928, 11, 285, 15, 47, 142, 8, 81, 19, 4, 333, 182, 325, 5, 48, 25, 713, 2707, 267, 18, 6, 176, 7, 206, 665, 74, 14, 9, 407, 6, 20, 25, 144, 67, 14, 218, 6, 20, 11, 63, 490, 67, 101, 3353, 42, 16947, 21, 12, 434, 9, 35, 221, 20, 88, 12, 408, 25, 35, 326, 44, 35, 1251, 7, 4, 325, 64, 117, 9, 573, 7, 13, 202, 12, 35, 709, 158]),\n",
" list([1, 4, 32427, 9, 35, 441, 1010, 57, 824, 6, 356, 63, 9, 779, 60, 639, 1998, 3061, 9, 1358, 988, 13704, 4223, 18, 4, 217, 295, 19, 308, 2451, 5, 592, 1342, 36, 71, 4, 118, 156, 15, 256, 1010, 1737, 11, 68, 2248, 1560, 1607, 9, 87, 17, 11668, 1016, 1337, 15, 13, 124, 256, 12, 128, 74, 41, 60, 48, 238, 24, 30, 4822, 1863, 4, 22, 1030, 8, 1839, 4, 91, 674, 44, 13704, 5, 44, 4, 58, 12, 304, 273, 518, 25, 28, 8, 3552, 479, 8, 97, 129, 213, 5, 15, 9, 51, 7610, 127, 133, 4, 1862, 7, 13704, 19, 5027, 988, 4, 3488, 7, 12046, 8, 4, 3699, 34, 6, 87, 33091, 8, 7968, 18, 4, 10940, 29, 62, 28, 88, 7, 4, 130, 7, 4, 3286, 325, 14388, 5, 117, 194, 8741, 4, 7971, 649, 200, 5027, 13600, 322, 6, 1736, 255, 19, 13704, 6, 132, 37, 69, 556, 958, 82, 4, 1732, 119, 1586, 200, 13704, 5, 11668, 32, 14, 166, 1065, 72855, 6, 701, 4028, 5, 221, 22, 2047, 5800, 47, 6, 55, 346, 1267, 15, 460, 287, 51, 6, 87, 284, 29, 16, 170, 8, 413, 6, 176, 7, 459, 16, 623, 8, 123, 4, 204, 1865, 7, 15, 58])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 64
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "C0sH_u3rnN5I",
"colab_type": "code",
"outputId": "e1bfc080-151a-4ba1-c03e-b7d807f18b02",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(subset_positive_reviews)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(12500,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 65
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "a7ale079naVz",
"colab_type": "code",
"outputId": "67af55f7-d298-425b-c19b-7a7f80a0d86e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(positive_reviews)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(12500,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 66
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "86dyVvB_nSVU",
"colab_type": "text"
},
"source": [
"Means we have got enough resampling done to equal the starting state of positive reviews"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Pk07zNVumiiY",
"colab_type": "code",
"outputId": "e29ee867-1aaf-4c83-c4b3-ef1c40594397",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 107
}
},
"source": [
"negative_reviews[0:3]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 67
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "I0_KMCT9m_Ih",
"colab_type": "text"
},
"source": [
"## 7.Concatenating Subset + Neg\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eAgpxOVThN2w",
"colab_type": "text"
},
"source": [
"Let's play with toy data set first"
]
},
{
"cell_type": "code",
"metadata": {
"id": "YpRoiWPNsJaX",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_pos_reviews = positive_reviews[0:5]\n",
"toy_pos_index = positive_index[0:5]\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "znhvZd5rpDzc",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_neg_reviews = negative_reviews[0:5]\n",
"toy_neg_index = negative_index[0:5]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "fs9QzTs3sQ7-",
"colab_type": "code",
"outputId": "d76fe0d1-8e21-493c-94d0-64a8855cb1cf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_pos_reviews"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 71
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "5ETQlRXNa7dU",
"colab_type": "code",
"outputId": "7d6a0a03-18fd-46dc-eb7f-17e3f04aeba0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_pos_index)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(1, 12500)"
]
},
"metadata": {
"tags": []
},
"execution_count": 72
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "7UjuUeLBbKVg",
"colab_type": "code",
"outputId": "54690280-6e28-423f-bfb0-dbf73679dc8f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"toy_pos_index"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([ 0, 3, 6, ..., 24994, 24995, 24998]),)"
]
},
"metadata": {
"tags": []
},
"execution_count": 73
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "s_4Jsqkohghg",
"colab_type": "code",
"outputId": "2267b925-61d2-4689-cbee-21f36f3a2304",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_pos_index[0])"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(12500,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 74
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "eZDqFUocsSbx",
"colab_type": "code",
"outputId": "fb0b016f-14ab-47eb-b58e-61b625482829",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_neg_reviews"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113]),\n",
" list([1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 43222, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 86588, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 15344, 10, 10]),\n",
" list([1, 4, 14906, 716, 4, 65, 7, 4, 689, 4367, 6308, 2343, 4804, 28674, 84206, 5270, 32099, 2315, 71688, 12572, 24785, 43394, 4, 10993, 628, 7685, 37, 9, 150, 4, 9820, 4069, 11, 2909, 4, 16287, 847, 313, 6, 176, 63860, 9, 6202, 138, 9, 4434, 19, 4, 96, 183, 26, 4, 192, 15, 27, 5842, 799, 7101, 39455, 588, 84, 11, 4, 3231, 152, 339, 5206, 42, 4869, 30497, 6293, 345, 4804, 37377, 142, 43, 218, 208, 54, 29, 853, 659, 46, 4, 882, 183, 80, 115, 30, 4, 172, 174, 10, 10, 1001, 398, 1001, 1055, 526, 34, 3717, 68395, 5262, 63370, 17, 4, 6706, 1094, 871, 64, 85, 22, 2030, 1109, 38, 230, 9, 4, 4324, 20636, 251, 5056, 1034, 195, 301, 14, 16, 31, 7, 4, 46035, 8, 783, 48545, 33, 4, 2945, 103, 465, 16454, 42, 845, 45, 446, 11, 1895, 19, 184, 76, 32, 4, 5310, 207, 110, 13, 197, 4, 14906, 16, 601, 964, 2152, 595, 13, 258, 4, 1730, 66, 338, 55, 5312, 4, 550, 728, 65, 1196, 8, 1839, 61, 1546, 42, 8361, 61, 602, 120, 45, 7304, 6, 320, 786, 99, 196, 11100, 786, 5936, 4, 225, 4, 373, 1009, 33, 4, 130, 63, 69, 72, 1104, 46, 1292, 225, 14, 66, 194, 11871, 1703, 56, 8, 803, 1004, 6, 18763, 155, 11, 4, 14906, 3231, 45, 853, 2029, 8, 30, 6, 117, 430, 19, 6, 8941, 9, 15, 66, 424, 8, 2337, 178, 9, 15, 66, 424, 8, 1465, 178, 9, 15, 66, 142, 15, 9, 424, 8, 28, 178, 662, 44, 12, 17, 4, 130, 898, 1686, 9, 6, 5623, 267, 185, 430, 4, 118, 21486, 277, 15, 4, 1188, 100, 216, 56, 19, 4, 357, 114, 10399, 367, 45, 115, 93, 788, 121, 4, 14906, 79, 32, 68, 278, 39, 8, 818, 162, 4165, 237, 600, 7, 98, 306, 8, 157, 549, 628, 11, 6, 12370, 13, 824, 15, 4104, 76, 42, 138, 36, 774, 77, 1059, 159, 150, 4, 229, 497, 8, 1493, 11, 175, 251, 453, 19, 8651, 189, 12, 43, 127, 6, 394, 292, 7, 8253, 4, 107, 8, 4, 2826, 15, 1082, 1251, 9, 906, 42, 1134, 6, 66, 78, 22, 15, 13, 244, 2519, 8, 135, 233, 52, 44, 10, 10, 466, 112, 398, 526, 34, 4, 1572, 4413, 6706, 1094, 225, 57, 599, 133, 225, 6, 227, 7, 541, 4323, 6, 171, 139, 7, 539, 11890, 56, 11, 6, 3231, 21, 164, 25, 426, 81, 33, 344, 624, 19, 6, 4617, 7, 10373, 12958, 6, 5802, 4, 22, 9, 1082, 629, 237, 45, 188, 6, 55, 655, 707, 6371, 956, 225, 1456, 841, 42, 1310, 225, 6, 2493, 1467, 7722, 2828, 21, 4, 14906, 9, 364, 23, 4, 2228, 2407, 225, 24, 76, 133, 18, 4, 189, 2293, 10, 10, 814, 11, 53728, 11, 2642, 14, 47, 15, 682, 364, 352, 168, 44, 12, 45, 24, 913, 93, 21, 247, 2441, 4, 116, 34, 35, 1859, 8, 72, 177, 9, 164, 8, 901, 344, 44, 13, 191, 135, 13, 126, 421, 233, 18, 259, 10, 10, 4, 14906, 6847, 4, 14065, 3074, 7, 112, 199, 753, 357, 39, 63, 12, 115, 15222, 763, 8, 15, 35, 3282, 1523, 65, 57, 599, 6, 1916, 277, 1730, 37, 25, 92, 202, 6, 8848, 44, 25, 28, 6, 22, 15, 122, 24, 4171, 72, 33, 32])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 75
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "-bl7UySbsf0q",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_data = np.concatenate((toy_pos_reviews,toy_neg_reviews))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "kMVAZHl9Z1Oo",
"colab_type": "code",
"outputId": "7c3bcb36-836a-43ec-c2b9-a48294a0489d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 232
}
},
"source": [
"toy_data"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829]),\n",
" list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113]),\n",
" list([1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 43222, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 86588, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 15344, 10, 10]),\n",
" list([1, 4, 14906, 716, 4, 65, 7, 4, 689, 4367, 6308, 2343, 4804, 28674, 84206, 5270, 32099, 2315, 71688, 12572, 24785, 43394, 4, 10993, 628, 7685, 37, 9, 150, 4, 9820, 4069, 11, 2909, 4, 16287, 847, 313, 6, 176, 63860, 9, 6202, 138, 9, 4434, 19, 4, 96, 183, 26, 4, 192, 15, 27, 5842, 799, 7101, 39455, 588, 84, 11, 4, 3231, 152, 339, 5206, 42, 4869, 30497, 6293, 345, 4804, 37377, 142, 43, 218, 208, 54, 29, 853, 659, 46, 4, 882, 183, 80, 115, 30, 4, 172, 174, 10, 10, 1001, 398, 1001, 1055, 526, 34, 3717, 68395, 5262, 63370, 17, 4, 6706, 1094, 871, 64, 85, 22, 2030, 1109, 38, 230, 9, 4, 4324, 20636, 251, 5056, 1034, 195, 301, 14, 16, 31, 7, 4, 46035, 8, 783, 48545, 33, 4, 2945, 103, 465, 16454, 42, 845, 45, 446, 11, 1895, 19, 184, 76, 32, 4, 5310, 207, 110, 13, 197, 4, 14906, 16, 601, 964, 2152, 595, 13, 258, 4, 1730, 66, 338, 55, 5312, 4, 550, 728, 65, 1196, 8, 1839, 61, 1546, 42, 8361, 61, 602, 120, 45, 7304, 6, 320, 786, 99, 196, 11100, 786, 5936, 4, 225, 4, 373, 1009, 33, 4, 130, 63, 69, 72, 1104, 46, 1292, 225, 14, 66, 194, 11871, 1703, 56, 8, 803, 1004, 6, 18763, 155, 11, 4, 14906, 3231, 45, 853, 2029, 8, 30, 6, 117, 430, 19, 6, 8941, 9, 15, 66, 424, 8, 2337, 178, 9, 15, 66, 424, 8, 1465, 178, 9, 15, 66, 142, 15, 9, 424, 8, 28, 178, 662, 44, 12, 17, 4, 130, 898, 1686, 9, 6, 5623, 267, 185, 430, 4, 118, 21486, 277, 15, 4, 1188, 100, 216, 56, 19, 4, 357, 114, 10399, 367, 45, 115, 93, 788, 121, 4, 14906, 79, 32, 68, 278, 39, 8, 818, 162, 4165, 237, 600, 7, 98, 306, 8, 157, 549, 628, 11, 6, 12370, 13, 824, 15, 4104, 76, 42, 138, 36, 774, 77, 1059, 159, 150, 4, 229, 497, 8, 1493, 11, 175, 251, 453, 19, 8651, 189, 12, 43, 127, 6, 394, 292, 7, 8253, 4, 107, 8, 4, 2826, 15, 1082, 1251, 9, 906, 42, 1134, 6, 66, 78, 22, 15, 13, 244, 2519, 8, 135, 233, 52, 44, 10, 10, 466, 112, 398, 526, 34, 4, 1572, 4413, 6706, 1094, 225, 57, 599, 133, 225, 6, 227, 7, 541, 4323, 6, 171, 139, 7, 539, 11890, 56, 11, 6, 3231, 21, 164, 25, 426, 81, 33, 344, 624, 19, 6, 4617, 7, 10373, 12958, 6, 5802, 4, 22, 9, 1082, 629, 237, 45, 188, 6, 55, 655, 707, 6371, 956, 225, 1456, 841, 42, 1310, 225, 6, 2493, 1467, 7722, 2828, 21, 4, 14906, 9, 364, 23, 4, 2228, 2407, 225, 24, 76, 133, 18, 4, 189, 2293, 10, 10, 814, 11, 53728, 11, 2642, 14, 47, 15, 682, 364, 352, 168, 44, 12, 45, 24, 913, 93, 21, 247, 2441, 4, 116, 34, 35, 1859, 8, 72, 177, 9, 164, 8, 901, 344, 44, 13, 191, 135, 13, 126, 421, 233, 18, 259, 10, 10, 4, 14906, 6847, 4, 14065, 3074, 7, 112, 199, 753, 357, 39, 63, 12, 115, 15222, 763, 8, 15, 35, 3282, 1523, 65, 57, 599, 6, 1916, 277, 1730, 37, 25, 92, 202, 6, 8848, 44, 25, 28, 6, 22, 15, 122, 24, 4171, 72, 33, 32])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 79
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ErVxZa38aIBA",
"colab_type": "text"
},
"source": [
"Repeat this for labels as well."
]
},
{
"cell_type": "code",
"metadata": {
"id": "a2keaVDlZ4qS",
"colab_type": "code",
"colab": {}
},
"source": [
"positive_labels = y_train[positive_index]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "4NEDYgo6bxR-",
"colab_type": "code",
"outputId": "3d0271ec-9d90-4899-c9f2-894284a26cba",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"positive_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([1, 1, 1, ..., 1, 1, 1])"
]
},
"metadata": {
"tags": []
},
"execution_count": 81
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "2sshSZNPby9x",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_pos_labels = positive_labels[0:5]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "rXq8HOYUcByP",
"colab_type": "code",
"outputId": "757714a7-36e4-4907-859c-6b5ceb436c13",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"toy_pos_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([1, 1, 1, 1, 1])"
]
},
"metadata": {
"tags": []
},
"execution_count": 83
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6p3eqH_ecDZn",
"colab_type": "code",
"colab": {}
},
"source": [
"negative_labels = y_train[negative_index]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "S_F71vRvcNjR",
"colab_type": "code",
"outputId": "3747e949-4799-4afc-b1db-85a5cdf776ff",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"negative_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([0, 0, 0, ..., 0, 0, 0])"
]
},
"metadata": {
"tags": []
},
"execution_count": 85
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "kmxWnrkecPJP",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_neg_labels = negative_labels[0:5]"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "m7LLtX2_cUIv",
"colab_type": "code",
"outputId": "65bcae04-a52b-436e-bbc4-ba13977cda98",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"toy_neg_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([0, 0, 0, 0, 0])"
]
},
"metadata": {
"tags": []
},
"execution_count": 87
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kdFSBvMnc71a",
"colab_type": "text"
},
"source": [
"concatenate 2 numpy arrays: column-wise\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YUpU_tRAnLpm",
"colab_type": "text"
},
"source": [
"### For Pos Reviews"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UO-a8xO4m8_r",
"colab_type": "text"
},
"source": [
"#### Trying H-STACK"
]
},
{
"cell_type": "code",
"metadata": {
"id": "IwYmFSU4eWCV",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_pos_data_labels = np.hstack((toy_pos_reviews,toy_pos_labels))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "iginUdtKdV5s",
"colab_type": "code",
"outputId": "37cf4a27-0278-4d11-f9c0-3b37ebec6e81",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_pos_data_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829]),\n",
" 1, 1, 1, 1, 1], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 113
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "A8iTiEB5h1XP",
"colab_type": "code",
"outputId": "be5939da-310f-4b5f-c6b9-559b55fbd4e1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_pos_data_labels)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(10,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 114
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "JABt-TwMiIq-",
"colab_type": "code",
"outputId": "e1ac6f11-20af-4619-dd52-b6615907486d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"toy_pos_data_labels[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1,\n",
" 14,\n",
" 22,\n",
" 16,\n",
" 43,\n",
" 530,\n",
" 973,\n",
" 1622,\n",
" 1385,\n",
" 65,\n",
" 458,\n",
" 4468,\n",
" 66,\n",
" 3941,\n",
" 4,\n",
" 173,\n",
" 36,\n",
" 256,\n",
" 5,\n",
" 25,\n",
" 100,\n",
" 43,\n",
" 838,\n",
" 112,\n",
" 50,\n",
" 670,\n",
" 22665,\n",
" 9,\n",
" 35,\n",
" 480,\n",
" 284,\n",
" 5,\n",
" 150,\n",
" 4,\n",
" 172,\n",
" 112,\n",
" 167,\n",
" 21631,\n",
" 336,\n",
" 385,\n",
" 39,\n",
" 4,\n",
" 172,\n",
" 4536,\n",
" 1111,\n",
" 17,\n",
" 546,\n",
" 38,\n",
" 13,\n",
" 447,\n",
" 4,\n",
" 192,\n",
" 50,\n",
" 16,\n",
" 6,\n",
" 147,\n",
" 2025,\n",
" 19,\n",
" 14,\n",
" 22,\n",
" 4,\n",
" 1920,\n",
" 4613,\n",
" 469,\n",
" 4,\n",
" 22,\n",
" 71,\n",
" 87,\n",
" 12,\n",
" 16,\n",
" 43,\n",
" 530,\n",
" 38,\n",
" 76,\n",
" 15,\n",
" 13,\n",
" 1247,\n",
" 4,\n",
" 22,\n",
" 17,\n",
" 515,\n",
" 17,\n",
" 12,\n",
" 16,\n",
" 626,\n",
" 18,\n",
" 19193,\n",
" 5,\n",
" 62,\n",
" 386,\n",
" 12,\n",
" 8,\n",
" 316,\n",
" 8,\n",
" 106,\n",
" 5,\n",
" 4,\n",
" 2223,\n",
" 5244,\n",
" 16,\n",
" 480,\n",
" 66,\n",
" 3785,\n",
" 33,\n",
" 4,\n",
" 130,\n",
" 12,\n",
" 16,\n",
" 38,\n",
" 619,\n",
" 5,\n",
" 25,\n",
" 124,\n",
" 51,\n",
" 36,\n",
" 135,\n",
" 48,\n",
" 25,\n",
" 1415,\n",
" 33,\n",
" 6,\n",
" 22,\n",
" 12,\n",
" 215,\n",
" 28,\n",
" 77,\n",
" 52,\n",
" 5,\n",
" 14,\n",
" 407,\n",
" 16,\n",
" 82,\n",
" 10311,\n",
" 8,\n",
" 4,\n",
" 107,\n",
" 117,\n",
" 5952,\n",
" 15,\n",
" 256,\n",
" 4,\n",
" 31050,\n",
" 7,\n",
" 3766,\n",
" 5,\n",
" 723,\n",
" 36,\n",
" 71,\n",
" 43,\n",
" 530,\n",
" 476,\n",
" 26,\n",
" 400,\n",
" 317,\n",
" 46,\n",
" 7,\n",
" 4,\n",
" 12118,\n",
" 1029,\n",
" 13,\n",
" 104,\n",
" 88,\n",
" 4,\n",
" 381,\n",
" 15,\n",
" 297,\n",
" 98,\n",
" 32,\n",
" 2071,\n",
" 56,\n",
" 26,\n",
" 141,\n",
" 6,\n",
" 194,\n",
" 7486,\n",
" 18,\n",
" 4,\n",
" 226,\n",
" 22,\n",
" 21,\n",
" 134,\n",
" 476,\n",
" 26,\n",
" 480,\n",
" 5,\n",
" 144,\n",
" 30,\n",
" 5535,\n",
" 18,\n",
" 51,\n",
" 36,\n",
" 28,\n",
" 224,\n",
" 92,\n",
" 25,\n",
" 104,\n",
" 4,\n",
" 226,\n",
" 65,\n",
" 16,\n",
" 38,\n",
" 1334,\n",
" 88,\n",
" 12,\n",
" 16,\n",
" 283,\n",
" 5,\n",
" 16,\n",
" 4472,\n",
" 113,\n",
" 103,\n",
" 32,\n",
" 15,\n",
" 16,\n",
" 5345,\n",
" 19,\n",
" 178,\n",
" 32]"
]
},
"metadata": {
"tags": []
},
"execution_count": 115
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "OERAAL9yh7TL",
"colab_type": "code",
"outputId": "b56608a5-33c9-4f0f-8105-577039ad9813",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_pos_data_labels[0])"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(218,)"
]
},
"metadata": {
"tags": []
},
"execution_count": 116
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "kQ6zH-N7jF5a",
"colab_type": "code",
"outputId": "9bab79ae-9999-458e-b73a-b080b7dc26ba",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"toy_pos_data_labels[6]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1"
]
},
"metadata": {
"tags": []
},
"execution_count": 117
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "UajdQ3t4iVLk",
"colab_type": "code",
"colab": {}
},
"source": [
"#np.dtype(toy_pos_data_labels)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "8s6GOT9Gjt28",
"colab_type": "text"
},
"source": [
"#### Trying V-STACK"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6avaGFR0jRJF",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_pos_data_labels_v = np.vstack((toy_pos_reviews,toy_pos_labels))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "qzOVGcAGjUZo",
"colab_type": "code",
"outputId": "a12b0d9f-698b-4828-f9eb-a12faaa9e26b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_pos_data_labels_v)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(2, 5)"
]
},
"metadata": {
"tags": []
},
"execution_count": 120
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6tFOZVZPjZ5W",
"colab_type": "code",
"outputId": "35a7a06b-3bf7-4a25-a698-e812e4607c88",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_pos_data_labels_v[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829])],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 121
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Vo2REpHJjgDE",
"colab_type": "code",
"outputId": "5dde31ee-3a1b-4725-f517-cc28953d974b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"toy_pos_data_labels_v[1]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([1, 1, 1, 1, 1], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 122
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "1N9BzlrJjo1U",
"colab_type": "code",
"outputId": "e4a8dd46-b14b-4834-f219-f1252597b5b4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_pos_data_labels_v"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829])],\n",
" [1, 1, 1, 1, 1]], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 123
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "W2YRQY6ZjyPV",
"colab_type": "text"
},
"source": [
"Not Right !!!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jQ_BAGnKj1R5",
"colab_type": "text"
},
"source": [
"### For Neg Reviews"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aK4VkaBWj3TC",
"colab_type": "text"
},
"source": [
"#### Trying H-Stack"
]
},
{
"cell_type": "code",
"metadata": {
"id": "jDIVrmbIfF0q",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_neg_data_labels = np.hstack((toy_neg_reviews,toy_neg_labels))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "zM7rmQBkfLOz",
"colab_type": "code",
"outputId": "feb3b931-8e14-410f-decf-c013d17eeb5d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_neg_data_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113]),\n",
" list([1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 43222, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 86588, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 15344, 10, 10]),\n",
" list([1, 4, 14906, 716, 4, 65, 7, 4, 689, 4367, 6308, 2343, 4804, 28674, 84206, 5270, 32099, 2315, 71688, 12572, 24785, 43394, 4, 10993, 628, 7685, 37, 9, 150, 4, 9820, 4069, 11, 2909, 4, 16287, 847, 313, 6, 176, 63860, 9, 6202, 138, 9, 4434, 19, 4, 96, 183, 26, 4, 192, 15, 27, 5842, 799, 7101, 39455, 588, 84, 11, 4, 3231, 152, 339, 5206, 42, 4869, 30497, 6293, 345, 4804, 37377, 142, 43, 218, 208, 54, 29, 853, 659, 46, 4, 882, 183, 80, 115, 30, 4, 172, 174, 10, 10, 1001, 398, 1001, 1055, 526, 34, 3717, 68395, 5262, 63370, 17, 4, 6706, 1094, 871, 64, 85, 22, 2030, 1109, 38, 230, 9, 4, 4324, 20636, 251, 5056, 1034, 195, 301, 14, 16, 31, 7, 4, 46035, 8, 783, 48545, 33, 4, 2945, 103, 465, 16454, 42, 845, 45, 446, 11, 1895, 19, 184, 76, 32, 4, 5310, 207, 110, 13, 197, 4, 14906, 16, 601, 964, 2152, 595, 13, 258, 4, 1730, 66, 338, 55, 5312, 4, 550, 728, 65, 1196, 8, 1839, 61, 1546, 42, 8361, 61, 602, 120, 45, 7304, 6, 320, 786, 99, 196, 11100, 786, 5936, 4, 225, 4, 373, 1009, 33, 4, 130, 63, 69, 72, 1104, 46, 1292, 225, 14, 66, 194, 11871, 1703, 56, 8, 803, 1004, 6, 18763, 155, 11, 4, 14906, 3231, 45, 853, 2029, 8, 30, 6, 117, 430, 19, 6, 8941, 9, 15, 66, 424, 8, 2337, 178, 9, 15, 66, 424, 8, 1465, 178, 9, 15, 66, 142, 15, 9, 424, 8, 28, 178, 662, 44, 12, 17, 4, 130, 898, 1686, 9, 6, 5623, 267, 185, 430, 4, 118, 21486, 277, 15, 4, 1188, 100, 216, 56, 19, 4, 357, 114, 10399, 367, 45, 115, 93, 788, 121, 4, 14906, 79, 32, 68, 278, 39, 8, 818, 162, 4165, 237, 600, 7, 98, 306, 8, 157, 549, 628, 11, 6, 12370, 13, 824, 15, 4104, 76, 42, 138, 36, 774, 77, 1059, 159, 150, 4, 229, 497, 8, 1493, 11, 175, 251, 453, 19, 8651, 189, 12, 43, 127, 6, 394, 292, 7, 8253, 4, 107, 8, 4, 2826, 15, 1082, 1251, 9, 906, 42, 1134, 6, 66, 78, 22, 15, 13, 244, 2519, 8, 135, 233, 52, 44, 10, 10, 466, 112, 398, 526, 34, 4, 1572, 4413, 6706, 1094, 225, 57, 599, 133, 225, 6, 227, 7, 541, 4323, 6, 171, 139, 7, 539, 11890, 56, 11, 6, 3231, 21, 164, 25, 426, 81, 33, 344, 624, 19, 6, 4617, 7, 10373, 12958, 6, 5802, 4, 22, 9, 1082, 629, 237, 45, 188, 6, 55, 655, 707, 6371, 956, 225, 1456, 841, 42, 1310, 225, 6, 2493, 1467, 7722, 2828, 21, 4, 14906, 9, 364, 23, 4, 2228, 2407, 225, 24, 76, 133, 18, 4, 189, 2293, 10, 10, 814, 11, 53728, 11, 2642, 14, 47, 15, 682, 364, 352, 168, 44, 12, 45, 24, 913, 93, 21, 247, 2441, 4, 116, 34, 35, 1859, 8, 72, 177, 9, 164, 8, 901, 344, 44, 13, 191, 135, 13, 126, 421, 233, 18, 259, 10, 10, 4, 14906, 6847, 4, 14065, 3074, 7, 112, 199, 753, 357, 39, 63, 12, 115, 15222, 763, 8, 15, 35, 3282, 1523, 65, 57, 599, 6, 1916, 277, 1730, 37, 25, 92, 202, 6, 8848, 44, 25, 28, 6, 22, 15, 122, 24, 4171, 72, 33, 32]),\n",
" 0, 0, 0, 0, 0], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 125
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YF1vkD5uneza",
"colab_type": "text"
},
"source": [
"### For All Reviews"
]
},
{
"cell_type": "code",
"metadata": {
"id": "EXrkuNHdfNWJ",
"colab_type": "code",
"colab": {}
},
"source": [
"toy_data_labels = np.vstack((toy_pos_data_labels,toy_neg_data_labels))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "S4BqAFDKfWDh",
"colab_type": "code",
"outputId": "5b32710a-6040-48f8-d6ba-e5d939dfcaa8",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 250
}
},
"source": [
"toy_data_labels"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829]),\n",
" 1, 1, 1, 1, 1],\n",
" [list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113]),\n",
" list([1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 43222, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 86588, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 15344, 10, 10]),\n",
" list([1, 4, 14906, 716, 4, 65, 7, 4, 689, 4367, 6308, 2343, 4804, 28674, 84206, 5270, 32099, 2315, 71688, 12572, 24785, 43394, 4, 10993, 628, 7685, 37, 9, 150, 4, 9820, 4069, 11, 2909, 4, 16287, 847, 313, 6, 176, 63860, 9, 6202, 138, 9, 4434, 19, 4, 96, 183, 26, 4, 192, 15, 27, 5842, 799, 7101, 39455, 588, 84, 11, 4, 3231, 152, 339, 5206, 42, 4869, 30497, 6293, 345, 4804, 37377, 142, 43, 218, 208, 54, 29, 853, 659, 46, 4, 882, 183, 80, 115, 30, 4, 172, 174, 10, 10, 1001, 398, 1001, 1055, 526, 34, 3717, 68395, 5262, 63370, 17, 4, 6706, 1094, 871, 64, 85, 22, 2030, 1109, 38, 230, 9, 4, 4324, 20636, 251, 5056, 1034, 195, 301, 14, 16, 31, 7, 4, 46035, 8, 783, 48545, 33, 4, 2945, 103, 465, 16454, 42, 845, 45, 446, 11, 1895, 19, 184, 76, 32, 4, 5310, 207, 110, 13, 197, 4, 14906, 16, 601, 964, 2152, 595, 13, 258, 4, 1730, 66, 338, 55, 5312, 4, 550, 728, 65, 1196, 8, 1839, 61, 1546, 42, 8361, 61, 602, 120, 45, 7304, 6, 320, 786, 99, 196, 11100, 786, 5936, 4, 225, 4, 373, 1009, 33, 4, 130, 63, 69, 72, 1104, 46, 1292, 225, 14, 66, 194, 11871, 1703, 56, 8, 803, 1004, 6, 18763, 155, 11, 4, 14906, 3231, 45, 853, 2029, 8, 30, 6, 117, 430, 19, 6, 8941, 9, 15, 66, 424, 8, 2337, 178, 9, 15, 66, 424, 8, 1465, 178, 9, 15, 66, 142, 15, 9, 424, 8, 28, 178, 662, 44, 12, 17, 4, 130, 898, 1686, 9, 6, 5623, 267, 185, 430, 4, 118, 21486, 277, 15, 4, 1188, 100, 216, 56, 19, 4, 357, 114, 10399, 367, 45, 115, 93, 788, 121, 4, 14906, 79, 32, 68, 278, 39, 8, 818, 162, 4165, 237, 600, 7, 98, 306, 8, 157, 549, 628, 11, 6, 12370, 13, 824, 15, 4104, 76, 42, 138, 36, 774, 77, 1059, 159, 150, 4, 229, 497, 8, 1493, 11, 175, 251, 453, 19, 8651, 189, 12, 43, 127, 6, 394, 292, 7, 8253, 4, 107, 8, 4, 2826, 15, 1082, 1251, 9, 906, 42, 1134, 6, 66, 78, 22, 15, 13, 244, 2519, 8, 135, 233, 52, 44, 10, 10, 466, 112, 398, 526, 34, 4, 1572, 4413, 6706, 1094, 225, 57, 599, 133, 225, 6, 227, 7, 541, 4323, 6, 171, 139, 7, 539, 11890, 56, 11, 6, 3231, 21, 164, 25, 426, 81, 33, 344, 624, 19, 6, 4617, 7, 10373, 12958, 6, 5802, 4, 22, 9, 1082, 629, 237, 45, 188, 6, 55, 655, 707, 6371, 956, 225, 1456, 841, 42, 1310, 225, 6, 2493, 1467, 7722, 2828, 21, 4, 14906, 9, 364, 23, 4, 2228, 2407, 225, 24, 76, 133, 18, 4, 189, 2293, 10, 10, 814, 11, 53728, 11, 2642, 14, 47, 15, 682, 364, 352, 168, 44, 12, 45, 24, 913, 93, 21, 247, 2441, 4, 116, 34, 35, 1859, 8, 72, 177, 9, 164, 8, 901, 344, 44, 13, 191, 135, 13, 126, 421, 233, 18, 259, 10, 10, 4, 14906, 6847, 4, 14065, 3074, 7, 112, 199, 753, 357, 39, 63, 12, 115, 15222, 763, 8, 15, 35, 3282, 1523, 65, 57, 599, 6, 1916, 277, 1730, 37, 25, 92, 202, 6, 8848, 44, 25, 28, 6, 22, 15, 122, 24, 4171, 72, 33, 32]),\n",
" 0, 0, 0, 0, 0]], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 127
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "51LtAk39faw8",
"colab_type": "code",
"outputId": "9c990bfd-6060-4f50-9723-22669185c534",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_data_labels)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(2, 10)"
]
},
"metadata": {
"tags": []
},
"execution_count": 128
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "mjyuB_mtirOi",
"colab_type": "code",
"outputId": "27a0360b-e9e4-4c56-a68b-45e488423431",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
}
},
"source": [
"toy_data_labels[0]"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829]),\n",
" 1, 1, 1, 1, 1], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 129
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6wInMoDrfxS_",
"colab_type": "code",
"outputId": "ef27f09d-c4e0-460f-b1c5-09b2aa863a03",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"np.shape(toy_data_labels.T)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(10, 2)"
]
},
"metadata": {
"tags": []
},
"execution_count": 130
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "__EhSa7Ef5iU",
"colab_type": "code",
"outputId": "349d7403-ca2c-405a-9fa4-313244bd621c",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 303
}
},
"source": [
"toy_data_labels.T"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[list([1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]),\n",
" list([1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 10156, 4, 1153, 9, 194, 775, 7, 8255, 11596, 349, 2637, 148, 605, 15358, 8003, 15, 123, 125, 68, 23141, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 36893, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 25249, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 46151, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95])],\n",
" [list([1, 4, 18609, 16085, 33, 2804, 4, 2040, 432, 111, 153, 103, 4, 1494, 13, 70, 131, 67, 11, 61, 15305, 744, 35, 3715, 761, 61, 5766, 452, 9214, 4, 985, 7, 64317, 59, 166, 4, 105, 216, 1239, 41, 1797, 9, 15, 7, 35, 744, 2413, 31, 8, 4, 687, 23, 4, 33929, 7339, 6, 3693, 42, 38, 39, 121, 59, 456, 10, 10, 7, 265, 12, 575, 111, 153, 159, 59, 16, 1447, 21, 25, 586, 482, 39, 4, 96, 59, 716, 12, 4, 172, 65, 9, 579, 11, 6004, 4, 1615, 5, 23005, 7, 5168, 17, 13, 7064, 12, 19, 6, 464, 31, 314, 11, 87564, 6, 719, 605, 11, 8, 202, 27, 310, 4, 3772, 3501, 8, 2722, 58, 10, 10, 537, 2116, 180, 40, 14, 413, 173, 7, 263, 112, 37, 152, 377, 4, 537, 263, 846, 579, 178, 54, 75, 71, 476, 36, 413, 263, 2504, 182, 5, 17, 75, 2306, 922, 36, 279, 131, 2895, 17, 2867, 42, 17, 35, 921, 18435, 192, 5, 1219, 3890, 19, 20523, 217, 4122, 1710, 537, 20341, 1236, 5, 736, 10, 10, 61, 403, 9, 47289, 40, 61, 4494, 5, 27, 4494, 159, 90, 263, 2311, 4319, 309, 8, 178, 5, 82, 4319, 4, 65, 15, 9225, 145, 143, 5122, 12, 7039, 537, 746, 537, 537, 15, 7979, 4, 18665, 594, 7, 5168, 94, 9096, 3987, 15242, 11, 28280, 4, 538, 7, 1795, 246, 56615, 9, 10161, 11, 635, 14, 9, 51, 408, 12, 94, 318, 1382, 12, 47, 6, 2683, 936, 5, 6307, 10197, 19, 49, 7, 4, 1885, 13699, 1118, 25, 80, 126, 842, 10, 10, 47289, 18223, 4726, 27, 4494, 11, 1550, 3633, 159, 27, 341, 29, 2733, 19, 4185, 173, 7, 90, 16376, 8, 30, 11, 4, 1784, 86, 1117, 8, 3261, 46, 11, 25837, 21, 29, 9, 2841, 23, 4, 1010, 26747, 793, 6, 13699, 1386, 1830, 10, 10, 246, 50, 9, 6, 2750, 1944, 746, 90, 29, 16376, 8, 124, 4, 882, 4, 882, 496, 27, 33029, 2213, 537, 121, 127, 1219, 130, 5, 29, 494, 8, 124, 4, 882, 496, 4, 341, 7, 27, 846, 10, 10, 29, 9, 1906, 8, 97, 6, 236, 11120, 1311, 8, 4, 23643, 7, 31, 7, 29851, 91, 22793, 3987, 70, 4, 882, 30, 579, 42, 9, 12, 32, 11, 537, 10, 10, 11, 14, 65, 44, 537, 75, 11876, 1775, 3353, 12716, 1846, 4, 11286, 7, 154, 5, 4, 518, 53, 13243, 11286, 7, 3211, 882, 11, 399, 38, 75, 257, 3807, 19, 18223, 17, 29, 456, 4, 65, 7, 27, 205, 113, 10, 10, 33058, 4, 22793, 10359, 9, 242, 4, 91, 1202, 11377, 5, 2070, 307, 22, 7, 5168, 126, 93, 40, 18223, 13, 188, 1076, 3222, 19, 4, 13465, 7, 2348, 537, 23, 53, 537, 21, 82, 40, 18223, 13, 33195, 14, 280, 13, 219, 4, 52788, 431, 758, 859, 4, 953, 1052, 12283, 7, 5991, 5, 94, 40, 25, 238, 60, 35410, 4, 15812, 804, 27767, 7, 4, 9941, 132, 8, 67, 6, 22, 15, 9, 283, 8, 5168, 14, 31, 9, 242, 955, 48, 25, 279, 22148, 23, 12, 1685, 195, 25, 238, 60, 796, 13713, 4, 671, 7, 2804, 5, 4, 559, 154, 888, 7, 726, 50, 26, 49, 7008, 15, 566, 30, 579, 21, 64, 2574]),\n",
" list([1, 14, 47, 8, 30, 31, 7, 4, 249, 108, 7, 4, 5974, 54, 61, 369, 13, 71, 149, 14, 22, 112, 4, 2401, 311, 12, 16, 3711, 33, 75, 43, 1829, 296, 4, 86, 320, 35, 534, 19, 263, 4821, 1301, 4, 1873, 33, 89, 78, 12, 66, 16, 4, 360, 7, 4, 58, 316, 334, 11, 4, 1716, 43, 645, 662, 8, 257, 85, 1200, 42, 1228, 2578, 83, 68, 3912, 15, 36, 165, 1539, 278, 36, 69, 44076, 780, 8, 106, 14, 6905, 1338, 18, 6, 22, 12, 215, 28, 610, 40, 6, 87, 326, 23, 2300, 21, 23, 22, 12, 272, 40, 57, 31, 11, 4, 22, 47, 6, 2307, 51, 9, 170, 23, 595, 116, 595, 1352, 13, 191, 79, 638, 89, 51428, 14, 9, 8, 106, 607, 624, 35, 534, 6, 227, 7, 129, 113])],\n",
" [list([1, 6740, 365, 1234, 5, 1156, 354, 11, 14, 5327, 6638, 7, 1016, 10626, 5940, 356, 44, 4, 1349, 500, 746, 5, 200, 4, 4132, 11, 16393, 9363, 1117, 1831, 7485, 5, 4831, 26, 6, 71690, 4183, 17, 369, 37, 215, 1345, 143, 32677, 5, 1838, 8, 1974, 15, 36, 119, 257, 85, 52, 486, 9, 6, 26441, 8564, 63, 271, 6, 196, 96, 949, 4121, 4, 74170, 7, 4, 2212, 2436, 819, 63, 47, 77, 7175, 180, 6, 227, 11, 94, 2494, 33740, 13, 423, 4, 168, 7, 4, 22, 5, 89, 665, 71, 270, 56, 5, 13, 197, 12, 161, 5390, 99, 76, 23, 77842, 7, 419, 665, 40, 91, 85, 108, 7, 4, 2084, 5, 4773, 81, 55, 52, 1901]),\n",
" list([1, 249, 1323, 7, 61, 113, 10, 10, 13, 1637, 14, 20, 56, 33, 2401, 18, 457, 88, 13, 2626, 1400, 45, 3171, 13, 70, 79, 49, 706, 919, 13, 16, 355, 340, 355, 1696, 96, 143, 4, 22, 32, 289, 7, 61, 369, 71, 2359, 5, 13, 16, 131, 2073, 249, 114, 249, 229, 249, 20, 13, 28, 126, 110, 13, 473, 8, 569, 61, 419, 56, 429, 6, 1513, 18, 35, 534, 95, 474, 570, 5, 25, 124, 138, 88, 12, 421, 1543, 52, 725, 6397, 61, 419, 11, 13, 1571, 15, 1543, 20, 11, 4, 22016, 5, 296, 12, 3524, 5, 15, 421, 128, 74, 233, 334, 207, 126, 224, 12, 562, 298, 2167, 1272, 7, 2601, 5, 516, 988, 43, 8, 79, 120, 15, 595, 13, 784, 25, 3171, 18, 165, 170, 143, 19, 14, 5, 7224, 6, 226, 251, 7, 61, 113])],\n",
" [list([1, 43, 188, 46, 5, 566, 264, 51, 6, 530, 664, 14, 9, 1713, 81, 25, 1135, 46, 7, 6, 20, 750, 11, 141, 4299, 5, 15455, 4441, 102, 28, 413, 38, 120, 5533, 15, 4, 3974, 7, 5369, 142, 371, 318, 5, 955, 1713, 571, 25242, 24762, 122, 14, 8, 72, 54, 12, 86, 385, 46, 5, 14, 20, 9, 399, 8, 72, 150, 13, 161, 124, 6, 155, 44, 14, 159, 170, 83, 12, 5, 51, 6, 866, 48, 25, 842, 4, 1120, 25, 238, 79, 4, 547, 15, 14, 9, 31, 7, 148, 16126, 102, 44, 35, 480, 3823, 2380, 19, 120, 4, 350, 228, 5, 269, 8, 28, 178, 1314, 2347, 7, 51, 6, 87, 65, 12, 9, 979, 21, 95, 24, 3186, 178, 11, 40732, 14, 9, 24, 15, 20, 4, 84, 376, 4, 65, 14, 127, 141, 6, 52, 292, 7, 4751, 175, 561, 7, 68, 3866, 137, 75, 2541, 68, 182, 5, 235, 175, 333, 19, 98, 50, 9, 38, 76, 724, 4, 6750, 15, 166, 285, 36, 140, 143, 38, 76, 53, 3094, 1301, 4, 6991, 16, 82, 6, 87, 3578, 44, 2527, 7612, 5, 800, 4, 3033, 11, 35, 1728, 96, 21, 14, 22, 9, 76, 53, 7, 6, 406, 65, 13, 43, 219, 12, 639, 21, 13, 80, 140, 5, 135, 15, 14, 9, 31, 7, 4, 118, 3672, 13, 28, 126, 110]),\n",
" list([1, 778, 128, 74, 12, 630, 163, 15, 4, 1766, 7982, 1051, 43222, 32, 85, 156, 45, 40, 148, 139, 121, 664, 665, 10, 10, 1361, 173, 4, 749, 86588, 16, 3804, 8, 4, 226, 65, 12, 43, 127, 24, 15344, 10, 10])],\n",
" [list([1, 785, 189, 438, 47, 110, 142, 7, 6, 7475, 120, 4, 236, 378, 7, 153, 19, 87, 108, 141, 17, 1004, 5, 30432, 883, 10789, 23, 8, 4, 136, 13772, 11631, 4, 7475, 43, 1076, 21, 1407, 419, 5, 5202, 120, 91, 682, 189, 2818, 5, 9, 1348, 31, 7, 4, 118, 785, 189, 108, 126, 93, 13772, 16, 540, 324, 23, 6, 364, 352, 21, 14, 9, 93, 56, 18, 11, 230, 53, 771, 74, 31, 34, 4, 2834, 7, 4, 22, 5, 14, 11, 471, 9, 17547, 34, 4, 321, 487, 5, 116, 15, 6584, 4, 22, 9, 6, 2286, 4, 114, 2679, 23, 107, 293, 1008, 1172, 5, 328, 1236, 4, 1375, 109, 9, 6, 132, 773, 14799, 1412, 8, 1172, 18, 7865, 29, 9, 276, 11, 6, 2768, 19, 289, 409, 4, 5341, 2140, 20250, 648, 1430, 10136, 8914, 5, 27, 3000, 1432, 7130, 103, 6, 346, 137, 11, 4, 2768, 295, 36, 7740, 725, 6, 3208, 273, 11, 4, 1513, 15, 1367, 35, 154, 14040, 103, 19100, 173, 7, 12, 36, 515, 3547, 94, 2547, 1722, 5, 3547, 36, 203, 30, 502, 8, 361, 12, 8, 989, 143, 4, 1172, 3404, 10, 10, 328, 1236, 9, 6, 55, 221, 2989, 5, 146, 165, 179, 770, 15, 50, 713, 53, 108, 448, 23, 12, 17, 225, 38, 76, 4397, 18, 183, 8, 81, 19, 12, 45, 1257, 8, 135, 15, 13772, 166, 4, 118, 7, 45, 12831, 17, 466, 45, 24410, 4, 22, 115, 165, 764, 6075, 5, 1030, 8, 2973, 73, 469, 167, 2127, 18281, 1568, 6, 87, 841, 18, 4, 22, 4, 192, 15, 91, 7, 12, 304, 273, 1004, 4, 1375, 1172, 2768, 12356, 15, 4, 22, 764, 55, 5773, 5, 14, 4233, 7444, 4, 1375, 326, 7, 4, 4760, 1786, 8, 361, 1236, 8, 989, 46, 7, 4, 2768, 45, 55, 776, 8, 79, 496, 98, 45, 400, 301, 15, 4, 1859, 9, 4, 155, 15, 66, 21885, 84, 5, 14, 22, 1534, 15, 17, 4, 167, 12356, 15, 75, 70, 115, 66, 30, 252, 7, 618, 51, 9, 2161, 4, 3130, 5, 14, 1525, 8, 6584, 15, 13772, 165, 127, 1921, 8, 30, 179, 2532, 4, 22, 9, 906, 18, 6, 176, 7, 1007, 1005, 4, 1375, 114, 4, 105, 26, 32, 55, 221, 11, 68, 205, 96, 5, 4, 192, 15, 4, 274, 410, 220, 304, 23, 94, 205, 109, 9, 55, 73, 224, 259, 3786, 15, 4, 22, 528, 1645, 34, 4, 130, 528, 30, 685, 345, 17, 4, 277, 199, 166, 281, 5, 1030, 8, 30, 179, 4442, 444, 13772, 9, 6, 371, 87, 189, 22, 5, 31, 7, 4, 118, 7, 4, 2068, 545, 1178, 829]),\n",
" list([1, 4, 14906, 716, 4, 65, 7, 4, 689, 4367, 6308, 2343, 4804, 28674, 84206, 5270, 32099, 2315, 71688, 12572, 24785, 43394, 4, 10993, 628, 7685, 37, 9, 150, 4, 9820, 4069, 11, 2909, 4, 16287, 847, 313, 6, 176, 63860, 9, 6202, 138, 9, 4434, 19, 4, 96, 183, 26, 4, 192, 15, 27, 5842, 799, 7101, 39455, 588, 84, 11, 4, 3231, 152, 339, 5206, 42, 4869, 30497, 6293, 345, 4804, 37377, 142, 43, 218, 208, 54, 29, 853, 659, 46, 4, 882, 183, 80, 115, 30, 4, 172, 174, 10, 10, 1001, 398, 1001, 1055, 526, 34, 3717, 68395, 5262, 63370, 17, 4, 6706, 1094, 871, 64, 85, 22, 2030, 1109, 38, 230, 9, 4, 4324, 20636, 251, 5056, 1034, 195, 301, 14, 16, 31, 7, 4, 46035, 8, 783, 48545, 33, 4, 2945, 103, 465, 16454, 42, 845, 45, 446, 11, 1895, 19, 184, 76, 32, 4, 5310, 207, 110, 13, 197, 4, 14906, 16, 601, 964, 2152, 595, 13, 258, 4, 1730, 66, 338, 55, 5312, 4, 550, 728, 65, 1196, 8, 1839, 61, 1546, 42, 8361, 61, 602, 120, 45, 7304, 6, 320, 786, 99, 196, 11100, 786, 5936, 4, 225, 4, 373, 1009, 33, 4, 130, 63, 69, 72, 1104, 46, 1292, 225, 14, 66, 194, 11871, 1703, 56, 8, 803, 1004, 6, 18763, 155, 11, 4, 14906, 3231, 45, 853, 2029, 8, 30, 6, 117, 430, 19, 6, 8941, 9, 15, 66, 424, 8, 2337, 178, 9, 15, 66, 424, 8, 1465, 178, 9, 15, 66, 142, 15, 9, 424, 8, 28, 178, 662, 44, 12, 17, 4, 130, 898, 1686, 9, 6, 5623, 267, 185, 430, 4, 118, 21486, 277, 15, 4, 1188, 100, 216, 56, 19, 4, 357, 114, 10399, 367, 45, 115, 93, 788, 121, 4, 14906, 79, 32, 68, 278, 39, 8, 818, 162, 4165, 237, 600, 7, 98, 306, 8, 157, 549, 628, 11, 6, 12370, 13, 824, 15, 4104, 76, 42, 138, 36, 774, 77, 1059, 159, 150, 4, 229, 497, 8, 1493, 11, 175, 251, 453, 19, 8651, 189, 12, 43, 127, 6, 394, 292, 7, 8253, 4, 107, 8, 4, 2826, 15, 1082, 1251, 9, 906, 42, 1134, 6, 66, 78, 22, 15, 13, 244, 2519, 8, 135, 233, 52, 44, 10, 10, 466, 112, 398, 526, 34, 4, 1572, 4413, 6706, 1094, 225, 57, 599, 133, 225, 6, 227, 7, 541, 4323, 6, 171, 139, 7, 539, 11890, 56, 11, 6, 3231, 21, 164, 25, 426, 81, 33, 344, 624, 19, 6, 4617, 7, 10373, 12958, 6, 5802, 4, 22, 9, 1082, 629, 237, 45, 188, 6, 55, 655, 707, 6371, 956, 225, 1456, 841, 42, 1310, 225, 6, 2493, 1467, 7722, 2828, 21, 4, 14906, 9, 364, 23, 4, 2228, 2407, 225, 24, 76, 133, 18, 4, 189, 2293, 10, 10, 814, 11, 53728, 11, 2642, 14, 47, 15, 682, 364, 352, 168, 44, 12, 45, 24, 913, 93, 21, 247, 2441, 4, 116, 34, 35, 1859, 8, 72, 177, 9, 164, 8, 901, 344, 44, 13, 191, 135, 13, 126, 421, 233, 18, 259, 10, 10, 4, 14906, 6847, 4, 14065, 3074, 7, 112, 199, 753, 357, 39, 63, 12, 115, 15222, 763, 8, 15, 35, 3282, 1523, 65, 57, 599, 6, 1916, 277, 1730, 37, 25, 92, 202, 6, 8848, 44, 25, 28, 6, 22, 15, 122, 24, 4171, 72, 33, 32])],\n",
" [1, 0],\n",
" [1, 0],\n",
" [1, 0],\n",
" [1, 0],\n",
" [1, 0]], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 131
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "zisY1tlYf-0e",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": 0,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment