Skip to content

Instantly share code, notes, and snippets.

@ravigurnatham
Last active December 30, 2020 03:16
Show Gist options
  • Save ravigurnatham/6d38f42122d86661fb6605c62423483f to your computer and use it in GitHub Desktop.
Save ravigurnatham/6d38f42122d86661fb6605c62423483f to your computer and use it in GitHub Desktop.
EDA.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "EDA.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ravigurnatham/6d38f42122d86661fb6605c62423483f/eda.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cW_MVIlps5WQ"
},
"source": [
"<h1>3. Exploratory Data Analysis </h1>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sNzZdmBJs5WS",
"outputId": "0e1df4ed-4a74-4b0e-e84e-1b3862bbf55d"
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"from subprocess import check_output\n",
"%matplotlib inline\n",
"import plotly.offline as py\n",
"py.init_notebook_mode(connected=True)\n",
"import plotly.graph_objs as go\n",
"import plotly.tools as tls\n",
"import os\n",
"import gc\n",
"\n",
"import re\n",
"from nltk.corpus import stopwords\n",
"import distance\n",
"from nltk.stem import PorterStemmer\n",
"from bs4 import BeautifulSoup"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<script>requirejs.config({paths: { 'plotly': ['https://cdn.plot.ly/plotly-latest.min']},});if(!window.Plotly) {{require(['plotly'],function(plotly) {window.Plotly=plotly;});}}</script>"
],
"text/vnd.plotly.v1+html": "<script>requirejs.config({paths: { 'plotly': ['https://cdn.plot.ly/plotly-latest.min']},});if(!window.Plotly) {{require(['plotly'],function(plotly) {window.Plotly=plotly;});}}</script>"
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "__T8jddGs5Wc"
},
"source": [
"<h2> 3.1 Reading data and basic stats </h2>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ifM_s9rvs5Wd",
"outputId": "2e17a7bc-9a5b-4c43-d35b-081cc9f92528"
},
"source": [
"df = pd.read_csv(\"train.csv\")\n",
"\n",
"print(\"Number of data points:\",df.shape[0])"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Number of data points: 404290\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "34zXGW8gs5Wj",
"outputId": "ab7d570a-9eeb-477a-b7cb-663ff6fd04fa"
},
"source": [
"df.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>qid1</th>\n",
" <th>qid2</th>\n",
" <th>question1</th>\n",
" <th>question2</th>\n",
" <th>is_duplicate</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>What is the step by step guide to invest in sh...</td>\n",
" <td>What is the step by step guide to invest in sh...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>What is the story of Kohinoor (Koh-i-Noor) Dia...</td>\n",
" <td>What would happen if the Indian government sto...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" <td>How can I increase the speed of my internet co...</td>\n",
" <td>How can Internet speed be increased by hacking...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>7</td>\n",
" <td>8</td>\n",
" <td>Why am I mentally very lonely? How can I solve...</td>\n",
" <td>Find the remainder when [math]23^{24}[/math] i...</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>9</td>\n",
" <td>10</td>\n",
" <td>Which one dissolve in water quikly sugar, salt...</td>\n",
" <td>Which fish would survive in salt water?</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" id qid1 qid2 question1 \\\n",
"0 0 1 2 What is the step by step guide to invest in sh... \n",
"1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n",
"2 2 5 6 How can I increase the speed of my internet co... \n",
"3 3 7 8 Why am I mentally very lonely? How can I solve... \n",
"4 4 9 10 Which one dissolve in water quikly sugar, salt... \n",
"\n",
" question2 is_duplicate \n",
"0 What is the step by step guide to invest in sh... 0 \n",
"1 What would happen if the Indian government sto... 0 \n",
"2 How can Internet speed be increased by hacking... 0 \n",
"3 Find the remainder when [math]23^{24}[/math] i... 0 \n",
"4 Which fish would survive in salt water? 0 "
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "mx4DFwMns5Wp",
"outputId": "1141e0bb-2750-489e-8b8c-2ba680f7416c"
},
"source": [
"df.info()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 404290 entries, 0 to 404289\n",
"Data columns (total 6 columns):\n",
"id 404290 non-null int64\n",
"qid1 404290 non-null int64\n",
"qid2 404290 non-null int64\n",
"question1 404290 non-null object\n",
"question2 404288 non-null object\n",
"is_duplicate 404290 non-null int64\n",
"dtypes: int64(4), object(2)\n",
"memory usage: 18.5+ MB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HHHTGTzws5Ww"
},
"source": [
"We are given a minimal number of data fields here, consisting of:\n",
"\n",
"- id: Looks like a simple rowID\n",
"- qid{1, 2}: The unique ID of each question in the pair\n",
"- question{1, 2}: The actual textual contents of the questions.\n",
"- is_duplicate: The label that we are trying to predict - whether the two questions are duplicates of each other."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZulqVzTDs5Wx"
},
"source": [
"<h3> 3.2.1 Distribution of data points among output classes</h3>\n",
"- Number of duplicate(smilar) and non-duplicate(non similar) questions"
]
},
{
"cell_type": "code",
"metadata": {
"id": "YHp64yNjs5Wx",
"outputId": "361ddf04-d545-45f9-dbe2-8bebd695e8da"
},
"source": [
"df.groupby(\"is_duplicate\")['id'].count().plot.bar()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x22b00727d30>"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEHCAYAAABSjBpvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEi9JREFUeJzt3X+s3XV9x/Hny1acDhWUSliLlmiXiSyiNkB0P5gsUHBZ\ncYMMtkiHzeoMJJroJposOJUEs6gZm7JgqBTjRIY6Gq3WDnHOyI9epAIVtTeIUkugWkQcUQe+98f5\nXD1cTu/99N7Kqd7nI/nmfM/78+P7OUnbV74/zmmqCkmSejxp3AuQJP3qMDQkSd0MDUlSN0NDktTN\n0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHVbPO4F7G+HHXZYLV++fNzLkKRfKbfccsv3qmrJbP1+7UJj\n+fLlTExMjHsZkvQrJcm3e/p5eUqS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUrdf\nuy/3/apYfsGnx72EXyt3X/yqcS9BWhBmPdNIcmSS65PcmWR7kje0+tuTfDfJtradNjTmrUkmk3wj\nySlD9VWtNpnkgqH6UUluSrIjyceSHNTqT2nvJ1v78v354SVJ+6bn8tQjwJuq6oXACcB5SY5ube+r\nqmPbtgmgtZ0FvAhYBXwgyaIki4D3A6cCRwNnD83z7jbXCuABYG2rrwUeqKoXAO9r/SRJYzJraFTV\nvVX1lbb/EHAnsHSGIauBq6rqJ1X1LWASOK5tk1V1V1X9FLgKWJ0kwCuBa9r4DcDpQ3NtaPvXACe1\n/pKkMdinG+Ht8tBLgJta6fwktyVZn+TQVlsK3DM0bGer7a3+bOAHVfXItPpj5mrtD7b+kqQx6A6N\nJAcDHwfeWFU/BC4Fng8cC9wLvGeq64jhNYf6THNNX9u6JBNJJnbv3j3j55AkzV1XaCR5MoPA+EhV\nfQKgqu6rqker6mfABxlcfoLBmcKRQ8OXAbtmqH8POCTJ4mn1x8zV2p8J7Jm+vqq6rKpWVtXKJUtm\n/Tl4SdIc9Tw9FeBy4M6qeu9Q/Yihbq8G7mj7G4Gz2pNPRwErgJuBrcCK9qTUQQxulm+sqgKuB85o\n49cA1w7NtabtnwF8vvWXJI1Bz/c0XgG8Brg9ybZWexuDp5+OZXC56G7gdQBVtT3J1cDXGDx5dV5V\nPQqQ5HxgM7AIWF9V29t8bwGuSvIu4FYGIUV7/XCSSQZnGGfN47NKkuZp1tCoqi8x+t7CphnGXARc\nNKK+adS4qrqLX1zeGq7/GDhztjVKkp4Y/oyIJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiS\nuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiS\nuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6GhiSpm6EhSepmaEiS\nuhkakqRus4ZGkiOTXJ/kziTbk7yh1Z+VZEuSHe310FZPkkuSTCa5LclLh+Za0/rvSLJmqP6yJLe3\nMZckyUzHkCSNR8+ZxiPAm6rqhcAJwHlJjgYuAK6rqhXAde09wKnAiratAy6FQQAAFwLHA8cBFw6F\nwKWt79S4Va2+t2NIksZg1tCoqnur6itt/yHgTmApsBrY0LptAE5v+6uBK2vgRuCQJEcApwBbqmpP\nVT0AbAFWtbZnVNUNVVXAldPmGnUMSdIY7NM9jSTLgZcANwGHV9W9MAgW4Dmt21LgnqFhO1ttpvrO\nEXVmOMb0da1LMpFkYvfu3fvykSRJ+6A7NJIcDHwceGNV/XCmriNqNYd6t6q6rKpWVtXKJUuW7MtQ\nSdI+6AqNJE9mEBgfqapPtPJ97dIS7fX+Vt8JHDk0fBmwa5b6shH1mY4hSRqDnqenAlwO3FlV7x1q\n2ghMPQG1Brh2qH5Oe4rqBODBdmlpM3BykkPbDfCTgc2t7aEkJ7RjnTNtrlHHkCSNweKOPq8AXgPc\nnmRbq70NuBi4Osla4DvAma1tE3AaMAk8DJwLUFV7krwT2Nr6vaOq9rT91wNXAE8FPtM2ZjiGJGkM\nZg2NqvoSo+87AJw0on8B5+1lrvXA+hH1CeCYEfXvjzqGJGk8/Ea4JKmboSFJ6mZoSJK6GRqSpG6G\nhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6G\nhiSpm6EhSepmaEiSuhkakqRuhoYkqZuhIUnqZmhIkroZGpKkboaGJKmboSFJ6mZoSJK6GRqSpG6G\nhiSpm6EhSepmaEiSus0aGknWJ7k/yR1Dtbcn+W6SbW07bajtrUkmk3wjySlD9VWtNpnkgqH6UUlu\nSrIjyceSHNTqT2nvJ1v78v31oSVJc9NzpnEFsGpE/X1VdWzbNgEkORo4C3hRG/OBJIuSLALeD5wK\nHA2c3foCvLvNtQJ4AFjb6muBB6rqBcD7Wj9J0hjNGhpV9UVgT+d8q4GrquonVfUtYBI4rm2TVXVX\nVf0UuApYnSTAK4Fr2vgNwOlDc21o+9cAJ7X+kqQxmc89jfOT3NYuXx3aakuBe4b67Gy1vdWfDfyg\nqh6ZVn/MXK39wdZfkjQmi+c47lLgnUC11/cArwVGnQkUo8OpZujPLG2PkWQdsA7guc997kzrljSL\n5Rd8etxL+LVy98WvGvcS9qs5nWlU1X1V9WhV/Qz4IIPLTzA4UzhyqOsyYNcM9e8BhyRZPK3+mLla\n+zPZy2WyqrqsqlZW1colS5bM5SNJkjrMKTSSHDH09tXA1JNVG4Gz2pNPRwErgJuBrcCK9qTUQQxu\nlm+sqgKuB85o49cA1w7NtabtnwF8vvWXJI3JrJenknwUOBE4LMlO4ELgxCTHMrhcdDfwOoCq2p7k\nauBrwCPAeVX1aJvnfGAzsAhYX1Xb2yHeAlyV5F3ArcDlrX458OEkkwzOMM6a96eVJM3LrKFRVWeP\nKF8+ojbV/yLgohH1TcCmEfW7+MXlreH6j4EzZ1ufJOmJ4zfCJUndDA1JUjdDQ5LUzdCQJHUzNCRJ\n3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ\n3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ\n3QwNSVI3Q0OS1M3QkCR1mzU0kqxPcn+SO4Zqz0qyJcmO9npoqyfJJUkmk9yW5KVDY9a0/juSrBmq\nvyzJ7W3MJUky0zEkSePTc6ZxBbBqWu0C4LqqWgFc194DnAqsaNs64FIYBABwIXA8cBxw4VAIXNr6\nTo1bNcsxJEljMmtoVNUXgT3TyquBDW1/A3D6UP3KGrgROCTJEcApwJaq2lNVDwBbgFWt7RlVdUNV\nFXDltLlGHUOSNCZzvadxeFXdC9Ben9PqS4F7hvrtbLWZ6jtH1Gc6xuMkWZdkIsnE7t275/iRJEmz\n2d83wjOiVnOo75OquqyqVlbVyiVLluzrcElSp7mGxn3t0hLt9f5W3wkcOdRvGbBrlvqyEfWZjiFJ\nGpO5hsZGYOoJqDXAtUP1c9pTVCcAD7ZLS5uBk5Mc2m6Anwxsbm0PJTmhPTV1zrS5Rh1DkjQmi2fr\nkOSjwInAYUl2MngK6mLg6iRrge8AZ7bum4DTgEngYeBcgKrak+SdwNbW7x1VNXVz/fUMntB6KvCZ\ntjHDMSRJYzJraFTV2XtpOmlE3wLO28s864H1I+oTwDEj6t8fdQxJ0vj4jXBJUjdDQ5LUzdCQJHUz\nNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUz\nNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAkdTM0JEndDA1JUjdDQ5LUzdCQJHUz\nNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd3mFRpJ7k5ye5JtSSZa7VlJtiTZ0V4PbfUkuSTJZJLbkrx0\naJ41rf+OJGuG6i9r80+2sZnPeiVJ87M/zjT+qKqOraqV7f0FwHVVtQK4rr0HOBVY0bZ1wKUwCBng\nQuB44DjgwqmgaX3WDY1btR/WK0mao1/G5anVwIa2vwE4fah+ZQ3cCByS5AjgFGBLVe2pqgeALcCq\n1vaMqrqhqgq4cmguSdIYzDc0CvhckluSrGu1w6vqXoD2+pxWXwrcMzR2Z6vNVN85ov44SdYlmUgy\nsXv37nl+JEnS3iye5/hXVNWuJM8BtiT5+gx9R92PqDnUH1+sugy4DGDlypUj+0iS5m9eZxpVtau9\n3g98ksE9ifvapSXa6/2t+07gyKHhy4Bds9SXjahLksZkzqGR5DeTPH1qHzgZuAPYCEw9AbUGuLbt\nbwTOaU9RnQA82C5fbQZOTnJouwF+MrC5tT2U5IT21NQ5Q3NJksZgPpenDgc+2Z6CXQz8e1V9NslW\n4Ooka4HvAGe2/puA04BJ4GHgXICq2pPkncDW1u8dVbWn7b8euAJ4KvCZtkmSxmTOoVFVdwEvHlH/\nPnDSiHoB5+1lrvXA+hH1CeCYua5RkrR/+Y1wSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAk\ndTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAk\ndTM0JEndDA1JUjdDQ5LUzdCQJHUzNCRJ3QwNSVI3Q0OS1M3QkCR1MzQkSd0MDUlSN0NDktTN0JAk\ndTvgQyPJqiTfSDKZ5IJxr0eSFrIDOjSSLALeD5wKHA2cneTo8a5KkhauAzo0gOOAyaq6q6p+ClwF\nrB7zmiRpwTrQQ2MpcM/Q+52tJkkag8XjXsAsMqJWj+uUrAPWtbc/SvKNX+qqFpbDgO+NexGzybvH\nvQKNgX8296/n9XQ60ENjJ3Dk0PtlwK7pnarqMuCyJ2pRC0mSiapaOe51SNP5Z3M8DvTLU1uBFUmO\nSnIQcBawccxrkqQF64A+06iqR5KcD2wGFgHrq2r7mJclSQvWAR0aAFW1Cdg07nUsYF7204HKP5tj\nkKrH3VeWJGmkA/2ehiTpAGJoSJK6HfD3NPTESfI7DL5xv5TB92F2ARur6s6xLkzSAcMzDQGQ5C0M\nfqYlwM0MHncO8FF/KFLSFG+EC4Ak3wReVFX/N61+ELC9qlaMZ2XSzJKcW1UfGvc6FgrPNDTlZ8Bv\njagf0dqkA9U/jnsBC4n3NDTljcB1SXbwix+JfC7wAuD8sa1KApLctrcm4PAnci0LnZen9HNJnsTg\n5+iXMvjLuBPYWlWPjnVhWvCS3AecAjwwvQn4clWNOkvWL4FnGvq5qvoZcOO41yGN8Cng4KraNr0h\nyRee+OUsXJ5pSJK6eSNcktTN0JAkdTM0tGAl+fI8x/91kn+dx/i7kxw2n7UkOT3J0XNdg7SvDA0t\nWFX18nGvYco81nI6YGjoCWNoaMFK8qP2ekSSLybZluSOJL8/w5hzk3wzyX8DrxiqX5HkjBFzn9jm\n/mSSryX5t/Zo88i1tP2/T3J7kq8mubjV/ibJ1lb7eJKnJXk58KfAP7W1P79tn01yS5L/ab8nJu03\nPnIrwV8Cm6vqoiSLgKeN6pTkCAbfPn4Z8CBwPXBrx/zHMTgb+DbwWeDPgGv2coxTGZw9HF9VDyd5\nVmv6RFV9sPV5F7C2qv4lyUbgU1V1TWu7DvjbqtqR5HjgA8ArO9YodTE0pMGPM65P8mTgP0d9F6A5\nHvhCVe0GSPIx4Lc75r+5qu5qYz4K/B57CQ3gj4EPVdXDAFW1p9WPaWFxCHAwg/8C+TGSHAy8HPiP\nJFPlp3SsT+rm5SkteFX1ReAPgO8CH05yzkzd91J/hPb3KYN/sQ+aYcxMX47KXtqvAM6vqt9lcLbz\nGyP6PAn4QVUdO7S9cIZjSfvM0NCCl+R5wP3t8s/lwEv30vUm4MQkz25nJWcOtd3N4LIVDP5PkicP\ntR2X5Kh2L+MvgC/NsJzPAa9N8rS2tqnLU08H7m3H/auh/g+1Nqrqh8C3kpzZxibJi2c4lrTPDA0J\nTgS2JbkV+HPgn0d1qqp7gbcDNwD/BXxlqPmDwB8muZnBZaz/HWq7AbgYuAP4FvDJvS2kqj4LbAQm\nkmwD3tya/oFBaG0Bvj405Crg75LcmuT5DAJlbZKvAtsZBJi03/gzItIvUZITgTdX1Z+Mey3S/uCZ\nhiSpm2ca0ghJbuLxTx69pqpuH8d6pAOFoSFJ6ublKUlSN0NDktTN0JAkdTM0JEndDA1JUrf/Bxeh\nKdRMn+OVAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x22b7737e198>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "-usI2K2bs5W4",
"outputId": "ff0a6a8b-65ad-487a-d5ec-df3c223ba620"
},
"source": [
"print('~> Total number of question pairs for training:\\n {}'.format(len(df)))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"~> Total number of question pairs for training:\n",
" 404290\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YiPia6Pjs5W_",
"outputId": "3cde4cec-4314-4c14-e807-b35e969bf9e8"
},
"source": [
"print('~> Question pairs are not Similar (is_duplicate = 0):\\n {}%'.format(100 - round(df['is_duplicate'].mean()*100, 2)))\n",
"print('\\n~> Question pairs are Similar (is_duplicate = 1):\\n {}%'.format(round(df['is_duplicate'].mean()*100, 2)))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"~> Question pairs are not Similar (is_duplicate = 0):\n",
" 63.08%\n",
"\n",
"~> Question pairs are Similar (is_duplicate = 1):\n",
" 36.92%\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wGX03QVRs5XF"
},
"source": [
"<h3> 3.2.2 Number of unique questions </h3>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "VOKa6aU2s5XG",
"outputId": "8f644b1d-27c0-4d63-84e2-bb2a42419be2"
},
"source": [
"qids = pd.Series(df['qid1'].tolist() + df['qid2'].tolist())\n",
"unique_qs = len(np.unique(qids))\n",
"qs_morethan_onetime = np.sum(qids.value_counts() > 1)\n",
"print ('Total number of Unique Questions are: {}\\n'.format(unique_qs))\n",
"#print len(np.unique(qids))\n",
"\n",
"print ('Number of unique questions that appear more than one time: {} ({}%)\\n'.format(qs_morethan_onetime,qs_morethan_onetime/unique_qs*100))\n",
"\n",
"print ('Max number of times a single question is repeated: {}\\n'.format(max(qids.value_counts()))) \n",
"\n",
"q_vals=qids.value_counts()\n",
"\n",
"q_vals=q_vals.values"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Total num of Unique Questions are: 537933\n",
"\n",
"Number of unique questions that appear more than one time: 111780 (20.77953945937505%)\n",
"\n",
"Max number of times a single question is repeated: 157\n",
"\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "plcvbd4Cs5XM",
"outputId": "8e137cc1-e0c4-44f4-9cc2-703302206d4f"
},
"source": [
"\n",
"x = [\"unique_questions\" , \"Repeated Questions\"]\n",
"y = [unique_qs , qs_morethan_onetime]\n",
"\n",
"plt.figure(figsize=(10, 6))\n",
"plt.title (\"Plot representing unique and repeated questions \")\n",
"sns.barplot(x,y)\n",
"plt.show()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAF2CAYAAADTMMRFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcZFV99/HPVxaXoCwyGGHQQRkTQQ0KQRKjQfFBcAk8\nBhKMETQkRB+MMREjLgmikmhi1BAVQwKyuCAqKkEMjgtuAWVQBFEJEwQZITI4KBAVBX7PH/e01rRV\n3T3DQB+mP+/Xq19dde6555xbVX37W/fcW5WqQpIkSf26x3wPQJIkSTMzsEmSJHXOwCZJktQ5A5sk\nSVLnDGySJEmdM7BJkiR1zsCmBSPJuUn+eL7HcXeT5OYkD5nvcYxK8vgkl833OO6oJJVkx/kex50t\nyauTvGu+xzGTJK9I8m/zPQ5pEgObNihJrkzyoxYyvpvknUk2W8s2lrR/pBvfWePs1bhQW1WbVdUV\n8zWmcarqc1X1K/M9joWi/V09eb7Hsb4k2TPJytGyqvrbqvINnbplYNOG6BlVtRnwGODXgVfd2R2u\n73C3EMOiJsvA/bW0gLkD0Aarqr4DfAx4xPRlSe6R5FVJrkpyXZJTkmzeFn+2/f5+O1L3G2PWf3WS\nDyR5V5Ibgee2No9M8t9Jvpfk9CRbtfpTR+0OS3JNkmuTvOQOtHevVvd7Sb6f5IIkD2jLNk9yQuvj\nO0lel2Sjtuy5ST6f5I1JbkjyrST7tmXHAI8H3tq2+62t/GfTdklOSvK2JB9NclOSLyZ56Mh27J3k\nsiQ/SPL2JJ+ZNA3d2nrdyP01jnq0ozpHJLm4tfe+JPeaUPfRSb7cxvS+JKdNtT21zdP6Ht2me7bH\n49vtqOw7ktx7wpgfmuRT7XG/Psm7k2wxlzG35S9tz8s1Sf5oXB8jdc9NckySLwA/BB4yh+f2C0n+\nufX9zSR7jbQ307oTtyvJqcCDgH9vr4u/auV7JPnP9vr7apI9R/raoT33NyVZBmw9y7au8bhMe37W\nOOo7/flM8qtJliVZ3V57vzey7KlJvt7G8Z323PwSw35h27Y9NyfZNtOmbZP8TpJL2/adm+Thc3me\nk2yd5Ky23uokn4thW+uBLyJtsJJsDzwV+MqYxc9tP08EHgJsBry1LXtC+71Fmw48b0IX+wEfALYA\n3g28CNgf+G1gW+AG4G3T1nkisBTYGzgya04zrU17hwCbA9sD9weeD/yoLTsZuBXYEXh062s0ND0W\nuIzhn+jfAyckSVW9Evgc8MK23S+csN3PAo4GtgRWAMfA8I+qjf/lbUyXAb85oY25+j1gH2AH4FEM\nz9kakmwKfBg4FdgKeD/wu2vRxxuAhwG7MDxm2wF/M6FugL9jeD4ezvD4v3ouY06yD3AE8H8YXgNz\nmWJ8DnAYcF/gKub23F7B8NweBZyRFvJnWXfidlXVc4Bv045cV9XfJ9kO+CjwOobH/Ajgg0kWtfbe\nA1zYxvFahtfrWOv4uEyt+0vAstbfNgyvzbcn2blVOQH406q6L8Mbt09V1f8C+wLXtO3ZrKqumdbu\nw4D3Ai8GFgFnMwTWTUeqTXptvgRY2dZ7APAKwO+A1B1mYNOG6MNJvg98HvgM8Ldj6jwbeFNVXVFV\nNzOEjIOydlOR51XVh6vq9qr6EfCnwCuramVV3cLwD++AaW0eXVX/W1WXAO9k+AezLu39lCEU7VhV\nt1XVhVV1Y4ajbPsCL279XAe8GThopJ+rqupfq+o2hn/iD2T4xzJXZ1TVl6rqVoZguUsrfypwaVWd\n0ZYdC/zPWrQ7zrFVdU1VrQb+faSvUXsAmwBvqaqfVtUHgAvm0niSAH8C/EVVra6qmxheLweNq19V\nK6pqWVXdUlWrgDcxBOq5jPn3gHdW1ddaaHj1HIZ4UlVd2h7PrZj9ub2Onz8O72MIzU+b7XUxx+0a\n9YfA2VV1dnu9LgOWA09N8iCGUxH+urX32fY4TLIuj8uUpwNXVtU7q+rWqvoy8EHggLb8p8BOSe5X\nVTe05XPx+8BH22PyU+CNwL1Z8w3IpOf5pwx/Uw9uz8Pnyi/t1nrgeTLaEO1fVZ+Ypc62DEcsplzF\n8PewNsHl6mn3Hwx8KMntI2W3TWtzdJ2rgEeuY3unMhwFOa1NXb0LeGVbZxPg2iGLAMMbs9G2fxai\nquqHrd7aXJgxGsJ+OLLutqP9VFVl2ond62B6X9uOqbMt8J1p/xSvGlNvnEXAfYALRx6vABuNq5xk\nG4Yg+niGo173YDjyOZcxb8tw1Gltxjj6vM3luR33OGw727pz3K5RDwYOTPKMkbJNgE+3/m5o4Wt0\nHNtPaGtdHpfRcTy2vUGbsjHD3wcMR1pfBbw+ycXAkTMcMZ8+pp+No6puT3I1w9HXKZOe539gCJ0f\nb4/18VX1+jlvkTSBR9i0UF3DsLOf8iCG6aLvMvfpi+n1rgb2raotRn7u1c6lmzL6T+tBbRxr3V57\n5350Ve3E8K7/6cDBbZ1bgK1H1rlfVe3M3NyRIwHXAoun7rSjV4snV+d/GcLSlF++A/1ul5EkwvDY\nju0nyWg/1zNMJe888nht3i5aGefvGB6jR1XV/RiONGVC3XHjnP78z2b0+ZjLczvucbhmDuvOtl3j\nXpunTntt/lILJtcCW7bpyrls62yPy0yvk6uBz0wbx2ZV9QKAqrqgqvZjmC79MHD6hO2Zbo39Q3tM\ntwe+M3GNqYarbqqql1TVQ4BnAH85ei6htK4MbFqo3gv8RTs5ejOGabD3tamnVcDtDOe2rY13AMck\neTBAkkVJ9ptW56+T3KedY/M84H3r0l6SJyZ5ZIaTxm9kmIa5raquBT4O/GOS+2W4cOGhSWaa3hr1\nXdZ+u6d8FHhkkv3btO3hzBzCLmKYQtuqhagXr2O/5zGE7Rcl2TjJM4HdR5Z/Fdg5yS7txPBXTy2o\nqtuBfwXe3I4ykWS7JE+Z0Nd9gZsZLkjZDnjpWozzdIaLSXZKch+Gc8zmbI7P7TYMj8MmSQ5kOB/t\n7DmsO9t2TX9dvAt4RpKnJNkow0UweyZZXFVXMUyPHp1k0yS/xRBc1vVxuQh4Zvu72RE4dGTZWcDD\nkjynbfMmSX49ycNb389Osnmb1ryR4Qj11PbcPz+/0GjcmJ6WZK8kmzCcl3YL8J8zbAcASZ6eZMcW\n8qb6vG2W1aRZGdi0UJ3IMG3yWeBbwI+BP4NhmpDhRPovtCu99phjm/8EnMkwFXITcD7DSeCjPsNw\nov4ngTdW1cfXsb1fZjjB/0bgG63dqSvcDgY2Bb7OMK31AYZzaua6DQdkuIL02DmuA0BVXQ8cyHAh\nw/eAnRj+cd8yYZVTGcLUlQxhYqbwOlO/PwGeyXDS9w0M5x+dMbL8v4DXAJ8ALmc4t3HUyxiek/Mz\nXKH7CWDSZ7wdzfBxMT9gCKhnTKg3bpwfA94CfKr196m5rjtituf2iwwn7l/P8Bo+oKq+N4d1Z9uu\nvwNe1f4ejqiqqxkuknkFwxucqxlC3tT/lD9geK2uZghgp0zaoDk8Lm8GfsIQsk5mOG9yat2bGC6e\nOIjhqNj/MFxEcs9W5TnAle15fT7DkUOq6psMb9quaNu0xlR7VV3W6v4zw2P5DIaLLn4yaTtGLGV4\nDd3M8Gbi7VV17hzWk2YUz4WU7nxJljAEw03aUbwNXoaPMlgJPLuqPn0X930SsLKq7vTP4OtFkucC\nf1xVvzXfY7mjkhSwtKpWzPdYpF54hE3SetOmyLZIck+Goy9hODIoSboDDGyS1qffAP6bn08j7V/D\nR5RIku4Ap0QlSZI65xE2SZKkzhnYJEmSOrfBfdPB1ltvXUuWLJnvYUiSJM3qwgsvvL6qFs1Wb4ML\nbEuWLGH58uXzPQxJkqRZJZnT17E5JSpJktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS\n5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJktS5jed7AHd3u770lPke\ngrQgXfgPB8/3ECTpLuMRNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnq\nnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlz\nBjZJkqTOGdgkSZI6N6fAluTKJJckuSjJ8la2VZJlSS5vv7ds5UlybJIVSS5O8piRdg5p9S9PcshI\n+a6t/RVt3czUhyRJ0kKyNkfYnlhVu1TVbu3+kcAnq2op8Ml2H2BfYGn7OQw4DobwBRwFPBbYHThq\nJIAd1+pOrbfPLH1IkiQtGHdkSnQ/4OR2+2Rg/5HyU2pwPrBFkgcCTwGWVdXqqroBWAbs05bdr6rO\nq6oCTpnW1rg+JEmSFoy5BrYCPp7kwiSHtbIHVNW1AO33Nq18O+DqkXVXtrKZyleOKZ+pjzUkOSzJ\n8iTLV61aNcdNkiRJunvYeI71HldV1yTZBliW5Jsz1M2YslqH8jmrquOB4wF22223tVpXkiSpd3M6\nwlZV17Tf1wEfYjgH7bttOpP2+7pWfSWw/cjqi4FrZilfPKacGfqQJElaMGYNbEl+Kcl9p24DewNf\nA84Epq70PAT4SLt9JnBwu1p0D+AHbTrzHGDvJFu2iw32Bs5py25Kske7OvTgaW2N60OSJGnBmMuU\n6AOAD7VP2tgYeE9V/UeSC4DTkxwKfBs4sNU/G3gqsAL4IfA8gKpaneS1wAWt3muqanW7/QLgJODe\nwMfaD8DrJ/QhSZK0YMwa2KrqCuDXxpR/D9hrTHkBh09o60TgxDHly4FHzLUPSZKkhcRvOpAkSeqc\ngU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMG\nNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnY\nJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CT\nJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2S\nJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmS\npM4Z2CRJkjo358CWZKMkX0lyVru/Q5IvJrk8yfuSbNrK79nur2jLl4y08fJWflmSp4yU79PKViQ5\ncqR8bB+SJEkLydocYftz4Bsj998AvLmqlgI3AIe28kOBG6pqR+DNrR5JdgIOAnYG9gHe3kLgRsDb\ngH2BnYBntboz9SFJkrRgzCmwJVkMPA34t3Y/wJOAD7QqJwP7t9v7tfu05Xu1+vsBp1XVLVX1LWAF\nsHv7WVFVV1TVT4DTgP1m6UOSJGnBmOsRtrcAfwXc3u7fH/h+Vd3a7q8Etmu3twOuBmjLf9Dq/6x8\n2jqTymfqQ5IkacGYNbAleTpwXVVdOFo8pmrNsmx9lY8b42FJlidZvmrVqnFVJEmS7rbmcoTtccDv\nJLmSYbrySQxH3LZIsnGrsxi4pt1eCWwP0JZvDqweLZ+2zqTy62foYw1VdXxV7VZVuy1atGgOmyRJ\nknT3MWtgq6qXV9XiqlrCcNHAp6rq2cCngQNatUOAj7TbZ7b7tOWfqqpq5Qe1q0h3AJYCXwIuAJa2\nK0I3bX2c2daZ1IckSdKCcUc+h+1lwF8mWcFwvtkJrfwE4P6t/C+BIwGq6lLgdODrwH8Ah1fVbe0c\ntRcC5zBchXp6qztTH5IkSQvGxrNX+bmqOhc4t92+guEKz+l1fgwcOGH9Y4BjxpSfDZw9pnxsH5Ik\nSQuJ33QgSZLUOQObJElS5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJ\nktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJ\nUucMbJIkSZ0zsEmSJHXOwCZJktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS5wxskiRJ\nnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJktQ5A5skSVLnDGySJEmdM7BJkiR1\nzsAmSZLUOQObJElS5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJktQ5\nA5skSVLnDGySJEmdM7BJkiR1zsAmSZLUuVkDW5J7JflSkq8muTTJ0a18hyRfTHJ5kvcl2bSV37Pd\nX9GWLxlp6+Wt/LIkTxkp36eVrUhy5Ej52D4kSZIWkrkcYbsFeFJV/RqwC7BPkj2ANwBvrqqlwA3A\noa3+ocANVbUj8OZWjyQ7AQcBOwP7AG9PslGSjYC3AfsCOwHPanWZoQ9JkqQFY9bAVoOb291N2k8B\nTwI+0MpPBvZvt/dr92nL90qSVn5aVd1SVd8CVgC7t58VVXVFVf0EOA3Yr60zqQ9JkqQFY07nsLUj\nYRcB1wHLgP8Gvl9Vt7YqK4Ht2u3tgKsB2vIfAPcfLZ+2zqTy+8/QhyRJ0oIxp8BWVbdV1S7AYoYj\nYg8fV639zoRl66v8FyQ5LMnyJMtXrVo1rookSdLd1lpdJVpV3wfOBfYAtkiycVu0GLim3V4JbA/Q\nlm8OrB4tn7bOpPLrZ+hj+riOr6rdqmq3RYsWrc0mSZIkdW8uV4kuSrJFu31v4MnAN4BPAwe0aocA\nH2m3z2z3acs/VVXVyg9qV5HuACwFvgRcACxtV4RuynBhwpltnUl9SJIkLRgbz16FBwInt6s57wGc\nXlVnJfk6cFqS1wFfAU5o9U8ATk2yguHI2kEAVXVpktOBrwO3AodX1W0ASV4InANsBJxYVZe2tl42\noQ9JkqQFY9bAVlUXA48eU34Fw/ls08t/DBw4oa1jgGPGlJ8NnD3XPiRJkhYSv+lAkiSpcwY2SZKk\nzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6\nZ2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqc\ngU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMG\nNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnY\nJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CT\nJEnqnIFNkiSpc7MGtiTbJ/l0km8kuTTJn7fyrZIsS3J5+71lK0+SY5OsSHJxkseMtHVIq395kkNG\nyndNcklb59gkmakPSZKkhWQuR9huBV5SVQ8H9gAOT7ITcCTwyapaCnyy3QfYF1jafg4DjoMhfAFH\nAY8FdgeOGglgx7W6U+vt08on9SFJkrRgzBrYquraqvpyu30T8A1gO2A/4ORW7WRg/3Z7P+CUGpwP\nbJHkgcBTgGVVtbqqbgCWAfu0ZferqvOqqoBTprU1rg9JkqQFY63OYUuyBHg08EXgAVV1LQyhDtim\nVdsOuHpktZWtbKbylWPKmaGP6eM6LMnyJMtXrVq1NpskSZLUvTkHtiSbAR8EXlxVN85UdUxZrUP5\nnFXV8VW1W1XttmjRorVZVZIkqXtzCmxJNmEIa++uqjNa8XfbdCbt93WtfCWw/cjqi4FrZilfPKZ8\npj4kSZIWjLlcJRrgBOAbVfWmkUVnAlNXeh4CfGSk/OB2tegewA/adOY5wN5JtmwXG+wNnNOW3ZRk\nj9bXwdPaGteHJEnSgrHxHOo8DngOcEmSi1rZK4DXA6cnORT4NnBgW3Y28FRgBfBD4HkAVbU6yWuB\nC1q911TV6nb7BcBJwL2Bj7UfZuhDkiRpwZg1sFXV5xl/nhnAXmPqF3D4hLZOBE4cU74ceMSY8u+N\n60OSJGkh8ZsOJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTO\nGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpn\nYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2SZKkzhnYJEmSOmdgkyRJ6pyB\nTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgkSZI6Z2CTJEnqnIFNkiSpcwY2\nSZKkzhnYJEmSOmdgkyRJ6pyBTZIkqXMGNkmSpM4Z2CRJkjpnYJMkSeqcgU2SJKlzBjZJkqTOGdgk\nSZI6t/F8D0CS9Iu+/ZpHzvcQpAXpQX9zyXwPYSyPsEmSJHXOwCZJktQ5A5skSVLnZg1sSU5Mcl2S\nr42UbZVkWZLL2+8tW3mSHJtkRZKLkzxmZJ1DWv3LkxwyUr5rkkvaOscmyUx9SJIkLTRzOcJ2ErDP\ntLIjgU9W1VLgk+0+wL7A0vZzGHAcDOELOAp4LLA7cNRIADuu1Z1ab59Z+pAkSVpQZg1sVfVZYPW0\n4v2Ak9vtk4H9R8pPqcH5wBZJHgg8BVhWVaur6gZgGbBPW3a/qjqvqgo4ZVpb4/qQJElaUNb1HLYH\nVNW1AO33Nq18O+DqkXorW9lM5SvHlM/Uxy9IcliS5UmWr1q1ah03SZIkqU/r+6KDjCmrdShfK1V1\nfFXtVlW7LVq0aG1XlyRJ6tq6BrbvtulM2u/rWvlKYPuReouBa2YpXzymfKY+JEmSFpR1DWxnAlNX\neh4CfGSk/OB2tegewA/adOY5wN5JtmwXG+wNnNOW3ZRkj3Z16MHT2hrXhyRJ0oIy61dTJXkvsCew\ndZKVDFd7vh44PcmhwLeBA1v1s4GnAiuAHwLPA6iq1UleC1zQ6r2mqqYuZHgBw5Wo9wY+1n6YoQ9J\nkqQFZdbAVlXPmrBorzF1Czh8QjsnAieOKV8OPGJM+ffG9SFJkrTQ+E0HkiRJnTOwSZIkdc7AJkmS\n1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS\n5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXOwCZJktQ5A5skSVLnDGySJEmd\nM7BJkiR1zsAmSZLUOQObJElS5wxskiRJnTOwSZIkdc7AJkmS1DkDmyRJUucMbJIkSZ0zsEmSJHXO\nwCZJktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS5wxskiRJnTOwSZIkdc7AJkmS1DkD\nmyRJUucMbJIkSZ0zsEmSJHXOwCZJktQ5A5skSVLnDGySJEmdM7BJkiR1zsAmSZLUOQObJElS57oP\nbEn2SXJZkhVJjpzv8UiSJN3Vug5sSTYC3gbsC+wEPCvJTvM7KkmSpLtW14EN2B1YUVVXVNVPgNOA\n/eZ5TJIkSXep3gPbdsDVI/dXtjJJkqQFY+P5HsAsMqasfqFSchhwWLt7c5LL7tRRaUOxNXD9fA9C\n6yZvPGS+hyBN4r7l7uyocdHjTvXguVTqPbCtBLYfub8YuGZ6pao6Hjj+rhqUNgxJllfVbvM9Dkkb\nFvctujP0PiV6AbA0yQ5JNgUOAs6c5zFJkiTdpbo+wlZVtyZ5IXAOsBFwYlVdOs/DkiRJukt1HdgA\nqups4Oz5Hoc2SE6jS7ozuG/RepeqXziHX5IkSR3p/Rw2SZKkBc/AJknqRpLbklyU5GtJ/j3JFndh\n3/uvy7fpJLl5QvniJB9JcnmSK5K8Nck97/hI1+hjjTEneU2SJ6/PPtQHA5vmVZLdkhw73+NYH5Ls\nmeQ3R+4/P8nB8zkm6W7oR1W1S1U9AlgNHH4X9r0/w9cg3mFJApwBfLiqlgJLgXsDf78+2h+xxpir\n6m+q6hPruQ91wMCmeVVVy6vqRfM9jvVkT+Bnga2q3lFVp8zfcKS7vfMY+XabJC9NckGSi5Mc3cqW\nJPlmkpNb+QeS3Kct2zXJZ5JcmOScJA9s5X/S2vlqkg8muU97s/U7wD+0I3wPbT//0db/XJJfbevv\nkOS81sZrJ4z9ScCPq+qdAFV1G/AXwMFJNkvy3CRvHdm2s5Ls2W7v3dr/cpL3J9mslb8+ydfbdr5x\nwphPSnJAq79Xkq8kuSTJiVNH95JcmeTo1v4lI9v1262di9p6910fT6LWDwOb1qu28/zayP0jkrw6\nyblJ3pDkS0n+K8nj2/I9k5zVbt8/ycfbjuJfklyVZOtJbbbbY3eoE8a2xk52ahpjdAzt/luTPLfd\nnrTDf9HIjvO0JEuA5wN/0XZ2j2/bfUSrv0uS81v9DyXZspVPelx2bmUXtXWW3vFnR7r7SLIRsBft\nszeT7M1wlGp3YBdg1yRPaNV/BTi+qh4F3Aj8vySbAP8MHFBVuwInAse0+mdU1a9X1a8B3wAOrar/\nbH29tB3h+2+Gqz3/rK1/BPD2tv4/AcdV1a8D/zNhE3YGLhwtqKobgSuBHWfY7q2BVwFPrqrHAMuB\nv0yyFfB/gZ3bdr5uwpin2rkXcBLw+1X1SIZPhXjBSFfXt/aPa9tG+314Ve0CPB740aRx6q5nYNNd\naeOq2h14MXDUmOVHAZ+vqkcz7IQeNIc2J+1Qx5nLTvZnZtnhHwk8uu04n19VVwLvAN7cdpyfm9bc\nKcDLWv1LWHP7xz0uzwf+qe04d2P41g9pIbh3kouA7wFbActa+d7t5yvAl4FfZQhwAFdX1Rfa7XcB\nv8UQ4h4BLGvtvYrh23IAHtHe4F0CPJshXK2hHdX6TeD9bf1/AR7YFj8OeG+7feqE7QhjvkqR8V+5\nOGoPhinOL7R+D2H46qIbgR8D/5bkmcAPZ2nnV4BvVdV/tfsnA08YWX5G+30hsKTd/gLwpiQvArao\nqltn6UP1lEHsAAAEAElEQVR3oe4/h00blHE7iFFPAJ4JUFUfTXLDTI1N26FOFc90Qu/jgN9tt08F\n3jDLeEd3+DB8ePO1bdnFwLuTfBj48Czj3Jxh5/eZVnQy8P6RKuMel/OAVyZZzHA04PJZxiptKH5U\nVbu0v5uzGM5hO5Yh6PxdVf3LaOV2dHt6MKpW/9Kq+o0xfZwE7F9VX21H0/ccU+cewPfbm6ZxZvtM\nrEv5+f5maqz3Ax4AXMawbxk9aHKvqWrAsqp61vQGk+zOcNTxIOCFDNOuk8wWDG9pv2+jZYGqen2S\njwJPBc5P8uSq+uYs7egu4hE2rW+3Mn4nBGN2EGOM2wlOavNnO9SRn4fPMr61aX9qhz/V9iOrau+2\n7GnA24BdgQuT3JE3P+N2nO9hODflR8A5SWbaMUsbnKr6AfAi4Ih2tPsc4I9GzufaLsk2rfqDkkwF\ns2cBn2cIRYumypNskmTqSNp9gWtbu88e6famtmxq+vJbSQ5s6yfJr7V6X2AITUxbf9QngfukXXjU\npnj/EXhrVf2IYWp0lyT3SLI9w1QvwPnA45Ls2Na7T5KHte3evH2Y/IsZpoXXGPM03wSWTLUDPAf4\nzJh6P5PkoVV1SVW9gWEqduIpJrrrGdi0vn0X2Kadj3ZP4Olrse5naTu/JPsCW87U5iw71HEm7WSv\nAnZKcs/2rn6vVj52h5/kHsD2VfVp4K+ALYDNmLDjbP94bpg6P4257TgfAlxRVccyTA8/aqb60oao\nqr4CfBU4qKo+DrwHOK9NZX6An/+9fQM4JMnFDNOox1XVT4ADgDck+SpwET+/KOivgS8yTLeOHkE6\nDXhphvNoH8qwnzi0rX8psF+r9+fA4UkuADafMPZiOOfsgCSXM0zx3l5VU6dVfAH4FsMpEm9kmOal\nqlYBzwXe27bnfIbgdF/grFb2GYYLGMaNear/HwPPY5iBuAS4neG0jZm8OMPHqXyV4c3ix2apr7uQ\n33Sg9a6d//Aihp3RdxjeSe4JHFFVy9tJtcurakmGq6KOqKqnJ7k/w3khWzPskJ4J7FpV149rs6pe\nnWQHhpNmHwhsApxWVa+ZMK4dGHb4GwMfBF5VVVPv1v+eYWd8OfAT4MyqOinJLgzTMZu39d7CMJ3y\n6VYW4F1tKuFhDP9Ebgf+jCH43VxVb2ztvAO4D3AF8LyquiHJuRMel5cDfwj8lOF8uz+oqtXr8nxI\nG7I2JXpW+xiQbmW4ovO9wDOr6sLZ6kvTGdjUrSRXArtV1fV3Uvs3TwU2SXdPd5fAJt1RXnQgSbrb\naldoG9a0wfMImzY4SV4JHDit+P0j545IknS3YmCTJEnqnFeJSpIkdc7AJkmS1DkDmyRJUucMbJIk\nSZ0zsEmSJHXu/wOOW4dlTW6sEgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x22b00c2ceb8>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "G-CwGaMms5XS"
},
"source": [
"<h3>3.2.3 Checking for Duplicates </h3>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "YCiDBHm5s5XT",
"outputId": "d8011926-4086-4c9a-9fcf-59663a584ec4"
},
"source": [
"#checking whether there are any repeated pair of questions\n",
"\n",
"pair_duplicates = df[['qid1','qid2','is_duplicate']].groupby(['qid1','qid2']).count().reset_index()\n",
"\n",
"print (\"Number of duplicate questions\",(pair_duplicates).shape[0] - df.shape[0])"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Number of duplicate questions 0\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iaHTnnt8s5XX"
},
"source": [
"<h3> 3.2.4 Number of occurrences of each question </h3>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "dPZwk-C8s5Xa",
"outputId": "0d6d5978-2306-4ed3-cf27-f2a0b974e47d"
},
"source": [
"plt.figure(figsize=(20, 10))\n",
"\n",
"plt.hist(qids.value_counts(), bins=160)\n",
"\n",
"plt.yscale('log', nonposy='clip')\n",
"\n",
"plt.title('Log-Histogram of question appearance counts')\n",
"\n",
"plt.xlabel('Number of occurences of question')\n",
"\n",
"plt.ylabel('Number of questions')\n",
"\n",
"print ('Maximum number of times a single question is repeated: {}\\n'.format(max(qids.value_counts()))) "
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Maximum number of times a single question is repeated: 157\n",
"\n"
],
"name": "stdout"
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABJUAAAJcCAYAAABAA5WYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XuUpWddJ/rvjzQXESgEonJvtIAx3piZMoyuGUUOYrAp\nQMZRAuKImBYUL0c9YyuoCHpsz1EGOeJgKzdxDCeDDqboMDCjXJTRMQFFCRiNoTkJQQi34qZczO/8\nsXfLpqmq3m+nd721U5/PWrVS+3n3fvd3X2qt9Hc9z/NWdwcAAAAAhrjF2AEAAAAAWD5KJQAAAAAG\nUyoBAAAAMJhSCQAAAIDBlEoAAAAADKZUAgAAAGAwpRIA3ExU1Ueq6ovGzjGmqnpyVb17+l7ceew8\nW6mqx1XVq8fOAQBwUymVAOAmqKoTVfWQXXierqrVU8aeXlW/dfJ2d9+uu685zXkeVFXXLSrnmKrq\nlkmeleSh0/fifXsg08HpZ3fg5Fh3/+fufuiYuRhHVb22qr577BwAcLYolQCAs6aqzhnx6b8gyW2S\nXDliBk5j5O8IAHAWKZUAYEGq6qKqurqq3l9Vl1bV3WaOPbSqrqqqzar61ap63U2dwTA7m6mqvqmq\n3lpVH66qd1bVj1bV5yZ5ZZK7TZeHfaSq7lZVt66qZ1fV9dOfZ1fVrWfO+x+q6l3TY999yvO8qKr+\nU1VdVlUfTfL1VXWoqv6sqj5UVddW1dNnznVy5s4Tpsc+UFVPqqqvqqq/qKoPVtWv7PAat8xaVfdL\nctX0bh+sqj/Y5vGPr6p3VNX7quqpszPNpq/lZ2fu+xmzuqbv1e9U1Q1V9faq+oGZY+dX1RXT1/zu\nqnrW9NDrZzJ9pKq+uqq+s6r+aOaxX1NVl0+/C5dX1dfMHHttVT2zqt4w/SxfXVV32ea1fV5VvWKa\n7wPT3+9xyrl+vqr+dPpcv1dVdzrlczk8fV/fVVU/MvPYW1TVkar62+l7d8nJx06P/5eq+rvpeV9f\nVV86c+xMvyP/vqr+v6p6b1U9deb4OVX1E9MsH66qN1bVPafH/llV/fea/M1dVVXfutV7Nb3vnarq\nhdPX+4GqevnMsS3/dmuLmWc1M/vo5GdbVb84Pefbq+ph02M/l+TfJPmV6XfhV2riP1bVe6bv3V9U\n1ZdtlxkA9hqlEgAsQFU9OMnPJ/nWJHdN8o4kL50eu0uSlyX58SR3zqQM+Zqtz3TGnp/ke7r79km+\nLMkfdPdHkzwsyfXT5WG36+7rkzw1yb9K8oAkX5nk/CRPm2a9IMkPJ3lIktUkX7fFcz02yc8luX2S\nP0ry0STfkeSOSQ4leXJVPeqUxzwwyX2TfFuSZ08zPCTJlyb51qra6nmyXdbu/uvpY5Pkjt394FMf\nWFXnJflPSR6f5G6ZvPf3OPV+W6mqWyTZSPLmJHdP8r8l+aGq+sbpXX45yS939x2SfHGSS6bjXzuT\n6Xbd/cennPdOSY4nec40z7OSHK/P3A/qsUmekOTzk9wqyY9uE/MWSV6Y5N5J7pXk75OcWtB9R5Lv\nmr7+T02fd9bXZ/K5PDTJkfr00s4fSPKoTD7/uyX5QJLnzjzuldPHfX6SNyX5z6ec90y+I/86yf0z\nea9/qqq+ZDr+w0kuTPJNSe4wfT0fq0lp+t+T/PY0x4VJfnW24DrFS5LcNpPvzecn+Y/Jzn+7c3pg\nJn/Td0nyfyV5flVVdz81yR8mecr0u/CUTN7nr01yv+l78W1JRl+2CQDzUioBwGI8LskLuvtN3f3x\nTAqkr66qg5n8Y/jK7v7d7j75D/u/m+Ocb6rJTJ4PVtUHkxzZ4b6fTHJeVd2huz/Q3W86TdZndPd7\nuvuGJD+TSfGSTP5h/cLuvrK7PzY9dqrf6+43dPeN3f0P3f3a7v7L6e2/SHJxPruMeub0vq/OpGC4\nePr878zkH97//Ayyns63JHlFd79++pn8ZJIb53zsVyU5t7uf0d2fmO5d9etJHjM9/skkq1V1l+7+\nSHf/yZznPZTkb7r7Jd39qe6+OMlfJVmfuc8Lu/uvu/vvMymrHrDVibr7fd39O939se7+cCYlzqnv\n+0u6+y3TgvEnMynwZpej/Ux3f7S7/zKTgurC6fj3JHlqd183fe+enuRbTs7Y6e4XdPeHZ459ZVWt\nzJz3TL4jP9Pdf9/db86kzPvK6fh3Z1IkXtUTb57un/XwJCe6+4XT9/JNSX4nk8/9M1TVXTMpWJ80\n/fv4ZHe/bnp4p7/debyju3+9u/8xyYszKaa+YJv7fjKTou2fJanuflt3v2vO5wGA0SmVAGAx7pbJ\nDIckSXd/JJMZCHefHrt25lgnmV1mdWV9ennav5k557/o7jue/ElydIfn/7eZlFfvqMnSuq+eN+v0\n97vNHLt25tjs71uOVdUDq+o102VYm0melMmsjVnvnvn977e4fbszyHo6p77vH838s0LuncmywdlS\n7yfy6bLgiZnMNvmrmixhe/iATO84ZewdmXxPTpotHD+Wbd6bqrptVf1aTZb3fSiTpXd3PKU0mv2s\n3pHklvnMz+bU4yff23sn+a8zr/1tSf4xyRdMl6MdnS5H+1CSE9PHbHfeeb8j273ueyb52y3egnsn\neeApn9HjknzhFve9Z5L3d/cHtji209/uPP4p97SITbb5zLr7DzKZTfbcJO+uqmNVdYc5nwcARqdU\nAoDFuD6Tf+QmSaZLc+6c5J1J3pWZZVdVVbO3u/tLZ5an/eGZPHl3X97dj8xkWc/L8+nlWH26rJks\nnbp++vtnZM3kH+Of9XSn3P7tJJcmuWd3ryR5XpIa9AK2t1PW03lXZvJX1W0z+UxO+mgmy6FOmi0j\nrk3y9tlSr7tv393flCTd/TfdfWEm7/cvJHnZ9DPf6v3e6fWcfE3vnPM1zfqRTJaLPXC6DO/k0rvZ\n937287tXJjNl3rvD8ZPv7bVJHnbK67/NdGbZY5M8MpPliytJDm7xvGfzO3JtJksMtxp/3SkZb9fd\nT97mvneqqjtucWynv92PToe3+56czmd9H7r7Od39LzNZhne/JP/HgPMBwKiUSgBw092yqm4z83Mg\nk380P6GqHlCTTa//zyT/q7tPZLKHzpdX1aOm9/2+DPuH6Y6q6lZV9biqWunuTyb5UCazSpLJjKA7\nn7I06eIkT6uqc6f7Pf1Ukt+aHrtk+jq+ZFrC/NQcEW6fySyQf6iq8zMpHc6WnbKezsuSPLyq/nVV\n3SrJM/KZ/y/050m+qSYbOH9hkh+aOfanST5UVT9WVZ8znZ3zZVX1VUlSVd9eVed2941JPjh9zD8m\nuSGTJXZftE2my5Lcr6oeW1UHqurbkpyX5BVzvqZZt89kltcHp3s1/fQW9/n2qjpv+lk+I8nLpsu0\nTvrJ6YynL81kH6f/dzr+vCQ/V1X3nr7ec6vqkTPP+/FMZvPcNpPv+jxZz/Q78htJnllV951udP0V\n0z2oXpHJe/n4qrrl9OerZvZi+ifTJWavzGTPpc+b3vdkCbft3+50yeU7M3kfz6mq78rWBdd23p2Z\n78I03wOr6paZFFb/kE//rQLAnqdUAoCb7rJM/jF/8ufp3f37mexZ8zuZzJD54kz33+nu9yb5d5ls\n4vu+TEqEKzL5h/nZ8vgkJ6bLkZ6U5Nunz/1XmRQz10yXCN0tyc9On/8vkvxlJhst/+z0/q/MZM+n\n1yS5OsnJjaZ3yvq9SZ5RVR/OpPS5ZIf7DrVt1tPp7iszKfB+O5PP5AOZWXaYycbNb85k+dar8+lC\nJdPiZT2T/Yzensnsnt/IZGZOklyQ5Mqq+kgmm3Y/Zrp30Mcy2dvoDdP3+1+dkunkXkA/ksl34T8k\nefj0OzLUs5N8zjTbnyT5b1vc5yVJXpTJEq3bZLIB96zXZfI5/36SX5zueZXpa7o0yaunn+ufZLIh\ndZL8ZibLxd6Z5K3TY6dzU74jz5re/9WZFKbPT/I5032kHprJ39n109f4C0luvc15Hp/JTK2/SvKe\nTEvEnf52py7KZDbR+zKZXfQ/B2T/5Uz2ovpAVT0nk43Gfz2T7+I7puf8xQHnA4BR1WQbBwBgLDW5\nsth1SR7X3a8ZO89OprM+3pLk1j3ZZHypVdWJJN/d3f9j7CyLVlWvTfJb3f0bWxw7mElZdsubw+cK\nAOwOM5UAYARV9Y1Vdcfp8pqfyGQ/mXmvGLarquqbp0vqPi+TmR8bigcAAJRKADCOr87kClbvzWRZ\n1aOml4zfi74nk72B/jaT/V622vgYAIB9xvI3AAAAAAYzUwkAAACAwQ6MHeCmuMtd7tIHDx4cOwYA\nAADAzcYb3/jG93b3uae731KWSlW1nmR9dXU1V1xxxdhxAAAAAG42quod89xvKZe/dfdGdx9eWVkZ\nOwoAAADAvrSUpRIAAAAA41IqAQAAADCYUgkAAACAwZayVKqq9ao6trm5OXYUAAAAgH1pKUslG3UD\nAAAAjGspSyUAAAAAxqVUAgAAAGAwpRIAAAAAgymVAAAAABhsKUslV38DAAAAGNdSlkqu/gYAAAAw\nrqUslQAAAAAYl1IJAAAAgMGUSgAAAAAMplQCAAAAYDClEgAAAACDLWWpVFXrVXVsc3Nz7CgAAAAA\n+9JSlkrdvdHdh1dWVsaOAgAAALAvLWWpBAAAAMC4lEoAAAAADKZUAgAAAGAwpRIAAAAAgx0YOwAT\nB48c33L8xNFDu5wEAAAA4PTMVAIAAABgsKUslapqvaqObW5ujh0FAAAAYF9aylKpuze6+/DKysrY\nUQAAAAD2paUslQAAAAAYl1IJAAAAgMGUSgAAAAAMplQCAAAAYDClEgAAAACDKZUAAAAAGEypBAAA\nAMBgSiUAAAAABlMqAQAAADDYUpZKVbVeVcc2NzfHjgIAAACwLy1lqdTdG919eGVlZewoAAAAAPvS\nUpZKAAAAAIxLqQQAAADAYEolAAAAAAZTKgEAAAAwmFIJAAAAgMGUSgAAAAAMplQCAAAAYDClEgAA\nAACDKZUAAAAAGEypBAAAAMBgSiUAAAAABlMqAQAAADCYUgkAAACAwZRKAAAAAAy2lKVSVa1X1bHN\nzc2xowAAAADsS0tZKnX3RncfXllZGTsKAAAAwL60lKUSAAAAAONSKgEAAAAwmFIJAAAAgMGUSgAA\nAAAMplQCAAAAYDClEgAAAACDKZUAAAAAGEypBAAAAMBgSiUAAAAABlMqAQAAADCYUgkAAACAwZRK\nAAAAAAymVAIAAABgMKUSAAAAAIMplQAAAAAYTKkEAAAAwGBKJQAAAAAGUyoBAAAAMJhSCQAAAIDB\nlEoAAAAADKZUAgAAAGCwPVMqVdWDquoPq+p5VfWgsfMAAAAAsL2FlkpV9YKqek9VveWU8Quq6qqq\nurqqjkyHO8lHktwmyXWLzAUAAADATbPomUovSnLB7EBVnZPkuUkeluS8JBdW1XlJ/rC7H5bkx5L8\nzIJzAQAAAHATLLRU6u7XJ3n/KcPnJ7m6u6/p7k8keWmSR3b3jdPjH0hy6+3OWVWHq+qKqrrihhtu\nWEhuAAAAAHY2xp5Kd09y7czt65LcvaoeXVW/luQlSX5luwd397HuXuvutXPPPXfBUQEAAADYyoER\nnrO2GOvu/t0kv7vbYQAAAAAYboyZStcluefM7XskuX6EHAAAAACcoTFKpcuT3Leq7lNVt0rymCSX\nDjlBVa1X1bHNzc2FBAQAAABgZwstlarq4iR/nOT+VXVdVT2xuz+V5ClJXpXkbUku6e4rh5y3uze6\n+/DKysrZDw0AAADAaS10T6XuvnCb8cuSXLbI5wYAAABgccZY/gYAAADAklMqAQAAADDYUpZKNuoG\nAAAAGNdSlko26gYAAAAY11KWSgAAAACMS6kEAAAAwGBKJQAAAAAGW8pSyUbdAAAAAONaylLJRt0A\nAAAA41rKUgkAAACAcR0YOwA7O3jk+JbjJ44e2uUkAAAAAJ9mphIAAAAAgymVAAAAABhsKUslV38D\nAAAAGNdSlkqu/gYAAAAwrqUslQAAAAAYl1IJAAAAgMGUSgAAAAAMplQCAAAAYDClEgAAAACDLWWp\nVFXrVXVsc3Nz7CgAAAAA+9JSlkrdvdHdh1dWVsaOAgAAALAvLWWpBAAAAMC4lEoAAAAADKZUAgAA\nAGAwpRIAAAAAgymVAAAAABhMqQQAAADAYEtZKlXVelUd29zcHDsKAAAAwL60lKVSd2909+GVlZWx\nowAAAADsS0tZKgEAAAAwLqUSAAAAAIMplQAAAAAYTKkEAAAAwGBKJQAAAAAGUyoBAAAAMJhSCQAA\nAIDBlEoAAAAADKZUAgAAAGCwpSyVqmq9qo5tbm6OHQUAAABgX1rKUqm7N7r78MrKythRAAAAAPal\npSyVAAAAABiXUgkAAACAwZRKAAAAAAymVAIAAABgMKUSAAAAAIMplQAAAAAYTKkEAAAAwGBKJQAA\nAAAGUyoBAAAAMJhSCQAAAIDBlEoAAAAADHZg7ACcmYNHjm977MTRQ7uYBAAAANiPzFQCAAAAYDCl\nEgAAAACDLWWpVFXrVXVsc3Nz7CgAAAAA+9JSlkrdvdHdh1dWVsaOAgAAALAvLWWpBAAAAMC4lEoA\nAAAADKZUAgAAAGAwpRIAAAAAgymVAAAAABhMqQQAAADAYEolAAAAAAZTKgEAAAAwmFIJAAAAgMGU\nSgAAAAAMplQCAAAAYDClEgAAAACDKZUAAAAAGEypBAAAAMBgSiUAAAAABlMqAQAAADCYUgkAAACA\nwZRKAAAAAAymVAIAAABgMKUSAAAAAIMplQAAAAAYbE+VSlX1uVX1xqp6+NhZAAAAANjeQkulqnpB\nVb2nqt5yyvgFVXVVVV1dVUdmDv1YkksWmQkAAACAm27RM5VelOSC2YGqOifJc5M8LMl5SS6sqvOq\n6iFJ3prk3QvOBAAAAMBNdGCRJ+/u11fVwVOGz09ydXdfkyRV9dIkj0xyuySfm0nR9PdVdVl333jq\nOavqcJLDSXKve91rceEBAAAA2NZCS6Vt3D3JtTO3r0vywO5+SpJU1Xcmee9WhVKSdPexJMeSZG1t\nrRcbFQAAAICtjFEq1RZj/1QOdfeLdi8KAAAAAGdijFLpuiT3nLl9jyTXj5DjZuvgkeNbjp84emiX\nkwAAAAA3V4veqHsrlye5b1Xdp6puleQxSS4dcoKqWq+qY5ubmwsJCAAAAMDOFloqVdXFSf44yf2r\n6rqqemJ3fyrJU5K8KsnbklzS3VcOOW93b3T34ZWVlbMfGgAAAIDTWvTV3y7cZvyyJJct8rkBAAAA\nWJwxlr8BAAAAsOSWslSypxIAAADAuJayVLKnEgAAAMC4lrJUAgAAAGBcSiUAAAAABlMqAQAAADCY\nUgkAAACAwZayVHL1NwAAAIBxLWWp5OpvAAAAAONaylIJAAAAgHEplQAAAAAYTKkEAAAAwGBLWSrZ\nqBsAAABgXEtZKtmoGwAAAGBcS1kqAQAAADAupRIAAAAAgymVAAAAABhMqQQAAADAYEolAAAAAAZb\nylKpqtar6tjm5ubYUQAAAAD2pQNjBzgT3b2RZGNtbe2isbMsk4NHjm85fuLooV1OAgAAACy7pZyp\nBAAAAMC4lEoAAAAADKZUAgAAAGAwpRIAAAAAgymVAAAAABhMqQQAAADAYEolAAAAAAZbylKpqtar\n6tjm5ubYUQAAAAD2paUslbp7o7sPr6ysjB0FAAAAYF9aylIJAAAAgHEplQAAAAAYTKkEAAAAwGBK\nJQAAAAAGUyoBAAAAMJhSCQAAAIDBlEoAAAAADKZUAgAAAGAwpRIAAAAAgy1lqVRV61V1bHNzc+wo\nAAAAAPvSUpZK3b3R3YdXVlbGjgIAAACwLx0YOwDjO3jk+LbHThw9tItJAAAAgGWxlDOVAAAAABiX\nUgkAAACAwZRKAAAAAAymVAIAAABgMKUSAAAAAIOdtlSqqh+sqjvUxPOr6k1V9dDdCAcAAADA3jTP\nTKXv6u4PJXloknOTPCHJ0YWmAgAAAGBPm6dUqul/vynJC7v7zTNjAAAAAOxD85RKb6yqV2dSKr2q\nqm6f5MbFxgIAAABgLzswx32emOQBSa7p7o9V1Z0zWQIHAAAAwD512lKpu2+sqncnOa+q5imhAAAA\nALiZO21JVFW/kOTbkrw1yT9OhzvJ6xeYCwAAAIA9bJ6ZR49Kcv/u/viiw8yrqtaTrK+uro4dBQAA\nAGBfmmej7muS3HLRQYbo7o3uPryysjJ2FAAAAIB9aZ6ZSh9L8udV9ftJ/mm2Unf/wMJSAQAAALCn\nzVMqXTr9YR86eOT4luMnjh7a5SQAAADAXjLP1d9eXFW3SnK/6dBV3f3JxcYCAAAAYC+b5+pvD0ry\n4iQnklSSe1bVv+9uV38DAAAA2KfmWf72S0ke2t1XJUlV3S/JxUn+5SKDAQAAALB3zXP1t1ueLJSS\npLv/OnvsanAAAAAA7K55ZipdUVXPT/KS6e3HJXnj4iIBAAAAsNfNUyo9Ocn3JfmBTPZUen2SX11k\nKAAAAAD2tnmu/vbxJM+a/gAAAADA9qVSVV3S3d9aVX+ZpE893t1fsdBkAAAAAOxZO81U+sHpfx++\nG0EAAAAAWB7bXv2tu981/fV7u/sdsz9Jvnd34gEAAACwF21bKs34hi3GHna2gwAAAACwPHbaU+nJ\nmcxI+uKq+ouZQ7dP8oZFBwMAAABg79ppT6XfTvLKJD+f5MjM+Ie7+/0LTQUAAADAnrZtqdTdm0k2\nq+ppSf6uuz9eVQ9K8hVV9Zvd/cHdCsnec/DI8S3HTxw9tMtJAAAAgDHMs6fS7yT5x6paTfL8JPfJ\nZBYTAAAAAPvUPKXSjd39qSSPTvLs7v7fk9x1sbEAAAAA2MvmKZU+WVUXJvmOJK+Yjt1ycZEAAAAA\n2OvmKZWekOSrk/xcd7+9qu6T5LcWGwsAAACAvWynq78lSbr7rVX1Y0nuNb399iRHFx0MAAAAgL3r\ntDOVqmo9yZ8n+W/T2w+oqksXHQwAAACAvWue5W9PT3J+kg8mSXf/eSZXgDurqupLqup5VfWyqnry\n2T4/AAAAAGfPPKXSp7p785SxnufkVfWCqnpPVb3llPELquqqqrq6qo4kSXe/rbuflORbk6zNc34A\nAAAAxjFPqfSWqnpsknOq6r5V9f8k+Z9znv9FSS6YHaiqc5I8N8nDkpyX5MKqOm967BFJ/ijJ7895\nfgAAAABGME+p9P1JvjTJx5NcnORDSX5onpN39+uTvP+U4fOTXN3d13T3J5K8NMkjp/e/tLu/Jsnj\ntjtnVR2uqiuq6oobbrhhnhgAAAAAnGXzXP3tY0meOv05G+6e5NqZ29cleWBVPSjJo5PcOsllO+Q5\nluRYkqytrc21DA8AAACAs+u0pVJVvSZb7KHU3Q8+w+esLca6u1+b5LVneE4AAAAAdtFpS6UkPzrz\n+22S/Nskn7oJz3ldknvO3L5HkutvwvnYQw4eOb7tsRNHD+1iEgAAAGCR5ln+9sZTht5QVa+7Cc95\neZL7VtV9krwzyWOSPHbICapqPcn66urqTYgBAAAAwJk67UbdVXWnmZ+7VNU3JvnCeU5eVRcn+eMk\n96+q66rqid39qSRPSfKqJG9Lckl3XzkkdHdvdPfhlZWVIQ8DAAAA4CyZZ/nbGzPZU6kyWfb29iRP\nnOfk3X3hNuOXZYfNuAEAAADY2+ZZ/naf3QgCAAAAwPKY5+pvj97peHf/7tmLMx97KgEAAACMa57l\nb09M8jVJ/mB6++uTvDbJZibL4na9VOrujSQba2trF+32cwMAAAAwX6nUSc7r7nclSVXdNclzu/sJ\nC00GAAAAwJ512qu/JTl4slCaeneS+y0oDwAAAABLYJ6ZSq+tqlcluTiTWUuPSfKahaYCAAAAYE+b\n5+pvT6mqb07ytdOhY939Xxcba2c26gYAAAAYV3X32BnO2NraWl9xxRVjxzgrDh45PnaE0Zw4emjs\nCAAAAMBUVb2xu9dOd7959lQCAAAAgM+gVAIAAABgsG1Lpar6/el/f2H34gAAAACwDHbaqPuuVfV1\nSR5RVS9NUrMHu/tNC00GAAAAwJ61U6n0U0mOJLlHkmedcqyTPHhRoU7H1d8AAAAAxrVtqdTdL0vy\nsqr6ye5+5i5mOq3u3kiysba2dtHYWQAAAAD2o51mKiVJuvuZVfWIJF87HXptd79isbHYTw4eOb7t\nsRNHD+1iEgAAAGBep736W1X9fJIfTPLW6c8PTscAAAAA2KdOO1MpyaEkD+juG5Okql6c5M+S/Pgi\ngwEAAACwd512ptLUHWd+X1lEEAAAAACWxzwzlX4+yZ9V1WuSVCZ7K5mlBAAAALCPzbNR98VV9dok\nX5VJqfRj3f13iw62k6paT7K+uro6ZgwAAACAfWuu5W/d/a7uvrS7f2/sQmmaZ6O7D6+sWIkHAAAA\nMIZ5lr/BaA4eOb7l+Imjh3Y5CQAAADBr3o26AQAAAOCf7FgqVdUtquotuxUGAAAAgOWwY6nU3Tcm\neXNV3WuX8gAAAACwBObZU+muSa6sqj9N8tGTg939iIWlAgAAAGBPm6dU+pmFpwAAAABgqZy2VOru\n11XVvZPct7v/R1XdNsk5i48GAAAAwF512qu/VdVFSV6W5NemQ3dP8vJFhjqdqlqvqmObm5tjxgAA\nAADYt+ZZ/vZ9Sc5P8r+SpLv/pqo+f6GpTqO7N5JsrK2tXTRmDsZz8MjxLcdPHD20y0kAAABgfzrt\nTKUkH+/uT5y8UVUHkvTiIgEAAACw181TKr2uqn4iyedU1Tck+S9JNhYbCwAAAIC9bJ5S6UiSG5L8\nZZLvSXJZkqctMhQAAAAAe9s8V3+7sapenMmeSp3kqu62/A0AAABgHzttqVRVh5I8L8nfJqkk96mq\n7+nuVy46HAAAAAB70zxXf/ulJF/f3VcnSVV9cZLjSZRKAAAAAPvUPKXSe04WSlPXJHnPgvLATXLw\nyPFtj504emgXkwAAAMDN27alUlU9evrrlVV1WZJLMtlT6d8luXwXsgEAAACwR+00U2l95vd3J/m6\n6e83JPm8hSUCAAAAYM/btlTq7ifsZpAhqmo9yfrq6urYUQAAAAD2pXmu/nafJN+f5ODs/bv7EYuL\ntbPu3kgx5pSbAAAas0lEQVSysba2dtFYGQAAAAD2s3k26n55kucn2Uhy42LjAAAAALAM5imV/qG7\nn7PwJAAAAAAsjXlKpV+uqp9O8uokHz852N1vWlgqWICDR45vOX7i6KFdTgIAAADLb55S6cuTPD7J\ng/Pp5W89vQ0AAADAPjRPqfTNSb6ouz+x6DAAAAAALIdbzHGfNye546KDAAAAALA85pmp9AVJ/qqq\nLs9n7qn0iIWlAgAAAGBPm6dU+umFpwAAAABgqZy2VOru1+1GEAAAAACWx2lLpar6cCZXe0uSWyW5\nZZKPdvcdFhkMAAAAgL1rnplKt5+9XVWPSnL+whIBAAAAsOfNc/W3z9DdL0/y4AVkAQAAAGBJzLP8\n7dEzN2+RZC2fXg4HAAAAwD40z9Xf1md+/1SSE0keuZA0c6qq9STrq6urY8bgZuLgkeNbjp84emiX\nkwAAAMDymGdPpSfsRpAhunsjycba2tpFY2cBAAAA2I+2LZWq6qd2eFx39zMXkAf2jO1mMCVmMQEA\nAMBOM5U+usXY5yZ5YpI7J1EqsW9ZMgcAAMB+t22p1N2/dPL3qrp9kh9M8oQkL03yS9s9DgAAAICb\nvx33VKqqOyX54SSPS/LiJP+iuz+wG8EAAAAA2Lt22lPp/07y6CTHknx5d39k11IBAAAAsKfdYodj\nP5LkbkmeluT6qvrQ9OfDVfWh3YkHAAAAwF60055KOxVOAAAAAOxjiiMAAAAABttxo25gmINHjg9+\nzImjhxaQBAAAABbLTCUAAAAABlMqAQAAADCYUgkAAACAwZRKAAAAAAymVAIAAABgMKUSAAAAAIMp\nlQAAAAAYTKkEAAAAwGBKJQAAAAAGUyoBAAAAMJhSCQAAAIDBDowdAPa7g0eObzl+4uihXU4CAAAA\n89tTM5Wq6lFV9etV9XtV9dCx8wAAAACwtYWXSlX1gqp6T1W95ZTxC6rqqqq6uqqOJEl3v7y7L0ry\nnUm+bdHZAAAAADgzu7H87UVJfiXJb54cqKpzkjw3yTckuS7J5VV1aXe/dXqXp02Pw7613bK4nVgy\nBwAAwG5Z+Eyl7n59kvefMnx+kqu7+5ru/kSSlyZ5ZE38QpJXdvebFp0NAAAAgDMz1p5Kd09y7czt\n66Zj35/kIUm+paqetNUDq+pwVV1RVVfccMMNi08KAAAAwGcZ6+pvtcVYd/dzkjxnpwd297Ekx5Jk\nbW2tF5ANAAAAgNMYa6bSdUnuOXP7HkmuHykLAAAAAAONVSpdnuS+VXWfqrpVksckuXSkLAAAAAAM\ntPDlb1V1cZIHJblLVV2X5Ke7+/lV9ZQkr0pyTpIXdPeVA865nmR9dXV1EZFhaW13xThXhQMAAOBs\nW3ip1N0XbjN+WZLLzvCcG0k21tbWLrop2QAAAAA4M2MtfwMAAABgiSmVAAAAABhsKUulqlqvqmOb\nm5tjRwEAAADYl5ayVOruje4+vLKyMnYUAAAAgH1p4Rt1A+NzVTgAAADOtqWcqQQAAADAuJRKAAAA\nAAy2lKWSjboBAAAAxrWUpZKNugEAAADGZaNuYEs29wYAAGAnSzlTCQAAAIBxKZUAAAAAGEypBAAA\nAMBgS1kqufobAAAAwLiWcqPu7t5IsrG2tnbR2FlgmW23GTcAAACczlLOVAIAAABgXEolAAAAAAZT\nKgEAAAAwmFIJAAAAgMGUSgAAAAAMtpSlUlWtV9Wxzc3NsaMAAAAA7EtLWSp190Z3H15ZWRk7CgAA\nAMC+dGDsAMByOXjk+JbjJ44e2uUkAAAAjGkpZyoBAAAAMC6lEgAAAACDKZUAAAAAGEypBAAAAMBg\nSiUAAAAABlvKUqmq1qvq2Obm5thRAAAAAPalA2MHOBPdvZFkY21t7aKxswC76+CR41uOnzh6aJeT\nAAAA7G9LOVMJAAAAgHEplQAAAAAYbCmXvwE3b9stcQMAAGDvMFMJAAAAgMHMVAJGY0YSAADA8jJT\nCQAAAIDBlEoAAAAADGb5G3BW7LSU7cTRQ7uYBAAAgN1gphIAAAAAgy3lTKWqWk+yvrq6OnYUYA42\n5AYAALj5WcqZSt290d2HV1ZWxo4CAAAAsC8tZakEAAAAwLiUSgAAAAAMplQCAAAAYDClEgAAAACD\nKZUAAAAAGEypBAAAAMBgSiUAAAAABlMqAQAAADCYUgkAAACAwZRKAAAAAAymVAIAAABgMKUSAAAA\nAIMplQAAAAAYTKkEAAAAwGAHxg5wJqpqPcn66urq2FGAPeLgkeNbjp84emiXk9x0N6fXAgAA3Hwt\n5Uyl7t7o7sMrKytjRwEAAADYl5ayVAIAAABgXEolAAAAAAZbyj2VAOa13f5EiT2KAAAAbgozlQAA\nAAAYTKkEAAAAwGBKJQAAAAAGUyoBAAAAMJhSCQAAAIDBlEoAAAAADHZg7AAAzOfgkeNbjp84emiX\nkwAAAJipBAAAAMAZUCoBAAAAMJhSCQAAAIDBlEoAAAAADKZUAgAAAGAwpRIAAAAAgymVAAAAABjs\nwNgBAPaag0eOD37MiaOHFpAEAABg7zJTCQAAAIDBlEoAAAAADKZUAgAAAGCwPbOnUlV9UZKnJlnp\n7m8ZOw9w83cmeycBAAAwsdCZSlX1gqp6T1W95ZTxC6rqqqq6uqqOJEl3X9PdT1xkHgAAAADOjkUv\nf3tRkgtmB6rqnCTPTfKwJOclubCqzltwDgAAAADOooUuf+vu11fVwVOGz09ydXdfkyRV9dIkj0zy\n1nnOWVWHkxxOknvd615nLSvAstppGd+Jo4d2MQkAALCfjLFR992TXDtz+7okd6+qO1fV85L886r6\n8e0e3N3Hunutu9fOPffcRWcFAAAAYAtjbNRdW4x1d78vyZN2OwwAAAAAw40xU+m6JPecuX2PJNeP\nkAMAAACAMzTGTKXLk9y3qu6T5J1JHpPksUNOUFXrSdZXV1cXEA+As8FeTwAAcPO20JlKVXVxkj9O\ncv+quq6qntjdn0rylCSvSvK2JJd095VDztvdG919eGVl5eyHBgAAAOC0Fn31twu3Gb8syWWLfG4A\nAAAAFmeMPZUAAAAAWHJKJQAAAAAGG2Oj7pvMRt3AsttpE+u9+jw21wYAAGYt5UwlG3UDAAAAjGsp\nSyUAAAAAxqVUAgAAAGAwpRIAAAAAg9moG+As2G5DbJtbD+N9BACA5bGUM5Vs1A0AAAAwrqUslQAA\nAAAYl1IJAAAAgMGUSgAAAAAMplQCAAAAYLClLJWqar2qjm1ubo4dBQAAAGBfWspSydXfAAAAAMa1\nlKUSAAAAAONSKgEAAAAwmFIJAAAAgMGUSgAAAAAMplQCAAAAYLADYwc4E1W1nmR9dXV17CgA+8bB\nI8e3HD9x9NBZOxcAALA8lnKmUndvdPfhlZWVsaMAAAAA7EtLWSoBAAAAMC6lEgAAAACDKZUAAAAA\nGEypBAAAAMBgSiUAAAAABlMqAQAAADDYgbEDnImqWk+yvrq6OnYUAPaog0eOD37MiaOHFpAEAABu\nnpZyplJ3b3T34ZWVlbGjAAAAAOxLS1kqAQAAADAupRIAAAAAgymVAAAAABhMqQQAAADAYEolAAAA\nAAZTKgEAAAAwmFIJAAAAgMGUSgAAAAAMplQCAAAAYLADYwc4E1W1nmR9dXV17CgA+97BI8dHfY4T\nRw8t/PkBAIDPtpQzlbp7o7sPr6ysjB0FAAAAYF9aylIJAAAAgHEplQAAAAAYTKkEAAAAwGBKJQAA\nAAAGUyoBAAAAMJhSCQAAAIDBlEoAAAAADKZUAgAAAGAwpRIAAAAAgymVAAAAABhMqQQAAADAYEol\nAAAAAAZTKgEAAAAwmFIJAAAAgMGWslSqqvWq/7+9ew+2qyzvOP79mQgWtEEN3ggYRLygI0gRr0VF\nW0EoMFZGGEZRmaJWEZ2iQp3B1ukfUbRax0tNEaMdCqYUNF7BYoSqNaCBBCKiFBCiKFA0Xqhg5Okf\n641uD3vnZJGcs3PO+X5mMmevd6/Ls1ae/Z59nv2ud2fphg0bxh2KJEmSJEnSnDQji0pV9dmqOnHB\nggXjDkWSJEmSJGlOmpFFJUmSJEmSJI2XRSVJkiRJkiT1ZlFJkiRJkiRJvVlUkiRJkiRJUm8WlSRJ\nkiRJktSbRSVJkiRJkiT1ZlFJkiRJkiRJvVlUkiRJkiRJUm8WlSRJkiRJktSbRSVJkiRJkiT1ZlFJ\nkiRJkiRJvVlUkiRJkiRJUm8WlSRJkiRJktSbRSVJkiRJkiT1ZlFJkiRJkiRJvVlUkiRJkiRJUm8W\nlSRJkiRJktSbRSVJkiRJkiT1ZlFJkiRJkiRJvVlUkiRJkiRJUm8WlSRJkiRJktTb/HEHsEmSnYEP\nA3cDX62qs8cckiRJkiRJkkaY0pFKSc5KcmuSqye0H5Lk2iTXJTm1Nb8EOK+q/go4YirjkiRJkiRJ\n0taZ6tvflgGHDDYkmQd8CDgU2Ac4Nsk+wCLg5rbab6c4LkmSJEmSJG2FKb39raouTbJ4QvOBwHVV\ndT1AknOBI4H1dIWlK9lMsSvJicCJAHvssce2D1qSNKMsPvXz2+W+blxy2DY7/qh9bS7evse/L/vq\nG+9MtC3P0eslbZm5nEfbsl+XNHXmcj810Tgm6t6N349Igq6YtBtwPvCXST4CfHbUxlW1tKoOqKoD\ndt1116mNVJIkSZIkSUONY6LuDGmrqvoV8KrpDkaSJEmSJEn9jWOk0npg94HlRcCPxhCHJEmSJEmS\n7qNxFJUuB/ZOsmeSHYBjgBV9dpDkL5Is3bBhw5QEKEmSJEmSpM2b0qJSknOA/wYen2R9khOqaiPw\nBuBC4BpgeVWt67PfqvpsVZ24YMGCbR+0JEmSJEmSJjXV3/527Ij2LwBfmMpjS5IkSZIkaeqM4/Y3\nSZIkSZIkzXAWlSRJkiRJktTbjCwqOVG3JEmSJEnSeM3IopITdUuSJEmSJI3XjCwqSZIkSZIkabws\nKkmSJEmSJKk3i0qSJEmSJEnqbUYWlZyoW5IkSZIkabxmZFHJibolSZIkSZLGa0YWlSRJkiRJkjRe\nFpUkSZIkSZLUm0UlSZIkSZIk9WZRSZIkSZIkSb3NyKKS3/4mSZIkSZI0XjOyqOS3v0mSJEmSJI3X\njCwqSZIkSZIkabwsKkmSJEmSJKk3i0qSJEmSJEnqLVU17hjusyS3AT8Ydxw9LARuH3cQGjvzQGAe\nqGMeaBNzQWAeqGMeCMwDdcaZB4+uql0nW2lGF5VmmiTfqqoDxh2Hxss8EJgH6pgH2sRcEJgH6pgH\nAvNAnZmQB97+JkmSJEmSpN4sKkmSJEmSJKk3i0rTa+m4A9B2wTwQmAfqmAfaxFwQmAfqmAcC80Cd\n7T4PnFNJkiRJkiRJvTlSSZIkSZIkSb1ZVJIkSZIkSVJvFpWmSZJDklyb5Lokp447Hk2PJLsnWZnk\nmiTrkpzc2h+S5MtJvt9+PnjcsWrqJZmX5Iokn2vLeyZZ1fLgU0l2GHeMmlpJdklyXpLvtn7hmfYH\nc0+SN7ffCVcnOSfJA+wPZr8kZyW5NcnVA21DX//pfKC9b1ybZP/xRa5taUQenNF+L6xNckGSXQae\nO63lwbVJXjSeqLWtDcuDgedOSVJJFrZl+4NZalQeJDmpvebXJXn3QPt22R9YVJoGSeYBHwIOBfYB\njk2yz3ij0jTZCPxNVT0ReAbw+vZ/fypwcVXtDVzcljX7nQxcM7D8LuB9LQ9+Cpwwlqg0nf4J+FJV\nPQHYly4f7A/mkCS7AW8EDqiqJwPzgGOwP5gLlgGHTGgb9fo/FNi7/TsR+Mg0xaipt4x758GXgSdX\n1VOA7wGnAbT3jMcAT2rbfLj9XaGZbxn3zgOS7A78GXDTQLP9wey1jAl5kOT5wJHAU6rqScB7Wvt2\n2x9YVJoeBwLXVdX1VXU3cC5domiWq6pbqmp1e/wLuj8gd6P7//9EW+0TwFHjiVDTJcki4DDgzLYc\n4GDgvLaKeTDLJflj4CDgYwBVdXdV/Qz7g7loPvBHSeYDOwG3YH8w61XVpcAdE5pHvf6PBD5ZnW8C\nuyR55PREqqk0LA+q6qKq2tgWvwksao+PBM6tqruq6gbgOrq/KzTDjegPAN4HvBUY/DYt+4NZakQe\nvA5YUlV3tXVube3bbX9gUWl67AbcPLC8vrVpDkmyGHgqsAp4eFXdAl3hCXjY+CLTNHk/3ZuEe9ry\nQ4GfDbyJtF+Y/R4D3AZ8vN0GeWaSnbE/mFOq6od0nzreRFdM2gB8G/uDuWrU69/3jnPXq4Evtsfm\nwRyS5Ajgh1W1ZsJT5sHc8jjgT9st8ZckeVpr327zwKLS9MiQthrSplkqyQOB/wDeVFU/H3c8ml5J\nDgdurapvDzYPWdV+YXabD+wPfKSqngr8Cm91m3PanDlHAnsCjwJ2pru1YSL7g7nN3xFzUJK3002d\ncPampiGrmQezUJKdgLcDpw97ekibeTB7zQceTDd1yluA5e0Oh+02DywqTY/1wO4Dy4uAH40pFk2z\nJPenKyidXVXnt+afbBq22n7eOmp7zQrPBo5IciPd7a8H041c2qXd/gL2C3PBemB9Va1qy+fRFZns\nD+aWFwI3VNVtVfUb4HzgWdgfzFWjXv++d5xjkhwPHA4cV1Wb/lA0D+aOveg+bFjT3i8uAlYneQTm\nwVyzHji/3e54Gd1dDgvZjvPAotL0uBzYu32zyw50E2ytGHNMmgatqvwx4Jqq+seBp1YAx7fHxwOf\nme7YNH2q6rSqWlRVi+le/1+pquOAlcBL22rmwSxXVT8Gbk7y+Nb0AuA72B/MNTcBz0iyU/sdsSkP\n7A/mplGv/xXAK9q3Pj0D2LDpNjnNPkkOAd4GHFFVdw48tQI4JsmOSfakm6j5snHEqKlVVVdV1cOq\nanF7v7ge2L+9d7A/mFs+TfcBNEkeB+wA3M523B/Mn3wVba2q2pjkDcCFdN/yclZVrRtzWJoezwZe\nDlyV5MrW9rfAErqhjCfQ/YFx9Jji03i9DTg3yT8AV9AmcNasdhJwdvuA4XrgVXQf8NgfzBFVtSrJ\necBquttcrgCWAp/H/mBWS3IO8DxgYZL1wDsY/X7gC8CL6SZivZOur9AsMCIPTgN2BL7c1Zr5ZlW9\ntqrWJVlOV3jeCLy+qn47nsi1LQ3Lg6oa1e/bH8xSI/qDs4CzklwN3A0c30Yvbrf9QX4/ulKSJEmS\nJEnaMt7+JkmSJEmSpN4sKkmSJEmSJKk3i0qSJEmSJEnqzaKSJEmSJEmSerOoJEmSJEmSpN4sKkmS\npHtJUkneO7B8SpK/20b7XpbkpdtiX5Mc5+gk1yRZOdXHmqmSnJFkXZIzpvm4+yV58cDyEUlOnc4Y\nJEnS1rOoJEmShrkLeEmSheMOZFCSeT1WPwH466p6/lTFM5me8Y7Da4D9q+ot03zc/YDfFZWqakVV\nLZnmGCRJ0layqCRJkobZCCwF3jzxiYkjjZL8sv18XpJLkixP8r0kS5Icl+SyJFcl2WtgNy9M8l9t\nvcPb9vPayJnLk6xN8pqB/a5M8m/AVUPiObbt/+ok72ptpwPPAf554iicdM5o61+V5GUDz721ta1J\nsqS1PTbJf7a21Un2ajF9bmC7DyZ5ZXt8Y5LTk3wNOLqt/6Uk327n/ISB6/iBJN9Icv2EazosjlH7\nObqdy5oklw65PkPPN8kKYGdg1eA1aM89NMlFSa5I8tEkP0iyMMniJFcPrPe7EWxbGl+SHYB3Ai9L\ncmWSlyV5ZZIPtvUfneTilgMXJ9ljsuslSZLGY/64A5AkSdutDwFrk7y7xzb7Ak8E7gCuB86sqgOT\nnAycBLyprbcYeC6wF7AyyWOBVwAbquppSXYEvp7korb+gcCTq+qGwYMleRTwLuBPgJ8CFyU5qqre\nmeRg4JSq+taEGF9CN1JmX2AhcHkrxuwHHAU8varuTPKQtv7ZwJKquiDJA+g+lNt9kuvw66p6Tovx\nYuC1VfX9JE8HPgwc3NZ7JF3x6wnACuC8JIeOiGPpiP2cDryoqn6YZJchsQw936o6Iskvq2q/Idu8\nA/hau46HASdOcr5bHF9V3d2KfgdU1RvaNXrlwH4+CHyyqj6R5NXAB9r1GHq9tiAuSZI0RSwqSZKk\noarq50k+CbwR+L8t3OzyqroFIMn/AJuKQlcBg7ehLa+qe4DvJ7merkjw58BTBkagLAD2Bu4GLptY\nUGqeBny1qm5rxzwbOAj49GZifA5wTlX9FvhJkkvafp4LfLyq7mznf0eSBwG7VdUFre3X7TiTXYdP\ntfUeCDwL+PeBbXYcWO/T7Tp8J8nDW9sLh8Sxuf18HViWZDlwfo/zXbGZ+A+iK0ZRVZ9P8tPNnexW\nxjfRMzcdG/hXYLCoOex6SZKkMbGoJEmSNuf9wGrg4wNtG2m30KerIOww8NxdA4/vGVi+hz9831ET\njlNAgJOq6sLBJ5I8D/jViPgmre702CZD4hq17u+uQfOACc9vivd+wM9GjAaCP7xeGfg5MY6R+6mq\n17aRQYcBVybZr6r+dwvOYTITY4DR590rvq2IY9j1kiRJY+KcSpIkaaSqugNYTjfp9SY30t1uBnAk\ncP/7sOujk9wv3TxLjwGuBS4EXpfk/gBJHpdk50n2swp4bpvvZx5wLHDJJNtcSjefz7wku9KNyrmM\nblTVq5Ps1I7/kKr6ObA+yVGtbcf2/A+AfdryAuAFww7Utr8hydFt+yTZd5L4RsUxdD9J9qqqVVV1\nOnA79741b9T5TnaNjmv7PxR4cGv/CfCwNufSjsDhk53niPh+ATxoxLG/ARzTHh8HfG2SWCVJ0phY\nVJIkSZN5L91cPJv8C10h5zLg6YweRbQ519IVf75INw/Pr4Ezge8Aq9tk0B9lklHV7Va704CVwBpg\ndVV9ZpJjXwCsbet/BXhrVf24qr5Ed0vYt5JcCZzS1n858MYka+kKHo+oqpvpim1r6eZcumIzxzsO\nOCHJGmAdXSFuc+c0Ko5R+zkjbaJyumLQmi05383FAPw9cFCS1XS3Jd7UYvsN3STbq4DPAd/dgvMc\nFt9KuqLclZkwSTjd7Zavatf75cDJk8QqSZLGJFXDRjZLkiRJnSQ30k2sffu4Y5EkSdsPRypJkiRJ\nkiSpN0cqSZIkSZIkqTdHKkmSJEmSJKk3i0qSJEmSJEnqzaKSJEmSJEmSerOoJEmSJEmSpN4sKkmS\nJEmSJKm3/wewDUouuwNUgwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x22b0188b3c8>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "h_WdYxlYs5Xj"
},
"source": [
"<h3> 3.2.5 Checking for NULL values </h3>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "r0x1gR2fs5Xk",
"outputId": "721aef48-e628-40c6-d567-25466f4283e1"
},
"source": [
"#Checking whether there are any rows with null values\n",
"nan_rows = df[df.isnull().any(1)]\n",
"print (nan_rows)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
" id qid1 qid2 question1 question2 \\\n",
"105780 105780 174363 174364 How can I develop android app? NaN \n",
"201841 201841 303951 174364 How can I create an Android app? NaN \n",
"\n",
" is_duplicate \n",
"105780 0 \n",
"201841 0 \n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CCYmufv6s5Xo"
},
"source": [
"- There are two rows with null values in question2 "
]
},
{
"cell_type": "code",
"metadata": {
"id": "yLBRyACgs5Xp",
"outputId": "076046a9-1510-41ef-cf98-15b38661dca4"
},
"source": [
"# Filling the null values with ' '\n",
"df = df.fillna('')\n",
"nan_rows = df[df.isnull().any(1)]\n",
"print (nan_rows)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Empty DataFrame\n",
"Columns: [id, qid1, qid2, question1, question2, is_duplicate]\n",
"Index: []\n"
],
"name": "stdout"
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment