Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajvijen/8e5a285ec439922d8df331783f5cecd4 to your computer and use it in GitHub Desktop.
Save rajvijen/8e5a285ec439922d8df331783f5cecd4 to your computer and use it in GitHub Desktop.
This is the notebook for SMS Spam Classification Using SVMs.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# SMS Spam Classification using SVMs:-\n\nyou can refer dataset here [SMS Spam Collection Dataset](https://gist.githubusercontent.com/rajvijen/51255cf4875372b904bdb812a3b85b28/raw/816dcd4cdc7553faea396186067e814487046c74/sms_spam_classification_data.csv). For details about dataset refer this kaggle dataset [link](https://www.kaggle.com/uciml/sms-spam-collection-dataset)."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Required Libraries:-\nFirst of all import all required libraries."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom collections import Counter\nfrom sklearn import feature_extraction, model_selection, metrics, svm\n\nfrom IPython.display import Image\nimport warnings\nwarnings.filterwarnings(\"ignore\")\n%matplotlib inline",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## EDA:-\nObserve the dataset in tabular format."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "data = pd.read_csv('https://gist.githubusercontent.com/rajvijen/51255cf4875372b904bdb812a3b85b28/raw/816dcd4cdc7553faea396186067e814487046c74/sms_spam_classification_data.csv', encoding='latin-1')\ndata.head(10)",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/plain": " v1 v2 Unnamed: 2 \\\n0 ham Go until jurong point, crazy.. Available only ... NaN \n1 ham Ok lar... Joking wif u oni... NaN \n2 spam Free entry in 2 a wkly comp to win FA Cup fina... NaN \n3 ham U dun say so early hor... U c already then say... NaN \n4 ham Nah I don't think he goes to usf, he lives aro... NaN \n5 spam FreeMsg Hey there darling it's been 3 week's n... NaN \n6 ham Even my brother is not like to speak with me. ... NaN \n7 ham As per your request 'Melle Melle (Oru Minnamin... NaN \n8 spam WINNER!! As a valued network customer you have... NaN \n9 spam Had your mobile 11 months or more? U R entitle... NaN \n\n Unnamed: 3 Unnamed: 4 \n0 NaN NaN \n1 NaN NaN \n2 NaN NaN \n3 NaN NaN \n4 NaN NaN \n5 NaN NaN \n6 NaN NaN \n7 NaN NaN \n8 NaN NaN \n9 NaN NaN ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>v1</th>\n <th>v2</th>\n <th>Unnamed: 2</th>\n <th>Unnamed: 3</th>\n <th>Unnamed: 4</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>ham</td>\n <td>Go until jurong point, crazy.. Available only ...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>1</th>\n <td>ham</td>\n <td>Ok lar... Joking wif u oni...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>2</th>\n <td>spam</td>\n <td>Free entry in 2 a wkly comp to win FA Cup fina...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>3</th>\n <td>ham</td>\n <td>U dun say so early hor... U c already then say...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>4</th>\n <td>ham</td>\n <td>Nah I don't think he goes to usf, he lives aro...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>5</th>\n <td>spam</td>\n <td>FreeMsg Hey there darling it's been 3 week's n...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>6</th>\n <td>ham</td>\n <td>Even my brother is not like to speak with me. ...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>7</th>\n <td>ham</td>\n <td>As per your request 'Melle Melle (Oru Minnamin...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>8</th>\n <td>spam</td>\n <td>WINNER!! As a valued network customer you have...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n <tr>\n <th>9</th>\n <td>spam</td>\n <td>Had your mobile 11 months or more? U R entitle...</td>\n <td>NaN</td>\n <td>NaN</td>\n <td>NaN</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Get some insights from data.\n### Data Visualization:-"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "count_class = pd.value_counts(data[\"v1\"], sort = True)\ncount_class.plot(kind = 'bar', color = ['blue', 'orange'])\nplt.title('Spam vs Non-spam distribution of data')\nplt.show()",
"execution_count": 3,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEaCAYAAAAYOoCaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAGH5JREFUeJzt3XuUZWV95vHvA83FGzdpEGiwMXYygpeoFWCNzoR44aYG4miCV1AcMmtMNI6Ot9ExKqPomqVGo2ZIdIEoIPES0GCUJaJLDWq1EBXB0CpKA0JDcxG8Af7mj/2Wni6quqqa7jpNvd/PWmedvd/9nr3fvc+u/ezb2ZWqQpLUn23G3QBJ0ngYAJLUKQNAkjplAEhSpwwASeqUASBJnTIApC0kyYVJXtS6n5Pkc5tx3JcmObR1/3WSD2/Gcb82yT9srvEtYLp/kuSqJLclefQ86v9m+WrTGACLJMnjk3w1yS1J1if5SpI/GHe7NockhyapJO+dVv7lJMePqVlblar6SFUdNle9JKcmOWke4zuwqi68p+1q393aaeN+S1WNY8P6f4G/qKr7V9XFm3PESa5M8qTNOc6lwABYBEl2Aj4NvAfYDdgHeCPwy3G2azO7HXh+kpVjbseSlmTZuNuwBT0YuHTcjeiJAbA4fhegqs6sqruq6udV9bmq+hZAkuPbEcF72hHC5UmeOPXhJC9IclmSnyb5QZI/Hxl2aJK1SV6Z5Pok1yY5JslRSf69HW28dqZGJTkkyU+SbDtS9idJptp1UJLJJLcmuS7JOzYyjzcDpwJvmGVa2yR5XZIftXZ+KMnObdjKdgRxXJIfJ7khyf+abUJJdkzy4SQ3Jrk5yTeS7NmGXZjkrUm+3pblOUl2G/nsP7Z5viXJl5IcODLs1CTvS/KZdhriK0kelORdSW5q38uspyaSPLnVuSXJ3wIZGXZ8ki+37iR5Z1sOtyT5VpKHJzkReA7wyjb9T7X6VyZ5Vftebk+ybIY92h2TfLStI99M8qiRaVeSh06bz5OS3A/4DLB3m95tSfbOtFNKSf44wymnm9vyfdjIsCuTvKLNwy2tDTsuZB1IskOS24BtgX9L8v1NWL6/k+SCtk7ckOQjSXZpw04H9gM+1ebxlXOtC92oKl9b+AXsBNwInAYcCew6bfjxwJ3Ay4DtgD8DbgF2a8OfAvwOwwr/h8DPgMe0YYe2z/7v9tn/CqwDzgAeABwI/AJ4yCxt+z7w5JH+fwRe3br/FXhe674/cMgs4zgUWAs8CLgV+L1W/mXg+Nb9QmAN8JA2rk8Ap7dhK4EC/h64D/AohqOjh80yvT8HPgXcl2Gj8VhgpzbsQuBq4OHA/YCPAx8e+ewL23LZAXgXcMnIsFOBG9r4dgQuAH4IPL9N5yTgC7O0afc2789o38PL2vfyopHv+Mut+3BgNbBL+04fBuw10oaTpo37SuASYF/gPiNlT2rdfw3cMTLtV7R2b9eGF/DQafN50uh3N216fz21zBh2Xm4HntzG/cr2PW4/0o6vA3szHN1eBvy3WZbRrOvATO1c4PJ9aGvjDsBy4EvAu6YtwyfN0J4Z14VeXmNvQC+v9kd+KsOG8k7gXGDPNux44BogI/W/Ttv4zjCufwJe2roPBX4ObNv6H9D+kA4eqb8aOGaWcZ0EfHDks7cDD279X2I4VbX7HPP2m40I8Hbgo617NAA+D/z3kc/8HsNGaxm/DYAV0+b/2Fmm90Lgq8AjZxh2IXDySP8BwK+mls+0uru06e7c+k8F/n5k+F8Cl430PwK4eZY2PR+4aKQ/7bueKQCeAPw7cAiwzbTxnMrMAfDCGcpGA2B02tsA1wL/qfXfkwB4PXD2tHFfDRw60o7njgx/O/B3syyjWdeBmdq5kOU7Q/1jgItnWl6z1N9gXejl5SmgRVJVl1XV8VW1gmHvdG+GvY4pV1dbE5sftTokOTLJRRlO59wMHMWwRzTlxqq6q3X/vL1fNzL85wx7XDM5A3h6kh2ApwPfrKoftWEnMOwBXt5Oszx1HrP6NuDw0VMQzd5tnkbnbxmw50jZT0a6fzbV5pHTE7cl2Q84HfgscFaSa5K8Pcl2I5+9atp0tgN2T7JtkpOTfD/JrQwbBdhwWU5fbvNdjnuPTrd9l1fNVLGqLgD+FngvcF2SUzJcJ9qYGcc10/Cq+jXDxnHvOT4zHxt8b23cVzFcx5oy4/c217iYeR3YWDtmXb5J9khyVpKr23f7YTb8Xjcwz3VhyTMAxqCqLmfYC3v4SPE+STLSvx9wTdswf5zhDok9q2oX4DxGzn/ew7Z8l+EP8Ujg2QyBMDXsiqp6FrAHw4b9Y+288cbGdyNDsL152qBrGC7yTdmP4UjoOuZQw10hU68fV9UdVfXGqjoA+I/AUxn2EKfsO206dzCc2nk2cDTwJGBnhiMP2DzL8trR6bbvct/ZKlfVu6vqsQyn6H4X+J9Tg2b7yBzTH532NsAKhmUOw0b5viN1H7SA8W7wvY3M19VzfG7OcbGAdYC5l+9bGeblkVW1E/BcNvxep8/nllwX7jUMgEWQ5D8keXmSFa1/X+BZwEUj1fYAXpJkuyTPZDhldB6wPcM5ynXAnUmOBOa8nXCBzgBeAvxnhmsAU+1+bpLlba/v5lZ81wyfn+4dDBvmh42UnQm8LMn+Se4PvIXhVNGdC21skj9K8ogMF69vZdjAj7bruUkOSHJf4E3Ax9oR0gMYri3cyLBBfMtCp70R/wwcmOTpGe7UeQkbbmhH2/8HSQ5uRy23M1yjmWr/dQznyBfqsSPT/iuG+Zxavy4Bnt32eo9guI405TrggWkX5GdwNvCUJE9s7X15G/dXN6GN92QdmGv5PgC4Dbg5yT78NlCnTF+uW3JduNcwABbHT4GDga8luZ3hD/M7DH9MU74GrGLYU/0/wDOq6saq+inDyn42cBPDnsu5m7l9ZzKcC76gqm4YKT8CuLTdofE3DOfkfzHXyKrqVoZzwbuNFH+Q4dTNlxguUP6C4Rz7pngQ8DGGjf9lwBcZDvmnnM5whPUThou5L2nlH2I42rka+C4bBvA90pbbM4GTGTYqq4CvzFJ9J4YL3je19tzIcIQH8AHggHbHzT8toAnnMNw8cBPwPODpVXVHG/ZS4GkMIf4chmtIU+2+nOH7/0Gb5ganjarqewx70+9hWDefBjytqn61gLZN2eR1YB7L943AYxhunvhnhgvMo94KvK7N4yvYguvCvUk2PO2sccjwY6kXVdXjx92We7skFzJcwFz0X7JK9zYeAUhSpwwASeqUp4AkqVMeAUhSp+YVAO15H99OckmSyVa2W5Lzk1zR3ndt5Uny7iRrMjwf5DEj4zmu1b8iyXFbZpYkSfMxr1NASa4EJkZvEUzydmB9VZ2c5NUMz7d5VZKjGG7tOorh1se/qaqDMzyQaxKYYPhRxmrgsVV102zT3X333WvlypWbPHOS1KPVq1ffUFXL56p3Tx4tezTDveMwPOTsQuBVrfxD7afaFyXZJclere75VbUeIMn5DPeZnznbBFauXMnk5OQ9aKIk9SfJj+auNf9rAAV8LsnqDI+sheGxBNcCtPc9Wvk+bPjckrWtbLZySdIYzPcI4HFVdU2SPYDzk1y+kbozPUujNlK+4YeHgDkRYL/99ptn8yRJCzWvI4Cquqa9Xw98EjiI4SmGewG09+tb9bVs+JCmqYdSzVY+fVqnVNVEVU0sXz7nKSxJ0iaaMwCS3C/JA6a6GR5E9h2G59FM3clzHMOzSGjlz293Ax0C3NJOEX0WOCzJru2OocNamSRpDOZzCmhP4JPtScXLgDOq6l+SfAM4O8kJwI8ZHtQEwxMsj2L4zz8/A14AUFXrk7wZ+Ear96apC8KSpMW3Vf8SeGJiorwLSJIWJsnqqpqYq56/BJakThkAktSpe/JDMDXp6p/IbXlb8VlJaUnxCECSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE7NOwCSbJvk4iSfbv37J/lakiuSfDTJ9q18h9a/pg1fOTKO17Ty7yU5fHPPjCRp/hZyBPBS4LKR/rcB76yqVcBNwAmt/ATgpqp6KPDOVo8kBwDHAgcCRwDvS7LtPWu+JGlTzSsAkqwAngL8Q+sP8ATgY63KacAxrfvo1k8b/sRW/2jgrKr6ZVX9EFgDHLQ5ZkKStHDzPQJ4F/BK4Net/4HAzVV1Z+tfC+zTuvcBrgJow29p9X9TPsNnJEmLbM4ASPJU4PqqWj1aPEPVmmPYxj4zOr0Tk0wmmVy3bt1czZMkbaL5HAE8DvjjJFcCZzGc+nkXsEuSZa3OCuCa1r0W2BegDd8ZWD9aPsNnfqOqTqmqiaqaWL58+YJnSJI0P3MGQFW9pqpWVNVKhou4F1TVc4AvAM9o1Y4Dzmnd57Z+2vALqqpa+bHtLqH9gVXA1zfbnEiSFmTZ3FVm9SrgrCQnARcDH2jlHwBOT7KGYc//WICqujTJ2cB3gTuBF1fVXfdg+pKkeyDDzvnWaWJioiYnJ8fdjDllpqsb2mRb8Sop3SskWV1VE3PV85fAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjo1ZwAk2THJ15P8W5JLk7yxle+f5GtJrkjy0STbt/IdWv+aNnzlyLhe08q/l+TwLTVTkqS5zecI4JfAE6rqUcDvA0ckOQR4G/DOqloF3ASc0OqfANxUVQ8F3tnqkeQA4FjgQOAI4H1Jtt2cMyNJmr85A6AGt7Xe7dqrgCcAH2vlpwHHtO6jWz9t+BOTpJWfVVW/rKofAmuAgzbLXEiSFmxe1wCSbJvkEuB64Hzg+8DNVXVnq7IW2Kd17wNcBdCG3wI8cLR8hs9IkhbZvAKgqu6qqt8HVjDstT9spmrtPbMMm618A0lOTDKZZHLdunXzaZ4kaRMs6C6gqroZuBA4BNglybI2aAVwTeteC+wL0IbvDKwfLZ/hM6PTOKWqJqpqYvny5QtpniRpAeZzF9DyJLu07vsATwIuA74APKNVOw44p3Wf2/ppwy+oqmrlx7a7hPYHVgFf31wzIklamGVzV2Ev4LR2x842wNlV9ekk3wXOSnIScDHwgVb/A8DpSdYw7PkfC1BVlyY5G/gucCfw4qq6a/POjiRpvjLsnG+dJiYmanJyctzNmFNmurqhTbYVr5LSvUKS1VU1MVc9fwksSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVNzBkCSfZN8IcllSS5N8tJWvluS85Nc0d53beVJ8u4ka5J8K8ljRsZ1XKt/RZLjttxsSZLmMp8jgDuBl1fVw4BDgBcnOQB4NfD5qloFfL71AxwJrGqvE4H3wxAYwBuAg4GDgDdMhYYkafHNGQBVdW1VfbN1/xS4DNgHOBo4rVU7DTimdR8NfKgGFwG7JNkLOBw4v6rWV9VNwPnAEZt1biRJ87agawBJVgKPBr4G7FlV18IQEsAerdo+wFUjH1vbymYrlySNwbwDIMn9gY8Df1VVt26s6gxltZHy6dM5Mclkksl169bNt3mSpAWaVwAk2Y5h4/+RqvpEK76undqhvV/fytcC+458fAVwzUbKN1BVp1TVRFVNLF++fCHzIklagPncBRTgA8BlVfWOkUHnAlN38hwHnDNS/vx2N9AhwC3tFNFngcOS7Nou/h7WyiRJY7BsHnUeBzwP+HaSS1rZa4GTgbOTnAD8GHhmG3YecBSwBvgZ8AKAqlqf5M3AN1q9N1XV+s0yF5KkBUvV3U7DbzUmJiZqcnJy3M2YU2a6uqFNthWvktK9QpLVVTUxVz1/CSxJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpU3MGQJIPJrk+yXdGynZLcn6SK9r7rq08Sd6dZE2SbyV5zMhnjmv1r0hy3JaZHUnSfM3nCOBU4IhpZa8GPl9Vq4DPt36AI4FV7XUi8H4YAgN4A3AwcBDwhqnQkCSNx5wBUFVfAtZPKz4aOK11nwYcM1L+oRpcBOySZC/gcOD8qlpfVTcB53P3UJEkLaJNvQawZ1VdC9De92jl+wBXjdRb28pmK5ckjcnmvgicGcpqI+V3H0FyYpLJJJPr1q3brI2TJP3WpgbAde3UDu39+la+Fth3pN4K4JqNlN9NVZ1SVRNVNbF8+fJNbJ4kaS6bGgDnAlN38hwHnDNS/vx2N9AhwC3tFNFngcOS7Nou/h7WyiRJY7JsrgpJzgQOBXZPspbhbp6TgbOTnAD8GHhmq34ecBSwBvgZ8AKAqlqf5M3AN1q9N1XV9AvLkqRFlKoZT8VvFSYmJmpycnLczZhTZrrCoU22Fa+S0r1CktVVNTFXPX8JLEmdMgAkqVMGgCR1ygCQpE4ZAJLUqTlvA5V0L3eGt6ltNs9eWreoeQQgSZ0yACSpUwaAJHXKAJCkThkAktQpA0CSOmUASFKnDABJ6pQBIEmdMgAkqVMGgCR1ygCQpE4ZAJLUKQNAkjplAEhSpwwASeqUASBJnTIAJKlTBoAkdcoAkKROGQCS1CkDQJI6ZQBIUqcMAEnqlAEgSZ0yACSpUwaAJHXKAJCkTi16ACQ5Isn3kqxJ8urFnr4kabCoAZBkW+C9wJHAAcCzkhywmG2QJA0W+wjgIGBNVf2gqn4FnAUcvchtkCSx+AGwD3DVSP/aViZJWmTLFnl6maGsNqiQnAic2HpvS/K9Ld6qfuwO3DDuRswlM60lWuruFesmz7nXrJwPnk+lxQ6AtcC+I/0rgGtGK1TVKcApi9moXiSZrKqJcbdDms51czwW+xTQN4BVSfZPsj1wLHDuIrdBksQiHwFU1Z1J/gL4LLAt8MGqunQx2yBJGiz2KSCq6jzgvMWergBPrWnr5bo5BqmquWtJkpYcHwUhSZ0yACSpUwaAJHVq0S8Ca/EleSSwkpHvu6o+MbYGSfzm2WBP4e7r5jvG1abeGABLXJIPAo8ELgV+3YoLMAA0bp8CfgF8m9+um1pEBsDSd0hV+cRVbY1WVNUjx92InnkNYOn7Vx+5ra3UZ5IcNu5G9MwjgKXvNIYQ+AnwS4YH8pV7XtoKXAR8Msk2wB38dt3cabzN6oc/BFvikqwB/gfTzrNW1Y/G1igJSPID4Bjg2+WGaCw8Alj6flxVPnBPW6MrgO+48R8fA2DpuzzJGQx3XPxyqtDbQLUVuBa4MMln2HDd9DbQRWIALH33YfjjGr3Y5m2g2hr8sL22by8tMq8BSFKnPAJY4pLsCJwAHAjsOFVeVS8cW6MkIMly4JXcfd18wtga1Rl/B7D0nQ48CDgc+CLDv+H86VhbJA0+AlwO7A+8EbiS4b8GapF4CmiJS3JxVT06ybeq6pFJtgM+616Wxi3J6qp67NS62cq+WFV/OO629cJTQEvfHe395iQPB37C8PAtadym1s1rkzwFuIbhCFWLxABY+k5JsivwOuBc4P7A68fbJAmAk5LsDLwceA+wE/Cy8TapL54CWuKS7AD8F4a9/u1acVXVm8bWKElbBS8CL33nAEcDdwK3tdftY22RBCR5SJJPJbkhyfVJzknykHG3qyceASxxSb5TVQ8fdzuk6ZJcBLwXOLMVHQv8ZVUdPL5W9cUjgKXvq0keMe5GSDNIVZ1eVXe214cZfqWuReIRwBKV5NsMf0zLgFXAD/Bx0NqKJDkZuBk4i2Fd/TNgB4ajAqpq/fha1wcDYIlK8uCNDfdx0Bq3JD8c6Z3aEGWqv6q8HrCFGQCSxiLJnwL/UlW3Jnk98BjgzVX1zTE3rRteA5A0Lq9rG//HA08GTgXeP94m9cUAkDQud7X3pwB/V1Xn4GOhF5UBIGlcrk7y/4A/Bc5rP1p0m7SIvAYgaSyS3Bc4guF/Al+RZC/gEVX1uTE3rRsGgCR1ysMtSeqUASBJnTIAJKlTBoAkdcoAkKRO/X/xRmMPgo4WIgAAAABJRU5ErkJggg==\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**`Pie-plot`**"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "count_class.plot(kind = 'pie', autopct = '% 1.0f%%')\nplt.title('Percentage distribution of data')\nplt.ylabel('')\nplt.show()",
"execution_count": 4,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAPEAAAD7CAYAAAC7UHJvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAHu9JREFUeJzt3XmcU+W9x/HPbxZ2ZK0IIhwrbiBqURAVBalWbRSt+y5e16rXVm01rh0L2lCL9rbuV627Vm3Vi9G6L7ggLmCpuCJRFFEUCQzDbJnn/vGckTBkZjIzmTznJL/36zWvmclyzjcn+eacnJxFjDEopcKrxHUApVTHaImVCjktsVIhpyVWKuS0xEqFnJZYqZDTEgeYiFSIyD3+38NEpFJESnM07JtE5DL/70ki8kUuhusPbw8R+TBXw2vDeLcWkXkislpEzsni9j9M3zDLa4lFJCEia/0X49ci8jcR6ZXPDK3xM+7tOkdTxpjPjTG9jDGplm4nIlNF5JUshneGMWZaLrKJiBGREWnDnm2M2ToXw26jC4AXjTG9jTF/yeWAReQOEZmey2Hmios58YHGmF7AGGAscGlbByAiZTlPVURyNTcPoOHAe65D5J0xJm8/QALYO+3/q4HH/b/7ALcBXwFfAtOBUv+6qcCrwLXACmC6f/mpwPvAamAhMMa/fAjwD2A5sBg4J22cFcCDwF3+/d4DdvavuxtoANYClcAF/uUPAcuAJPAyMCpteAOAWcAq4E0/9ytp128DPOPn/hA4ooXpsznwkp/rGeA64B7/Og8wQFnaNPnUv+1i4FhgW6AaSPn5V/q3vQO4EXgCWAPs7V/WOB0nAV8AFwPf+s/TsWm5XgROSft/auNj9KeH8YdbCRzZOLy022/rD2OlP72npF13B3A9EPcfyxvAFi1Moyn+MFb6w9zWv/x5/3FX+zm2asv0bel5Bk4D6oBaf9iz/MujwCLWvf5+kc8+/ZDbVYmBzfwnY5r//6PAzUBPYGNgLnB62oumHvhvoAzoDhyOLftYQIAR2HfiEuBt4HKgC/Bj7It937QSVwM/B0qBPwBzmnuj8S/7L6A30BX4MzA/7boH/J8ewEhgSdoLvKf//0l+7jHYkoxqZvq8Dlzjj2dP/8WxQYn94a4CtvavG5z2gptK2ptIWlGSwO7+9OnGhiWuTxv3RGwpG4f/Is2U2P/fACPS/p+EX2KgHPgE+wbRBZjsP66t07KtAMb5j+1e4IFmps9Wfq59/OFe4A+7S6acbZm+WTzPP0yvtMsOx84wSrBvXmuAwcVQ4krsu+hnwA3YQg4CaoDuabc9Gngh7UXzeZNhPQX8KsM4dslw24uAv6WV+Nm060YCa1sqcZNh9fVftH2wbwJ1jS9I//of5sT+Ezu7yf1vBn6XYbjDsEXqmXbZfTRf4pXAoenTLFPB0l6Ad2W4rGmJ08f9IHBZpnI0HQctl3gP7NytJO36+4GKtBy3pl33c+CDZqb9ZcCDaf+XYN/IJ2XK2Zbp29Lz3HR6tfDamA8clM9OGWNw8dnyYGPMs+kXiMho7DvrVyLSeHEJdi7WKP1vsHPyRRmGPxwYIiIr0y4rBWan/b8s7e8qoJuIlBlj6psOzP/8eCX2XfdH2MVtgIHYN6CyFnIOB3ZpkqUMu9je1BDge2PMmrTLPsM+zvUYY9aIyJHAb4DbRORV4HxjzAcZhpspVyaZxj2klftkYwiwxBjTkHbZZ8Cmaf83fT6aW9k5xL8vAMaYBhFZ0mRYLeVodvq28jwnMw1QRE4AzsO+weLnHphFlpwKygqiJdg58cBMRfI13d1qCbBFM8NabIzZsp1Zmo7nGOAg7OfIBHYO/D12EX459t19KPCRf/v00i0BXjLG7JPFeL8C+olIz7QX2rAMeWxIY54CnhKR7ti5//9i53rN7ZbW2u5qmcb9H//vNdiPC402aWVY6ZYCm4lISVqRh7FuerXFUmB04z9i3/E3w86NW9Pa9G3peYYm009EhmOn+U+B140xKRGZn3b7vAnE98TGmK+Ap4GZIrKRiJSIyBYiMrGFu90K/EZEdhJrhD9h5wKrRORCEekuIqUisp2IjM0yztfYz9GNemPfYL7DvpCvSsudAv4JVIhIDxHZBjgh7b6PA1uJyPEiUu7/jBWRbTNMg8+At4ArRKSLiEwADswUUEQGicgUEenpZ6vErtRpzD9URLpk+XjTNY57D+AA7IoesIuJh/iPcQRwcpP7NZ1m6d7Avglc4D/+Sf7jeqAd+R4EIiLyUxEpB87HPv7XWrtjFtO32efZ1/Qx9sQWezmAiJwEbNeOx9RhgSix7wTsio+F2HfAh7ErbDIyxjyEXfy5D7uC4lGgv1+sA4EdsWttv8UWvk+WOf4AXCoiK0XkN9i12J9h3+0XAnOa3P5sf9jLsIvJ92NfDBhjVgM/A47CzkWWATOwK04yOQb7mX4F8Dt/3JmUYF/AS/3bTgTO9K97HrvCcJmIfJvlY8bP9r0/zHuBM9IWz6/Frpn9GrjTvz5dBXCnP82OSL/CGFOLXaO8P/a5uAE4oZVF/4yMMR8CxwF/9Yd1IPYry9osB9HS9G3teb4NGOk/xkeNMQuBmdiVZV9jlxBebetjygXxP5CrHBGRGcAmxpgTXWdRxSFIc+JQEpFtRGR7f5F+HHZR8xHXuVTxCMqKrTDrjV2EHgJ8g13EesxpIlVUdHFaqZDTxWmlQk5LrFTIaYmVCjktsVIhpyVWKuS0xEqFnJZYqZDTEisVclpipUJOS6xUyGmJlQo5LbFSIaclVirktMRKhZyWWKmQ0xIrFXJaYqVCTkusVMhpiZUKOS2xUiGnJVYq5LTESoWcllipkNODx4eYF433AgZgT6nZ0//plfa7HHuq0CrsSc3S/64EliZikbr8J1e5pAePDzgvGh8ObANs6f9sgT0l5zCyP0lccxqwJ1BLNPlZDCxMxCLLmrmfChAtcYB40XhvYBww3v/ZBXvCa1eWYk8H+hb2LIFzErHIaod5VAZaYoe8aLwHsC/2tJ+7AiMJ9nqKFPbE4y8As4CXE7FIcyeFV3miJc4zLxofhD2vbuNZ6bu5TdQhK4Engf8DnkzEIknHeYqSljgPvGh8CPbk2AdjF5GDPLdtrzrgZeAh4P5ELLLKcZ6ioSXuJF40LsDPgDOAAyiubwLWAA8ANydikTddhyl0WuIc86LxjYH/Ak4Ffuw4ThDMA24B7tWVYp1DS5wjXjS+PRAFDgW6OI4TRGuA24BYIhb5ynWYQqIl7iAvGt8JuAyYAojjOGFQjZ0za5lzREvcTl40vh0wHbuWWbVdNXAztsy6UUkHaInbyN+CajpwDIW5ljnf1gI3AtMTscj3rsOEkZY4S140Xg6cB1wO9HAcpxAtBy4GbkvEIvqibAMtcRa8aHx34CZgO9dZisBc4IxELDLPdZCw0BK3wIvG+wMzgJPRlVb5lAL+DFyeiEWqXIcJOi1xM7xo/Gjgf3C7A0KxSwAnJmKRl10HCTItcRP+TgnXA1MdR1FWCvg9dsVXg+swQaQlTuN/bfQgsK3rLGoDLwDH6nfLG9KvSHxeNH4qdqWKFjiY9gLe9aLx/VwHCZqinxP7O+LfDBztOovKigH+BFys+zJbRV1iLxrfDLs/7CjXWVSbPQkckYhFKl0Hca1oS+zvsPAEsKnrLKrd3gEixb7ZZlF+Jvai8Z8Cs9ECh90YYI4XjRf1eoyiK7EXjR+LXRTbyHUWlRPDgde8aHyi6yCuFFWJvWj8QuBu7PGYVeHoCzzlReNHuA7iQtGU2IvGfw/E0M0nC1VX4D4vGj/MdZB8K4oVW140fhl2qx9V+OqAQxKxyOOug+RLwZfYX4SOuc6h8qoGODARizzjOkg+FHSJvWj8dOwuhKr4VAH7JWKR2a6DdLaCLbG/kuN+iuhzv9rAamDvRCwy13WQzlSQJfai8T2BZ9CjTir4DtglEYssch2ksxRcif1NKd9G9wNW6ywEdi3Us1IU1KKmF413Ax5BC6zWNxJ4wIvGC+r13qjQHtQtwE6uQ6hA2h97lNKCUzCL0140/mvgWtc5VKAZ7HfIj7oOkksFUWIvGt8LeJriOmmZap9VwNhELPKR6yC5EvoSe9H4AOA9YJDrLCo03sSu6Eq5DpILhfCZ+Hq0wKptxmJPflcQQj0n9jd2f8h1DhVKtcC4RCzyrusgHRXaEnvR+I+wi9H6dZJqr3exn4/rXAfpiDAvTt+AFlh1zA7Yc2uFWijnxF40fiTwgOscqiDUA7slYpE3XQdpr9CV2IvGNwI+QefCKnfewn4+DlcZfGFcnL4YLbDKrZ2BY12HaK82zYlFxAMeN8Y4OcWnF40PAz4EurkYvypoS4CtE7HIWtdB2ipsc+Kr0AKrzrEZcK7rEO3Rnjnxk8ArwG7Al8BBwHHAadj9dz8BjjfGVInIHcBaYBvsoUVPAk4EdgXeMMZMzXbcXjS+E3ZLGz3Qneosq4EtE7HI166DtEV75sRbAtcbY0YBK4FDgX8aY8YaY3YA3seelLtRP2Ay9l1uFnYnhVHAaBHZsQ3j/RNaYNW5egNXuA7RVu0p8WJjzHz/77cBD9hORGaLyALsCoL0cxvNMnZ2vwD42hizwBjTgN1Qw8tmhF40fgAwqR1ZlWqrU7xofCvXIdqiPSWuSfs7hd1z6A7gbGPMaOw7WbcMt29oct8Gst/r6NJ25FSqPUqB81yHaItcrdjqDXwlIuXkeFW9F41PAnbJ5TCVasUJ/ma9oZCrEl8GvIE9ON0HORpmowtzPDylWtMdOMt1iGwFeostLxofif3srFS+LQeGh+F746B/T3yO6wCqaP0IOMF1iGwEdk7sReP9gC+AHq6zqKL1EbBN0LepDvKceCpaYOXWVsA+rkO0JsglPs51AKWwWxgGWiAXp/0v2z90nUMp7GbDgxKxyGrXQZoT1Dnx0a4DKOXrDhzuOkRLtMRKte4o1wFaErjFaS8aH4PdJlupoKgHBidikW9dB8kkiHPiY1wHUKqJMuzeeoEUxNOeHJavEa1Z+BLJ1x8EEUp79WfgAedT2qMPyx+bQd2KLwBoqF5DSbeeDDnpr1R/sZAVT9+AlJYzcMpvKe83hIbqSpY/NoONj/g9IrqnZAGbAtzsOkQmgVqc9qLxEcDH+RiXaUjxxfUnMOTkGyjt0YfvX7gdKe9K3wnr77+x4vlbKenak767H803j1xJv4lTqU9+w9rFb9N/8imseP5WeozYhW7DRucjtnKnEugfxGNUB21xenLexmQMGIOpq8EYQ0NtFaW9BjS5iaHqg1foue2eAEhJGaa+FlNfg5SUUff9V6RWf6cFLg69gHGuQ2QStMXpvJVYSsvo/7MzWXr7WZSUd6Os3xD67/PL9W5T88V7lPbsS3n/TQHoM/5wvvvXdUh5FwZGzuf7F26j7x66TUoRmQy86jpEU0GbE++VrxGZVD2V859g8NS/sOlZd9FlY4/knPVP67Rm4Us/zIUBugz6MYNPmMkmR/+B+uQySnv1B2D5YzP4dtafSK35Pl/xlRs/dR0gk8CU2IvGRwEb52t8td98CkB5v8GICD222YOaL9//4XrTkKLqo9fpsc2eG9zXGEPytb/TZ/ejWfnqffSdcAw9R+3Fqrdn5Su+cmNXLxrv7jpEU4EpMfn8PAyU9hpA3bdLSFUlAahePI/yAZv9cH11Yj7lA4ZSttHADe675j/P0X2LnSnt1gtTVwNSAiL2b1XIugATXIdoKkifiSfmc2RlvQfQZ/ejWXbvhUhpGWUb/YgBkXWHHV7z/svrLUo3aqirpvI/zzHoiGkAbDT2YJY/chVSWsbAKRfkLb9yZgL2CDaBEZivmLxo/GNghOscSrXikUQscojrEOkCsTjtf874sescSmXBySmMWhKIEmPPEBGULEq1ZAsvGg/UqYSCUpyRrgMolaUSAvZ6DUqJR7V+E6UCI1CL1FpipdpOS5yBlliFybauA6RzXmIvGhdgmOscSrXBYNcB0jkvMTAAKHcdQqk2yNvmwdkIQok3cR1AqTYK1MnWglDiQa4DKNVG3bxofCPXIRoFocQDWr+JUoETmEXqIJS4n+sASrVDYBapg1Di/q4DKNUOOidO09N1AKXaITAn+wtCiZUKo8Dsix+EEgdjh2al2qbUdYBGQXg3aXAdoFCNlMSiQ0pnf+k6RyH61vSpgYjrGEAwSqxz4k7ykRk67ODSV5MDZdUY11kK0N+CckIIXZwuYPWUlU+suXbLKtNVz/Wce/WuAzTSEhe4NXTvPblmZt86U/qF6ywFRkucJuU6QKFbRv9BB9ReWddgZIXrLAUkMOdkCkKJ9YWVBx+aYZufUBf90hjWus5SICpdB2gUhBJ/7TpAsXilYfToi+pP+bcxuvSTA4FZ668lLjIPpCbvclPqwMCdFCyEtMRptMR5NqP+6D2fSu38ouscIbaWimRgzp4XhBJ/4zpAMTq97rxJ7zUMf8V1jpAKzFwYAlDiRCyyCqh2naMYTamdPn6Z6fem6xwhFKiv65yX2KeL1A6kKC2bXDNzZKXpttB1lpDROXEGn7oOUKyq6NZzYs21G9ea0s9dZwkRLXEGC1wHKGbf0WfgfrUzTIOR5a6zhISWOAMtsWOfmiHDj6q9dLkxrHGdJQQCteSoJVY/mGu2HXlu3ZnvGxOc7YIDaq7rAOmCUuL30B0hAuHRhgk7X1t/2BzXOQIsQUUyUF+LBqLEiVikEljsOoey/pI6ZMKjqd1e6sxxXPJcNZtdu5peV61a7/Kb3qpl9I2V7HhTJRNuX8PC5XYL0Vc/r2f7GysZ+7+VfLLCHkdiZbVh33vWYExe3/8DNReGgJTYp4vUAfLrurMnvtMw4uXOGv6BW5cx95QNj5F4zOhyFvyyF/PP6MUFu3fhvKfsJgQzX6/lH0d056rJ3bjxzVoApr1Uw8UTuiIinRUzkzfyObJsBKnE77gOoNZ3aG3FhCUNAzvlRTt+aBmDe2/48tuo67pCrqmFxn6Wl8LaeqiqM5SXwqIVDXy5uoGJXt4PThO4Egfh8DyNOnXxTbWdoaRkn9qrt3+j61kL+kjV6HyN9/q5tVwzp4baFDx/gj0y7EUTunLarGq6l8Pdv+jOb56uZtpeXfMVqVE9AZzZBGlOPAfd/DJwqunafWLNtUNrTFnevlY5a1wXFp3Tmxl7d2P6bLvovOMmpcw5pScvnNiTT79vYEjvEgxw5MNVHPfPtXxdmZfjLS6gIhm4/bEDU+JELFIDvO46h9rQSnr326f26i4pI3ndPPao7cp49IP1D6BhjGH6yzVctmdXrniphismdeW47cv5yxu1+YgUuEVpCFCJfc+6DqAy+9wMGnpo7RUrjWFV67duv4+/W3e8gvhH9WzZf/2X6J3v1hHZsox+3YWqOigR+1OVn4PlBPIjX9BK/C/XAVTz5psRW/+y7lefGEOHZ3sXPFPN0GtWU1UHQ69ZTcWL9pPUdXPrGHWD/Yrpmjm13Hlw9x/uU1VnuPPdOs4c2wWA88Z34dAH13LRc9X8cmynn6e+Goh39kjaQ/L8HVuLvGhcgGUE6GRVakOnlT7+6kVl9+0mQl6/23Hs/6hIHuQ6RCaBmhMnYhEDPOE6h2rZLakDdn8gtVenfYccUA+7DtCcQJXYd5/rAKp1F9WfOvG11MhAfkbsBLXALNchmhPEEj8HfOU6hGrdMXWX7PlpwybF8I3Cc1QkV7oO0ZzAlTgRizQAD7jOobIhsl/tjDErTO/5rpN0ssAuSkMAS+y7x3UAlZ1ayrtOrLlm82pT/rHrLJ2kHnjMdYiWBLLEiVjkHeB91zlUdlbTs8/kmpm96k3JUtdZOsGLVCS/cx2iJYEsse9e1wFU9pYycPBBtdOqGgyB/ezYTre5DtCaIJf4LgJ05jnVuvfM5iNOrvvtZ8ZQ4zpLjnwF/MN1iNYEtsSJWGQJ8JDrHKptXmj4yQ6X10+dZwx52SOhk91ERTIwZz9sTmBL7LvadQDVdnenfjb+9tT+s13n6KBa4GbXIbIR6BInYpF52O+NVchMqz9+4gupHcK8Mcj9VCRDcVKDQJfYp3PjkDqp7oI9P2wYGsYzMBpgRms3EpGeIhIXkXdF5D8icqSIJERkhojM9X9G+Lc9UETeEJF5IvKsiAzyL68QkTtF5Gn/voeIyB9FZIGI/EtEWt2zI/AlTsQiTwH/dp1DtYdIpPaqcctNn7ddJ2mjx6hIZvMV537AUmPMDsaY7Vi3F94qY8w44Drgz/5lrwDjjTE/wW7MdEHacLYAIsBB2G0kXjDGjAbW+pe3KPAl9v3JdQDVPvWUlU+quWarKtP1A9dZ2iCW5e0WAHv7c949jDFJ//L7037v6v89FHhKRBYAvwVGpQ3nSWNMnT+8Uta9GSwAvNZChKXE9wMfug6h2mcN3XvvVTOzf50pXeI6SxaeoiKZ1RE8jDEfATthy/YHEbm88ar0m/m//wpc589hTwe6pd2mxh9eA1Bn1u0f3EAWx8ELRYkTsUg99t1LhdTX9N84UntVfYORFa6ztKAeODfbG4vIEKDKGHMPdmlxjH/VkWm/G3cQ6cO6czid2PGo64SixACJWGQWuqY61D4ym21+fN1FS42hynWWZlyX5WfhRqOBuSIyH7gEmO5f3lVE3gB+xbo3hQrgIRGZDXybo7xAwI7s0RovGt8emEeI3nzUhg4vfXHuH8tu2UmEUtdZ0iwHtqQimWz1li0QkQSwszEmp0VtSajKkIhF/g3c7jqH6piHUpPG3ZCa8prrHE1c0tECuxKqEvsuBVa7DqE65ur6o/Z4IjXuRdc5fPPI0Y4Oxhgvn3NhCGGJE7HI16z77KFC7My6X09a0OAFYfPMc6hIhnZb79CV2HcNELYNCFQGB9dO2/Ur0/9NhxEeoCL5isPxd1goS+x/5XQSdPz4x8qtFKVlk2tmjlptur/nYPTfAec7GG9OhbLEAIlYZAG6WF0Q1tK1x6SaawbVmrJEnkc9lYpk6I9GEtoS+67CnohNhdx39Bm4b22sJGVkeZ5G+T9UJB/P07g6VahLnIhFUsBxQKXrLKrjFpshw46qvexbYzr9+XyH9XdACLVQlxggEYssAv7bdQ6VG2+abbY9p+7sD4zptEMzVQJHUZEsmPUpoS8xQCIWuQO40XUOlRuzGnbbeWb94Z31MeksKpIFdXjdgiix71cE9NSTqu2uS/1iwj9TE3L9fN5NRfKuHA/TuYIpcSIWqQMOAz5znUXlxnl1Z058q2GrXJ247X3gzBwNK1BCtQNENrxofAfgVaCn6yyq44SGhpe6nDt3WMny8R0YzJfAblQkP89VriApmDlxo0Qs8i4wlfV3zFYhZSgp2af26h2Tpkd7D9H0PbBvoRYYCrDEAIlY5GHsjhKqANTQpdueNX8eVmPKF7XxrmuBKVQkXWwNljcFWWKARCxyFXZjEFUAkvTqu3ft1d1SRpZleZcUcGTYt4vORsGWGCARi1wCXOs6h8qNJWbjTQ+pvWKVMazK4uanUZEM7InBc6mgSwyQiEXOA25wnUPlxrtmxFan1527yJgWd365hIpk0Rw8ouBL7DsbPSJIwXi6YexPptcf95YxGVdezqAiWVQfo4qixIlYxACnAne7zqJy47bUz3e7LzW56XfIF1KRjDoJ5FDBfU/cEi8aF+yBwQtm4/did2/5lS/tXvreBOAMKpK3us7jQlGVuJEXjZ+BPcVGkI62qNrFVN1XfuURu017Pe46iStFWWIALxr/OfB3oJfrLKrdvgWmJGKR11u9ZQEr2hIDeNH4GOBxYLDrLKrNPgb293dFLWpFsWKrOYlY5B1gPDDfdRbVJv8AxmqBraKeEzfyovFu2FNQnu46i2pRLXB+Iha5znWQINESp/Gi8WOAm4DerrOoDXwKHJGIRfRQxU0U9eJ0U4lY5D7gJ+jB94LmYWCMFjgznRNn4EXjZcBl2DPd6ddQ7iSBCxKxyC2ugwSZlrgF/gEGbmTd2d5V/vwd+HUiFsl2r6WipSVuhb+V18nYLb0GOI5TDBYDZyZikX+5DhIWWuIsedH4AGyRTwbEcZxCVA/MBK5IxCJrXYcJEy1xG3nR+HjsJps7uc5SQB4FLk3EIgV9BI7OoiVuJy8aPwi4HBjjOkuIPQFcrmudO0ZL3EFeNH4g8Dt0ztwWzwKXJWIR/SovB7TEOeJF4wdgy7yz6ywB9hwwLRGL6EH+c0hLnGNeNL4n9gAEhwHdHMcJgjXAXcB1iVhkoeswhUhL3Em8aLwfcDy20Ns5juPCPOBW4N5ELJJ0HaaQaYnzwF+jfSpwKNDHcZzOtBi7pvkefw8xlQda4jzyovFyYDLwC2AKhbEf83xscR/1z76h8kxL7Ii/JdiOwH7A/sAuQBenobKzEruDyFPY4ibcxlFa4oDwovGu2FKPA8b6v7fC7dZhDcBC4HX/Zw7wgX/0UBUQWuIA86LxPthCbw94wPC0n745HNUK7P66i/zfnwKfAO8kYpFszragHNISh5Rf8OHAZtiD/fUAumf4bbAnFmv8WYVdJF4JLAcWJ2KRlfnOr3JHS6xUyOmRPZQKOS2xUiGnJVYq5LTESoWcllipkNMSKxVyWmKlQk5LrFTIaYmVCjktsVIhpyVWKuS0xEqFnJZYqZDTEisVclpipUJOS6xUyGmJlQo5LbFSIaclVirktMRKhZyWWKmQ0xIrFXJaYqVCTkusVMhpiZUKOS2xUiGnJVYq5LTESoWcllipkNMSKxVyWmKlQu7/AbnBmmHSc0bRAAAAAElFTkSuQmCC\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "we need to know a little bit more about text data\n### Text Analytics:-\nlet's look at frequencies of words in spam and non-spam(ham) messages and plot that out."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "count1 = Counter(\" \".join(data[data['v1']=='ham'][\"v2\"]).split()).most_common(20)\ndf1 = pd.DataFrame.from_dict(count1)\ndf1 = df1.rename(columns = {0: \"non-spam words\", 1 : \"count\"})\n\ncount2 = Counter(\" \".join(data[data['v1']=='spam'][\"v2\"]).split()).most_common(20)\ndf2 = pd.DataFrame.from_dict(count2)\ndf2 = df2.rename(columns={0: \"spam words\", 1 : \"count_\"})",
"execution_count": 5,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "#Plots:-\ndf1.plot.bar(legend = False)\ny_pos = np.arange(len(df1[\"non-spam words\"]))\nplt.xticks(y_pos, df1[\"non-spam words\"])\nplt.title('More frequent words in non-spam messages')\nplt.xlabel('words')\nplt.ylabel('number')\nplt.show()",
"execution_count": 6,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAElCAYAAADz3wVRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xe8HVW99/HPl4TeIaElhFCCioiIoagoERQpIlyFK01C0VyU+iCXIvjABXwAG7YrSIlUgVxEyBUQIxJAIyVACF1iKAktwVACSEn4PX+stclk55Q95+x2cr7v12u/zsyamTW/2Wfv/dtrrZnZigjMzMxqtUSrAzAzs77FicPMzEpx4jAzs1KcOMzMrBQnDjMzK8WJw8zMSnHi6Ack/ZukGZJel/SxVsfTF0kaJWlmD7Y7T9J3GxGTWas4cTSQpKckvSNpUFX5FEkhaXiTQvkhcHhErBAR9zdpn73W0w/rdhIRh0bE6a2Ow6yenDga70lgn8qMpI8Ay/a0MkkDe7DZesDDdaxvsebnxKxrThyNdxlwQGF+NHBpcQVJK0u6VNJsSU9LOlnSEnnZgZL+KukcSXOAU3P5wZIelfSypJslrVe9Y0lLS3odGAA8IOkfufwpScdLmgq8IWmgpHUk/TbH8KSkIwv1LCvp4ryvRyT9Z7ElkFtPGxXmL5Z0RmH+i7mV9YqkSZI2Kyx7StKxkqZKelXS1ZKWkbQ8cBOwTu5ie13SOlXHt36us/JcXShpVmH55ZKOztPrSBovaY6kaZK+UVjvVEnX5PVfAw6sPmZgy6p9Hy/pWUlzJT0uaYfq57/6uai0oCR9W9IsSc9LOqij7fL6EyWdnv//cyX9sdh6lfQlSQ/n52CipA9197x2sa8Oj6fw3Fydl90n6aOF7U6Q9I+87BFJ/1ZYVnztviJpuqRP5vIZ+TkY3c3xn5FfM69L+l9Jq0u6QtJrku5RodUu6YOSJuT/8eOS/r2wbJcc39x8nMfm8kGSfp/jmyPpjsLrqatjGyDpR5JeUnq/HK70PhiYl68s6aL8P342H8eAvGwjSbfl/8tLkq7u7DloWxHhR4MewFPA54DHgQ+RPsBnkFoAAQzP610KXA+sCAwH/g4ckpcdCMwDjgAGklorewDTcp0DgZOBSV3EEcBGVXFNAdbN9S0B3Av8X2ApYANgOvCFvP5ZwB3Aanmbh4CZXdR/MXBGnt4CmAVsnY9/dN7/0oVY7gbWyfU/Chyal40q7qeTY3sG+HiefjzH/aHCso/l6duAXwLLAJsDs4Ed8rJTgXfz87pEfk46PWbgA/n/uE6eHw5s2El8xediVP5fngYsCewCvAms2sm2E4F/ABvnmCYCZ+VlGwNvAJ/PdR2XXxNLdfe8drCfTo+n8NzsmfdzLKkVvWRevlfexxLAV3NMa1e9dg/K//sz8v/kv4GlgR2BucAKXRz/NGBDYGXgEdJ743Ok1/2lwK/zusvnYzgoL9sCeAn4cF7+PPDpPL0qsEWePhM4Lx/bksCnAdVwbIfmeIbm+v5Eeh8MzMuvA36V41oj/y/+Iy+7Ejgp17sMsG2rP6tKf7a1OoDF+cGCxHFyfoHuBEzIL+zIb9ABwNvAJoXt/gOYmKcPBJ6pqvcmcmLJ80uQPoDW6ySOjhLHwYX5rTvYx4mFN+V0YKfCsjHUnjjOBU6vqvtxYLtCLPsXln0fOC9Pj6L7xHEZcAywVq73+/lNvT7wSn5u1gXmAysWtjsTuDhPnwrcXlVvp8cMbERKhp8jf4B2EV/xuRgF/Kvy4ZLLZgHbdLLtRODkwvy3gD/k6e8C46peA88Co7p7XjvYT6fHk5+bO6v28/6HcAd1TQF2L7x2nygs+0h+raxZKPsnsHkXx39SYf5HwE2F+d2AKXn6q8AdVdv/CjglTz9Del+tVLXOaaQvbRt1FEMXx/ZnciLI85/LxzYQWJP0nl62sHwf4NY8fSlwPjC0u32268NdVc1xGbAv6Y10adWyQaRv+U8Xyp4GhhTmZ1Rtsx7w09y8fgWYA6hqm+4U61yP1CX0SqHO75DeAJC+dRXXL8banfWAb1fVvW6us+KFwvSbwAol6r+N9IH8GeB20ofNdvlxR0S8l/c1JyLmVh1DV89xp8ccEdOAo0kfqrMkXaWqbrQu/DMi5hXmuzvezp6bdapiei/HWzymDreVdJMWdP/tV8PxvP885P3MzPtH0gFa0A35CrAp6TVd8WJh+l+5juqyro6/et3Otl0P2LrqdbYf6QsFwFdILbynczfRJ3L5D0itmj/mrrQTKpV3c2zVr4/q99OSwPOFbX9FanlAah0KuDt3NR7cxfG3JSeOJoiIp0nN+12Aa6sWv0TqCiiOUQwjfXt8v4qqbWaQvu2sUngsGxGTyoRVVd+TVfWtGBG75OXPkz7si/EVvQksV5hfqzA9A/heVd3LRcSVJWPszG2k7oVRefovwKdIieO2vM5zwGqSVqw6hq6e4y6POSJ+ExHbsqDb8ewaYq2n5yi8ZiSJFO+znW6RRcTOkc6wWyEirshlXR3P+89D7v8fCjynNK52AXA4sHpErELq0lNvD64HZgC3Vb3OVoiIbwJExD0RsTvpw/s6YFwunxsR346IDUgtmGMk7VDDsT1Peh4qiq+VGaQWx6BCLCtFxIfzPl+IiG9ExDqkVtAvVRgj7AucOJrnEGD7iHijWBgR80kv4u9JWjG/YI8BLu+irvOAEyV9GN4fiNurF7HdDbyWB0iXzQN/m0qqDAiPy/tbVdJQ0nhL0RRg37zdTqQP7YoLgEMlba1keUm7Vn2Id+ZFYHVJK3e2QkQ8QfrmuT+pu+m1vN1XyIkjImYAk4AzlQbeNyP9P67oYt+dHrOkD0jaXtLSwFt5//NrOJ56Ggfsmj/klgS+TfqwKvPlAajpeD4u6ct54PfovJ87Sf33QRovQmmgf9NeHFNv/B7YWNLXJC2ZH1tK+pCkpSTtJ2nliHgXeI18fEonbmyUE2+lfD7dH9s44ChJQyStAhxfWRARzwN/BH4kaSVJS0jaUNJ2ua698msK4OW8n2a/fnrFiaNJIuIfETG5k8VHkAbeppO+Mf8GGNtFXb8jfSO8SuksoIeAnXsR23zSt63NSS2jl4ALSQOSAP9F6hZ5kvSGuKyqiqPy9pXugesKdU8GvgH8gvQmmUbqsqslrsdIA4nTc5O/s+6g20hdQM8U5gUUr1nZhzSm9BzwO1Lf94Qudt/VMS9NGjx/idQdtAapa69pIuJxUrL8eY5jN2C3iHinB9V1dzzXk8YQXga+Bnw5It6NiEdI4w5/IyXrjwB/7dEB9VLuhtwR2Jv0P36B9B5ZOq/yNeCp/H45lPTcAYwgDWy/TjqOX0bExBqO7QLS62Iq6XV2I+lEgEoCOIDUBf0I6Xm7Blg7L9sSuEvpjMfxwFER8WRdnogmqZw9YFYzSaOAyyNiaHfrWt8m6VTSwPH+3a3bn0namXTywXrdrrwYcIvDzKyk3KW7i9I1UEOAU0gt2X7BicPMrDyRujNfJnVVPUq6DqpfaFhXlaSxwBeBWRGxaaH8CNKZCvOAGyLiuFx+ImnAcj5wZETcnMt3An5Kut7hwog4qyEBm5lZTRqZOD5DGnC6tJI4JH2WdMXkrhHxtqQ1ImKWpE1Ig6Bbkc6P/hPpylhIV4p+nnTu+D3APnngyszMWqBhN3OLiNu16N1fv0m6ZcLbeZ3KfYV2B67K5U9KmkZKIgDTImI6gKSr8rpOHGZmLdLsu4BuDHxa0vdI54sfGxH3kK52vbOw3kwWXAE7o6p86+52MmjQoBg+fHhdAjYz6y/uvffelyJicHfrNTtxDCTdEGwb0rnM4yRtQMdXmgYdD9532LcmaQzpfkIMGzaMyZM7u2TCzMw6Iqmm2wk1+6yqmcC1kdwNvEe698tMFr5kfyjpIp7OyhcREedHxMiIGDl4cLcJ08zMeqjZieM6YHsASRuTrqx8iXT15N5Kvx+xPulqzrtJg+EjlH53YSnSVaHjmxyzmZkVNKyrStKVpBvPDVL60Z9TSLfRGCvpIeAdYHSk07oeljSONOg9Dzgs3wYDSYcDN5NOxx0bER3+kp2ZmTXHYnnLkZEjR4bHOMzMypF0b0SM7G49XzluZmalOHGYmVkpThxmZlaKE4eZmZXS7AsAW2b4CTd0ufyps3ZtUiRmZn2bWxxmZlaKE4eZmZXixGFmZqU4cZiZWSlOHGZmVooTh5mZleLEYWZmpThxmJlZKU4cZmZWSr+5cry3urvyHHz1uZn1D25xmJlZKU4cZmZWiruqmsg3WjSzxUHDWhySxkqalX9fvHrZsZJC0qA8L0k/kzRN0lRJWxTWHS3pifwY3ah4zcysNo3sqroY2Km6UNK6wOeBZwrFOwMj8mMMcG5edzXgFGBrYCvgFEmrNjBmMzPrRsMSR0TcDszpYNE5wHFAFMp2By6N5E5gFUlrA18AJkTEnIh4GZhAB8nIzMyap6mD45K+BDwbEQ9ULRoCzCjMz8xlnZWbmVmLNG1wXNJywEnAjh0t7qAsuijvqP4xpG4uhg0b1sMozcysO81scWwIrA88IOkpYChwn6S1SC2JdQvrDgWe66J8ERFxfkSMjIiRgwcPbkD4ZmYGTUwcEfFgRKwREcMjYjgpKWwRES8A44ED8tlV2wCvRsTzwM3AjpJWzYPiO+YyMzNrkUaejnsl8DfgA5JmSjqki9VvBKYD04ALgG8BRMQc4HTgnvw4LZeZmVmLNGyMIyL26Wb58MJ0AId1st5YYGxdgzMzsx7zLUfMzKwUJw4zMyvFicPMzEpx4jAzs1KcOMzMrBQnDjMzK8WJw8zMSnHiMDOzUpw4zMysFCcOMzMrxYnDzMxKceIwM7NSnDjMzKwUJw4zMyvFicPMzEpx4jAzs1KcOMzMrBQnDjMzK6WRvzk+VtIsSQ8Vyn4g6TFJUyX9TtIqhWUnSpom6XFJXyiU75TLpkk6oVHxmplZbRrZ4rgY2KmqbAKwaURsBvwdOBFA0ibA3sCH8za/lDRA0gDgv4GdgU2AffK6ZmbWIg1LHBFxOzCnquyPETEvz94JDM3TuwNXRcTbEfEkMA3YKj+mRcT0iHgHuCqva2ZmLdLKMY6DgZvy9BBgRmHZzFzWWbmZmbVISxKHpJOAecAVlaIOVosuyjuqc4ykyZImz549uz6BmpnZIpqeOCSNBr4I7BcRlSQwE1i3sNpQ4LkuyhcREedHxMiIGDl48OD6B25mZkCTE4eknYDjgS9FxJuFReOBvSUtLWl9YARwN3APMELS+pKWIg2gj29mzGZmtrCBjapY0pXAKGCQpJnAKaSzqJYGJkgCuDMiDo2IhyWNAx4hdWEdFhHzcz2HAzcDA4CxEfFwo2I2M7PuNSxxRMQ+HRRf1MX63wO+10H5jcCNdQzNzMx6wVeOm5lZKU4cZmZWihOHmZmV4sRhZmalOHGYmVkpThxmZlaKE4eZmZXixGFmZqU4cZiZWSlOHGZmVooTh5mZleLEYWZmpThxmJlZKU4cZmZWihOHmZmV4sRhZmalOHGYmVkpThxmZlZKwxKHpLGSZkl6qFC2mqQJkp7If1fN5ZL0M0nTJE2VtEVhm9F5/SckjW5UvGZmVpuG/eY4cDHwC+DSQtkJwC0RcZakE/L88cDOwIj82Bo4F9ha0mrAKcBIIIB7JY2PiJcbGHdbG37CDV0uf+qsXZsUiZn1Vw1rcUTE7cCcquLdgUvy9CXAHoXySyO5E1hF0trAF4AJETEnJ4sJwE6NitnMzLrX7DGONSPieYD8d41cPgSYUVhvZi7rrNzMzFqkXQbH1UFZdFG+aAXSGEmTJU2ePXt2XYMzM7MFmp04XsxdUOS/s3L5TGDdwnpDgee6KF9ERJwfESMjYuTgwYPrHriZmSXNThzjgcqZUaOB6wvlB+Szq7YBXs1dWTcDO0paNZ+BtWMuMzOzFmnYWVWSrgRGAYMkzSSdHXUWME7SIcAzwF559RuBXYBpwJvAQQARMUfS6cA9eb3TIqJ6wN3MzJqoYYkjIvbpZNEOHawbwGGd1DMWGFvH0Po9n9JrZr3RLoPjZmbWRzhxmJlZKU4cZmZWihOHmZmV0m3ikDRA0p+aEYyZmbW/bhNHRMwH3pS0chPiMTOzNlfr6bhvAQ9KmgC8USmMiCMbEpWZmbWtWhPHDflhZmb9XE2JIyIukbQsMCwiHm9wTGZm1sZqOqtK0m7AFOAPeX5zSeMbGZiZmbWnWk/HPRXYCngFICKmAOs3KCYzM2tjtSaOeRHxalVZh7+LYWZmi7daB8cfkrQvMEDSCOBIYFLjwjIzs3ZVa4vjCODDwNvAlcBrwNGNCsrMzNpXrWdVvQmcJOnsNBtzGxuWmZm1q1rPqtpS0oPAVNKFgA9I+nhjQzMzs3ZU6xjHRcC3IuIOAEnbAr8GNmtUYGZm1p5qHeOYW0kaABHxF8DdVWZm/VCXiUPSFpK2AO6W9CtJoyRtJ+mXwMSe7lTS/5H0sKSHJF0paRlJ60u6S9ITkq6WtFRed+k8Py0vH97T/ZqZWe9111X1o6r5UwrTPbqOQ9IQ0um8m0TEvySNA/YGdgHOiYirJJ0HHAKcm/++HBEbSdobOBv4ak/2bWZmvddl4oiIzzZwv8tKehdYDnge2B7YNy+/hHS1+rnA7nka4BrgF5IUEb4A0cysBWoaHJe0CnAAMLy4TU9uqx4Rz0r6IfAM8C/gj8C9wCsRMS+vNhMYkqeHADPytvMkvQqsDrxUdt9mZtZ7tZ5VdSNwJ/Ag8F5vdihpVVIrYn3Sva/+B9i5g1UrLQp1saxY7xhgDMCwYcN6E6KZmXWh1sSxTEQcU6d9fg54MiJmA0i6FvgksIqkgbnVMRR4Lq8/E1gXmClpILAyMKe60og4HzgfYOTIke7GMjNrkFpPx71M0jckrS1ptcqjh/t8BthG0nKSBOwAPALcCuyZ1xkNXJ+nx+d58vI/e3zDzKx1am1xvAP8ADiJBd1EAWxQdocRcZeka4D7gHnA/aSWwg3AVZLOyGUX5U0uIiWuaaSWxt5l92lmZvVTa+I4BtgoIuoyIB0Rp7Dwqb0A00m/+VG97lvAXvXYr5mZ9V6tXVUPA282MhAzM+sbam1xzAemSLqVdGt1oGen45qZWd9Wa+K4Lj/MzKyfq/X3OC5pdCDWdww/4YZu13nqrF2bEImZtUKtV44/SQcX3UVE6bOqzKD75FNL4qlHHWZWXq1dVSML08uQznLq6XUcZmbWh9V0VlVE/LPweDYifkK6KaGZmfUztXZVbVGYXYLUAlmxIRGZmVlbq7Wr6kcsGOOYBzyFL8ozM+uXak0cOwNfYeHbqu8NnNaAmMzMrI2VuY7jFdL9pd5qXDhmZtbuak0cQyNip4ZGYmZmfUKt96qaJOkjDY3EzMz6hFpbHNsCB+YLAd8m/SpfRMRmDYvMzMzaUpnBcTMzs5rvVfV0owMxM7O+odYWh9liyfe7MivPicOsF3ynYOuPaj2rqq4krSLpGkmPSXpU0ickrSZpgqQn8t9V87qS9DNJ0yRNrbr9iZmZNVlLEgfwU+APEfFB4KPAo8AJwC0RMQK4Jc9DGpgfkR9jgHObH66ZmVU0vatK0krAZ4ADASLiHeAdSbsDo/JqlwATgeOB3YFLIyKAO3NrZe2IeL7JoZs1hMdZrK9pRYtjA2A28GtJ90u6UNLywJqVZJD/rpHXHwLMKGw/M5eZmVkLtCJxDAS2AM6NiI8Bb7CgW6oj6qBskV8jlDRG0mRJk2fPnl2fSM3MbBGtSBwzgZkRcVeev4aUSF6UtDZA/jursP66he2HAs9VVxoR50fEyIgYOXjw4IYFb2bW3zU9cUTEC8AMSR/IRTsAjwDjgdG5bDRwfZ4eDxyQz67aBnjV4xtmZq3Tqus4jgCukLQUMB04iJTExkk6BHiGBT8UdSOwCzANeDOva2ZmLdKSxBERU0g/P1tthw7WDeCwhgdlZmY1adV1HGZm1kc5cZiZWSlOHGZmVooTh5mZleK745otBnzbEmsmtzjMzKwUJw4zMyvFicPMzEpx4jAzs1I8OG5mgAfYrXZucZiZWSlOHGZmVoq7qsysLrrr6gJ3dy0u3OIwM7NSnDjMzKwUJw4zMyvFicPMzEpx4jAzs1JaljgkDZB0v6Tf5/n1Jd0l6QlJV+ffI0fS0nl+Wl4+vFUxm5lZa1scRwGPFubPBs6JiBHAy8AhufwQ4OWI2Ag4J69nZmYt0pLEIWkosCtwYZ4XsD1wTV7lEmCPPL17nicv3yGvb2ZmLdCqFsdPgOOA9/L86sArETEvz88EhuTpIcAMgLz81by+mZm1QNOvHJf0RWBWRNwraVSluINVo4ZlxXrHAGMAhg0bVodIzazZfKPFvqEVLY5PAV+S9BRwFamL6ifAKpIqiWwo8FyengmsC5CXrwzMqa40Is6PiJERMXLw4MGNPQIzs36s6YkjIk6MiKERMRzYG/hzROwH3ArsmVcbDVyfp8fnefLyP0fEIi0OMzNrjna6juN44BhJ00hjGBfl8ouA1XP5McAJLYrPzMxo8d1xI2IiMDFPTwe26mCdt4C9mhqYmfVZvR0n8V1+u9dOLQ4zM+sDnDjMzKwU/5CTmVmd1aO7q51PTXaLw8zMSnGLw8xsMdWoVotbHGZmVooTh5mZleLEYWZmpThxmJlZKU4cZmZWihOHmZmV4sRhZmalOHGYmVkpThxmZlaKE4eZmZXixGFmZqU4cZiZWSlOHGZmVooTh5mZldL0xCFpXUm3SnpU0sOSjsrlq0maIOmJ/HfVXC5JP5M0TdJUSVs0O2YzM1ugFS2OecC3I+JDwDbAYZI2AU4AbomIEcAteR5gZ2BEfowBzm1+yGZmVtH0xBERz0fEfXl6LvAoMATYHbgkr3YJsEee3h24NJI7gVUkrd3ksM3MLGvpGIek4cDHgLuANSPieUjJBVgjrzYEmFHYbGYuq65rjKTJkibPnj27kWGbmfVrLUscklYAfgscHRGvdbVqB2WxSEHE+RExMiJGDh48uF5hmplZlZYkDklLkpLGFRFxbS5+sdIFlf/OyuUzgXULmw8FnmtWrGZmtrBWnFUl4CLg0Yj4cWHReGB0nh4NXF8oPyCfXbUN8GqlS8vMzJpvYAv2+Snga8CDkqbksu8AZwHjJB0CPAPslZfdCOwCTAPeBA5qbrhmZlbU9MQREX+h43ELgB06WD+AwxoalJmZ1cxXjpuZWSlOHGZmVooTh5mZleLEYWZmpThxmJlZKU4cZmZWihOHmZmV4sRhZmalOHGYmVkpThxmZlaKE4eZmZXixGFmZqU4cZiZWSlOHGZmVooTh5mZleLEYWZmpThxmJlZKU4cZmZWSp9JHJJ2kvS4pGmSTmh1PGZm/VWfSBySBgD/DewMbALsI2mT1kZlZtY/9YnEAWwFTIuI6RHxDnAVsHuLYzIz65cUEa2OoVuS9gR2ioiv5/mvAVtHxOGFdcYAY/LsB4DHu6l2EPBSL0PrbR3tEEO71NEOMdSjjnaIoV3qaIcY2qWOdoihljrWi4jB3VUysJdBNIs6KFso40XE+cD5NVcoTY6Ikb0Kqpd1tEMM7VJHO8RQjzraIYZ2qaMdYmiXOtohhnrVAX2nq2omsG5hfijwXItiMTPr1/pK4rgHGCFpfUlLAXsD41sck5lZv9QnuqoiYp6kw4GbgQHA2Ih4uJfV1tyt1cA62iGGdqmjHWKoRx3tEEO71NEOMbRLHe0QQ73q6BuD42Zm1j76SleVmZm1CScOMzMrxYnDzMxKceLopyStKmkrSZ+pPFodU6tI+r8dPZocw2p1qmdZSR8ouc1l+e9Rvdz3AEmX97KOpWspq6GeRY6l1uOTtISkfy+7z/6kXyUOSWtK+mJ+rNGD7Q/o6NGIWLuIYU1JF0m6Kc9vIumQknV8HbiddJbaf+W/p9Y71maR9ElJ+/bif/JG4TGfdE+04SX2/ylJy+fp/SX9WNJ6JWO4S9L/SNpFUkcXvNYSx27AFOAPeX5zSbWctv7xHO/B+QvFasVHrfuPiPnA4HzKfE/9rcay7ozuoOzAWjaMiPeAw7tdsRuSls6vy+/09AuJpI0l3SLpoTy/maSTS9ZxWS1lZfSJ03HrIX+D+AEwkXQl+s8l/WdEXFOimi0L08sAOwD3AZfWsP+5VF3tXlkERESsVGMMFwO/Bk7K838HrgYuqnF7gKNIx3JnRHxW0gdJCaRbXRwHALUch6S/RMS2HdRV9rmovAE2JH1gzq+EQQ3/k0LMP6qq84eUu07oXOCjkj4KHEf6X1wKbFeijo2BzwEHk16bVwMXR8TfS9RxKum+bhMBImKKpOE1bHceKdlsANxbKBfpudygRAxPAX/NCeuNSmFE/LirjSStBQwBlpX0MRbcLWIlYLlady5pH2BfYP2qpLki8M9a6wEmSDqW9N4qHsecEnVcD7xKek7fLrFd0QXAfwK/yvufKuk3wBkl6vhwcUbSQODjPYwH6EeJg/RBu2VEzAKQNBj4E1Bz4oiII4rzklYGasrcEbFi7aF2aVBEjJN0Yq53nqT53W1U5a2IeEsSkpaOiMdq7d6oHIek04AXSMcvYD/Sm7OWOrYt1tVLI4FNor7nlS9HuQ/LeRERknYHfhoRF0nq6Btvp3L8E0gfWJ8FLge+JekB4ISIqOVb97yIeLVsgyUifgb8TNK5pCRS6ba8PSIeKFVZuqPDc6TejDL/3y+QWgRDgWKSmQt8p0Q9k4DnSfdkKn4hmAtMLVHPwfnvYYWyskl0aETsVGL9jiwXEXdX/U/n1bJh/oz4DikZv1ZY9C69vJ6jPyWOJSpJI/snve+qexMY0cs6ynpD0urkb+qStiF9qyljpqRVgOtIH1QvU/4WLl+IiK0L8+dKugv4fsl6eushYC3Sh0WPSHqQBS2fAcBg4LQSVczNb9L9gc8o/QzAkiVjWD1vfwApIR9BavVsDvwPsH4N1TwkaV9ggKQRwJGkD9JaPUZKWNeSvgxcJumCiPh5rRVExH8BSFoxzcbrNW53CXCJpK9ExG9LxFxdz9PA08AnelpHrqeW57s7kyR9JCIe7EUdL0nakAXv9z2p8bUeEWcCZ0o6k/S+3JjUUwJd9BrUot+dSlGFAAAIb0lEQVRcACjp+8BHgStz0VeBqRFxfIk6/peFP2A+BIyLiKb9sJSkLYCfA5uSPjQHA3tGRJlvU8X6tgNWBv6Qb1lf63aTSL+RchXpOdkHOCwiPtmTOHpK0q2kD9e7KXQHRMSXStRRHI+YB7wYETV9q8vbr0XqHrknIu6QNAwYFRE1d5dJ+jup9TY2Ip6tWnZ8RJxdQx3LkVrWO+aim4HTI6KmbhJJU4FPRMQbeX554G8RsVmJ49g0H0dlbOQl4IAyd3qQtCupe6XyIUdE1JTI69UN2tk4Wcn/6SOkL5bTSa/NSgxlns8NSK2DTwIvA08C++UEWWsd3yB9iRhK6tLdhvR/3b7WOhapsx8ljrOBu4BtSf/A24FtSiaOYp/1PODpiJhZ10Bri2Mg6dbxAh6PiHdbEMNw4KfAp0hv0L8CR0fEU02Oo8NxhIi4rZlx9JakLUndCutR6Ako+SEzkpQ4hhfqqPmDKre8toyIt/L8MqRk+JESMUwCToqIW/P8KOD/1fqFQtJ5pK7CzwIXAnsCd0dEqRNAektSsZX1/nhmROxZoo71gFWBT+ei24FXSn7oD4iI+TmJLxERc2vdtlDHgywY09y8MqYZEV8tW9f7dfajxHFfRGxRVTa1zBszb7MmCwbJ767q/moKSZ9k4Q+HUt+ErD7qPMj/OHAsqRX5XqW85IdMr+qQdAzpbKTf5aI9SAP0PykRwwMR8dHuyrrYfmpEbFb4uwJwbUTs2O3GDVQZzyzZkj0K+DoLuv72AEp1/Ul6hnTiwtXAn3sylifpnojYUtIU0u8YvS1pSkRsXrauisV+jEPSN4FvARvkpnjFiqRvyWXqqseZWb1Sj7OI6hTHYOAbLJrADu5smzrvv24f2j1V50H+2RHxv62sIyJ+LGkiC1rlB0XE/SWrmS7puyw4aWR/UvdKrf6V/74paR3SWGQ9xht6qyfjmYeQejUqXX9nk04trjlxkHoWdiMN0l8k6ffAVRHxlxJ11GNMcyGLfYsjf1NYFTgTKI5FzC15ah35DJfPV5+ZVeu3qXqQ9Cj1P4uoJ3FMAu4gnWr4/lldvRnY7M8k7UAaJ7qFhcdqrm1mHT0l6bKI+FputQxnQfK5jdQt8nKN9XyX9MG6A2kMLYALI+K7DQm88zh6PZ5Zj66/qvpWJXUP7xcRA3pYR4/GNKst9i2OiHiVdNbRPnWorhFnZpXV67OI6mS5MuND1q2DgA+SzsaqdDMFqZujmXX0VOUiwtGk8YnKNSBAh7/g2aGIOD1P/jZ/u14mv4eb7YeF6Z6OZ/6adGFnseuvzPVWwPsf9l8lXZh6D9Djq9rrNfa32Lc46qkeZ2b1Yt+Vb0Ar0suziOoUzxnApIi4sZn7XVxJerCn30TrWUcv9n0k8E3SdQ7Fs8IqXYc1X//QLmN49RjPzGdBvn9CTtmuP0lPkrqlxwHjK91erebEUUJ+c8wgnSVReSH8ruut6rbv7fI+zyZdnfz+IuDsWPiaimbEMxdYnpS83qWJYwuLI0kXAOdExCOtrKO3JJ0bEd/sxfYdjuFFxJH1iK9EHNXjmZ8GmjqemeNYKSJe637N5nLiKCF/y96bdJuRscDNzR5rqNfZYXWKZTXSgGHxfPs+dRpsu8hjVxuSBpJ7es5/r+totTYaw2v5eGbe7zKkQfbq61qachJKZxb7MY56ioiT8+DdjqT+5F9IGgdcFBH/aOS+63l2WJ3i+TrpnlfFi4omkQY1rbze3pqiXnW0WruM4bXDeCaks9MeI92S5TTSrX0ebUEcC3GLoweUbmZ3EOmNeivpQ3NCRBzX5Ya922fdzg6rUzx1v6jI+q82HMNr2XhmVRz3R8THCte1LEnq6ejxVd/14BZHCXmMYzTpNgoXkvo835W0BPAEC4891FWdzw6rhx7fKNGsAz9kwRjeHoXySlmzzSRdc1EZzzy/WeOZVSp3hXgl387lBUrc8r9RnDjKGQR8ufpK3Ih4T9IXWxRTq9T9oiLrvypjY5KWrB4nk7RsC0Jag3R/p/fHM1sQA8D5+fqNk0k3vVwBaOo1LR1xV5X1Wr0uKrL+qziGBxTHC1cE/hoR+7cgJrFgPHMk6ZTYho9nVsWwNPAVUiujcsfliBpv+tgobnFYr/lMKquD3wA30SZjeJA+nSW9QOoemkcaY7xGUkPHM6vU48eg6s4tDjOzKh2MZ15XHM+MiA2bFMdDEbFpM/ZVhlscZmaLapfxzHr8GFTducVhZtZmtOBXKQfSyx+DagQnDjOzNqOFf5VyEdUtoWZz4jAzs1JacQm9mZn1YU4cZmZWihOHWZuQNCr/eJFZW3PiMGsRST36+U+zVnPiMOsBScfli8SQdI6kP+fpHSRdLmkfSQ9KekjS2YXtXpd0mqS7gE9I2knSY5L+Any5sN52kqbkx/2SVmz2MZp1xonDrGduJ905FdJ9jFbIt7zelnSn5LOB7Um3CN9SUuWOr8sDD+VfbJwMXADslutaq1D/scBhEbF5Xvavxh6OWe2cOMx65l7g47kl8DbpFtwjSR/yrwATI2J2RMwDrgA+k7ebD/w2T38QeDIinsi/eHd5of6/Aj/OrZpVcj1mbcGJw6wHIuJd4CnSnVMnAXcAnyX9dOszXWz6VkTML8x3eCFVRJwFfB1YFrgz/1CWWVtw4jDrudtJXUq3kxLHoaSf0b0T2E7SoDwAvg/Q0R2EHwPWl1S5Yd77P9IlacOIeDAiziZ1aTlxWNtw4jDruTuAtYG/RcSLwFvAHRHxPHAi6WeFHwDui4jrqzeOiLeAMcANeXC8eBuJo/PA+gOk8Y2bGnsoZrXzLUfMzKwUtzjMzKwUJw4zMyvFicPMzEpx4jAzs1KcOMzMrBQnDjMzK8WJw8zMSnHiMDOzUv4/s7ZaAJA296QAAAAASUVORK5CYII=\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "and for `spam words in messages`"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df2.plot.bar(legend = False, color = 'orange')\ny_pos = np.arange(len(df2[\"spam words\"]))\nplt.xticks(y_pos, df2[\"spam words\"])\nplt.title('More frequent words in spam messages')\nplt.xlabel('words')\nplt.ylabel('number')\nplt.show()",
"execution_count": 7,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAElCAYAAAD+wXUWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3Xm4HFWd//H3Jwk7kUVCWAKEJQOIsoSwuBLIjAZEwVF+yiIBoxEHBQcXcJkRGTccR4SZnzggSgABGRwkIqjIvkMiISCLhD0GSFjCHiDxO3+c01Dp1L1dfW/37ebez+t5+umu7dTprb51ljqliMDMzKzesE5nwMzMupMDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpB4ghTtKHJD0i6XlJO3Q6P29EkiZKmteH7X4i6V/akSezVnCA6BBJD0p6RdI6dfNnSwpJYwcoKz8APhsRq0fErQO0z37r60G5m0TEYRHxb53Oh1lPHCA66wFg/9qEpLcBq/Q1MUkj+rDZJsCfW5jeoObPxIYSB4jOOhM4uDA9BTijuIKkNSSdIWmhpIckfV3SsLzsEEnXSTpB0lPAsXn+JyTdJelpSb+XtEn9jiWtJOl5YDhwm6T78vwHJR0taQ7wgqQRkjaQ9KuchwckHVFIZxVJp+d93SnpS8Uz+1wa2qIwfbqkbxWm986lpkWSrpe0bWHZg5K+KGmOpGck/VLSypJWAy4BNshVY89L2qDu/W2a06x9Vj+VtKCw/CxJn8+vN5A0Q9JTkuZK+lRhvWMlnZ/XfxY4pP49AzvV7ftoSX+V9JykeyRNqv/86z+LWolI0hckLZD0qKRDy7bL6x8i6f68jwckHViYf52k/8yf2d3F/Us6NP82nsvbf7qwrJaHLxfysK+kvST9JX8+X+0lT6dL+rGkS/J3cp2k9ST9KH9Wd6tQjdngd7WzpJmSnpX0uKQf5vkr5+/iyfz93iJpdKP3lpd/Ob+n+ZI+WfxtKv0ffiDp4by/n0haJS9bR9JFeX9PSbqm9rsa9CLCjw48gAeBvwfuAbYmHagfIZ3RBzA2r3cGcCEwEhgL/AWYmpcdAiwBPgeMIJU+9gXm5jRHAF8Hru8lHwFsUZev2cBGOb1hwCzgX4EVgc2A+4H35fW/B1wDrJ23uQOY10v6pwPfyq/HAwuAXfL7n5L3v1IhLzcDG+T07wIOy8smFvfTw3t7GNgxv74n53vrwrId8uurgB8DKwPbAwuBSXnZscCr+XMdlj+THt8zsGX+HjfI02OBzXvIX/GzmJi/y+OAFYC9gBeBtUq2Ww14FtgyT68PbFP3m/jnnM5HgWeAtfPy9wObAwJ2y/sYX5eHf83bfip/FmeTfn/bAIuBzXp5P08AO+bP8nJSKfng/P1+C7gir9vod3UD8PH8enVg1/z608BvgFVzmjsCb6rw3iYDj+X3sCrp5Oy13ybwI2BG/k5H5n18Ny/7LvCT/JmsALwbUKePIQNynOp0Bobqg9cDxNfzD3AycCnpoB75wDIceBl4S2G7TwNX5teHAA/XpXsJOYDk6WH5j7JJD/koCxCfKEzvUrKPrwA/z6/vByYXlk2jeoA4Gfi3urTvAXYr5OWgwrLvAz/JryfSOECcCRwFrJfT/T5wGLApsCh/NhsBS4GRhe2+C5yeXx8LXF2Xbo/vGdiCFPT+HlihQf6Kn8VE4CVgRGH5AvKBsW671XL+PwysUrfsEGB+8QBGCrIf7yEPvwaOrMvD8Dw9Mn9/uxTWnwXs28v7ObUw/TngrsL024BFFX9XVwPfBNapW+cTwPXAthX+Y8X39jPyAb/wPUV+FvAChUAOvB14IL8+jnSStkWjfQ62x9AoJnW3M4EDSH/sM+qWrUM6u3qoMO8hYMPC9CN122wCnJiLw4uAp0h/gA2prpjmJqSqnEWFNL8KjM7LN6hbv5jXRjYBvlCX9kY5zZrHCq9fJJ1NVnUV6aD3HtIB50rSmeVuwDUR8be8r6ci4rm699DbZ9zje46IucDnSYFlgaRz66u/evFkRCwpTJe+34h4gVQyOAx4VNJvJW1VWOWvkY9shfxtACBpT0k35qqSRaSSSrGjxJMRsTS/fik/P15Y/lJZngrq1+1p20a/q6nA3wF352qkvfP8M4HfA+fmqqLvS1qhwnur/86Kr0eRShWzCnn5XZ4P8O+kUvkfctXVMb28/0HFAaLDIuIhUjF8L+B/6xY/QareKLYhbAz8tZhE3TaPAJ+OiDULj1Ui4vpmslWX3gN16Y2MiL3y8kdJB/Vi/opeJP35atarS/vbdWmvGhHnNJnHnlxFqg6YmF9fC7yTFCCuyuvMB9aWNLLuPfT2Gff6niPi7Ih4F69XFx5fIa9NiYjfR8Q/kKqX7gZOLSzeUJLq8jdf0krAr0g910ZHxJrAxaQTiIHW6+8qIu6NiP2BdUmf3/mSVouIVyPimxHxFuAdwN7AwRXe26PAmML+i9/fE6TgtU0hL2tExOo5L89FxBciYjPgA8BRPbUrDTYOEN1hKrBHPjN8TT6TOw/4tqSRSo3NRwFn9ZLWT4CvSNoGXmvk3q8febsZeFap4XUVScMlvVVSrWH2vLy/tSSNIVUrFM0GDsjbTSYdnGtOBQ6TtIuS1SS9v+5g3ZPHgTdLWqOnFSLiXtIf/yBSNdGzebsPkwNERDxCqrL4bm4A3Zb0ffyil333+J4lbSlpj3zAWpz3v7SHdPpE0mhJH1RqrH8ZeL5uH+sCR0haIX/3W5MOlisCK5HaFZZI2hN4byvz1oRef1eSDpI0KpfyFuVtlkraXdLbJA0ntcO8Snrvjd7becChkraWtCqp7QOAvI9TgRMkrZv3v6Gk9+XXe0vaIgfdZ/P+WvqddisHiC4QEfdFxMweFn+OVD96P+kM+GxSfWpPaV1AOuM6V6nXzR3Anv3I21LSWdP2pJLOE8BPgdqB+ZukKowHgD+QqgCKjszbLwIOJNUL19KeSWoI/S/gaVIx/pCK+bobOAe4P1cL9FSNcxWp2uThwrSA4jUf+5PafOYDFwDfiIhLe9l9b+95JVIj9hOk6rF1SVUnrTQM+ELO71OkoPtPheU3AeNyHr4NfCQinszVaEeQDpZPk6o2Z7Q4b5VU+F1NBv6s1NPuROBjEbGYVAI9n3Sgvov0fZ7V6L1FxCXAScAVpN/ZDXnRy/n56Dz/xvy/+SOpwwGkz/KPpEB8A/DjiLiyRR9FV9OyVZVm/SNpIukPO6bRutZ6kg4BPpmruKwHkrYmnTytVNfuYwUuQZjZkKA0rMyKktYilbJ/4+DQOwcIMxsqPk1qo7iP1Ibwmc5mp/u5isnMzEq5BGFmZqUcIMzMrNQbemTKddZZJ8aOHdvpbJiZvaHMmjXriYgY1Wi9N3SAGDt2LDNn9nT5gJmZlZFUaUgcVzGZmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUm0NEJLWVLrh+91KNxN/u6S1JV0q6d78vFZeV5JOUrpp/BxJ49uZNzMz6127SxAnAr+LiK2A7Ujjtx8DXBYR44DL8jSkexaMy49ppPsVm5lZh7TtQjlJbyLdC/gQgIh4BXhF0j6kW0ACTCfdJ/hoYB/gjHwv3Rtz6WP9iHi08k7PrnDnxAM8OKGZWRXtLEFsRhpa9+eSbpX003yLxNG1g35+XjevvyHL3kh8HsveON7MzAZQOwPECGA8cHJE7EC6beYxvaxfdvq/3Om+pGmSZkqauXDhwtbk1MzMltPOADEPmBcRN+Xp80kB43FJ6wPk5wWF9TcqbD+GdM/dZUTEKRExISImjBrVcKwpMzPro7YFiIh4DHhEUu3G35OAO0k3Ep+S500BLsyvZwAH595MuwLPNNX+YGZmLdXu0Vw/B/xC0orA/cChpKB0nqSpwMPAfnndi4G9gLnAi3ldMzPrkLYGiIiYDUwoWTSpZN0ADm9nfszMrDpfSW1mZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqXaGiAkPSjpdkmzJc3M89aWdKmke/PzWnm+JJ0kaa6kOZLGtzNvZmbWu4EoQeweEdtHxIQ8fQxwWUSMAy7L0wB7AuPyYxpw8gDkzczMetCJKqZ9gOn59XRg38L8MyK5EVhT0vodyJ+ZmdH+ABHAHyTNkjQtzxsdEY8C5Od18/wNgUcK287L88zMrANGtDn9d0bEfEnrApdKuruXdVUyL5ZbKQWaaQAbb7xxa3JpZmbLaWsJIiLm5+cFwAXAzsDjtaqj/Lwgrz4P2Kiw+Rhgfkmap0TEhIiYMGrUqHZm38xsSGtbgJC0mqSRtdfAe4E7gBnAlLzaFODC/HoGcHDuzbQr8EytKsrMzAZeO6uYRgMXSKrt5+yI+J2kW4DzJE0FHgb2y+tfDOwFzAVeBA5tY97MzKyBtgWIiLgf2K5k/pPApJL5ARzervyYmVlzfCW1mZmVcoAwM7NSDhBmZlbKAcLMzEo5QJiZWSkHCDMzK+UAYWZmpRwgzMyslAOEmZmVcoAwM7NSDhBmZlbKAcLMzEo5QJiZWSkHCDMzK+UAYWZmpRwgzMyslAOEmZmVcoAwM7NSDhBmZlbKAcLMzEo5QJiZWSkHCDMzK+UAYWZmpRwgzMyslAOEmZmVanuAkDRc0q2SLsrTm0q6SdK9kn4pacU8f6U8PTcvH9vuvJmZWc8GogRxJHBXYfp44ISIGAc8DUzN86cCT0fEFsAJeT0zM+uQtgYISWOA9wM/zdMC9gDOz6tMB/bNr/fJ0+Tlk/L6ZmbWAe0uQfwI+DLwtzz9ZmBRRCzJ0/OADfPrDYFHAPLyZ/L6ZmbWAW0LEJL2BhZExKzi7JJVo8KyYrrTJM2UNHPhwoUtyKmZmZVpZwnincAHJT0InEuqWvoRsKakEXmdMcD8/HoesBFAXr4G8FR9ohFxSkRMiIgJo0aNamP2zcyGtrYFiIj4SkSMiYixwMeAyyPiQOAK4CN5tSnAhfn1jDxNXn55RCxXgjAzs4HRiesgjgaOkjSX1MZwWp5/GvDmPP8o4JgO5M3MzLIRjVfpv4i4Ergyv74f2LlkncXAfgORHzMza8xXUpuZWSkHCDMzK+UAYWZmpRwgzMyslAOEmZmVcoAwM7NSDQNEHq77jwORGTMz6x4NA0RELAVelLTGAOTHzMy6RNUL5RYDt0u6FHihNjMijmhLrszMrOOqBojf5oeZmQ0RlQJEREyXtAqwcUTc0+Y8ddbZDe5RdIDHDzSzoaFSLyZJHwBmA7/L09tLmtHOjJmZWWdV7eZ6LGmAvUUAETEb2LRNeTIzsy5QNUAsiYhn6ua5rsXMbBCr2kh9h6QDgOGSxgFHANe3L1tmZtZpVUsQnwO2AV4GzgGeBT7frkyZmVnnVe3F9CLwNUnHp8l4rr3ZMjOzTqvai2knSbcDc0gXzN0macf2Zs3MzDqpahvEacA/RcQ1AJLeBfwc2LZdGTMzs86q2gbxXC04AETEtYCrmczMBrFeSxCSxueXN0v6b1IDdQAfBa5sb9bMzKyTGlUx/Ufd9DcKr30dhJnZINZrgIiI3QcqI2Zm1l0qNVJLWhM4GBhb3MbDfZuZDV5VezFdDNwI3A78rX3ZMTOzblE1QKwcEUe1NSdmZtZVqnZzPVPSpyStL2nt2qO3DSStLOnmfFHdnyV9M8/fVNJNku6V9EtJK+b5K+XpuXn52H69MzMz65eqAeIV4N+BG4BZ+TGzwTYvA3tExHbA9sBkSbsCxwMnRMQ44Glgal5/KvB0RGwBnJDXMzOzDqkaII4CtoiIsRGxaX5s1tsGkTyfJ1fIjwD2AM7P86cD++bX++Rp8vJJkhrc3s3MzNqlaoD4M/Bis4lLGi5pNrAAuBS4D1gUEUvyKvOADfPrDYFHAPLyZ4A3l6Q5TdJMSTMXLlzYbJbMzKyiqo3US4HZkq4gVR0Bjbu5RsRSYPvcTfYCYOuy1fJzWWlhuYvxIuIU4BSACRMm+GI9M7M2qRogfp0ffRIRiyRdCewKrClpRC4ljAHm59XmARsB8ySNANYAnurrPs3MrH+q3g9ieuO1liVpFPBqDg6rAH9Pani+AvgIcC4wBbgwbzIjT9+Ql18eES4hmJl1SNUrqR+gvLqnt4bq9YHpkoaT2jrOi4iLJN0JnCvpW8CtpKHEyc9nSppLKjl8rPrbMDOzVqtaxTSh8HplYD+g1+sgImIOsEPJ/PuBnUvmL87pmplZF6jUiykiniw8/hoRPyJ1VzUzs0GqahXT+MLkMFKJYmRbcmRmZl2hahXTf/B6G8QS4EFcHWRmNqhVDRB7Ah9m2eG+PwYc14Y8mZlZF2jmOohFwJ+Axe3LjpmZdYuqAWJMRExua07MzKyrVB2L6XpJb2trTszMrKtULUG8CzgkXzD3MmncpIiIbduWMzMz66hmGqnNzGwIqToW00PtzoiZmXWXqm0QZmY2xDhAmJlZKQcIMzMr5QBhZmalHCDMzKyUA4SZmZVygDAzs1IOEGZmVsoBwszMSjlAmJlZKQcIMzMr5QBhZmalHCDMzKyUA4SZmZVygDAzs1JtCxCSNpJ0haS7JP1Z0pF5/tqSLpV0b35eK8+XpJMkzZU0R9L4duXNzMwaa2cJYgnwhYjYGtgVOFzSW4BjgMsiYhxwWZ6GdNe6cfkxDTi5jXkzM7MG2hYgIuLRiPhTfv0ccBewIbAPMD2vNh3YN7/eBzgjkhuBNSWt3678mZlZ7wakDULSWGAH4CZgdEQ8CimIAOvm1TYEHilsNi/PMzOzDmh7gJC0OvAr4PMR8Wxvq5bMi5L0pkmaKWnmwoULW5VNMzOr09YAIWkFUnD4RUT8b579eK3qKD8vyPPnARsVNh8DzK9PMyJOiYgJETFh1KhR7cu8mdkQ185eTAJOA+6KiB8WFs0ApuTXU4ALC/MPzr2ZdgWeqVVFmZnZwBvRxrTfCXwcuF3S7Dzvq8D3gPMkTQUeBvbLyy4G9gLmAi8Ch7Yxb2Zm1kDbAkREXEt5uwLApJL1Azi8XfkZUGf39LazA5ZrWjEz6zrtLEFYXzUKMNA4yLQiDTMb0jzUhpmZlXIJwnrmqjKzIc0lCDMzK+UAYWZmpVzFZO3V32oqN7abdYxLEGZmVsolCBv8XAox6xOXIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpd3M1q8LjUtkQ5BKEmZmVcoAwM7NSrmIyGygel8reYFyCMDOzUi5BmA0lLoVYE1yCMDOzUg4QZmZWygHCzMxKuQ3CzJrjiwaHDAcIMxt4DjJvCG2rYpL0M0kLJN1RmLe2pEsl3Zuf18rzJekkSXMlzZE0vl35MjOzatrZBnE6MLlu3jHAZRExDrgsTwPsCYzLj2nAyW3Ml5mZVdC2KqaIuFrS2LrZ+wAT8+vpwJXA0Xn+GRERwI2S1pS0fkQ82q78mdkbmK/nGBAD3YtpdO2gn5/XzfM3BB4prDcvzzMzsw7plkbqstOB0vAvaRqpGoqNN964nXkys8HMDeUNDXSAeLxWdSRpfWBBnj8P2Kiw3hhgflkCEXEKcArAhAkT/A2aWecM8iAz0AFiBjAF+F5+vrAw/7OSzgV2AZ5x+4OZDQldPMpv2wKEpHNIDdLrSJoHfIMUGM6TNBV4GNgvr34xsBcwF3gROLRd+TIzs2ra2Ytp/x4WTSpZN4DD25UXMzNrnsdiMjOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWygHCzMxKOUCYmVkpBwgzMyvlAGFmZqUcIMzMrJQDhJmZlXKAMDOzUg4QZmZWqqsChKTJku6RNFfSMZ3Oj5nZUNY1AULScOD/A3sCbwH2l/SWzubKzGzo6poAAewMzI2I+yPiFeBcYJ8O58nMbMhSRHQ6DwBI+ggwOSI+mac/DuwSEZ+tW28aMC1Pbgnc00uy6wBP9DNrgyWNbshDt6TRDXnoljS6IQ/dkkY35GGg0tgkIkY1SmREPzPRSiqZt1z0iohTgFMqJSjNjIgJ/crUIEmjG/LQLWl0Qx66JY1uyEO3pNENeeimNKC7qpjmARsVpscA8zuUFzOzIa+bAsQtwDhJm0paEfgYMKPDeTIzG7K6poopIpZI+izwe2A48LOI+HM/k61UFTVE0uiGPHRLGt2Qh25Joxvy0C1pdEMeuimN7mmkNjOz7tJNVUxmZtZFHCDMzKyUA4SZmZVygOiBpLUk7SzpPbVHxe2GSfp/7c5fhXxI0kaN12x7PraSNEnS6nXzJ1fcfriks9qTO3ujyr+Lf+50PoqKv3FJW/QxjTOaXP/M/HxkX/bXMP3B2EgtaTSwU568OSIWNLn9J4EjSddizAZ2BW6IiD0qbn91RFQKKCXbHtXb8oj4YRNpzYqIHfuSj7z9aOA7wAYRsWceG+vtEXFaxe2PAA4H7gK2B46MiAvzsj9FxPiK6fwe+EAegqVPJB1cNj8iKv8hJb0TmB0RL0g6CBgPnBgRDzXY7nZKLvokXRwaEbFtE3lYCfgwMJZCL8SIOK6JNP4OOBkYHRFvlbQt8MGI+FbVNPpL0prAwSz/Po5oIo0rI2JiC/LyjpJ8NHWgzuncBjwAnA18NyI2b7B+fTd+AbsDl+c8fLDCPu8kjV83A5hI3QXHEfFUxeyX6ppurq2Sz97/HbiS9GH9p6QvRcT5TSRzJCnA3BgRu0vaCvhmE9tfKumLwC+BF2ozK35ZI5vYTyM3StopIm7p4/anAz8Hvpan/0J6T5UCBPApYMeIeF7SWOB8SWMj4kTKr5zvyYPAdfkPVfw8KwdLXj9hAFgZmAT8CWjmQHAysJ2k7YAvkz6HM4DdGmy3dxP7aORC4BlgFvByH9M4FfgS8N8AETFH0tlAwwAh6TnKgx05rTdVzMPFwI3A7cDfKm5T7zpJ/8Xy/7M/VU0gn4FvTjoRXFpLggq/C0mrAq9ExJK83+0kfQY4h3QdVyNjgDuBn+Z9CpgA/EfV/AM/AX4HbEb6TbyWvZzmZk2ktZxBFyBIB7OdaqUGSaOAPwLNBIjFEbFYEpJWioi7JW3ZxPafyM+HF+ZV+rIioplA1MjuwGGSHiT9gZo9Y10nIs6T9JWctyWSljbaqGB4RDyft31Q0kRSkNiE5gLE/PwYRh8DaER8rjgtaQ3gzCaTWRIRIWkfUsnhNElTKuy71xJGk8ZERKXquV6sGhE3S8t8BUuqbBgRIwEkHQc8RvoMBRxIc9/NyhHRa2m5gnfk59p/pnZQrFTSzyYAb4m+VaVcDuxL+hyQ9CHgM8D7gH8G/qfCvo8kHbO+FBGzJb0UEVdVzUBEnAScJOlkUrCo1VxcHRG3NfNmygzGADGsrkrpSZpva5mXi8C/JpUGnqaJYT8iYtMm9/caSSc1SLtyEZxU9FwLeHeevhpY1MT2L0h6M/mMUdKupLPXqh6TtH1EzAbIJYm9gZ8Bb6uaSC1oShqZJlPQ6acXgXFNbvNcDpYHAe/JQ9Sv0GijXs66awG76lk3wPWS3hYRtzexTb0nJG3O69/rR4BHm0zjfRGxS2H6ZEk3Ad+vuP2Zkj4FXEShJNRklciVJfOaPdDfAaxH8+8fYJWIqAWHaaQS86SIWCjpe402joi/ASdI+p/8/Dh9PybfDZwF/C/pd3WmpFMj4j/7mB70IzPd7JJcZ31Onv4oqThbWUR8KL88VtIVwBqkYlwl/azvntV4lcr2BT5J4UdDql6o+qM5ilS3ubmk64BRwEea2P/B1J2Z5uL4wZL+u2oikt5KyvvaefoJ4OBmrrSX9BteP3gMB7YGzqu6ffZR4ABgakQ8JmljUnVmr2pn3S3yLuBQSfeTDqxNt2OQSranAFtJ+iup3vzAJvOxVNKBpGH5A9if16toqniF9Nl9jde/l2arRIonCiuTqvLuqrJh4fcwErhT0s0sG6ga1v8DT0r6BmkMuX8EtszBYX1gxWpvASJiHrCfpPcDz1bdrs5UYNeIeAFA0vHADVT/r5cadI3U+YO5ifRHEumsedeIOHoA81D8Ul6r746IZg6urcjHHFKjcu1Hsxqpsb2ZRtERpGHVBdwTEa+2JbO95+F64GsRcUWengh8JyLe0euGy6ZRbCdYAjyU/5gDTtK6pN8FABHxcBPbbkJJqbCZaixJwyNiaf49DIuI56puW0hjLHAi8E7SgfY64PMR8WDF7e8jDeff32Gti2muBMyIiPdVWLfXdqMq1Ty5dP0ZUrC7D/gqcBupavdrEXF2lXy3Qu4IsVNELM7TKwO3RETlknppuoMwQCzXO0bSnCbPsFqdpzWAMyueldS2GQUcTbq7XvFgUrl+tRU/mlb18OgPSbdFxHaN5lVIp0+92yRdGxHvKqkqaqqKSNIHSQ2QGwALgE2AuyJimybew5EsWyrcF2iqKkHSw6QS8S+By/tY/94vucPBxyLixRamuRbpe61cdSjp+PqTx7J5FdPagBQw50REb/epaTml3o9TgAvyrH2B0yPiR/1Kd7AEiNx74J9IRdT7CotGAtdFxEEdyRggaQXSj2brJrb5A+kP/EXgMNKXv7CZH25/fzQ99fBosh2k3yRdQOpxVGtUPgiYEBH7NpFGfe+2d5MaBpvpvNAvuRvkHsAfI2IHSbsD+0fEtAabFtNoRalwFeADpJ4240ntAOdGxLVNpDGKVOc+lmVPHj7R0zZ1218AbANcwbJVO810cy12Hx5OqgI9LiL+q4k0uu6Esq8kjadQcxIRt/Y7zUEUINYgFb2/CxxTWPRckw1frchLaX13RBzT81bLpTErInYs/lglXRURjbpU1qfT5x+NpLvoew+PfpN0ZkR8PAe6sbz+Pq4CvhkRTzeR1m3AP0Rd77ZmSyH9oXwTl5yXHSLib5Jujoidm0ijpVUJ+az7RODAiBjexHbXA9eQ2sxea3tIbRVvAAAGx0lEQVSIiF9V3L6091dETG8iD5sUJpcAj+c2rirbdu0JZTcZNI3UEfEMqYfN/p3OC/CDwuu+1nfX6vofzY1X80n9ppsSqU945X7hdfrTw6MVdswHgSmket1aN0ZorpsstKZ3W38tUrra9mrgF5IWULF7acHPgZvyGTikUmHV61Jek+vgP0rq6XYL0OzV/6v2p10vIqYr3ffl7/Ksptu3mml3KXE2cAldcELZzQZNCaLb9LW+u7D93qQztI1IPRHeBBwbEb9paUbL913s4bE90JceHq3IxxGkRsDNgL8WF6VsROUeL5K+D2zHsr3b5gxE5wWlYRdGk6rqXiIFpgNJbRC/jYimeq71typB0gM5L+eRGnVfaLBJWRrfAq6PiKZ6CBa2nwhMJ10EKdLvfEpEXN2X9Pqw/zdFxLOS1i5b7iCROEC0QSvquyVNJw1NsShPrw38oGodb3/ks0sBx5OuGH5tEXB8LNv/ve0knRwRn+lnGkcAj5C+i9qB9YLet2oNSRcBX42IOXXzJwDfiIgPDEQ+Cvt9U0T0tTtlLY3ngNVIJw6v0nyD/SzggFpjrtLwH+dEP4aGaYakiyJi7xwsa1cx1zR18jGYDZoqpi7Tiqu5t60FB0hnNJJ2aG02y9W6+Elaob67X27gHFD9DQ7ZusARpOq2n5HuXDhQxtYHB4CImJm7iw60VyQdTmokLvaQq3zyEREj80nLuGIaTVih2NMnIv6SO3MMiIioDX9yLanK75qIuHug9v9G4dFc26MV9d3DcgMi8FoJYkACuqTP5MbQLSXNKTweAJY70L0RRMTXSQez04BDgHslfUfpiuJ26+0AOuABl9QbbD3SkBBXkdq2mroWQmlAy6tI3WWPzc//2kQSMyWdJmlifpxKay8SrernwPqkMdvuk3S+2jQy6huRq5jaoBX13UpXY3+FVOoIUiPityOi2fGDmtZNPcJaTWmgvUOByaQulrsCl0bEl3vdsH/7PId0vcGpdfOnAu+NiI+2a9895OfW3M12TkRsm8/cfx99uMaGNKDl9soDWlZ9L0oXtR3Oshe0/jgi+joAYZ8pDZmyE3nsMuCliNhqoPPRjRwg2qBV9d1Kw2vvkdO4LCLubGlGh5D8nUwBniCNnvnriHhV0jDg3mgwNHM/9z2adC3KK7x+ljyBNBzDhyKP5zNQal1rJV1N6ur5GKkjRTON/rdExE6SZpOuiH5Z0uyI2L7CtsOB6d3QlVTSZaS2lBtInUKubbZDyWDmNoj2aEl9dw4IDgqtsQ7wj/VdI/O1CK0cjns5EfE48I58Ydxb8+zfRsTl7dxvL07J1ZdfJ421tTrwL02m0ecBLSMN8zFK0orRj3t8tMgcYEfS9/IMqSvyDRHxUmez1R1cgmgTSQLeS6rOmEDqUnhaRNzX64ZmbaZlbzpUaxiOaOKmQ3Xp7UYe0LLqAV9psMbxpADV13t8tEy+PuVQ0sgF60XESp3IR7dxCaJNIiIkPUYqvi8h1emfL6mt9d1mFbTipkOvqe/p1pva1fGkdrkT6Mc9PlpB0mdJVcE7Ag+RSvzXdCo/3cYliDboZH23WSOS7oiItzZesy37rt0i8zekW2QuY6A7QUj6EqmBfFbVYTqGEpcg2qNj9d1mFbTipkN9VbtF5qbAzML8ltwis1kR0fB+HkOZSxBmQ4ReH/10BOmakP7cdKi/een31fHWfg4QZkNE3einy+nn4Hc2CDlAmJlZKQ+1YWZmpRwgzMyslAOE2QDLg9Nd1Ol8mDXiAGHWZnnsIbM3HAcIs15I+nK+8BFJJ0i6PL+eJOksSftLul3SHZKOL2z3vKTjJN0EvF3SZEl3S7oW+MfCertJmp0ft0rq2FXFZvUcIMx6dzVpKAZIY2qtnofHfhdwL+mue3uQbs26k6R987qrAXfku+/NBE4FPpDTWq+Q/heBw/MoqO8m3ZLUrCs4QJj1bhawYz6zf5k0LPQE0sF8EXBlRCzMwzT8AnhP3m4p8Kv8eivggYi4N1K/8rMK6V8H/DCXUtb0cA/WTRwgzHoREa8CD5JG+ryeNJDb7sDmwMO9bLo4IpYWk+oh/e8BnyTdWe7GfOMds67gAGHW2NWkqqCrSQHiMGA2cCOwm6R1ckP0/qTbcNa7G9i0cHvT/WsLJG0eEbdHxPGkqigHCOsaDhBmjV1Dum/xDfnmP4tJN7l/lHRb2CuA24A/RcSF9RtHxGJgGvDb3EhdHNLi87mB+zZS+8Ml7X0rZtV5qA0zMyvlEoSZmZVygDAzs1IOEGZmVsoBwszMSjlAmJlZKQcIMzMr5QBhZmalHCDMzKzU/wEZmguQCheaxwAAAABJRU5ErkJggg==\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "By visualizing and observation we can say that most of frequest words in both classes are [`stop words`](https://en.wikipedia.org/wiki/Stop_words) such as 'to', 'a', 'or'.\nFor better accuracy in model it's better to remove stop words. \n\n### Feature Engineering:-\nThe features in our data are important to the [predictive models](https://en.wikipedia.org/wiki/Predictive_modelling) we use and will influence the results we are going to achieve. The quality and quantity of the features will have great influence on whether the model is good or not.\nso, first or most remove stop words."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "f = feature_extraction.text.CountVectorizer(stop_words = 'english')\nX = f.fit_transform(data[\"v2\"])\nnp.shape(X)",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "(5572, 8409)"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "So, finally our goal is to detect spam words.\n**Predictive Modelling**:-\n\nFirst of all transform the categorical variables(spam/non-spam) into binary variable(1/0) by using *[label encoding](https://medium.com/@contactsunny/label-encoder-vs-one-hot-encoder-in-machine-learning-3fc273365621)*.\n\nNow, split the data into train and test set."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "data[\"v1\"] = data[\"v1\"].map({'spam':1, 'ham':0})",
"execution_count": 9,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Now, split the data into train and test set."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "X_train, X_test, y_train, y_test = model_selection.train_test_split(X, data['v1'], test_size=0.33, random_state=42)\nprint([np.shape(X_train), np.shape(X_test)])",
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": "[(3733, 8409), (1839, 8409)]\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "So, now use sci-kit learn's in-built `SVC`(support vector classifier) with `gaussian kernel` for predictive modelling.\nWe train the model by tuning `regularization` parameter C, and evaluate the accuracy, recall and precision of the model with the test set."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# make a list of parameter's to tune for training\nlist_C = np.arange(500, 2000, 100) #100000\n# zeros initialization\nscore_train = np.zeros(len(list_C))\nscore_test = np.zeros(len(list_C))\nrecall_test = np.zeros(len(list_C))\nprecision_test= np.zeros(len(list_C))\n\ncount = 0\nfor C in list_C:\n # Create a classifier: a support vector classifier\n clf = svm.SVC(C = C)#, kernel=’rbf’, degree=3, gamma=’auto_deprecated’)\n \n # learn the texts\n clf.fit(X_train, y_train)\n score_train[count] = clf.score(X_train, y_train)\n score_test[count]= clf.score(X_test, y_test)\n recall_test[count] = metrics.recall_score(y_test, clf.predict(X_test))\n precision_test[count] = metrics.precision_score(y_test, clf.predict(X_test))\n count = count + 1 ",
"execution_count": 11,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "**Acuraccy metrices**\n![confusion_matrix_01](../images/confusion_matrix_1.png)\nLet's look at accuracy metrics with parameter C."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "matrix = np.matrix(np.c_[list_C, score_train, score_test, recall_test, precision_test])\nmodels = pd.DataFrame(data = matrix, columns = \n ['C', 'Train Accuracy', 'Test Accuracy', 'Test Recall', 'Test Precision'])\nmodels.head(10)",
"execution_count": 12,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 12,
"data": {
"text/plain": " C Train Accuracy Test Accuracy Test Recall Test Precision\n0 500.0 0.994910 0.982599 0.873016 1.0\n1 600.0 0.995714 0.982599 0.873016 1.0\n2 700.0 0.996785 0.982599 0.873016 1.0\n3 800.0 0.997053 0.982599 0.873016 1.0\n4 900.0 0.997589 0.983143 0.876984 1.0\n5 1000.0 0.998125 0.983143 0.876984 1.0\n6 1100.0 0.998928 0.983143 0.876984 1.0\n7 1200.0 0.999732 0.983143 0.876984 1.0\n8 1300.0 1.000000 0.983143 0.876984 1.0\n9 1400.0 1.000000 0.983143 0.876984 1.0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>C</th>\n <th>Train Accuracy</th>\n <th>Test Accuracy</th>\n <th>Test Recall</th>\n <th>Test Precision</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>500.0</td>\n <td>0.994910</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>600.0</td>\n <td>0.995714</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>700.0</td>\n <td>0.996785</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>800.0</td>\n <td>0.997053</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>900.0</td>\n <td>0.997589</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>1000.0</td>\n <td>0.998125</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1100.0</td>\n <td>0.998928</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>1200.0</td>\n <td>0.999732</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>1300.0</td>\n <td>1.000000</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>1400.0</td>\n <td>1.000000</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Check the model with the most test precision."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "best_index = models['Test Precision'].idxmax()\nmodels.iloc[best_index, :]",
"execution_count": 13,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 13,
"data": {
"text/plain": "C 500.000000\nTrain Accuracy 0.994910\nTest Accuracy 0.982599\nTest Recall 0.873016\nTest Precision 1.000000\nName: 0, dtype: float64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "This model doesn't produce any `false-positive`, which is expected.\nLet's check if there is more than one model with 100% precision."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "models[models['Test Precision']==1].head(5)",
"execution_count": 14,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 14,
"data": {
"text/plain": " C Train Accuracy Test Accuracy Test Recall Test Precision\n0 500.0 0.994910 0.982599 0.873016 1.0\n1 600.0 0.995714 0.982599 0.873016 1.0\n2 700.0 0.996785 0.982599 0.873016 1.0\n3 800.0 0.997053 0.982599 0.873016 1.0\n4 900.0 0.997589 0.983143 0.876984 1.0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>C</th>\n <th>Train Accuracy</th>\n <th>Test Accuracy</th>\n <th>Test Recall</th>\n <th>Test Precision</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>500.0</td>\n <td>0.994910</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>600.0</td>\n <td>0.995714</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>700.0</td>\n <td>0.996785</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>800.0</td>\n <td>0.997053</td>\n <td>0.982599</td>\n <td>0.873016</td>\n <td>1.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>900.0</td>\n <td>0.997589</td>\n <td>0.983143</td>\n <td>0.876984</td>\n <td>1.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Among these models with the highest possible precision, we are going to select which has more test accuracy."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "best_index = models[models['Test Precision']==1]['Test Accuracy'].idxmax()\n\n# check with the best parameter(C) value \nclf = svm.SVC(C=list_C[best_index])\nclf.fit(X_train, y_train)\nmodels.iloc[best_index, :]",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 15,
"data": {
"text/plain": "C 900.000000\nTrain Accuracy 0.997589\nTest Accuracy 0.983143\nTest Recall 0.876984\nTest Precision 1.000000\nName: 4, dtype: float64"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "confusion_matrix_test = metrics.confusion_matrix(y_test, clf.predict(X_test))\npd.DataFrame(data = confusion_matrix_test, columns = ['Predicted, non-spam(0)', 'Predicted, spam(1)'], index = ['Actual, non-spam(0)', 'Actual, spam(1)'])",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 16,
"data": {
"text/plain": " Predicted, non-spam(0) Predicted, spam(1)\nActual, non-spam(0) 1587 0\nActual, spam(1) 31 221",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Predicted, non-spam(0)</th>\n <th>Predicted, spam(1)</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>Actual, non-spam(0)</th>\n <td>1587</td>\n <td>0</td>\n </tr>\n <tr>\n <th>Actual, spam(1)</th>\n <td>31</td>\n <td>221</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "We misclassify 31 spam messages as non-spam messages whereas we don't misclassify any non-spam message.\n\n### Results:-\nWe got 98.3143% accuracy, which is quite well with SVM classifier.\n\nIt classifies every non-spam message correctly (Model precision)\n\nIt classifies the 87.7% of spam messages correctly (Model recall)"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"toc": {
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"base_numbering": 1,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"language_info": {
"name": "python",
"version": "3.6.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "This is the notebook for SMS Spam Classification Using SVMs.",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment