Skip to content

Instantly share code, notes, and snippets.

@ishridharhegde
Created June 20, 2019 10:00
Show Gist options
  • Save ishridharhegde/59d8f21ad2c669d7ac552119c1ffe661 to your computer and use it in GitHub Desktop.
Save ishridharhegde/59d8f21ad2c669d7ac552119c1ffe661 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [],
"source": [
"#This is the Internity Foundation's Machine Learning Internship day 5 task\n",
"#The task is to implement Data Preprocessing on any data set of out choice\n",
"#Shridhar Hegde 18.Jun.2019\n",
"\n",
"# Data Set Information:\n",
"\n",
"# The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed. \n",
"\n",
"# There are four datasets: \n",
"# 1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]\n",
"# 2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.\n",
"# 3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs). \n",
"# 4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs). \n",
"# The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM). \n",
"\n",
"# The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).\n",
"\n",
"\n",
"# Attribute Information:\n",
"\n",
"# Input variables:\n",
"# # bank client data:\n",
"# 1 - age (numeric)\n",
"# 2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')\n",
"# 3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)\n",
"# 4 - education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')\n",
"# 5 - default: has credit in default? (categorical: 'no','yes','unknown')\n",
"# 6 - housing: has housing loan? (categorical: 'no','yes','unknown')\n",
"# 7 - loan: has personal loan? (categorical: 'no','yes','unknown')\n",
"# # related with the last contact of the current campaign:\n",
"# 8 - contact: contact communication type (categorical: 'cellular','telephone') \n",
"# 9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')\n",
"# 10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')\n",
"# 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.\n",
"# # other attributes:\n",
"# 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)\n",
"# 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)\n",
"# 14 - previous: number of contacts performed before this campaign and for this client (numeric)\n",
"# 15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')\n",
"# # social and economic context attributes\n",
"# 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)\n",
"# 17 - cons.price.idx: consumer price index - monthly indicator (numeric) \n",
"# 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) \n",
"# 19 - euribor3m: euribor 3 month rate - daily indicator (numeric)\n",
"# 20 - nr.employed: number of employees - quarterly indicator (numeric)\n",
"\n",
"# Output variable (desired target):\n",
"# 21 - y - has the client subscribed a term deposit? (binary: 'yes','no')\n"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {},
"outputs": [],
"source": [
"#Import the libraries\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"#Read the data\n",
"df = pd.read_csv(\"bankmarketing.csv\", sep=\";\")"
]
},
{
"cell_type": "code",
"execution_count": 287,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>unemployed</td>\n",
" <td>married</td>\n",
" <td>primary</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>no</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>services</td>\n",
" <td>married</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>management</td>\n",
" <td>single</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>yes</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>management</td>\n",
" <td>married</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>blue-collar</td>\n",
" <td>married</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>yes</td>\n",
" <td>no</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed married primary no 1787 no no \n",
"1 33 services married secondary no 4789 yes yes \n",
"2 35 management single tertiary no 1350 yes no \n",
"3 30 management married tertiary no 1476 yes yes \n",
"4 59 blue-collar married secondary no 0 yes no \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"0 cellular 19 oct 79 1 -1 0 unknown no \n",
"1 cellular 11 may 220 1 339 4 failure no \n",
"2 cellular 16 apr 185 1 330 1 failure no \n",
"3 unknown 3 jun 199 4 -1 0 unknown no \n",
"4 unknown 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 287,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Display head of the data\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 288,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4516</th>\n",
" <td>33</td>\n",
" <td>services</td>\n",
" <td>married</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>-333</td>\n",
" <td>yes</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>30</td>\n",
" <td>jul</td>\n",
" <td>329</td>\n",
" <td>5</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4517</th>\n",
" <td>57</td>\n",
" <td>self-employed</td>\n",
" <td>married</td>\n",
" <td>tertiary</td>\n",
" <td>yes</td>\n",
" <td>-3313</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>unknown</td>\n",
" <td>9</td>\n",
" <td>may</td>\n",
" <td>153</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4518</th>\n",
" <td>57</td>\n",
" <td>technician</td>\n",
" <td>married</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>295</td>\n",
" <td>no</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>aug</td>\n",
" <td>151</td>\n",
" <td>11</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4519</th>\n",
" <td>28</td>\n",
" <td>blue-collar</td>\n",
" <td>married</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>1137</td>\n",
" <td>no</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>6</td>\n",
" <td>feb</td>\n",
" <td>129</td>\n",
" <td>4</td>\n",
" <td>211</td>\n",
" <td>3</td>\n",
" <td>other</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4520</th>\n",
" <td>44</td>\n",
" <td>entrepreneur</td>\n",
" <td>single</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1136</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>cellular</td>\n",
" <td>3</td>\n",
" <td>apr</td>\n",
" <td>345</td>\n",
" <td>2</td>\n",
" <td>249</td>\n",
" <td>7</td>\n",
" <td>other</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan \\\n",
"4516 33 services married secondary no -333 yes no \n",
"4517 57 self-employed married tertiary yes -3313 yes yes \n",
"4518 57 technician married secondary no 295 no no \n",
"4519 28 blue-collar married secondary no 1137 no no \n",
"4520 44 entrepreneur single tertiary no 1136 yes yes \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"4516 cellular 30 jul 329 5 -1 0 unknown no \n",
"4517 unknown 9 may 153 1 -1 0 unknown no \n",
"4518 cellular 19 aug 151 11 -1 0 unknown no \n",
"4519 cellular 6 feb 129 4 211 3 other no \n",
"4520 cellular 3 apr 345 2 249 7 other no "
]
},
"execution_count": 288,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Display the tail of data\n",
"df.tail()"
]
},
{
"cell_type": "code",
"execution_count": 289,
"metadata": {},
"outputs": [],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#If the marital status is \"single\" replace it with 0 else replace with 1\n",
"def replace_marital(val):\n",
" if val == \"single\":\n",
" return 0\n",
" else:\n",
" return 1\n",
"\n",
"df['marital'] = df['marital'].apply(replace_marital, 1)"
]
},
{
"cell_type": "code",
"execution_count": 290,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>unemployed</td>\n",
" <td>1</td>\n",
" <td>primary</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>no</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>services</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>management</td>\n",
" <td>0</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>yes</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>management</td>\n",
" <td>1</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>yes</td>\n",
" <td>yes</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>blue-collar</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>yes</td>\n",
" <td>no</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed 1 primary no 1787 no no \n",
"1 33 services 1 secondary no 4789 yes yes \n",
"2 35 management 0 tertiary no 1350 yes no \n",
"3 30 management 1 tertiary no 1476 yes yes \n",
"4 59 blue-collar 1 secondary no 0 yes no \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"0 cellular 19 oct 79 1 -1 0 unknown no \n",
"1 cellular 11 may 220 1 339 4 failure no \n",
"2 cellular 16 apr 185 1 330 1 failure no \n",
"3 unknown 3 jun 199 4 -1 0 unknown no \n",
"4 unknown 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 290,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 291,
"metadata": {},
"outputs": [],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#Replace the the \"no\" in \"housing\" column with 0 and \"yes\" with 1\n",
"def replace_housing(val):\n",
" if val == \"no\":\n",
" return 0\n",
" else:\n",
" return 1\n",
"\n",
"df['housing'] = df['housing'].apply(replace_housing, 1)"
]
},
{
"cell_type": "code",
"execution_count": 292,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>unemployed</td>\n",
" <td>1</td>\n",
" <td>primary</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>0</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>services</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>1</td>\n",
" <td>yes</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>management</td>\n",
" <td>0</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>1</td>\n",
" <td>no</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>management</td>\n",
" <td>1</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>1</td>\n",
" <td>yes</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>blue-collar</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>no</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed 1 primary no 1787 0 no \n",
"1 33 services 1 secondary no 4789 1 yes \n",
"2 35 management 0 tertiary no 1350 1 no \n",
"3 30 management 1 tertiary no 1476 1 yes \n",
"4 59 blue-collar 1 secondary no 0 1 no \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"0 cellular 19 oct 79 1 -1 0 unknown no \n",
"1 cellular 11 may 220 1 339 4 failure no \n",
"2 cellular 16 apr 185 1 330 1 failure no \n",
"3 unknown 3 jun 199 4 -1 0 unknown no \n",
"4 unknown 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 292,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 293,
"metadata": {},
"outputs": [],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#Replace \"no\" in \"loan\" column with 0 and \"yes\" with 1\n",
"df['loan'] = df['loan'].replace({\n",
" \"no\":0,\n",
" \"yes\":1\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 294,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>unemployed</td>\n",
" <td>1</td>\n",
" <td>primary</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>services</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>management</td>\n",
" <td>0</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>management</td>\n",
" <td>1</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>blue-collar</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan \\\n",
"0 30 unemployed 1 primary no 1787 0 0 \n",
"1 33 services 1 secondary no 4789 1 1 \n",
"2 35 management 0 tertiary no 1350 1 0 \n",
"3 30 management 1 tertiary no 1476 1 1 \n",
"4 59 blue-collar 1 secondary no 0 1 0 \n",
"\n",
" contact day month duration campaign pdays previous poutcome y \n",
"0 cellular 19 oct 79 1 -1 0 unknown no \n",
"1 cellular 11 may 220 1 339 4 failure no \n",
"2 cellular 16 apr 185 1 330 1 failure no \n",
"3 unknown 3 jun 199 4 -1 0 unknown no \n",
"4 unknown 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 294,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 295,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['unemployed', 'services', 'management', 'blue-collar',\n",
" 'self-employed', 'technician', 'entrepreneur', 'admin.', 'student',\n",
" 'housemaid', 'retired', 'unknown'], dtype=object)"
]
},
"execution_count": 295,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Find the unique \"job\" that are there\n",
"df['job'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 296,
"metadata": {},
"outputs": [],
"source": [
"#Replace each job with some numerical code asssigned to each type of job\n",
"df['job'].replace({\n",
" 'unknown':np.nan,\n",
" 'management':0,\n",
" 'technician':1,\n",
" 'entrepreneur':2,\n",
" 'blue-collar':3,\n",
" 'retired':4,\n",
" 'admin':5,\n",
" 'services':6,\n",
" 'self-employed':7,\n",
" 'unemployed':8,\n",
" 'housemaid':9,\n",
" 'student':10\n",
"}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 297,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>primary</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>tertiary</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>secondary</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact day \\\n",
"0 30 8 1 primary no 1787 0 0 cellular 19 \n",
"1 33 6 1 secondary no 4789 1 1 cellular 11 \n",
"2 35 0 0 tertiary no 1350 1 0 cellular 16 \n",
"3 30 0 1 tertiary no 1476 1 1 unknown 3 \n",
"4 59 3 1 secondary no 0 1 0 unknown 5 \n",
"\n",
" month duration campaign pdays previous poutcome y \n",
"0 oct 79 1 -1 0 unknown no \n",
"1 may 220 1 339 4 failure no \n",
"2 apr 185 1 330 1 failure no \n",
"3 jun 199 4 -1 0 unknown no \n",
"4 may 226 1 -1 0 unknown no "
]
},
"execution_count": 297,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 298,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['primary', 'secondary', 'tertiary', 'unknown'], dtype=object)"
]
},
"execution_count": 298,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Find unique \"education\"\n",
"df['education'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 299,
"metadata": {},
"outputs": [],
"source": [
"#Replace \"education\" with numerical values\n",
"df['education'].replace({\n",
" 'unknown':np.nan,\n",
" 'tertiary':0,\n",
" 'secondary':1,\n",
" 'primary':2\n",
"}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 300,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>no</td>\n",
" <td>1787</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>no</td>\n",
" <td>4789</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" <td>1350</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" <td>1476</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>no</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact day \\\n",
"0 30 8 1 2.0 no 1787 0 0 cellular 19 \n",
"1 33 6 1 1.0 no 4789 1 1 cellular 11 \n",
"2 35 0 0 0.0 no 1350 1 0 cellular 16 \n",
"3 30 0 1 0.0 no 1476 1 1 unknown 3 \n",
"4 59 3 1 1.0 no 0 1 0 unknown 5 \n",
"\n",
" month duration campaign pdays previous poutcome y \n",
"0 oct 79 1 -1 0 unknown no \n",
"1 may 220 1 339 4 failure no \n",
"2 apr 185 1 330 1 failure no \n",
"3 jun 199 4 -1 0 unknown no \n",
"4 may 226 1 -1 0 unknown no "
]
},
"execution_count": 300,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 301,
"metadata": {},
"outputs": [],
"source": [
"df['default'].replace({\n",
" 'no':0,\n",
" 'yes':1\n",
"}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 302,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>1787</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>4789</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>1350</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>1476</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 1787 0 0 cellular \n",
"1 33 6 1 1.0 0 4789 1 1 cellular \n",
"2 35 0 0 0.0 0 1350 1 0 cellular \n",
"3 30 0 1 0.0 0 1476 1 1 unknown \n",
"4 59 3 1 1.0 0 0 1 0 unknown \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 oct 79 1 -1 0 unknown no \n",
"1 11 may 220 1 339 4 failure no \n",
"2 16 apr 185 1 330 1 failure no \n",
"3 3 jun 199 4 -1 0 unknown no \n",
"4 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 302,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 303,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-3313"
]
},
"execution_count": 303,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#USING NORMALIZATION USING MIN-MAX TECHNIQUE\n",
"#Find the minimum balance\n",
"df['balance'].min()"
]
},
{
"cell_type": "code",
"execution_count": 304,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"71188"
]
},
"execution_count": 304,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Find the maximum balance\n",
"df['balance'].max()"
]
},
{
"cell_type": "code",
"execution_count": 305,
"metadata": {},
"outputs": [],
"source": [
"#Applying the min-max normalization technique\n",
"df['balance'] = df['balance'].apply(lambda v: (v - df['balance'].min()) / (df['balance'].max() - df['balance'].min()))"
]
},
{
"cell_type": "code",
"execution_count": 306,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>cellular</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>cellular</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>unknown</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 cellular \n",
"1 33 6 1 1.0 0 0.108750 1 1 cellular \n",
"2 35 0 0 0.0 0 0.062590 1 0 cellular \n",
"3 30 0 1 0.0 0 0.064281 1 1 unknown \n",
"4 59 3 1 1.0 0 0.044469 1 0 unknown \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 oct 79 1 -1 0 unknown no \n",
"1 11 may 220 1 339 4 failure no \n",
"2 16 apr 185 1 330 1 failure no \n",
"3 3 jun 199 4 -1 0 unknown no \n",
"4 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 306,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 307,
"metadata": {},
"outputs": [],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#Replace \"unknown\" in \"contact\" with \"nan\", \"telephone\" with 0 and \"cellular\" with 1\n",
"df['contact'].replace({\n",
" 'unknown':np.nan,\n",
" 'telephone':0,\n",
" 'cellular':1\n",
"}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 308,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>oct</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>may</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>apr</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>jun</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>may</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 oct 79 1 -1 0 unknown no \n",
"1 11 may 220 1 339 4 failure no \n",
"2 16 apr 185 1 330 1 failure no \n",
"3 3 jun 199 4 -1 0 unknown no \n",
"4 5 may 226 1 -1 0 unknown no "
]
},
"execution_count": 308,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 309,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['oct', 'may', 'apr', 'jun', 'feb', 'aug', 'jan', 'jul', 'nov',\n",
" 'sep', 'mar', 'dec'], dtype=object)"
]
},
"execution_count": 309,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#Finding the unique \"months\" that are there\n",
"df['month'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 310,
"metadata": {},
"outputs": [],
"source": [
"#Giving appropriate numbers to each of the months\n",
"df['month'] = df['month'].map({\n",
" 'jan':1,\n",
" 'feb':2,\n",
" 'mar':3,\n",
" 'apr':4,\n",
" 'may':5,\n",
" 'jun':6,\n",
" 'jul':7,\n",
" 'aug':8,\n",
" 'sep':9,\n",
" 'oct':10,\n",
" 'nov':11,\n",
" 'dec':12\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 311,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>10</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>failure</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>unknown</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 10 79 1 -1 0 unknown no \n",
"1 11 5 220 1 339 4 failure no \n",
"2 16 4 185 1 330 1 failure no \n",
"3 3 6 199 4 -1 0 unknown no \n",
"4 5 5 226 1 -1 0 unknown no "
]
},
"execution_count": 311,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 312,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['unknown', 'failure', 'other', 'success'], dtype=object)"
]
},
"execution_count": 312,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#Finding the unique entries in \"poutcome\" column\n",
"df['poutcome'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 313,
"metadata": {},
"outputs": [],
"source": [
"#Replace entries in \"poutcome\" with numeric codes\n",
"df['poutcome'] = df['poutcome'].map({\n",
" 'unknown':np.nan,\n",
" 'failure':0,\n",
" 'other':1,\n",
" 'success':2\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 314,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>10</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>339</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>330</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>-1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 10 79 1 -1 0 NaN no \n",
"1 11 5 220 1 339 4 0.0 no \n",
"2 16 4 185 1 330 1 0.0 no \n",
"3 3 6 199 4 -1 0 NaN no \n",
"4 5 5 226 1 -1 0 NaN no "
]
},
"execution_count": 314,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 315,
"metadata": {},
"outputs": [],
"source": [
"#USING NORMALIZATION USING MIN-MAX TECHNIQUE\n",
"#Normalize the \"pdays\" column\n",
"df['pdays'] = df['pdays'].apply(lambda v: (v - df['pdays'].min()) / (df['pdays'].max() - df['pdays'].min()))"
]
},
{
"cell_type": "code",
"execution_count": 316,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>10</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>0.389908</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>0.379587</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>no</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 10 79 1 0.000000 0 NaN no \n",
"1 11 5 220 1 0.389908 4 0.0 no \n",
"2 16 4 185 1 0.379587 1 0.0 no \n",
"3 3 6 199 4 0.000000 0 NaN no \n",
"4 5 5 226 1 0.000000 0 NaN no "
]
},
"execution_count": 316,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 317,
"metadata": {},
"outputs": [],
"source": [
"#REPLACE CATEGORICAL DATA WITH NUMERICAL DATA\n",
"#IN the last column \"y\", replace \"no\" with 0 and \"yes\" with 1\n",
"df['y'].replace({\n",
" 'no':0,\n",
" 'yes':1\n",
"}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 318,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>10</td>\n",
" <td>79</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>220</td>\n",
" <td>1</td>\n",
" <td>0.389908</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>185</td>\n",
" <td>1</td>\n",
" <td>0.379587</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>199</td>\n",
" <td>4</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>226</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 10 79 1 0.000000 0 NaN 0 \n",
"1 11 5 220 1 0.389908 4 0.0 0 \n",
"2 16 4 185 1 0.379587 1 0.0 0 \n",
"3 3 6 199 4 0.000000 0 NaN 0 \n",
"4 5 5 226 1 0.000000 0 NaN 0 "
]
},
"execution_count": 318,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 319,
"metadata": {},
"outputs": [],
"source": [
"#USING NORMALIZATION USING MIN-MAX TECHNIQUE\n",
"#In 'duration' column, using min-max normalization\n",
"df['duration'] = df['duration'].apply(lambda v: (v - df['duration'].min())/ (df['duration'].max() - df['duration'].min()))"
]
},
{
"cell_type": "code",
"execution_count": 320,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>job</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>30</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>0</td>\n",
" <td>0.068455</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>19</td>\n",
" <td>10</td>\n",
" <td>0.024826</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>6</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.108750</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>11</td>\n",
" <td>5</td>\n",
" <td>0.071500</td>\n",
" <td>1</td>\n",
" <td>0.389908</td>\n",
" <td>4</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.062590</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1.0</td>\n",
" <td>16</td>\n",
" <td>4</td>\n",
" <td>0.059914</td>\n",
" <td>1</td>\n",
" <td>0.379587</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>30</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.064281</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>0.064548</td>\n",
" <td>4</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>59</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0.044469</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>5</td>\n",
" <td>0.073486</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age job marital education default balance housing loan contact \\\n",
"0 30 8 1 2.0 0 0.068455 0 0 1.0 \n",
"1 33 6 1 1.0 0 0.108750 1 1 1.0 \n",
"2 35 0 0 0.0 0 0.062590 1 0 1.0 \n",
"3 30 0 1 0.0 0 0.064281 1 1 NaN \n",
"4 59 3 1 1.0 0 0.044469 1 0 NaN \n",
"\n",
" day month duration campaign pdays previous poutcome y \n",
"0 19 10 0.024826 1 0.000000 0 NaN 0 \n",
"1 11 5 0.071500 1 0.389908 4 0.0 0 \n",
"2 16 4 0.059914 1 0.379587 1 0.0 0 \n",
"3 3 6 0.064548 4 0.000000 0 NaN 0 \n",
"4 5 5 0.073486 1 0.000000 0 NaN 0 "
]
},
"execution_count": 320,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 321,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>age</th>\n",
" <th>marital</th>\n",
" <th>education</th>\n",
" <th>default</th>\n",
" <th>balance</th>\n",
" <th>housing</th>\n",
" <th>loan</th>\n",
" <th>contact</th>\n",
" <th>day</th>\n",
" <th>month</th>\n",
" <th>duration</th>\n",
" <th>campaign</th>\n",
" <th>pdays</th>\n",
" <th>previous</th>\n",
" <th>poutcome</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4334.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>3197.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>4521.000000</td>\n",
" <td>816.000000</td>\n",
" <td>4521.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>41.170095</td>\n",
" <td>0.735457</td>\n",
" <td>0.844947</td>\n",
" <td>0.016810</td>\n",
" <td>0.063565</td>\n",
" <td>0.566025</td>\n",
" <td>0.152842</td>\n",
" <td>0.905849</td>\n",
" <td>15.915284</td>\n",
" <td>6.166777</td>\n",
" <td>0.086051</td>\n",
" <td>2.793630</td>\n",
" <td>0.046751</td>\n",
" <td>0.542579</td>\n",
" <td>0.557598</td>\n",
" <td>0.115240</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>10.576211</td>\n",
" <td>0.441138</td>\n",
" <td>0.666325</td>\n",
" <td>0.128575</td>\n",
" <td>0.040397</td>\n",
" <td>0.495676</td>\n",
" <td>0.359875</td>\n",
" <td>0.292084</td>\n",
" <td>8.247667</td>\n",
" <td>2.378380</td>\n",
" <td>0.086017</td>\n",
" <td>3.109807</td>\n",
" <td>0.114818</td>\n",
" <td>1.693562</td>\n",
" <td>0.750699</td>\n",
" <td>0.319347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>19.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>33.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.045395</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>9.000000</td>\n",
" <td>5.000000</td>\n",
" <td>0.033102</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>39.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.050429</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>16.000000</td>\n",
" <td>6.000000</td>\n",
" <td>0.059914</td>\n",
" <td>2.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>49.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.064335</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>21.000000</td>\n",
" <td>8.000000</td>\n",
" <td>0.107580</td>\n",
" <td>3.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>87.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>31.000000</td>\n",
" <td>12.000000</td>\n",
" <td>1.000000</td>\n",
" <td>50.000000</td>\n",
" <td>1.000000</td>\n",
" <td>25.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" age marital education default balance \\\n",
"count 4521.000000 4521.000000 4334.000000 4521.000000 4521.000000 \n",
"mean 41.170095 0.735457 0.844947 0.016810 0.063565 \n",
"std 10.576211 0.441138 0.666325 0.128575 0.040397 \n",
"min 19.000000 0.000000 0.000000 0.000000 0.000000 \n",
"25% 33.000000 0.000000 0.000000 0.000000 0.045395 \n",
"50% 39.000000 1.000000 1.000000 0.000000 0.050429 \n",
"75% 49.000000 1.000000 1.000000 0.000000 0.064335 \n",
"max 87.000000 1.000000 2.000000 1.000000 1.000000 \n",
"\n",
" housing loan contact day month \\\n",
"count 4521.000000 4521.000000 3197.000000 4521.000000 4521.000000 \n",
"mean 0.566025 0.152842 0.905849 15.915284 6.166777 \n",
"std 0.495676 0.359875 0.292084 8.247667 2.378380 \n",
"min 0.000000 0.000000 0.000000 1.000000 1.000000 \n",
"25% 0.000000 0.000000 1.000000 9.000000 5.000000 \n",
"50% 1.000000 0.000000 1.000000 16.000000 6.000000 \n",
"75% 1.000000 0.000000 1.000000 21.000000 8.000000 \n",
"max 1.000000 1.000000 1.000000 31.000000 12.000000 \n",
"\n",
" duration campaign pdays previous poutcome \\\n",
"count 4521.000000 4521.000000 4521.000000 4521.000000 816.000000 \n",
"mean 0.086051 2.793630 0.046751 0.542579 0.557598 \n",
"std 0.086017 3.109807 0.114818 1.693562 0.750699 \n",
"min 0.000000 1.000000 0.000000 0.000000 0.000000 \n",
"25% 0.033102 1.000000 0.000000 0.000000 0.000000 \n",
"50% 0.059914 2.000000 0.000000 0.000000 0.000000 \n",
"75% 0.107580 3.000000 0.000000 0.000000 1.000000 \n",
"max 1.000000 50.000000 1.000000 25.000000 2.000000 \n",
"\n",
" y \n",
"count 4521.000000 \n",
"mean 0.115240 \n",
"std 0.319347 \n",
"min 0.000000 \n",
"25% 0.000000 \n",
"50% 0.000000 \n",
"75% 0.000000 \n",
"max 1.000000 "
]
},
"execution_count": 321,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Get description about the processed data\n",
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 361,
"metadata": {},
"outputs": [],
"source": [
"#Saving the preprocessed data frame as a CSV file\n",
"df.to_csv(\"preprocessedBankmaintenance.csv\", index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment