Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sarojbc75/099fdfb07398022f1c338454f2badd1b to your computer and use it in GitHub Desktop.
Save sarojbc75/099fdfb07398022f1c338454f2badd1b to your computer and use it in GitHub Desktop.
How to apply cross validation and compare different models
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Project Objective: How to apply cross validation and compare diff models. \n",
" \n",
" Summary of the overall approach.\n",
" 1. Chi-square test was performed to sigle out of the feature importance of the catagorical feature and it was concluded \n",
" that All catagorical features seems to have some kind of influence on the outcome.\n",
" 2. Following Features have been decided to be removed from model building and training as part of initial phase.\n",
" -- Cabin : because most of the values (Aprox 70%) are missing\n",
" -- Names:\n",
" -- Ticket\n",
" 3. Model Accuracy has been tested with Cross validations score using following models.\n",
" 1.Decesion Trees Classifier\n",
" 2.RandomForest Classifier\n",
" 3.LogisticRegression\n",
" 4.Support Vector Classifier\n",
" 5.GradientBoosting Classifier\n",
" 6.XGBoost Classifier\n",
" 7.XGBoost Random Forest Classifier\n",
" 4. Based on 10 Crosss validation splits, it is found that the XGBoost Random Forest Classifier gave the better accuracy \n",
" score of Min Score of:75%, Maximum Score of:90% and Average Score of: 83%\n",
" 5. K-Fold Crosss validation was performed and extracted the best training and Testing sample out of 10 splits and \n",
" model was retrained.\n",
" 6. GridSearch Hyperparametr tuning approach was also tried to check if the model performance can further be increased.\n",
"## About Data\n",
" The dataset has been taken from Kaggle Titanic Survivor_Prediction"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"dataSet=pd.read_csv(r'C:\\Saroj_Official\\AI\\Kaggle\\TitanicSurviver\\train.csv')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataSet.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 PassengerId 891 non-null int64 \n",
" 1 Survived 891 non-null int64 \n",
" 2 Pclass 891 non-null int64 \n",
" 3 Name 891 non-null object \n",
" 4 Sex 891 non-null object \n",
" 5 Age 714 non-null float64\n",
" 6 SibSp 891 non-null int64 \n",
" 7 Parch 891 non-null int64 \n",
" 8 Ticket 891 non-null object \n",
" 9 Fare 891 non-null float64\n",
" 10 Cabin 204 non-null object \n",
" 11 Embarked 889 non-null object \n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.7+ KB\n"
]
}
],
"source": [
"# Check the dataset for features and Label types\n",
"dataSet.info()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0.000000\n",
"Survived 0.000000\n",
"Pclass 0.000000\n",
"Name 0.000000\n",
"Sex 0.000000\n",
"Age 19.865320\n",
"SibSp 0.000000\n",
"Parch 0.000000\n",
"Ticket 0.000000\n",
"Fare 0.000000\n",
"Cabin 77.104377\n",
"Embarked 0.224467\n",
"dtype: float64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check for Null Features and % of Null values across each Featur\n",
"# Features having Null values are 'Age','Cabin' and 'Embarked'\n",
"dataSet.isnull().sum()/len(dataSet)*100"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Ticket Fare \\\n",
"0 0 3 male 22.0 1 0 A/5 21171 7.2500 \n",
"1 1 1 female 38.0 1 0 PC 17599 71.2833 \n",
"2 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 \n",
"3 1 1 female 35.0 1 0 113803 53.1000 \n",
"4 0 3 male 35.0 0 0 373450 8.0500 \n",
"\n",
" Embarked \n",
"0 S \n",
"1 C \n",
"2 S \n",
"3 S \n",
"4 S "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fill in Null Values for 'Age' with mean value\n",
"dataSet.Age.fillna(np.round(dataSet.Age.mean()),inplace=True)\n",
"# Fill in Null Values for 'Embarked' with mode\n",
"dataSet.Embarked.fillna(dataSet.Embarked.mode()[0],inplace=True)\n",
"\n",
"# Since there are 77% of missing values for the attribute Cabin, its better to discard this columns from model building as part of initial trial\n",
"# Lets also drop PassengerId,Name column from model building\n",
"dataSet1=dataSet.drop(columns=['PassengerId','Name','Cabin'])\n",
"dataSet1.head()\n",
"# Name 891 non-null object\n",
"# Embarked 889 non-null object\n",
"# Cabin 204 non-null object\n",
"# Sex 891 non-null object\n",
"# Ticket 891 non-null object"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 891.000000\n",
"mean 29.758889\n",
"std 13.002570\n",
"min 0.420000\n",
"25% 22.000000\n",
"50% 30.000000\n",
"75% 35.000000\n",
"max 80.000000\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Lets Make Age into Age_group (Optional Step)\n",
"dataSet1.Age.describe()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def age_group(x):\n",
" age_grp=''\n",
" if (x >=0 and x <= 12):\n",
" age_grp='Kid'\n",
" elif (x >=13 and x <= 19):\n",
" age_grp='Teenager'\n",
" elif (x >=20 and x <= 55):\n",
" age_grp='Adult'\n",
" else:\n",
" age_grp='Older'\n",
" return age_grp "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"dataSet1['Age_group']=dataSet1.Age.apply(age_group)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Age_group</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>886</th>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>male</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>211536</td>\n",
" <td>13.0000</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>887</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>female</td>\n",
" <td>19.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>112053</td>\n",
" <td>30.0000</td>\n",
" <td>S</td>\n",
" <td>Teenager</td>\n",
" </tr>\n",
" <tr>\n",
" <th>888</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>female</td>\n",
" <td>30.0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>W./C. 6607</td>\n",
" <td>23.4500</td>\n",
" <td>S</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>889</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>male</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>111369</td>\n",
" <td>30.0000</td>\n",
" <td>C</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>male</td>\n",
" <td>32.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>370376</td>\n",
" <td>7.7500</td>\n",
" <td>Q</td>\n",
" <td>Adult</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>891 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Ticket Fare \\\n",
"0 0 3 male 22.0 1 0 A/5 21171 7.2500 \n",
"1 1 1 female 38.0 1 0 PC 17599 71.2833 \n",
"2 1 3 female 26.0 0 0 STON/O2. 3101282 7.9250 \n",
"3 1 1 female 35.0 1 0 113803 53.1000 \n",
"4 0 3 male 35.0 0 0 373450 8.0500 \n",
".. ... ... ... ... ... ... ... ... \n",
"886 0 2 male 27.0 0 0 211536 13.0000 \n",
"887 1 1 female 19.0 0 0 112053 30.0000 \n",
"888 0 3 female 30.0 1 2 W./C. 6607 23.4500 \n",
"889 1 1 male 26.0 0 0 111369 30.0000 \n",
"890 0 3 male 32.0 0 0 370376 7.7500 \n",
"\n",
" Embarked Age_group \n",
"0 S Adult \n",
"1 C Adult \n",
"2 S Adult \n",
"3 S Adult \n",
"4 S Adult \n",
".. ... ... \n",
"886 S Adult \n",
"887 S Teenager \n",
"888 S Adult \n",
"889 C Adult \n",
"890 Q Adult \n",
"\n",
"[891 rows x 10 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataSet1"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index(['Pclass', 'Sex', 'Ticket', 'Embarked', 'Age_group'], dtype='object')\n",
"[5.44658660e+01 9.27024470e+01 2.87165547e+03 1.02025247e+01\n",
" 2.02415975e+00]\n"
]
}
],
"source": [
"# Lets try to figure out: Among all categorical features which features seems to be influncing most for predicting the outcome\n",
"# We will do Chisquare test for SELECTING TOP-N CATEGORICAL FEATURES. For this lets separate all categorical features and label\n",
"dataSet2=dataSet1.drop(columns=['Age','Fare','SibSp','Parch'])\n",
"dataSet2=dataSet2.astype(str)\n",
"dataSet2.head()\n",
"\n",
"#From Below Chisquare Test it seems most influencing Categorical feature is 'Ticket' followed by 'Sex','Pclass'.\n",
"#However the scores are not that far apart and hence its not conclusive to leave out any acategorical feature at this point.\n",
"# We can possibly leave out Ticket Ticket and include Fare instead\n",
"from sklearn.feature_selection import chi2\n",
"from sklearn.feature_selection import SelectKBest\n",
"from sklearn.preprocessing import OrdinalEncoder\n",
"features=dataSet2.drop(columns=['Survived'])\n",
"label=dataSet2.loc[:,['Survived']]\n",
"OE=OrdinalEncoder()\n",
"features_chi=OE.fit_transform(features) # This is needed to encode each catagorical variable to integers values.\n",
"modelKBest=SelectKBest(score_func=chi2,k='all')\n",
"finalFeatures=modelKBest.fit_transform(features_chi,label)\n",
"print(features.columns)\n",
"print(modelKBest.scores_)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Lets decide Final features and label as below. Dropping out PassengerId,Name,Cabin, Ticket columns for the initial model. \n",
"finalFeatures=dataSet.drop(columns=['PassengerId','Name','Cabin','Ticket','Survived'])\n",
"label=dataSet.loc[:,['Survived']]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Sex_female</th>\n",
" <th>Sex_male</th>\n",
" <th>Embarked_C</th>\n",
" <th>Embarked_Q</th>\n",
" <th>Embarked_S</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Age SibSp Parch Fare Sex_female Sex_male Embarked_C \\\n",
"0 3 22.0 1 0 7.2500 0 1 0 \n",
"1 1 38.0 1 0 71.2833 1 0 1 \n",
"2 3 26.0 0 0 7.9250 1 0 0 \n",
"3 1 35.0 1 0 53.1000 1 0 0 \n",
"4 3 35.0 0 0 8.0500 0 1 0 \n",
"\n",
" Embarked_Q Embarked_S \n",
"0 0 1 \n",
"1 0 0 \n",
"2 0 1 \n",
"3 0 1 \n",
"4 0 1 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Lets apply OneHotEncoding for Categorical features\n",
"finalFeatures=pd.get_dummies(finalFeatures)\n",
"finalFeatures.head()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"finalFeatures=finalFeatures.values\n",
"label=label.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Verify the performance of different models using Stratified-KFold Cross Validation "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import StratifiedKFold\n",
"from sklearn.metrics import f1_score,classification_report,confusion_matrix\n",
"#from sklearn import metrics\n",
"\n",
"def stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y):\n",
" global df_model_selection\n",
" \n",
" skf = StratifiedKFold(n_splits, random_state=12,shuffle=True)\n",
" \n",
" weighted_f1_score = []\n",
" #print(skf.split(X,y))\n",
" for train_index, test_index in skf.split(X,y):\n",
" X_train, X_test = X[train_index], X[test_index] \n",
" y_train, y_test = y[train_index], y[test_index]\n",
" \n",
" \n",
" model_obj.fit(X_train, y_train)\n",
" test_ds_predicted = model_obj.predict( X_test ) \n",
" weighted_f1_score.append(round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2))\n",
" \n",
" sd_weighted_f1_score = np.std(weighted_f1_score, ddof=1)\n",
" range_of_f1_scores = \"{}-{}\".format(min(weighted_f1_score),max(weighted_f1_score)) \n",
" df_model_selection = pd.concat([df_model_selection,pd.DataFrame([[process,model_name,sorted(weighted_f1_score),range_of_f1_scores,sd_weighted_f1_score]], columns =COLUMN_NAMES) ])\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
"STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
"\n",
"Increase the number of iterations (max_iter) or scale the data as shown in:\n",
" https://scikit-learn.org/stable/modules/preprocessing.html\n",
"Please also refer to the documentation for alternative solver options:\n",
" https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
" extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Process</th>\n",
" <th>Model Name</th>\n",
" <th>F1 Scores</th>\n",
" <th>Range of F1 Scores</th>\n",
" <th>Std Deviation of F1 Scores</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Naive Bayes</td>\n",
" <td>[0.74, 0.75, 0.78, 0.78, 0.78, 0.79, 0.79, 0.8...</td>\n",
" <td>0.74-0.82</td>\n",
" <td>0.025927</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Logistic Regression</td>\n",
" <td>[0.76, 0.76, 0.78, 0.79, 0.79, 0.79, 0.81, 0.8...</td>\n",
" <td>0.76-0.83</td>\n",
" <td>0.023664</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Decesion Tree Classifier</td>\n",
" <td>[0.74, 0.74, 0.75, 0.75, 0.79, 0.79, 0.8, 0.8,...</td>\n",
" <td>0.74-0.81</td>\n",
" <td>0.029364</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Random Forest Classifier</td>\n",
" <td>[0.71, 0.77, 0.8, 0.81, 0.81, 0.82, 0.82, 0.83...</td>\n",
" <td>0.71-0.85</td>\n",
" <td>0.040332</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>XGBoost Classifier</td>\n",
" <td>[0.76, 0.78, 0.79, 0.79, 0.8, 0.81, 0.82, 0.83...</td>\n",
" <td>0.76-0.85</td>\n",
" <td>0.028304</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Gradient Boosting Classifier</td>\n",
" <td>[0.78, 0.79, 0.8, 0.82, 0.83, 0.83, 0.84, 0.84...</td>\n",
" <td>0.78-0.88</td>\n",
" <td>0.029155</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>XGBoost Random Forest Classifier</td>\n",
" <td>[0.73, 0.81, 0.81, 0.82, 0.83, 0.83, 0.84, 0.8...</td>\n",
" <td>0.73-0.88</td>\n",
" <td>0.042111</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Support Vector Machine Classifier</td>\n",
" <td>[0.56, 0.58, 0.59, 0.6, 0.6, 0.67, 0.67, 0.68,...</td>\n",
" <td>0.56-0.7</td>\n",
" <td>0.053800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Stochastic Gradient Descent Classifier</td>\n",
" <td>[0.52, 0.58, 0.63, 0.67, 0.72, 0.76, 0.76, 0.7...</td>\n",
" <td>0.52-0.81</td>\n",
" <td>0.099225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Gausian Process Classifier</td>\n",
" <td>[0.6, 0.66, 0.67, 0.68, 0.71, 0.71, 0.76, 0.78...</td>\n",
" <td>0.6-0.82</td>\n",
" <td>0.071181</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>K Nearst Neighbour Classifier</td>\n",
" <td>[0.63, 0.65, 0.66, 0.69, 0.72, 0.74, 0.75, 0.7...</td>\n",
" <td>0.63-0.77</td>\n",
" <td>0.051865</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Stratified-KFold</td>\n",
" <td>Linear Discriminant Analysis</td>\n",
" <td>[0.77, 0.78, 0.78, 0.79, 0.79, 0.8, 0.81, 0.81...</td>\n",
" <td>0.77-0.82</td>\n",
" <td>0.016465</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Process Model Name \\\n",
"0 Stratified-KFold Naive Bayes \n",
"0 Stratified-KFold Logistic Regression \n",
"0 Stratified-KFold Decesion Tree Classifier \n",
"0 Stratified-KFold Random Forest Classifier \n",
"0 Stratified-KFold XGBoost Classifier \n",
"0 Stratified-KFold Gradient Boosting Classifier \n",
"0 Stratified-KFold XGBoost Random Forest Classifier \n",
"0 Stratified-KFold Support Vector Machine Classifier \n",
"0 Stratified-KFold Stochastic Gradient Descent Classifier \n",
"0 Stratified-KFold Gausian Process Classifier \n",
"0 Stratified-KFold K Nearst Neighbour Classifier \n",
"0 Stratified-KFold Linear Discriminant Analysis \n",
"\n",
" F1 Scores Range of F1 Scores \\\n",
"0 [0.74, 0.75, 0.78, 0.78, 0.78, 0.79, 0.79, 0.8... 0.74-0.82 \n",
"0 [0.76, 0.76, 0.78, 0.79, 0.79, 0.79, 0.81, 0.8... 0.76-0.83 \n",
"0 [0.74, 0.74, 0.75, 0.75, 0.79, 0.79, 0.8, 0.8,... 0.74-0.81 \n",
"0 [0.71, 0.77, 0.8, 0.81, 0.81, 0.82, 0.82, 0.83... 0.71-0.85 \n",
"0 [0.76, 0.78, 0.79, 0.79, 0.8, 0.81, 0.82, 0.83... 0.76-0.85 \n",
"0 [0.78, 0.79, 0.8, 0.82, 0.83, 0.83, 0.84, 0.84... 0.78-0.88 \n",
"0 [0.73, 0.81, 0.81, 0.82, 0.83, 0.83, 0.84, 0.8... 0.73-0.88 \n",
"0 [0.56, 0.58, 0.59, 0.6, 0.6, 0.67, 0.67, 0.68,... 0.56-0.7 \n",
"0 [0.52, 0.58, 0.63, 0.67, 0.72, 0.76, 0.76, 0.7... 0.52-0.81 \n",
"0 [0.6, 0.66, 0.67, 0.68, 0.71, 0.71, 0.76, 0.78... 0.6-0.82 \n",
"0 [0.63, 0.65, 0.66, 0.69, 0.72, 0.74, 0.75, 0.7... 0.63-0.77 \n",
"0 [0.77, 0.78, 0.78, 0.79, 0.79, 0.8, 0.81, 0.81... 0.77-0.82 \n",
"\n",
" Std Deviation of F1 Scores \n",
"0 0.025927 \n",
"0 0.023664 \n",
"0 0.029364 \n",
"0 0.040332 \n",
"0 0.028304 \n",
"0 0.029155 \n",
"0 0.042111 \n",
"0 0.053800 \n",
"0 0.099225 \n",
"0 0.071181 \n",
"0 0.051865 \n",
"0 0.016465 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.naive_bayes import BernoulliNB\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.ensemble import GradientBoostingClassifier\n",
"from xgboost import XGBRFClassifier,XGBClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"\n",
"from sklearn.linear_model import SGDClassifier\n",
"from sklearn.multiclass import OneVsRestClassifier\n",
"\n",
"from sklearn.gaussian_process import GaussianProcessClassifier\n",
"from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
"\n",
"COLUMN_NAMES = [\"Process\",\"Model Name\", \"F1 Scores\",\"Range of F1 Scores\",\"Std Deviation of F1 Scores\"]\n",
"df_model_selection = pd.DataFrame(columns=COLUMN_NAMES)\n",
"\n",
"process='Stratified-KFold'\n",
"n_splits = 10\n",
"X=finalFeatures\n",
"y=label\n",
"\n",
"# 1.Naive Bayes\n",
"model_NB=BernoulliNB()\n",
"model_obj=model_NB\n",
"model_name='Naive Bayes'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 2.Logistic Regression\n",
"model_LR=LogisticRegression()\n",
"model_obj=model_LR\n",
"model_name='Logistic Regression'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 3.Decesion Tree Classifier\n",
"model_DTC=DecisionTreeClassifier()\n",
"model_obj=model_DTC\n",
"model_name='Decesion Tree Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 4.Random Forest Classifier\n",
"model_RFC=RandomForestClassifier()\n",
"model_obj=model_RFC\n",
"model_name='Random Forest Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 5.XGBoost Classifier\n",
"model_XGBC=XGBClassifier()\n",
"model_obj=model_XGBC\n",
"model_name='XGBoost Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 6.Gradient Boosting Classifier\n",
"model_GBC=GradientBoostingClassifier()\n",
"model_obj=model_GBC\n",
"model_name='Gradient Boosting Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 7.XGBoost Random Forest Classifier\n",
"model_XGBRFC=XGBRFClassifier()\n",
"model_obj=model_XGBRFC\n",
"model_name='XGBoost Random Forest Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"# 8.Support Vector Machine Classifier\n",
"model_SVC=SVC()\n",
"model_obj=model_SVC\n",
"model_name='Support Vector Machine Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"\n",
"# 9.SGD Classifier\n",
"model_sgd = OneVsRestClassifier(SGDClassifier())\n",
"model_obj=model_sgd\n",
"model_name='Stochastic Gradient Descent Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"#10.Gausian Process Classifier\n",
"model_GPC = GaussianProcessClassifier()\n",
"model_obj=model_GPC\n",
"model_name='Gausian Process Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"#11.Gausian Process Classifier\n",
"model_KNNC=KNeighborsClassifier()\n",
"model_obj=model_KNNC\n",
"model_name='K Nearst Neighbour Classifier'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"#12 Linear Discriminant Analysis\n",
"model_LDA=LinearDiscriminantAnalysis()\n",
"model_obj=model_LDA\n",
"model_name='Linear Discriminant Analysis'\n",
"stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
"\n",
"#Exporting the results to csv\n",
"#df_model_selection.to_csv(\"Model_statistics.csv\",index = False)\n",
"df_model_selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conclusion sofar\n",
"#from above Cross validation results it is understood that Gradient Boosting Classifier is giving better perfprmance than others classifiers\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train f1-Score: 0.88, Test f1-score: 0.84, for Sample Split: 1\n",
"Train f1-Score: 0.9, Test f1-score: 0.83, for Sample Split: 2\n",
"Train f1-Score: 0.9, Test f1-score: 0.81, for Sample Split: 3\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train f1-Score: 0.9, Test f1-score: 0.89, for Sample Split: 4\n",
"Train f1-Score: 0.89, Test f1-score: 0.73, for Sample Split: 5\n",
"Train f1-Score: 0.89, Test f1-score: 0.8, for Sample Split: 6\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train f1-Score: 0.9, Test f1-score: 0.79, for Sample Split: 7\n",
"Train f1-Score: 0.9, Test f1-score: 0.88, for Sample Split: 8\n",
"Train f1-Score: 0.89, Test f1-score: 0.85, for Sample Split: 9\n",
"Train f1-Score: 0.9, Test f1-score: 0.78, for Sample Split: 10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
}
],
"source": [
"# Now lets try to get the Scores using StratifiedKFold Cross Validation\n",
"\n",
"#Initialize the algo\n",
"model=GradientBoostingClassifier()\n",
"\n",
"#Initialize StratifiedKFold Method\n",
"from sklearn.model_selection import StratifiedKFold\n",
"kfold = StratifiedKFold(n_splits=10, \n",
" random_state=1,\n",
" shuffle=True)\n",
"\n",
"#Initialize For Loop \n",
"\n",
"i=0\n",
"for train,test in kfold.split(finalFeatures,label):\n",
" i = i+1\n",
" X_train,X_test = finalFeatures[train],finalFeatures[test]\n",
" y_train,y_test = label[train],label[test]\n",
" \n",
" model.fit(X_train,y_train)\n",
" test_ds_predicted=model.predict(X_test)\n",
" train_ds_predicted=model.predict(X_train)\n",
" \n",
" test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
" train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
" \n",
" #print(\"Train Score: {}, Test score: {}, for Sample Split: {}\".format(model.score(X_train,y_train),model.score(X_test,y_test),i))\n",
" print(\"Train f1-Score: {}, Test f1-score: {}, for Sample Split: {}\".format(train_f1_score,test_f1_score,i))\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"#Lets extract the Train and Test sample for split 4\n",
"from sklearn.model_selection import StratifiedKFold\n",
"kfold = StratifiedKFold(n_splits=10, #n_splits should be equal to no of cv value in cross_val_score\n",
" random_state=1,\n",
" shuffle=True)\n",
"i=0\n",
"for train,test in kfold.split(finalFeatures,label):\n",
" i = i+1\n",
" if i == 4:\n",
" X_train,X_test,y_train,y_test = finalFeatures[train],finalFeatures[test],label[train],label[test]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train f1-Score: 0.88, Test f1-score: 0.94\n",
"Train Accuracy Score is:0.9 and Test Accuracy Score:0.89\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
}
],
"source": [
"#Final Model\n",
"finalModel=GradientBoostingClassifier()\n",
"finalModel.fit(X_train,y_train)\n",
"\n",
"test_ds_predicted=model.predict(X_test)\n",
"train_ds_predicted=model.predict(X_train)\n",
"\n",
"test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
"train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
"print(\"Train f1-Score: {}, Test f1-score: {}\".format(train_f1_score,test_f1_score))\n",
"\n",
"\n",
"train_score=np.round(finalModel.score(X_train,y_train),2)\n",
"test_score=np.round(finalModel.score(X_test,y_test),2)\n",
"print('Train Accuracy Score is:{} and Test Accuracy Score:{}'.format(train_score,test_score))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confusion Matrix:\n",
" [[527 22]\n",
" [ 69 273]]\n",
"\n",
" Classification Report:\n",
" precision recall f1-score support\n",
"\n",
" 0 0.88 0.96 0.92 549\n",
" 1 0.93 0.80 0.86 342\n",
"\n",
" accuracy 0.90 891\n",
" macro avg 0.90 0.88 0.89 891\n",
"weighted avg 0.90 0.90 0.90 891\n",
"\n"
]
}
],
"source": [
"# Confusion Matrix and Classification Report\n",
"from sklearn.metrics import confusion_matrix,classification_report\n",
"cm=confusion_matrix(y_true=label, y_pred=finalModel.predict(finalFeatures))\n",
"CR=classification_report(y_true=label, y_pred=finalModel.predict(finalFeatures))\n",
"print('Confusion Matrix:\\n',cm)\n",
"print('\\n Classification Report:\\n',CR)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Lets try to see if we can improve the model performance by Hyper parameter Tuning using Randomized Grid Search"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n",
"C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
" return f(**kwargs)\n"
]
},
{
"data": {
"text/plain": [
"RandomizedSearchCV(cv=10, estimator=GradientBoostingClassifier(),\n",
" param_distributions={'learning_rate': [0.05, 0.1, 0.5, 0.9,\n",
" 1],\n",
" 'loss': ['deviance', 'exponential'],\n",
" 'max_depth': [5, 6, 7, 8, 9, 10, 15, 20,\n",
" 30],\n",
" 'n_estimators': [50, 100, 120, 150,\n",
" 200],\n",
" 'subsample': [0.5, 0.6, 0.7, 0.8, 0.9,\n",
" 1]})"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.model_selection import RandomizedSearchCV\n",
"#min_child_weight\n",
"#colsample_bylevel\n",
"#colsample_bytree\n",
"#gamma\n",
"param_grid={\n",
" #'booster':['gbtree','gblinear','dart'],\n",
" 'max_depth':[5,6,7,8,9,10,15,20,30],\n",
" 'n_estimators':[50,100,120,150,200],\n",
" 'learning_rate':[0.05,0.1,0.5,0.9,1],\n",
" 'loss':['deviance','exponential'],\n",
" #'criterion':[0.2,0.4,0.6,0.8,1.0],\n",
" #'colsample_bynode':[0.2,0.4,0.6,0.8,1.0],\n",
" 'subsample':[0.5,0.6,0.7,0.8,0.9,1]\n",
" #'gamma':[0.1,0.3,0.5,0.8,1]\n",
" }\n",
"model_GBC=GradientBoostingClassifier()\n",
"RS = RandomizedSearchCV(estimator=model_GBC,param_distributions=param_grid,cv=10,n_iter=10)\n",
"\n",
"RS.fit(finalFeatures,label) \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8249812734082397"
]
},
"execution_count": 155,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"RS.best_score_"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
" learning_rate=0.05, loss='deviance', max_depth=6,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, n_estimators=50,\n",
" n_iter_no_change=None, presort='deprecated',\n",
" random_state=None, subsample=0.7, tol=0.0001,\n",
" validation_fraction=0.1, verbose=0,\n",
" warm_start=False)"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"RS.best_estimator_"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train f1-Score: 0.88, Test f1-score: 0.94\n",
"Train Accuracy Score is:0.92 and Test Accuracy Score:0.92\n"
]
}
],
"source": [
"#Final Model\n",
"finalModel=GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
" learning_rate=0.05, loss='deviance', max_depth=6,\n",
" max_features=None, max_leaf_nodes=None,\n",
" min_impurity_decrease=0.0, min_impurity_split=None,\n",
" min_samples_leaf=1, min_samples_split=2,\n",
" min_weight_fraction_leaf=0.0, n_estimators=50,\n",
" n_iter_no_change=None, presort='deprecated',\n",
" random_state=None, subsample=0.7, tol=0.0001,\n",
" validation_fraction=0.1, verbose=0,\n",
" warm_start=False)\n",
"finalModel.fit(X_train,y_train)\n",
"\n",
"test_ds_predicted=model.predict(X_test)\n",
"train_ds_predicted=model.predict(X_train)\n",
"\n",
"test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
"train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
"print(\"Train f1-Score: {}, Test f1-score: {}\".format(train_f1_score,test_f1_score))\n",
"\n",
"\n",
"train_score=np.round(finalModel.score(X_train,y_train),2)\n",
"test_score=np.round(finalModel.score(X_test,y_test),2)\n",
"print('Train Accuracy Score is:{} and Test Accuracy Score:{}'.format(train_score,test_score))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Making the Test data ready for Predictions"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PassengerId 0.000000\n",
"Pclass 0.000000\n",
"Name 0.000000\n",
"Sex 0.000000\n",
"Age 9.652076\n",
"SibSp 0.000000\n",
"Parch 0.000000\n",
"Ticket 0.000000\n",
"Fare 0.112233\n",
"Cabin 36.700337\n",
"Embarked 0.000000\n",
"dtype: float64"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"testData=pd.read_csv('~/Public/skbc/Kaggle/test.csv')\n",
"testData.isnull().sum()/len(dataSet)*100"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>892</td>\n",
" <td>3</td>\n",
" <td>Kelly, Mr. James</td>\n",
" <td>male</td>\n",
" <td>34.5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>330911</td>\n",
" <td>7.8292</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Pclass Name Sex Age SibSp Parch Ticket \\\n",
"0 892 3 Kelly, Mr. James male 34.5 0 0 330911 \n",
"\n",
" Fare Cabin Embarked \n",
"0 7.8292 NaN Q "
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"testData.head(1)"
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [],
"source": [
"# Fill in Null Values for 'Age' with mean value\n",
"testData.Age.fillna(np.round(testData.Age.mean()),inplace=True)\n",
"\n",
"# Fill in Null Values for 'Fare' with mean value\n",
"testData.Fare.fillna(np.round(testData.Fare.mean()),inplace=True)\n",
"\n",
"# Lets make the Final features same as Train dataset\n",
"finalTestdata=testData.drop(columns=['PassengerId','Name','Cabin','Ticket'])\n",
"\n",
"# Lets apply OneHotEncoding for Categorical features\n",
"finalTestdata=pd.get_dummies(finalTestdata).values\n",
"\n",
"# Preddict the survivir based on test data\n",
"survivor=finalModel.predict(finalTestdata)\n",
"\n",
"# Export the prediction as CSV file\n",
"prediction=testData[['PassengerId']]\n",
"prediction['Survived']=pd.DataFrame(survivor,columns=['Survived'])\n",
"#prediction.to_csv('D:\\SarojOfficeWork\\AI\\Kaggle\\TitanicSurviver\\prediction.csv')\n",
"prediction.to_csv('prediction.csv')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment