sarojbc75/CrossValidation and Grid Search_Automation_ClassificationProblem.ipynb

## CrossValidation and Grid Search_Automation_ClassificationProblem.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Project Objective: How to apply cross validation and compare diff models. \n",
    "   \n",
    "    Summary of the overall approach.\n",
    "    1. Chi-square test was performed to sigle out of the feature importance of the catagorical feature and it was concluded \n",
    "        that All catagorical features seems to have some kind of influence on the outcome.\n",
    "    2. Following Features have been decided to be removed from model building and training as part of initial phase.\n",
    "        -- Cabin : because most of the values (Aprox 70%) are missing\n",
    "        -- Names:\n",
    "        -- Ticket\n",
    "    3. Model Accuracy has been tested with Cross validations score using following models.\n",
    "          1.Decesion Trees Classifier\n",
    "          2.RandomForest Classifier\n",
    "          3.LogisticRegression\n",
    "          4.Support Vector Classifier\n",
    "          5.GradientBoosting Classifier\n",
    "          6.XGBoost Classifier\n",
    "          7.XGBoost Random Forest Classifier\n",
    "    4. Based on 10 Crosss validation splits, it is found that the XGBoost Random Forest Classifier gave the better accuracy \n",
    "       score of Min Score of:75%, Maximum Score of:90% and Average Score of: 83%\n",
    "    5. K-Fold Crosss validation was performed and extracted the best training and Testing sample out of 10 splits and \n",
    "       model was retrained.\n",
    "    6. GridSearch Hyperparametr tuning approach was also tried to check if the model performance can further be increased.\n",
    "## About Data\n",
    "    The dataset has been taken from Kaggle Titanic Survivor_Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataSet=pd.read_csv(r'C:\\Saroj_Official\\AI\\Kaggle\\TitanicSurviver\\train.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataSet.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  891 non-null    int64  \n",
      " 1   Survived     891 non-null    int64  \n",
      " 2   Pclass       891 non-null    int64  \n",
      " 3   Name         891 non-null    object \n",
      " 4   Sex          891 non-null    object \n",
      " 5   Age          714 non-null    float64\n",
      " 6   SibSp        891 non-null    int64  \n",
      " 7   Parch        891 non-null    int64  \n",
      " 8   Ticket       891 non-null    object \n",
      " 9   Fare         891 non-null    float64\n",
      " 10  Cabin        204 non-null    object \n",
      " 11  Embarked     889 non-null    object \n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# Check the dataset for features and Label types\n",
    "dataSet.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId     0.000000\n",
       "Survived        0.000000\n",
       "Pclass          0.000000\n",
       "Name            0.000000\n",
       "Sex             0.000000\n",
       "Age            19.865320\n",
       "SibSp           0.000000\n",
       "Parch           0.000000\n",
       "Ticket          0.000000\n",
       "Fare            0.000000\n",
       "Cabin          77.104377\n",
       "Embarked        0.224467\n",
       "dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check for Null Features and % of Null values across each Featur\n",
    "# Features having Null values are 'Age','Cabin' and 'Embarked'\n",
    "dataSet.isnull().sum()/len(dataSet)*100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Survived  Pclass     Sex   Age  SibSp  Parch            Ticket     Fare  \\\n",
       "0         0       3    male  22.0      1      0         A/5 21171   7.2500   \n",
       "1         1       1  female  38.0      1      0          PC 17599  71.2833   \n",
       "2         1       3  female  26.0      0      0  STON/O2. 3101282   7.9250   \n",
       "3         1       1  female  35.0      1      0            113803  53.1000   \n",
       "4         0       3    male  35.0      0      0            373450   8.0500   \n",
       "\n",
       "  Embarked  \n",
       "0        S  \n",
       "1        C  \n",
       "2        S  \n",
       "3        S  \n",
       "4        S  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Fill in Null Values for 'Age' with mean value\n",
    "dataSet.Age.fillna(np.round(dataSet.Age.mean()),inplace=True)\n",
    "# Fill in Null Values for 'Embarked' with mode\n",
    "dataSet.Embarked.fillna(dataSet.Embarked.mode()[0],inplace=True)\n",
    "\n",
    "# Since there are 77% of missing values for the attribute Cabin, its better to discard this columns from model building as part of initial trial\n",
    "# Lets also drop PassengerId,Name column from model building\n",
    "dataSet1=dataSet.drop(columns=['PassengerId','Name','Cabin'])\n",
    "dataSet1.head()\n",
    "# Name           891 non-null object\n",
    "# Embarked       889 non-null object\n",
    "# Cabin          204 non-null object\n",
    "# Sex            891 non-null object\n",
    "# Ticket         891 non-null object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    891.000000\n",
       "mean      29.758889\n",
       "std       13.002570\n",
       "min        0.420000\n",
       "25%       22.000000\n",
       "50%       30.000000\n",
       "75%       35.000000\n",
       "max       80.000000\n",
       "Name: Age, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Lets Make Age into Age_group  (Optional Step)\n",
    "dataSet1.Age.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def age_group(x):\n",
    "    age_grp=''\n",
    "    if (x >=0 and x <= 12):\n",
    "        age_grp='Kid'\n",
    "    elif (x >=13 and x <= 19):\n",
    "        age_grp='Teenager'\n",
    "    elif (x >=20 and x <= 55):\n",
    "        age_grp='Adult'\n",
    "    else:\n",
    "        age_grp='Older'\n",
    "    return  age_grp   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataSet1['Age_group']=dataSet1.Age.apply(age_group)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Age_group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>886</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>male</td>\n",
       "      <td>27.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>211536</td>\n",
       "      <td>13.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>S</td>\n",
       "      <td>Teenager</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>female</td>\n",
       "      <td>30.0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>W./C. 6607</td>\n",
       "      <td>23.4500</td>\n",
       "      <td>S</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>111369</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>C</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>male</td>\n",
       "      <td>32.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>370376</td>\n",
       "      <td>7.7500</td>\n",
       "      <td>Q</td>\n",
       "      <td>Adult</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>891 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Survived  Pclass     Sex   Age  SibSp  Parch            Ticket     Fare  \\\n",
       "0           0       3    male  22.0      1      0         A/5 21171   7.2500   \n",
       "1           1       1  female  38.0      1      0          PC 17599  71.2833   \n",
       "2           1       3  female  26.0      0      0  STON/O2. 3101282   7.9250   \n",
       "3           1       1  female  35.0      1      0            113803  53.1000   \n",
       "4           0       3    male  35.0      0      0            373450   8.0500   \n",
       "..        ...     ...     ...   ...    ...    ...               ...      ...   \n",
       "886         0       2    male  27.0      0      0            211536  13.0000   \n",
       "887         1       1  female  19.0      0      0            112053  30.0000   \n",
       "888         0       3  female  30.0      1      2        W./C. 6607  23.4500   \n",
       "889         1       1    male  26.0      0      0            111369  30.0000   \n",
       "890         0       3    male  32.0      0      0            370376   7.7500   \n",
       "\n",
       "    Embarked Age_group  \n",
       "0          S     Adult  \n",
       "1          C     Adult  \n",
       "2          S     Adult  \n",
       "3          S     Adult  \n",
       "4          S     Adult  \n",
       "..       ...       ...  \n",
       "886        S     Adult  \n",
       "887        S  Teenager  \n",
       "888        S     Adult  \n",
       "889        C     Adult  \n",
       "890        Q     Adult  \n",
       "\n",
       "[891 rows x 10 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataSet1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Index(['Pclass', 'Sex', 'Ticket', 'Embarked', 'Age_group'], dtype='object')\n",
      "[5.44658660e+01 9.27024470e+01 2.87165547e+03 1.02025247e+01\n",
      " 2.02415975e+00]\n"
     ]
    }
   ],
   "source": [
    "# Lets try to figure out: Among all categorical features which features seems to be influncing most for predicting the outcome\n",
    "# We will do Chisquare test for SELECTING TOP-N CATEGORICAL FEATURES. For this lets separate all categorical features and label\n",
    "dataSet2=dataSet1.drop(columns=['Age','Fare','SibSp','Parch'])\n",
    "dataSet2=dataSet2.astype(str)\n",
    "dataSet2.head()\n",
    "\n",
    "#From Below Chisquare Test it seems most influencing Categorical feature is 'Ticket' followed by 'Sex','Pclass'.\n",
    "#However the scores are not that far apart and hence its not conclusive to leave out any acategorical feature at this point.\n",
    "# We can possibly leave out Ticket Ticket and include Fare instead\n",
    "from sklearn.feature_selection import chi2\n",
    "from sklearn.feature_selection import SelectKBest\n",
    "from sklearn.preprocessing import OrdinalEncoder\n",
    "features=dataSet2.drop(columns=['Survived'])\n",
    "label=dataSet2.loc[:,['Survived']]\n",
    "OE=OrdinalEncoder()\n",
    "features_chi=OE.fit_transform(features)  # This is needed to encode each catagorical variable to integers values.\n",
    "modelKBest=SelectKBest(score_func=chi2,k='all')\n",
    "finalFeatures=modelKBest.fit_transform(features_chi,label)\n",
    "print(features.columns)\n",
    "print(modelKBest.scores_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets decide Final features and  label as below. Dropping out PassengerId,Name,Cabin, Ticket columns for the initial model. \n",
    "finalFeatures=dataSet.drop(columns=['PassengerId','Name','Cabin','Ticket','Survived'])\n",
    "label=dataSet.loc[:,['Survived']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Sex_female</th>\n",
       "      <th>Sex_male</th>\n",
       "      <th>Embarked_C</th>\n",
       "      <th>Embarked_Q</th>\n",
       "      <th>Embarked_S</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Pclass   Age  SibSp  Parch     Fare  Sex_female  Sex_male  Embarked_C  \\\n",
       "0       3  22.0      1      0   7.2500           0         1           0   \n",
       "1       1  38.0      1      0  71.2833           1         0           1   \n",
       "2       3  26.0      0      0   7.9250           1         0           0   \n",
       "3       1  35.0      1      0  53.1000           1         0           0   \n",
       "4       3  35.0      0      0   8.0500           0         1           0   \n",
       "\n",
       "   Embarked_Q  Embarked_S  \n",
       "0           0           1  \n",
       "1           0           0  \n",
       "2           0           1  \n",
       "3           0           1  \n",
       "4           0           1  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Lets apply OneHotEncoding for Categorical features\n",
    "finalFeatures=pd.get_dummies(finalFeatures)\n",
    "finalFeatures.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "finalFeatures=finalFeatures.values\n",
    "label=label.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Verify the performance of different models using Stratified-KFold Cross Validation "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import StratifiedKFold\n",
    "from sklearn.metrics import f1_score,classification_report,confusion_matrix\n",
    "#from sklearn import metrics\n",
    "\n",
    "def stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y):\n",
    "    global df_model_selection\n",
    "    \n",
    "    skf = StratifiedKFold(n_splits, random_state=12,shuffle=True)\n",
    "    \n",
    "    weighted_f1_score = []\n",
    "    #print(skf.split(X,y))\n",
    "    for train_index, test_index in skf.split(X,y):\n",
    "        X_train, X_test = X[train_index], X[test_index] \n",
    "        y_train, y_test = y[train_index], y[test_index]\n",
    "        \n",
    "        \n",
    "        model_obj.fit(X_train, y_train)\n",
    "        test_ds_predicted = model_obj.predict( X_test )     \n",
    "        weighted_f1_score.append(round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2))\n",
    "        \n",
    "    sd_weighted_f1_score = np.std(weighted_f1_score, ddof=1)\n",
    "    range_of_f1_scores = \"{}-{}\".format(min(weighted_f1_score),max(weighted_f1_score))    \n",
    "    df_model_selection = pd.concat([df_model_selection,pd.DataFrame([[process,model_name,sorted(weighted_f1_score),range_of_f1_scores,sd_weighted_f1_score]], columns =COLUMN_NAMES) ])\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:764: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "Please also refer to the documentation for alternative solver options:\n",
      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\ipykernel_launcher.py:17: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Process</th>\n",
       "      <th>Model Name</th>\n",
       "      <th>F1 Scores</th>\n",
       "      <th>Range of F1 Scores</th>\n",
       "      <th>Std Deviation of F1 Scores</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Naive Bayes</td>\n",
       "      <td>[0.74, 0.75, 0.78, 0.78, 0.78, 0.79, 0.79, 0.8...</td>\n",
       "      <td>0.74-0.82</td>\n",
       "      <td>0.025927</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Logistic Regression</td>\n",
       "      <td>[0.76, 0.76, 0.78, 0.79, 0.79, 0.79, 0.81, 0.8...</td>\n",
       "      <td>0.76-0.83</td>\n",
       "      <td>0.023664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Decesion Tree Classifier</td>\n",
       "      <td>[0.74, 0.74, 0.75, 0.75, 0.79, 0.79, 0.8, 0.8,...</td>\n",
       "      <td>0.74-0.81</td>\n",
       "      <td>0.029364</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Random Forest Classifier</td>\n",
       "      <td>[0.71, 0.77, 0.8, 0.81, 0.81, 0.82, 0.82, 0.83...</td>\n",
       "      <td>0.71-0.85</td>\n",
       "      <td>0.040332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>XGBoost Classifier</td>\n",
       "      <td>[0.76, 0.78, 0.79, 0.79, 0.8, 0.81, 0.82, 0.83...</td>\n",
       "      <td>0.76-0.85</td>\n",
       "      <td>0.028304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Gradient Boosting Classifier</td>\n",
       "      <td>[0.78, 0.79, 0.8, 0.82, 0.83, 0.83, 0.84, 0.84...</td>\n",
       "      <td>0.78-0.88</td>\n",
       "      <td>0.029155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>XGBoost Random Forest Classifier</td>\n",
       "      <td>[0.73, 0.81, 0.81, 0.82, 0.83, 0.83, 0.84, 0.8...</td>\n",
       "      <td>0.73-0.88</td>\n",
       "      <td>0.042111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Support Vector Machine Classifier</td>\n",
       "      <td>[0.56, 0.58, 0.59, 0.6, 0.6, 0.67, 0.67, 0.68,...</td>\n",
       "      <td>0.56-0.7</td>\n",
       "      <td>0.053800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Stochastic Gradient Descent Classifier</td>\n",
       "      <td>[0.52, 0.58, 0.63, 0.67, 0.72, 0.76, 0.76, 0.7...</td>\n",
       "      <td>0.52-0.81</td>\n",
       "      <td>0.099225</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Gausian Process Classifier</td>\n",
       "      <td>[0.6, 0.66, 0.67, 0.68, 0.71, 0.71, 0.76, 0.78...</td>\n",
       "      <td>0.6-0.82</td>\n",
       "      <td>0.071181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>K Nearst Neighbour Classifier</td>\n",
       "      <td>[0.63, 0.65, 0.66, 0.69, 0.72, 0.74, 0.75, 0.7...</td>\n",
       "      <td>0.63-0.77</td>\n",
       "      <td>0.051865</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Stratified-KFold</td>\n",
       "      <td>Linear Discriminant Analysis</td>\n",
       "      <td>[0.77, 0.78, 0.78, 0.79, 0.79, 0.8, 0.81, 0.81...</td>\n",
       "      <td>0.77-0.82</td>\n",
       "      <td>0.016465</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Process                              Model Name  \\\n",
       "0  Stratified-KFold                             Naive Bayes   \n",
       "0  Stratified-KFold                     Logistic Regression   \n",
       "0  Stratified-KFold                Decesion Tree Classifier   \n",
       "0  Stratified-KFold                Random Forest Classifier   \n",
       "0  Stratified-KFold                      XGBoost Classifier   \n",
       "0  Stratified-KFold            Gradient Boosting Classifier   \n",
       "0  Stratified-KFold        XGBoost Random Forest Classifier   \n",
       "0  Stratified-KFold       Support Vector Machine Classifier   \n",
       "0  Stratified-KFold  Stochastic Gradient Descent Classifier   \n",
       "0  Stratified-KFold              Gausian Process Classifier   \n",
       "0  Stratified-KFold           K Nearst Neighbour Classifier   \n",
       "0  Stratified-KFold            Linear Discriminant Analysis   \n",
       "\n",
       "                                           F1 Scores Range of F1 Scores  \\\n",
       "0  [0.74, 0.75, 0.78, 0.78, 0.78, 0.79, 0.79, 0.8...          0.74-0.82   \n",
       "0  [0.76, 0.76, 0.78, 0.79, 0.79, 0.79, 0.81, 0.8...          0.76-0.83   \n",
       "0  [0.74, 0.74, 0.75, 0.75, 0.79, 0.79, 0.8, 0.8,...          0.74-0.81   \n",
       "0  [0.71, 0.77, 0.8, 0.81, 0.81, 0.82, 0.82, 0.83...          0.71-0.85   \n",
       "0  [0.76, 0.78, 0.79, 0.79, 0.8, 0.81, 0.82, 0.83...          0.76-0.85   \n",
       "0  [0.78, 0.79, 0.8, 0.82, 0.83, 0.83, 0.84, 0.84...          0.78-0.88   \n",
       "0  [0.73, 0.81, 0.81, 0.82, 0.83, 0.83, 0.84, 0.8...          0.73-0.88   \n",
       "0  [0.56, 0.58, 0.59, 0.6, 0.6, 0.67, 0.67, 0.68,...           0.56-0.7   \n",
       "0  [0.52, 0.58, 0.63, 0.67, 0.72, 0.76, 0.76, 0.7...          0.52-0.81   \n",
       "0  [0.6, 0.66, 0.67, 0.68, 0.71, 0.71, 0.76, 0.78...           0.6-0.82   \n",
       "0  [0.63, 0.65, 0.66, 0.69, 0.72, 0.74, 0.75, 0.7...          0.63-0.77   \n",
       "0  [0.77, 0.78, 0.78, 0.79, 0.79, 0.8, 0.81, 0.81...          0.77-0.82   \n",
       "\n",
       "   Std Deviation of F1 Scores  \n",
       "0                    0.025927  \n",
       "0                    0.023664  \n",
       "0                    0.029364  \n",
       "0                    0.040332  \n",
       "0                    0.028304  \n",
       "0                    0.029155  \n",
       "0                    0.042111  \n",
       "0                    0.053800  \n",
       "0                    0.099225  \n",
       "0                    0.071181  \n",
       "0                    0.051865  \n",
       "0                    0.016465  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.naive_bayes import BernoulliNB\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "from xgboost import XGBRFClassifier,XGBClassifier\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "from sklearn.linear_model import SGDClassifier\n",
    "from sklearn.multiclass import OneVsRestClassifier\n",
    "\n",
    "from sklearn.gaussian_process import GaussianProcessClassifier\n",
    "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
    "\n",
    "COLUMN_NAMES = [\"Process\",\"Model Name\", \"F1 Scores\",\"Range of F1 Scores\",\"Std Deviation of F1 Scores\"]\n",
    "df_model_selection = pd.DataFrame(columns=COLUMN_NAMES)\n",
    "\n",
    "process='Stratified-KFold'\n",
    "n_splits = 10\n",
    "X=finalFeatures\n",
    "y=label\n",
    "\n",
    "# 1.Naive Bayes\n",
    "model_NB=BernoulliNB()\n",
    "model_obj=model_NB\n",
    "model_name='Naive Bayes'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 2.Logistic Regression\n",
    "model_LR=LogisticRegression()\n",
    "model_obj=model_LR\n",
    "model_name='Logistic Regression'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 3.Decesion Tree Classifier\n",
    "model_DTC=DecisionTreeClassifier()\n",
    "model_obj=model_DTC\n",
    "model_name='Decesion Tree Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 4.Random Forest Classifier\n",
    "model_RFC=RandomForestClassifier()\n",
    "model_obj=model_RFC\n",
    "model_name='Random Forest Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 5.XGBoost Classifier\n",
    "model_XGBC=XGBClassifier()\n",
    "model_obj=model_XGBC\n",
    "model_name='XGBoost Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 6.Gradient Boosting Classifier\n",
    "model_GBC=GradientBoostingClassifier()\n",
    "model_obj=model_GBC\n",
    "model_name='Gradient Boosting Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 7.XGBoost Random Forest Classifier\n",
    "model_XGBRFC=XGBRFClassifier()\n",
    "model_obj=model_XGBRFC\n",
    "model_name='XGBoost Random Forest Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "# 8.Support Vector Machine Classifier\n",
    "model_SVC=SVC()\n",
    "model_obj=model_SVC\n",
    "model_name='Support Vector Machine Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "\n",
    "# 9.SGD Classifier\n",
    "model_sgd = OneVsRestClassifier(SGDClassifier())\n",
    "model_obj=model_sgd\n",
    "model_name='Stochastic Gradient Descent Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "#10.Gausian Process Classifier\n",
    "model_GPC = GaussianProcessClassifier()\n",
    "model_obj=model_GPC\n",
    "model_name='Gausian Process Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "#11.Gausian Process Classifier\n",
    "model_KNNC=KNeighborsClassifier()\n",
    "model_obj=model_KNNC\n",
    "model_name='K Nearst Neighbour Classifier'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "#12 Linear Discriminant Analysis\n",
    "model_LDA=LinearDiscriminantAnalysis()\n",
    "model_obj=model_LDA\n",
    "model_name='Linear Discriminant Analysis'\n",
    "stratified_K_fold_validation(model_obj, model_name, process, n_splits, X, y)\n",
    "\n",
    "#Exporting the results to csv\n",
    "#df_model_selection.to_csv(\"Model_statistics.csv\",index = False)\n",
    "df_model_selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion sofar\n",
    "#from above Cross validation results it is understood that Gradient Boosting Classifier is giving better perfprmance than others classifiers\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train f1-Score: 0.88, Test f1-score: 0.84, for Sample Split: 1\n",
      "Train f1-Score: 0.9, Test f1-score: 0.83, for Sample Split: 2\n",
      "Train f1-Score: 0.9, Test f1-score: 0.81, for Sample Split: 3\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train f1-Score: 0.9, Test f1-score: 0.89, for Sample Split: 4\n",
      "Train f1-Score: 0.89, Test f1-score: 0.73, for Sample Split: 5\n",
      "Train f1-Score: 0.89, Test f1-score: 0.8, for Sample Split: 6\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train f1-Score: 0.9, Test f1-score: 0.79, for Sample Split: 7\n",
      "Train f1-Score: 0.9, Test f1-score: 0.88, for Sample Split: 8\n",
      "Train f1-Score: 0.89, Test f1-score: 0.85, for Sample Split: 9\n",
      "Train f1-Score: 0.9, Test f1-score: 0.78, for Sample Split: 10\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    }
   ],
   "source": [
    "# Now lets try to get the Scores using StratifiedKFold Cross Validation\n",
    "\n",
    "#Initialize the algo\n",
    "model=GradientBoostingClassifier()\n",
    "\n",
    "#Initialize StratifiedKFold Method\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "kfold = StratifiedKFold(n_splits=10, \n",
    "              random_state=1,\n",
    "              shuffle=True)\n",
    "\n",
    "#Initialize For Loop \n",
    "\n",
    "i=0\n",
    "for train,test in kfold.split(finalFeatures,label):\n",
    "    i = i+1\n",
    "    X_train,X_test = finalFeatures[train],finalFeatures[test]\n",
    "    y_train,y_test = label[train],label[test]\n",
    "    \n",
    "    model.fit(X_train,y_train)\n",
    "    test_ds_predicted=model.predict(X_test)\n",
    "    train_ds_predicted=model.predict(X_train)\n",
    "    \n",
    "    test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
    "    train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
    "    \n",
    "    #print(\"Train Score: {}, Test score: {}, for Sample Split: {}\".format(model.score(X_train,y_train),model.score(X_test,y_test),i))\n",
    "    print(\"Train f1-Score: {}, Test f1-score: {}, for Sample Split: {}\".format(train_f1_score,test_f1_score,i))\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Lets extract the Train and Test sample for split 4\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "kfold = StratifiedKFold(n_splits=10, #n_splits should be equal to no of cv value in cross_val_score\n",
    "              random_state=1,\n",
    "              shuffle=True)\n",
    "i=0\n",
    "for train,test in kfold.split(finalFeatures,label):\n",
    "    i = i+1\n",
    "    if i == 4:\n",
    "        X_train,X_test,y_train,y_test = finalFeatures[train],finalFeatures[test],label[train],label[test]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train f1-Score: 0.88, Test f1-score: 0.94\n",
      "Train Accuracy Score is:0.9 and  Test Accuracy Score:0.89\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    }
   ],
   "source": [
    "#Final Model\n",
    "finalModel=GradientBoostingClassifier()\n",
    "finalModel.fit(X_train,y_train)\n",
    "\n",
    "test_ds_predicted=model.predict(X_test)\n",
    "train_ds_predicted=model.predict(X_train)\n",
    "\n",
    "test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
    "train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
    "print(\"Train f1-Score: {}, Test f1-score: {}\".format(train_f1_score,test_f1_score))\n",
    "\n",
    "\n",
    "train_score=np.round(finalModel.score(X_train,y_train),2)\n",
    "test_score=np.round(finalModel.score(X_test,y_test),2)\n",
    "print('Train Accuracy Score is:{} and  Test Accuracy Score:{}'.format(train_score,test_score))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Confusion Matrix:\n",
      " [[527  22]\n",
      " [ 69 273]]\n",
      "\n",
      " Classification Report:\n",
      "               precision    recall  f1-score   support\n",
      "\n",
      "           0       0.88      0.96      0.92       549\n",
      "           1       0.93      0.80      0.86       342\n",
      "\n",
      "    accuracy                           0.90       891\n",
      "   macro avg       0.90      0.88      0.89       891\n",
      "weighted avg       0.90      0.90      0.90       891\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Confusion Matrix and Classification Report\n",
    "from sklearn.metrics import confusion_matrix,classification_report\n",
    "cm=confusion_matrix(y_true=label, y_pred=finalModel.predict(finalFeatures))\n",
    "CR=classification_report(y_true=label, y_pred=finalModel.predict(finalFeatures))\n",
    "print('Confusion Matrix:\\n',cm)\n",
    "print('\\n Classification Report:\\n',CR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Lets try to see if we can improve the model performance by Hyper parameter Tuning using Randomized Grid Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n",
      "C:\\Software\\Anaconda\\lib\\site-packages\\sklearn\\utils\\validation.py:73: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  return f(**kwargs)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "RandomizedSearchCV(cv=10, estimator=GradientBoostingClassifier(),\n",
       "                   param_distributions={'learning_rate': [0.05, 0.1, 0.5, 0.9,\n",
       "                                                          1],\n",
       "                                        'loss': ['deviance', 'exponential'],\n",
       "                                        'max_depth': [5, 6, 7, 8, 9, 10, 15, 20,\n",
       "                                                      30],\n",
       "                                        'n_estimators': [50, 100, 120, 150,\n",
       "                                                         200],\n",
       "                                        'subsample': [0.5, 0.6, 0.7, 0.8, 0.9,\n",
       "                                                      1]})"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.model_selection import RandomizedSearchCV\n",
    "#min_child_weight\n",
    "#colsample_bylevel\n",
    "#colsample_bytree\n",
    "#gamma\n",
    "param_grid={\n",
    "            #'booster':['gbtree','gblinear','dart'],\n",
    "            'max_depth':[5,6,7,8,9,10,15,20,30],\n",
    "            'n_estimators':[50,100,120,150,200],\n",
    "            'learning_rate':[0.05,0.1,0.5,0.9,1],\n",
    "            'loss':['deviance','exponential'],\n",
    "            #'criterion':[0.2,0.4,0.6,0.8,1.0],\n",
    "            #'colsample_bynode':[0.2,0.4,0.6,0.8,1.0],\n",
    "            'subsample':[0.5,0.6,0.7,0.8,0.9,1]\n",
    "            #'gamma':[0.1,0.3,0.5,0.8,1]\n",
    "           }\n",
    "model_GBC=GradientBoostingClassifier()\n",
    "RS = RandomizedSearchCV(estimator=model_GBC,param_distributions=param_grid,cv=10,n_iter=10)\n",
    "\n",
    "RS.fit(finalFeatures,label) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8249812734082397"
      ]
     },
     "execution_count": 155,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RS.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
       "                           learning_rate=0.05, loss='deviance', max_depth=6,\n",
       "                           max_features=None, max_leaf_nodes=None,\n",
       "                           min_impurity_decrease=0.0, min_impurity_split=None,\n",
       "                           min_samples_leaf=1, min_samples_split=2,\n",
       "                           min_weight_fraction_leaf=0.0, n_estimators=50,\n",
       "                           n_iter_no_change=None, presort='deprecated',\n",
       "                           random_state=None, subsample=0.7, tol=0.0001,\n",
       "                           validation_fraction=0.1, verbose=0,\n",
       "                           warm_start=False)"
      ]
     },
     "execution_count": 156,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "RS.best_estimator_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train f1-Score: 0.88, Test f1-score: 0.94\n",
      "Train Accuracy Score is:0.92 and  Test Accuracy Score:0.92\n"
     ]
    }
   ],
   "source": [
    "#Final Model\n",
    "finalModel=GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
    "                           learning_rate=0.05, loss='deviance', max_depth=6,\n",
    "                           max_features=None, max_leaf_nodes=None,\n",
    "                           min_impurity_decrease=0.0, min_impurity_split=None,\n",
    "                           min_samples_leaf=1, min_samples_split=2,\n",
    "                           min_weight_fraction_leaf=0.0, n_estimators=50,\n",
    "                           n_iter_no_change=None, presort='deprecated',\n",
    "                           random_state=None, subsample=0.7, tol=0.0001,\n",
    "                           validation_fraction=0.1, verbose=0,\n",
    "                           warm_start=False)\n",
    "finalModel.fit(X_train,y_train)\n",
    "\n",
    "test_ds_predicted=model.predict(X_test)\n",
    "train_ds_predicted=model.predict(X_train)\n",
    "\n",
    "test_f1_score=round(f1_score(y_true=y_test, y_pred=test_ds_predicted , average='weighted'),2)\n",
    "train_f1_score=round(f1_score(y_true=y_train, y_pred=train_ds_predicted , average='weighted'),2)\n",
    "print(\"Train f1-Score: {}, Test f1-score: {}\".format(train_f1_score,test_f1_score))\n",
    "\n",
    "\n",
    "train_score=np.round(finalModel.score(X_train,y_train),2)\n",
    "test_score=np.round(finalModel.score(X_test,y_test),2)\n",
    "print('Train Accuracy Score is:{} and  Test Accuracy Score:{}'.format(train_score,test_score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Making the Test data ready for Predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId     0.000000\n",
       "Pclass          0.000000\n",
       "Name            0.000000\n",
       "Sex             0.000000\n",
       "Age             9.652076\n",
       "SibSp           0.000000\n",
       "Parch           0.000000\n",
       "Ticket          0.000000\n",
       "Fare            0.112233\n",
       "Cabin          36.700337\n",
       "Embarked        0.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "testData=pd.read_csv('~/Public/skbc/Kaggle/test.csv')\n",
    "testData.isnull().sum()/len(dataSet)*100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>892</td>\n",
       "      <td>3</td>\n",
       "      <td>Kelly, Mr. James</td>\n",
       "      <td>male</td>\n",
       "      <td>34.5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>330911</td>\n",
       "      <td>7.8292</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Q</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Pclass              Name   Sex   Age  SibSp  Parch  Ticket  \\\n",
       "0          892       3  Kelly, Mr. James  male  34.5      0      0  330911   \n",
       "\n",
       "     Fare Cabin Embarked  \n",
       "0  7.8292   NaN        Q  "
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "testData.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fill in Null Values for 'Age' with mean value\n",
    "testData.Age.fillna(np.round(testData.Age.mean()),inplace=True)\n",
    "\n",
    "# Fill in Null Values for 'Fare' with mean value\n",
    "testData.Fare.fillna(np.round(testData.Fare.mean()),inplace=True)\n",
    "\n",
    "# Lets make the Final features same as Train dataset\n",
    "finalTestdata=testData.drop(columns=['PassengerId','Name','Cabin','Ticket'])\n",
    "\n",
    "# Lets apply OneHotEncoding for Categorical features\n",
    "finalTestdata=pd.get_dummies(finalTestdata).values\n",
    "\n",
    "# Preddict the survivir based on test data\n",
    "survivor=finalModel.predict(finalTestdata)\n",
    "\n",
    "# Export the prediction as CSV file\n",
    "prediction=testData[['PassengerId']]\n",
    "prediction['Survived']=pd.DataFrame(survivor,columns=['Survived'])\n",
    "#prediction.to_csv('D:\\SarojOfficeWork\\AI\\Kaggle\\TitanicSurviver\\prediction.csv')\n",
    "prediction.to_csv('prediction.csv')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}