Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ocoyawale/7c0cd896c87d763b758fc2b70f165ba0 to your computer and use it in GitHub Desktop.
Save ocoyawale/7c0cd896c87d763b758fc2b70f165ba0 to your computer and use it in GitHub Desktop.
Proper way to start a Data / ML project
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "ea25cdf7-bdbc-3cf1-0737-bc51675e3374"
},
"source": [
"# Titanic Data Science Solutions\n",
"\n",
"This notebook is companion to the book [Data Science Solutions](https://startupsci.com). The notebook walks us through a typical workflow for solving data science competitions at sites like Kaggle.\n",
"\n",
"There are several excellent notebooks to study data science competition entries. However many will skip some of the explanation on how the solution is developed as these notebooks are developed by experts for experts. The objective of this notebook is to follow a step-by-step workflow, explaining each step and rationale for every decision we take during solution development.\n",
"\n",
"## Workflow stages\n",
"\n",
"The competition solution workflow goes through seven stages described in the Data Science Solutions book.\n",
"\n",
"1. Question or problem definition.\n",
"2. Acquire training and testing data.\n",
"3. Wrangle, prepare, cleanse the data.\n",
"4. Analyze, identify patterns, and explore the data.\n",
"5. Model, predict and solve the problem.\n",
"6. Visualize, report, and present the problem solving steps and final solution.\n",
"7. Supply or submit the results.\n",
"\n",
"The workflow indicates general sequence of how each stage may follow the other. However there are use cases with exceptions.\n",
"\n",
"- We may combine mulitple workflow stages. We may analyze by visualizing data.\n",
"- Perform a stage earlier than indicated. We may analyze data before and after wrangling.\n",
"- Perform a stage multiple times in our workflow. Visualize stage may be used multiple times.\n",
"- Drop a stage altogether. We may not need supply stage to productize or service enable our dataset for a competition.\n",
"\n",
"\n",
"## Question and problem definition\n",
"\n",
"Competition sites like Kaggle define the problem to solve or questions to ask while providing the datasets for training your data science model and testing the model results against a test dataset. The question or problem definition for Titanic Survival competition is [described here at Kaggle](https://www.kaggle.com/c/titanic).\n",
"\n",
"> Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.\n",
"\n",
"We may also want to develop some early understanding about the domain of our problem. This is described on the [Kaggle competition description page here](https://www.kaggle.com/c/titanic). Here are the highlights to note.\n",
"\n",
"- On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. Translated 32% survival rate.\n",
"- One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew.\n",
"- Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.\n",
"\n",
"## Workflow goals\n",
"\n",
"The data science solutions workflow solves for seven major goals.\n",
"\n",
"**Classifying.** We may want to classify or categorize our samples. We may also want to understand the implications or correlation of different classes with our solution goal.\n",
"\n",
"**Correlating.** One can approach the problem based on available features within the training dataset. Which features within the dataset contribute significantly to our solution goal? Statistically speaking is there a [correlation](https://en.wikiversity.org/wiki/Correlation) among a feature and solution goal? As the feature values change does the solution state change as well, and visa-versa? This can be tested both for numerical and categorical features in the given dataset. We may also want to determine correlation among features other than survival for subsequent goals and workflow stages. Correlating certain features may help in creating, completing, or correcting features.\n",
"\n",
"**Converting.** For modeling stage, one needs to prepare the data. Depending on the choice of model algorithm one may require all features to be converted to numerical equivalent values. So for instance converting text categorical values to numeric values.\n",
"\n",
"**Completing.** Data preparation may also require us to estimate any missing values within a feature. Model algorithms may work best when there are no missing values.\n",
"\n",
"**Correcting.** We may also analyze the given training dataset for errors or possibly innacurate values within features and try to corrent these values or exclude the samples containing the errors. One way to do this is to detect any outliers among our samples or features. We may also completely discard a feature if it is not contribting to the analysis or may significantly skew the results.\n",
"\n",
"**Creating.** Can we create new features based on an existing feature or a set of features, such that the new feature follows the correlation, conversion, completeness goals.\n",
"\n",
"**Charting.** How to select the right visualization plots and charts depending on nature of the data and the solution goals. A good start is to read the Tableau paper on [Which chart or graph is right for you?](http://www.tableau.com/learn/whitepapers/which-chart-or-graph-is-right-for-you#ERAcoH5sEG5CFlek.99)."
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "56a3be4e-76ef-20c6-25e8-da16147cf6d7"
},
"source": [
"## Refactor Release 2017-Jan-29\n",
"\n",
"We are significantly refactoring the notebook based on (a) comments received by readers, (b) issues in porting notebook from Jupyter kernel (2.7) to Kaggle kernel (3.5), and (c) review of few more best practice kernels.\n",
"\n",
"### User comments\n",
"\n",
"- Combine training and test data for certain operations like converting titles across dataset to numerical values. (thanks @Sharan Naribole)\n",
"- Correct observation - nearly 30% of the passengers had siblings and/or spouses aboard. (thanks @Reinhard)\n",
"- Correctly interpreting logistic regresssion coefficients. (thanks @Reinhard)\n",
"\n",
"### Porting issues\n",
"\n",
"- Specify plot dimensions, bring legend into plot.\n",
"\n",
"\n",
"### Best practices\n",
"\n",
"- Performing feature correlation analysis early in the project.\n",
"- Using multiple plots instead of overlays for readability."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"_cell_guid": "5767a33c-8f18-4034-e52d-bf7a8f7d8ab8"
},
"outputs": [],
"source": [
"# data analysis and wrangling\n",
"import pandas as pd\n",
"import numpy as np\n",
"import random as rnd\n",
"\n",
"# visualization\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# machine learning\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.svm import SVC, LinearSVC\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.linear_model import Perceptron\n",
"from sklearn.linear_model import SGDClassifier\n",
"from sklearn.tree import DecisionTreeClassifier"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "6b5dc743-15b1-aac6-405e-081def6ecca1"
},
"source": [
"## Acquire data\n",
"\n",
"The Python Pandas packages helps us work with our datasets. We start by acquiring the training and testing datasets into Pandas DataFrames. We also combine these datasets to run certain operations on both datasets together."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"_cell_guid": "e7319668-86fe-8adc-438d-0eef3fd0a982"
},
"outputs": [],
"source": [
"train_df = pd.read_csv('../input/train.csv')\n",
"test_df = pd.read_csv('../input/test.csv')\n",
"combine = [train_df, test_df]"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "3d6188f3-dc82-8ae6-dabd-83e28fcbf10d"
},
"source": [
"## Analyze by describing data\n",
"\n",
"Pandas also helps describe the datasets answering following questions early in our project.\n",
"\n",
"**Which features are available in the dataset?**\n",
"\n",
"Noting the feature names for directly manipulating or analyzing these. These feature names are described on the [Kaggle data page here](https://www.kaggle.com/c/titanic/data)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"_cell_guid": "ce473d29-8d19-76b8-24a4-48c217286e42"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['PassengerId' 'Survived' 'Pclass' 'Name' 'Sex' 'Age' 'SibSp' 'Parch'\n",
" 'Ticket' 'Fare' 'Cabin' 'Embarked']\n"
]
}
],
"source": [
"print(train_df.columns.values)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "cd19a6f6-347f-be19-607b-dca950590b37"
},
"source": [
"**Which features are categorical?**\n",
"\n",
"These values classify the samples into sets of similar samples. Within categorical features are the values nominal, ordinal, ratio, or interval based? Among other things this helps us select the appropriate plots for visualization.\n",
"\n",
"- Categorical: Survived, Sex, and Embarked. Ordinal: Pclass.\n",
"\n",
"**Which features are numerical?**\n",
"\n",
"Which features are numerical? These values change from sample to sample. Within numerical features are the values discrete, continuous, or timeseries based? Among other things this helps us select the appropriate plots for visualization.\n",
"\n",
"- Continous: Age, Fare. Discrete: SibSp, Parch."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"_cell_guid": "8d7ac195-ac1a-30a4-3f3f-80b8cf2c1c0f"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# preview the data\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "97f4e6f8-2fea-46c4-e4e8-b69062ee3d46"
},
"source": [
"**Which features are mixed data types?**\n",
"\n",
"Numerical, alphanumeric data within same feature. These are candidates for correcting goal.\n",
"\n",
"- Ticket is a mix of numeric and alphanumeric data types. Cabin is alphanumeric.\n",
"\n",
"**Which features may contain errors or typos?**\n",
"\n",
"This is harder to review for a large dataset, however reviewing a few samples from a smaller dataset may just tell us outright, which features may require correcting.\n",
"\n",
"- Name feature may contain errors or typos as there are several ways used to describe a name including titles, round brackets, and quotes used for alternative or short names."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"_cell_guid": "f6e761c2-e2ff-d300-164c-af257083bb46"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>886</th>\n",
" <td>887</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>Montvila, Rev. Juozas</td>\n",
" <td>male</td>\n",
" <td>27.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>211536</td>\n",
" <td>13.00</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>887</th>\n",
" <td>888</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Graham, Miss. Margaret Edith</td>\n",
" <td>female</td>\n",
" <td>19.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>112053</td>\n",
" <td>30.00</td>\n",
" <td>B42</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>888</th>\n",
" <td>889</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Johnston, Miss. Catherine Helen \"Carrie\"</td>\n",
" <td>female</td>\n",
" <td>NaN</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>W./C. 6607</td>\n",
" <td>23.45</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>889</th>\n",
" <td>890</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Behr, Mr. Karl Howell</td>\n",
" <td>male</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>111369</td>\n",
" <td>30.00</td>\n",
" <td>C148</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>890</th>\n",
" <td>891</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Dooley, Mr. Patrick</td>\n",
" <td>male</td>\n",
" <td>32.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>370376</td>\n",
" <td>7.75</td>\n",
" <td>NaN</td>\n",
" <td>Q</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"886 887 0 2 Montvila, Rev. Juozas \n",
"887 888 1 1 Graham, Miss. Margaret Edith \n",
"888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n",
"889 890 1 1 Behr, Mr. Karl Howell \n",
"890 891 0 3 Dooley, Mr. Patrick \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"886 male 27.0 0 0 211536 13.00 NaN S \n",
"887 female 19.0 0 0 112053 30.00 B42 S \n",
"888 female NaN 1 2 W./C. 6607 23.45 NaN S \n",
"889 male 26.0 0 0 111369 30.00 C148 C \n",
"890 male 32.0 0 0 370376 7.75 NaN Q "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "8bfe9610-689a-29b2-26ee-f67cd4719079"
},
"source": [
"**Which features contain blank, null or empty values?**\n",
"\n",
"These will require correcting.\n",
"\n",
"- Cabin > Age > Embarked features contain a number of null values in that order for the training dataset.\n",
"- Cabin > Age are incomplete in case of test dataset.\n",
"\n",
"**What are the data types for various features?**\n",
"\n",
"Helping us during converting goal.\n",
"\n",
"- Seven features are integer or floats. Six in case of test dataset.\n",
"- Five features are strings (object)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"_cell_guid": "9b805f69-665a-2b2e-f31d-50d87d52865d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
"PassengerId 891 non-null int64\n",
"Survived 891 non-null int64\n",
"Pclass 891 non-null int64\n",
"Name 891 non-null object\n",
"Sex 891 non-null object\n",
"Age 714 non-null float64\n",
"SibSp 891 non-null int64\n",
"Parch 891 non-null int64\n",
"Ticket 891 non-null object\n",
"Fare 891 non-null float64\n",
"Cabin 204 non-null object\n",
"Embarked 889 non-null object\n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.6+ KB\n",
"________________________________________\n",
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 418 entries, 0 to 417\n",
"Data columns (total 11 columns):\n",
"PassengerId 418 non-null int64\n",
"Pclass 418 non-null int64\n",
"Name 418 non-null object\n",
"Sex 418 non-null object\n",
"Age 332 non-null float64\n",
"SibSp 418 non-null int64\n",
"Parch 418 non-null int64\n",
"Ticket 418 non-null object\n",
"Fare 417 non-null float64\n",
"Cabin 91 non-null object\n",
"Embarked 418 non-null object\n",
"dtypes: float64(2), int64(4), object(5)\n",
"memory usage: 36.0+ KB\n"
]
}
],
"source": [
"train_df.info()\n",
"print('_'*40)\n",
"test_df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "859102e1-10df-d451-2649-2d4571e5f082"
},
"source": [
"**What is the distribution of numerical feature values across the samples?**\n",
"\n",
"This helps us determine, among other early insights, how representative is the training dataset of the actual problem domain.\n",
"\n",
"- Total samples are 891 or 40% of the actual number of passengers on board the Titanic (2,224).\n",
"- Survived is a categorical feature with 0 or 1 values.\n",
"- Around 38% samples survived representative of the actual survival rate at 32%.\n",
"- Most passengers (> 75%) did not travel with parents or children.\n",
"- Nearly 30% of the passengers had siblings and/or spouse aboard.\n",
"- Fares varied significantly with few passengers (<1%) paying as high as $512.\n",
"- Few elderly passengers (<1%) within age range 65-80."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"_cell_guid": "58e387fe-86e4-e068-8307-70e37fe3f37b"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>714.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>446.000000</td>\n",
" <td>0.383838</td>\n",
" <td>2.308642</td>\n",
" <td>29.699118</td>\n",
" <td>0.523008</td>\n",
" <td>0.381594</td>\n",
" <td>32.204208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>257.353842</td>\n",
" <td>0.486592</td>\n",
" <td>0.836071</td>\n",
" <td>14.526497</td>\n",
" <td>1.102743</td>\n",
" <td>0.806057</td>\n",
" <td>49.693429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.420000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>223.500000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>20.125000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.910400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>446.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>14.454200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>668.500000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>38.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>891.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>80.000000</td>\n",
" <td>8.000000</td>\n",
" <td>6.000000</td>\n",
" <td>512.329200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass Age SibSp \\\n",
"count 891.000000 891.000000 891.000000 714.000000 891.000000 \n",
"mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n",
"std 257.353842 0.486592 0.836071 14.526497 1.102743 \n",
"min 1.000000 0.000000 1.000000 0.420000 0.000000 \n",
"25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n",
"50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n",
"75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n",
"max 891.000000 1.000000 3.000000 80.000000 8.000000 \n",
"\n",
" Parch Fare \n",
"count 891.000000 891.000000 \n",
"mean 0.381594 32.204208 \n",
"std 0.806057 49.693429 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 7.910400 \n",
"50% 0.000000 14.454200 \n",
"75% 0.000000 31.000000 \n",
"max 6.000000 512.329200 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df.describe()\n",
"# Review survived rate using `percentiles=[.61, .62]` knowing our problem description mentions 38% survival rate.\n",
"# Review Parch distribution using `percentiles=[.75, .8]`\n",
"# SibSp distribution `[.68, .69]`\n",
"# Age and Fare `[.1, .2, .3, .4, .5, .6, .7, .8, .9, .99]`"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "5462bc60-258c-76bf-0a73-9adc00a2f493"
},
"source": [
"**What is the distribution of categorical features?**\n",
"\n",
"- Names are unique across the dataset (count=unique=891)\n",
"- Sex variable as two possible values with 65% male (top=male, freq=577/count=891).\n",
"- Cabin values have several dupicates across samples. Alternatively several passengers shared a cabin.\n",
"- Embarked takes three possible values. S port used by most passengers (top=S)\n",
"- Ticket feature has high ratio (22%) of duplicate values (unique=681)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"_cell_guid": "8066b378-1964-92e8-1352-dcac934c6af3"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Ticket</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891</td>\n",
" <td>891</td>\n",
" <td>891</td>\n",
" <td>204</td>\n",
" <td>889</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>891</td>\n",
" <td>2</td>\n",
" <td>681</td>\n",
" <td>147</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>Chronopoulos, Mr. Apostolos</td>\n",
" <td>male</td>\n",
" <td>CA. 2343</td>\n",
" <td>G6</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>1</td>\n",
" <td>577</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>644</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Name Sex Ticket Cabin Embarked\n",
"count 891 891 891 204 889\n",
"unique 891 2 681 147 3\n",
"top Chronopoulos, Mr. Apostolos male CA. 2343 G6 S\n",
"freq 1 577 7 4 644"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df.describe(include=['O'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "2cb22b88-937d-6f14-8b06-ea3361357889"
},
"source": [
"### Assumtions based on data analysis\n",
"\n",
"We arrive at following assumptions based on data analysis done so far. We may validate these assumptions further before taking appropriate actions.\n",
"\n",
"**Correlating.**\n",
"\n",
"We want to know how well does each feature correlate with Survival. We want to do this early in our project and match these quick correlations with modelled correlations later in the project.\n",
"\n",
"**Completing.**\n",
"\n",
"1. We may want to complete Age feature as it is definitely correlated to survival.\n",
"2. We may want to complete the Embarked feature as it may also correlate with survival or another important feature.\n",
"\n",
"**Correcting.**\n",
"\n",
"1. Ticket feature may be dropped from our analysis as it contains high ratio of duplicates (22%) and there may not be a correlation between Ticket and survival.\n",
"2. Cabin feature may be dropped as it is highly incomplete or contains many null values both in training and test dataset.\n",
"3. PassengerId may be dropped from training dataset as it does not contribute to survival.\n",
"4. Name feature is relatively non-standard, may not contribute directly to survival, so maybe dropped.\n",
"\n",
"**Creating.**\n",
"\n",
"1. We may want to create a new feature called Family based on Parch and SibSp to get total count of family members on board.\n",
"2. We may want to engineer the Name feature to extract Title as a new feature.\n",
"3. We may want to create new feature for Age bands. This turns a continous numerical feature into an ordinal categorical feature.\n",
"4. We may also want to create a Fare range feature if it helps our analysis.\n",
"\n",
"**Classifying.**\n",
"\n",
"We may also add to our assumptions based on the problem description noted earlier.\n",
"\n",
"1. Women (Sex=female) were more likely to have survived.\n",
"2. Children (Age<?) were more likely to have survived. \n",
"3. The upper-class passengers (Pclass=1) were more likely to have survived."
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "6db63a30-1d86-266e-2799-dded03c45816"
},
"source": [
"## Analyze by pivoting features\n",
"\n",
"To confirm some of our observations and assumptions, we can quickly analyze our feature correlations by pivoting features against each other. We can only do so at this stage for features which do not have any empty values. It also makes sense doing so only for features which are categorical (Sex), ordinal (Pclass) or discrete (SibSp, Parch) type.\n",
"\n",
"- **Pclass** We observe significant correlation (>0.5) among Pclass=1 and Survived (classifying #3). We decide to include this feature in our model.\n",
"- **Sex** We confirm the observation during problem definition that Sex=female had very high survival rate at 74% (classifying #1).\n",
"- **SibSp and Parch** These features have zero correlation for certain values. It may be best to derive a feature or a set of features from these individual features (creating #1)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"_cell_guid": "0964832a-a4be-2d6f-a89e-63526389cee9"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0.629630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>0.472826</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>0.242363</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Survived\n",
"0 1 0.629630\n",
"1 2 0.472826\n",
"2 3 0.242363"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df[['Pclass', 'Survived']].groupby(['Pclass'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"_cell_guid": "68908ba6-bfe9-5b31-cfde-6987fc0fbe9a"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sex</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>female</td>\n",
" <td>0.742038</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>male</td>\n",
" <td>0.188908</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sex Survived\n",
"0 female 0.742038\n",
"1 male 0.188908"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df[[\"Sex\", \"Survived\"]].groupby(['Sex'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"_cell_guid": "01c06927-c5a6-342a-5aa8-2e486ec3fd7c"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SibSp</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0.535885</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>0.464286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0.345395</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>0.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>0.166667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>8</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SibSp Survived\n",
"1 1 0.535885\n",
"2 2 0.464286\n",
"0 0 0.345395\n",
"3 3 0.250000\n",
"4 4 0.166667\n",
"5 5 0.000000\n",
"6 8 0.000000"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df[[\"SibSp\", \"Survived\"]].groupby(['SibSp'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"_cell_guid": "e686f98b-a8c9-68f8-36a4-d4598638bbd5"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Parch</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>0.600000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0.550847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>0.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0.343658</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5</td>\n",
" <td>0.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Parch Survived\n",
"3 3 0.600000\n",
"1 1 0.550847\n",
"2 2 0.500000\n",
"0 0 0.343658\n",
"5 5 0.200000\n",
"4 4 0.000000\n",
"6 6 0.000000"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df[[\"Parch\", \"Survived\"]].groupby(['Parch'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "0d43550e-9eff-3859-3568-8856570eff76"
},
"source": [
"## Analyze by visualizing data\n",
"\n",
"Now we can continue confirming some of our assumptions using visualizations for analyzing the data.\n",
"\n",
"### Correlating numerical features\n",
"\n",
"Let us start by understanding correlations between numerical features and our solution goal (Survived).\n",
"\n",
"A histogram chart is useful for analyzing continous numerical variables like Age where banding or ranges will help identify useful patterns. The histogram can indicate distribution of samples using automatically defined bins or equally ranged bands. This helps us answer questions relating to specific bands (Did infants have better survival rate?)\n",
"\n",
"Note that x-axis in historgram visualizations represents the count of samples or passengers.\n",
"\n",
"**Observations.**\n",
"\n",
"- Infants (Age <=4) had high survival rate.\n",
"- Oldest passengers (Age = 80) survived.\n",
"- Large number of 15-25 year olds did not survive.\n",
"- Most passengers are in 15-35 age range.\n",
"\n",
"**Decisions.**\n",
"\n",
"This simple analysis confirms our assumptions as decisions for subsequent workflow stages.\n",
"\n",
"- We should consider Age (our assumption classifying #2) in our model training.\n",
"- Complete the Age feature for null values (completing #1).\n",
"- We should band age groups (creating #3)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"_cell_guid": "50294eac-263a-af78-cb7e-3778eb9ad41f"
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ff74ffd3f98>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEclJREFUeJzt3X+QXWV9x/H3uhk1CSmz6g4EpFpG52sZOkPrOP6MBoQi\nFSdTo1J/8MOkklLtOKNOxREVA1XEptLWiHUEgSD+YjqajE5QUBSKFO2I1mq/GgfRmtBsyyIJMKmB\n9I97opd1s/fs7v3xnHvfr5lMzj3nnrOfPbvPfu/z3OeeM3bgwAEkSSrNYwYdQJKk2VigJElFskBJ\nkopkgZIkFckCJUkqkgVKklSkJYMOMEoi4jTgHcDDwHLgLmBDZt63yOOeA4xn5hWLPM6twAWZefMC\n9n028CFgP/AAcFZmTi0mj0bTkLeTMeDtwEXA72fmjsVkGXYWqD6JiMcC1wLHZ+auat0HgPXApsUc\nOzOvWnTAxbsKODsz74iItwB/A5w72EhqmhFoJ+8AxoCdgw7SBBao/llK69Xg8oMrMvPtB5cj4qfA\nyZm5IyJWAxdn5gsi4mbgTuAPgTuA6cx8X7XPBcAK4CFaP8vHHWL7u4DNwNOqx5/KzE0RsQz4NDAJ\n/Bh4/MzQEfEafrvQ3JOZf9b2nKcCSzPzjmrVZ4Hb53V2pJahbSeVD2fm/RHxhvmdltFkgeqTzPxl\nRLwHuDMibge+BlyfmVlj972Z+aKIOAG4Enhftf4M4NXAK6rHnzzE9jcDOzPzDRExDtweEV8BngM8\nlJnPjYiVtIZSZua+DriuQ76jgHvaHt8DrKzxfUmPMuTthMy8v8b3oYqTJPooMz8APAW4ovr/XyPi\nvBq73lbtfyfwuIg4NiKOA/Zn5vfbjn+o7ScCf1q9yryJ1ivApwF/ANxa7bsL+M/ufKeMAV5DSwsy\nQu1EHdiD6qOIWJaZ/wt8CvhURHyO1rj65Tz6D/pjZ+z6f23L19F6Jbic1lj9TLNt3wdszMzrZ+Q5\nCXikbdX4LJnrDF38nFYv6qCjgF/Mkk3qaIjbiebJAtUnEXEqcGlEvCAz91SrjwUOzuK5HzimenzS\nHIe6DriGVsN6ac3ttwKvAq6PiMcAfwtcDPwAeB6wOSKOAWLmweoMXWTmzyNiOiKen5n/ArwO2DrX\nPtJshrmdaP4c4uuTzLwB+DhwU0TcHBFfB14MvLF6yibgiojYTmua9qGOcxetV5FTB2c51di+Gdgb\nEd+kNXnhvsy8F9gCPCkibqE16+6Omcebh3No/WG5FXg28O5FHEsjatjbSUR8pBpCPBL4ZETctNBj\njYIxb7chSSqRPShJUpEsUJKkIlmgJElFskBJkorUl2nmU1N75pyJMTGxjOnpB/sRpSualLdJWaFZ\neetknZxcMVb3eMPUTpqUFZqVt0lZYXHtpIge1JIlv/W5t6I1KW+TskKz8vY7q+emd5qUt0lZYXF5\niyhQkiTNZIGSJBXJAiVJKpIFSpJUJAuUJKlIFihJUpG83UYXrbvkq3Nuv/L8ue4OIElqZw9KklQk\nC5QkqUgWKElSkSxQkqQiOUmij5xEIUn12YOSJBXJAiVJKpIFSpJUJAuUJKlIFihJUpEsUJKkInWc\nZh4Rq4HPAf9Rrfp34FJgCzAO7ALOzMx9PcrYN04Dl6Ry1O1BfT0zV1f//grYCGzOzFXADmBdzxJK\nkkbSQof4VgNbq+VtwMldSSNJUqXulSSOi4itwBOA9wLL24b0dgMr59p5YmIZS5aMz/kFJidX1Iwy\nOO0Ze5G3V+egCee2XZPydjPrsLSTg5qUFZqVt0lZYeF56xSoH9MqSp8FjgW+NmO/sU4HmJ5+cM7t\nk5MrmJraUyPKYB3M2Ku8vThmU87tQU3KWyfrfBrmsLQTaFZWaFbeJmWFxbWTjgUqM38BfKZ6+JOI\nuAd4VkQszcyHgKOBnfNKLElSBx3fg4qI10bE26rlI4EjgE8Aa6unrAW29yyhJGkk1Rni2wpcFxFr\ngMcC5wHfAa6JiA3A3cDVvYsoSRpFdYb49gAvm2XTKd2PU7ZOn5OSJHWPV5KQJBXJAiVJKpIFSpJU\nJAuUJKlIFihJUpEsUJKkIlmgJElFskBJkopkgZIkFckCJUkqkgVKklQkC5QkqUgWKElSkSxQkqQi\nWaAkSUWyQEmSilTnjrpExFLg+8BFwE3AFmAc2AWcmZn7epZQkjSS6vagLgDurZY3ApszcxWwA1jX\ni2CSpNHWsUBFxDOA44AvVqtWA1ur5W3AyT1JJkkaaXWG+DYBbwLOrh4vbxvS2w2s7HSAiYllLFky\nPudzJidX1Igy3Hp1Dpp2bpuUt5tZh62dNCkrNCtvk7LCwvPOWaAi4izgm5l5V0TM9pSxOl9kevrB\nObdPTq5gampPnUMNtV6cg6ad2yblrZN1Pg1zmNpJk7JCs/I2KSssrp106kG9FDg2Ik4HngzsA/ZG\nxNLMfAg4Gtg578SSJHUwZ4HKzDMOLkfEhcBPgecBa4Frq/+39y6eJGlULeRzUO8Bzo6IW4AnAFd3\nN5IkSTU/BwWQmRe2PTyl+1EkSfoNryQhSSqSBUqSVCQLlCSpSBYoSVKRak+SkKT5WHfJVzs+58rz\nT+pDEjWVPShJUpEsUJKkIjnEJ6lYdYYJ63AosZnsQUmSimQPqkE6vZr0VaKkYWIPSpJUJAuUJKlI\nFihJUpEsUJKkIlmgJElFskBJkorUcZp5RCwDrgKOAB4PXAR8F9gCjAO7gDMzc1/vYkqSRk2dHtTL\ngG9n5ouAVwF/B2wENmfmKmAHsK53ESVJo6hjDyozP9P28Bjgv4DVwF9U67YBbwMu73Y4SdLoqn0l\niYi4DXgycDpwY9uQ3m5gZQ+ySZJGWO0ClZnPi4gTgGuBsbZNY4fY5dcmJpaxZMn4nM+ZnFxRN8rQ\nWuyFMQ91Dpt2bpuUt5tZR7GddOtisJ3UOW9NOrdNygoLz1tnksQzgd2Z+fPMvDMilgB7ImJpZj4E\nHA3snOsY09MPzvk1JidXMDW1Zx6xNZvZzmHTzm2T8tbJOp+GOUztpLQ/oHV+Tk06t03JCotrJ3Um\nSbwQeCtARBwBHAbcCKyttq8FttfMKklSLXWG+D4KXBERtwBLgTcC3wauiYgNwN3A1b2LKEkaRXVm\n8T0EvGaWTad0P44kSS3eD2qIeL8oScPESx1JkopkD0pqkFJ6yf2aHq7RZg9KklQkC5QkqUgWKElS\nkSxQkqQiWaAkSUWyQEmSimSBkiQVyQIlSSqSBUqSVCSvJKFfK+UqBZIE9qAkSYWyQEmSimSBkiQV\nyQIlSSpSrUkSEXEpsKp6/vuBbwFbgHFgF3BmZu7rVUhJ0ujp2IOKiBOB4zPzucBLgMuAjcDmzFwF\n7ADW9TSlJGnk1Bni+wbwymr5PmA5sBrYWq3bBpzc9WSSpJHWcYgvMx8GHqgerge+BJzaNqS3G1g5\n1zEmJpaxZMn4nF9ncnJFx7BanMWe4379jJr0u9DNrHXaSSdNOnf9VOe8NOncNSkrLDxv7Q/qRsQa\nWgXqj4Eft20a67Tv9PSDc26fnFzB1NSeulG0QIs9x/34GTXpd6FO1vk0zDrtpJOmnLt+q/Nzasq5\na1JWWFw7qTWLLyJOBd4JnJaZvwT2RsTSavPRwM7aaSVJqqHOJInDgQ8Cp2fmvdXqG4G11fJaYHtv\n4kmSRlWdIb4zgCcBn42Ig+vOBj4eERuAu4GrexNPkjSq6kyS+BjwsVk2ndL9OJIktXglCUlSkSxQ\nkqQieT+oEdLpfk/SsOr0u79t05o+JdF82IOSJBXJAiVJKpIFSpJUJAuUJKlITpJQbZ3eaL7y/JP6\nlES95oQalcAelCSpSPagJKlLHGXoLntQkqQiWaAkSUUqaojP7rEk6SB7UJKkIhXVg5KkQXjZW78w\n6AiahT0oSVKRavWgIuJ44AvAhzLzwxFxDLAFGAd2AWdm5r7exZQkjZqOBSoilgP/CNzUtnojsDkz\nPxcR7wPWAZf3JmJ9TrKQpOFRZ4hvH/AnwM62dauBrdXyNuDk7saSJI26jj2ozNwP7I+I9tXL24b0\ndgMr5zrGxMQyliwZn/PrTE6u6BSl1nN6ub/m1q3z26SfUzez1mknnTTp3I2iUWwjsPC83ZjFN9bp\nCdPTD865fXJyBVNTezp+oTrP6eX+mls3zm/d34US1Mk6n4ZZp5100pRzN6pGrY3A4trJQmfx7Y2I\npdXy0Tx6+E+SpEVbaA/qRmAtcG31//auJZK0YHVuk+FkITVFnVl8zwQ2AU8FfhURrwBeC1wVERuA\nu4GrexlSkjR66kyS+Ddas/ZmOqXraTTU/BiApPnwShKSpCJZoCRJRfJiseqaOm/QL2Z/hwDVdE5i\nmR97UJKkIlmgJElFcohPjTHX8IjDItLwsQclSSpSo3pQvX4TXpIGzYkUv2EPSpJUJAuUJKlIjRri\nkxbKz1j9hkPdzTcqv8/2oCRJRbJASZKKZIGSJBXJAiVJKpKTJDQUfONfGj72oCRJRVpwDyoiPgQ8\nBzgAvDkzv9W1VJKk4tUZudi2ac2Cj7+gHlREvAh4emY+F1gP/MOCE0iSNIuFDvG9GPg8QGb+EJiI\niN/pWipJ0sgbO3DgwLx3ioiPAV/MzC9Uj28B1mfmj7qcT5I0oro1SWKsS8eRJAlYeIHaCRzZ9vgo\nYNfi40iS1LLQAvVl4BUAEfFHwM7M3NO1VJKkkbeg96AAIuIS4IXAI8AbM/O73QwmSRptCy5QkiT1\nkleSkCQVyQIlSSrSwC8W24RLJkXEpcAqWufr/cC3gC3AOK3Zi2dm5r7BJXy0iFgKfB+4CLiJsrO+\nFvhrYD/wbuB7FJg3Ig4DrgEmgMcB7wV+QB+y2ka6zzbSG91uJwPtQTXhkkkRcSJwfJXxJcBlwEZg\nc2auAnYA6wYYcTYXAPdWy8VmjYgnAu8BXgCcDqyh3LznAJmZJ9Kawfr39CGrbaRnbCO9cQ5dbCeD\nHuJrwiWTvgG8slq+D1gOrAa2Vuu2ASf3P9bsIuIZwHHAF6tVqyk0K60sN2bmnszclZnnUm7e/wGe\nWC1PVI9X0/ustpEus430VFfbyaAL1JHAVNvjKR79AeCBy8yHM/OB6uF64EvA8rYu6m5g5UDCzW4T\n8Ja2xyVnfSqwLCK2RsQtEfFiCs2bmZ8GfjcidtD6g/w2+pPVNtJ9tpEe6XY7GXSBmqnYSyZFxBpa\nje9NMzYVkzkizgK+mZl3HeIpxWStjNF6tfVyWkMDn+DRGYvJGxGvA36WmU8DTgI+POMp/cpazDmZ\nyTbSE41pI9D9djLoAtWISyZFxKnAO4HTMvOXwN7qTVaAo2l9HyV4KbAmIm4H/hx4F+VmBfhv4LbM\n3J+ZPwH2AHsKzft84AaA6kPpRwEP9CGrbaS7bCO91dV2MugCVfwlkyLicOCDwOmZefBN1RuBtdXy\nWmD7ILLNlJlnZOazMvM5wMdpzVAqMmvly8BJEfGY6s3gwyg37w7g2QAR8RRgL/AVep/VNtJFtpGe\n62o7GfiVJEq/ZFJEnAtcCLTfSuRsWr/cjwfuBl6fmb/qf7pDi4gLgZ/SejVzDYVmjYgNtIaFAC6m\nNT25uLzV9NkrgSNoTaV+F/BD+pDVNtIbtpHu63Y7GXiBkiRpNoMe4pMkaVYWKElSkSxQkqQiWaAk\nSUWyQEmSimSBGgIRsTIi9kfE+YPOIpXINtJMFqjhcDatS9qfM+AcUqlsIw3k56CGQET8CDgPuAo4\nIzNvi4jTgEto3VLgBuBNmfnkiJgAPgpMAocDmzLzusEkl/rDNtJM9qAaLiJeSOsT21+l9Wnt10fE\nGPBPwFnVfVkOb9vlYmB7Zp5E6+oEGyNiss+xpb6xjTSXBar51gNXZeYBWlc6fhVwDHBY2yVxrm97\n/onAeRFxM6374fwK+L3+xZX6zjbSUAO/5bsWrrpx3VrgZxHx8mr1OK0G9kjbUx9uW94H/GVmfrs/\nKaXBsY00mz2oZns18PXMPC4zT8jME4Bzab0h/EhERPW8l7ftcyutV5BExNKI+EhE+EJFw8o20mAW\nqGZbD1w+Y931tG5nfRnw+Yi4gdYrwv3V9guBp0fErbTuePmdzNyPNJxsIw3mLL4hVd3d9HuZeVc1\ntLEhM08ddC6pFLaR8tltHV7jwD9HxP3V8nkDziOVxjZSOHtQkqQi+R6UJKlIFihJUpEsUJKkIlmg\nJElFskBJkor0/+AXXrJKE5HuAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7ff74ff8a208>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"g = sns.FacetGrid(train_df, col='Survived')\n",
"g.map(plt.hist, 'Age', bins=20)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "87096158-4017-9213-7225-a19aea67a800"
},
"source": [
"### Correlating numerical and ordinal features\n",
"\n",
"We can combine multiple features for identifying correlations using a single plot. This can be done with numerical and categorical features which have numeric values.\n",
"\n",
"**Observations.**\n",
"\n",
"- Pclass=3 had most passengers, however most did not survive. Confirms our classifying assumption #2.\n",
"- Infant passengers in Pclass=2 and Pclass=3 mostly survived. Further qualifies our classifying assumption #2.\n",
"- Most passengers in Pclass=1 survived. Confirms our classifying assumption #3.\n",
"- Pclass varies in terms of Age distribution of passengers.\n",
"\n",
"**Decisions.**\n",
"\n",
"- Consider Pclass for model training."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"_cell_guid": "916fdc6b-0190-9267-1ea9-907a3d87330d"
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgEAAAHUCAYAAACj/ftgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X20XXV54PFvzMUQLinGcqUQmCLVPpZlZ1xYi2gDoU2M\nIyAvUVijRBioItUWiixtfZky0alUFi9OZUAK8mpGcNoiLDBgUAtUa0FHBKuPRAiCiXItQUJIA4E7\nf5yd6eFyX/Y995x7zzm/72ctVs7Zb+d59j2/zbN/+3f2njMyMoIkSSrPi2Y7AEmSNDssAiRJKpRF\ngCRJhbIIkCSpUBYBkiQVyiJAkqRCDcx2AL0uIvYFEvhmNWkn4CHgjzLz8XHWORFYmpnHz0SM48Tw\nSuCLwH3jxRER6zNz3zGmrwTeCzwDLAD+GTg9M7dNM6Y/A+7NzJumuZ1HgN/LzPUtrHs48DHgaWAj\ncEJmbp1OPOp+tuO+a8cvBj4FnAbslJnbpxNLP7MnoD2GM3NJ9d8bgZ8CH53toMYTEYPAFcDNLay7\nN/CXwPLMXAL8Do0DyFHTjSszz57ugWM6ImJn4BLg2MxcDPwM+NPZikczznbcB+24ch7wvVmOoSfY\nE9AZtwOnAETEgcAFNM4sHwPe1bxgRBwNfBD4Nxp/j5WZuT4iTgOOB56q/jsemAd8HpgDzAc+m5mf\nG7W9W4EXj4rn7Mxc0/R+G7AMOBb4D1PMbWG1/fnAk5k5UsW24/NHqCrv5jOliFgPXAvsVy3695m5\nulrnUuDbwIHAncDyceZ/AbgYGAJ2A87NzNURsQdwHTC3Wm7O6KAj4gzgraMmfzczT296/3rgR5n5\nUPX+OuBsGgdLlcd23JvtGODDmflERFw2xf1SHIuANouIucAxwB3VpGuAozPzvog4HThs1CovAY7L\nzJ9ExJ8D7wfOBFYBv5mZP4+I5cBewFLgh5l5anXW+oejPz8z3zRZjFXX2PaImHJ+mXlvRFwHPBAR\n/wB8FfhiZj5cY/X7M/NDEXEUjYPo6ojYicY++SCNgwc0DpBjzf8EsCYzL6/Ogu6JiK/Q6PL7p2rb\nBwB/Mkbc59E4O5jIXjTO/nf4WTVNhbEdT6jb2zGZ+USNPISXA9plKCK+HhFfB74GbADOj4jdgZdk\n5n0AmXlBZn5h1Lo/B66sGuKJwO7V9MuANRHxEeDBzLwX+DKwNCKuAI4APtvZtMaWmX8MBPC3wAHA\n9yPiiBqrfqP692bgwOoAsAz4VmY+1rTcePMPBU6t9vNNNK5lvhz4bRpnHmTmd4BfTi/D/28O4H21\ny2E77s92rAnYE9Aew9V1teeputTGLbSq6vha4IDMvD8i3k/j2hyZeUZE/DrwFuD6iPhAZn45IvYH\nDgHeDpwOvHHUNut0I7YsIuYAO2fmBuBy4PKIeDfwbuDGUYuPjuNpgMx8OiJupnFmcBhwdfNCE8zf\nRmOg1t1jxPRc06S5Y8RdpxvxYZ5/5r8X8Mjobalv2Y77ox1rCiwCOigz/zUifhERr8vMuyLiTP79\n2iA0BuI8B6yvugWPBH4REQtpdI2tysyLIuJFwO9W09dn5tqI+Fq13kDzyNc63YjT9B7g6Ih4a2Y+\nXU3bD1hXvX4C2Ad4kEbF/+w42/k8jZHJr6u2WWf+nTSuf94dEfOBc2l0Gf4LcBBwU3XtdtfRG6vZ\njfgt4OUR8RuZ+WMa10hvmGQd9Tnbcc+1Y02BRUDnrQQ+HRHPAI9X748ByMzHImI1cBeNnyOdQ6Na\nXkrjwHJXRGyi0V12MvAy4OKI2Eajq/qvWvnpS0T8Lo2fz/wasLDqlrssM6+ecMWGvwEWAf8YEU/S\n+CnVD4AzqvlnA7dGxP3APTQOJGO5ncbI5lvH+UnSWPPPAi6NiDtpDK66pBq49Gnguoj4KvB94IEa\nebxAdeZyMo1rmNuBHwOfaWVb6ju247F1XTsGiIi/A15avb0tIh7JzHe2ur1+NsdHCWs8Mc7viyX1\nDtuxJuLAQEmSCmVPgCRJhbInQJKkQlkESJJUKIsASZIKNSM/ERwe3lxr4MHChbuwadNTky/YQ/ot\np37LB8rOaWhowQvuzz6eOu245H3ZS/otp37LB6aW01Ta8Whd1RMwMPCCG0T1vH7Lqd/yAXPqh8/t\nJHPqfv2WD8xcTl1VBEiSpJljESBJUqEsAiRJKpRFgCRJhbIIkCSpUBYBkiQVyiJAkqRC1bpZUETM\nB+4DPg7cRuNZ2XOBjcDKcZ4jLUmSuljdnoCPAo9Vr1cBF2bmYmAdcFInApMkSZ01aREQEa8C9gdu\nqiYtAW6oXt8ILO1IZJIkqaPq9AScC5zR9H6wqfv/UWDPtkclSZI6bsIxARHxLuCbmflgRIy1SK2H\nFixcuEvt+yAPDS2otVwv6bec+i0fMKc66rZj92Vv6Lec+i0fmJmcJhsYeBiwX0QcDuwNbAOejIj5\nmbkVWARsmOxDpvAkJIaHN9datlf0W079lg+UndNUDjI1n0pY7L7sJf2WU7/lA1PLaTrFwoRFQGYe\nt+N1RJwFrAfeAKwArqn+XdPyp0uSpFnTyn0C/gI4ISLuAF4KXNnekCRJ0kyodZ8AgMw8q+ntsvaH\nIkmSZpJ3DJQkqVAWAZIkFcoiQJKkQlkESJJUKIsASZIKZREgSVKhLAIkSSqURYAkSYWyCJAkqVAW\nAZIkFcoiQJKkQlkESJJUKIsASZIKZREgSVKhLAIkSSqURYAkSYWyCJAkqVADky0QEbsAVwB7ADsD\nHwfuAa4G5gIbgZWZua1zYUqSpHar0xNwBHB3Zh4CHAucB6wCLszMxcA64KTOhShJkjph0p6AzLy2\n6e0+wCPAEuC91bQbgTOBi9odnCRJ6pxJi4AdIuIbwN7A4cDapu7/R4E9OxCbJEnqoDkjIyO1F46I\n1wBXAXtm5lA17RXAVZn5hvHW27792ZGBgbnTjVVS+82pu6DtWOpatdvxaHUGBr4WeDQzH87M70bE\nALA5IuZn5lZgEbBhom1s2vRUrWCGhhYwPLy51rK9ot9y6rd8oOychoYW1N5mnXZc8r7sJf2WU7/l\nA1PLaSrteLQ6AwMPBj4AEBF7ALsCa4EV1fwVwJqWI5AkSbOizpiAi4HLIuIOYD7wPuBu4KqIOAV4\nCLiycyFKkqROqPPrgK3AO8aYtaz94UiSpJniHQMlSSqURYAkSYWyCJAkqVAWAZIkFcoiQJKkQlkE\nSJJUKIsASZIKZREgSVKhLAIkSSqURYAkSYWyCJAkqVAWAZIkFcoiQJKkQlkESJJUKIsASZIKZREg\nSVKhLAIkSSrUQJ2FIuJTwOJq+U8CdwFXA3OBjcDKzNzWqSAlSVL7TdoTEBGHAq/OzIOANwMXAKuA\nCzNzMbAOOKmjUUqSpLarczngduDt1evHgUFgCXBDNe1GYGnbI5MkSR016eWAzHwW2FK9PRm4GVje\n1P3/KLBnZ8KTJEmdMmdkZKTWghFxJPBh4E3A/Zn5smr6K4CrMvMN4627ffuzIwMDc9sQrqQ2m1N3\nQdvx7Fl9yw9rL/uO5a/qYCTqUrXb8Wh1BwYuBz4CvDkzfxkRT0bE/MzcCiwCNky0/qZNT9UKZmho\nAcPDm2st2yv6Lad+ywfKzmloaEHtbdZpxyXvy07asqX+uOu6f/fZzqmd+i0fmFpOU2nHo9UZGLgb\ncA5weGY+Vk1eC6yoXq8A1rQcgSRJmhV1egKOA3YHrouIHdNOAC6NiFOAh4ArOxOeJEnqlDoDAy8B\nLhlj1rL2hyNJkmaKdwyUJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoSwCJEkqlEWAJEmF\nsgiQJKlQFgGSJBWq1lMEJal019/xQK3ljlq8X4cjkdrHngBJkgplESBJUqEsAiRJKpRFgCRJhXJg\noCT1kToDGAcH57HsgEUzEI26Xa0iICJeDXwJOD8zPxMR+wBXA3OBjcDKzNzWuTAlSVK7TXo5ICIG\ngb8GbmuavAq4MDMXA+uAkzoTniRJ6pQ6PQHbgLcAH2qatgR4b/X6RuBM4KK2RiZJfa7uvQd6hfdS\n6D2TFgGZuR3YHhHNkwebuv8fBfbsQGySJKmD2jEwcM5kCyxcuAsDA3NrbWxoaMG0A+o2/ZZTv+UD\n5lRH3Xbcr/tycHBe7WXrqrvNTujE36kT+6iufv3edVqrRcCTETE/M7cCi4ANEy28adNTtTY6NLSA\n4eHNLYbUnfotp37LB8rOaSoHmTrtuJ/35ZYt9cY+TyX/uttst8HBeR35O3ViH9XRz9+7usu2qtX7\nBKwFVlSvVwBrWo5AkiTNikl7AiLitcC5wL7AMxHxNuCdwBURcQrwEHBlJ4OUpE6o+5v6qZyx99tg\nP/W3OgMDv03j1wCjLWt7NJIkacZ4x0DNijpnS/6MSJp99mz0N58dIElSoSwCJEkqlJcD1HZ2H0rd\nz3YqsCdAkqRi2RMgVRysKKk09gRIklQoiwBJkgrl5YAuN5Nd1HU+693H/Ke2fFa7+OhSjcVBb92t\n3X+fwcF5LDtgUVu3WQp7AiRJKpRFgCRJhfJyQB9wVLskqRX2BEiSVCh7AqaonQPRenHw0upbfjil\nx6p2i4n29VQfFTudz9rBnhlJ3cCeAEmSCmURIElSobwc0CE7uoTb3dXcql689NCNMXdjTN3Kezho\nJk2lbfqd+3ctFwERcT7wemAEOC0z72pbVJIkqeNaKgIi4hDglZl5UET8FvA54KC2RlZxkJWkHeyJ\nUa+r+x2eqbuztjom4A+A6wEy8wfAwoj4lbZFJUmSOq7VIuDXgOGm98PVNEmS1CPmjIyMTHmliLgE\nuCkzv1S9vxM4KTN/1Ob4JElSh7TaE7CB55/57wVsnH44kiRpprRaBNwKvA0gIg4ANmTm5rZFJUmS\nOq6lywEAEXE2cDDwHPC+zLynnYFJkqTOarkIkCRJvc3bBkuSVCiLAEmSCmURIElSoSwCJEkqlEWA\nJEmFsgiQJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoSwCJEkq1MBsB9DrImJfIIFvVpN2\nAh4C/igzHx9nnROBpZl5/EzEOMbn7wRcBPwWMB/435l5zhjLrc/MfceYvhJ4L/AMsAD4Z+D0zNw2\nzbj+DLg3M2+a5nYeAX4vM9e3sO7hwMeAp4GNwAmZuXU68aj72Y77rh2/GPgUcBqwU2Zun04s/cye\ngPYYzswl1X9vBH4KfHS2g5rAe4B5VaxvBP6kOghOKiL2Bv4SWJ6ZS4DfoXEAOWq6QWXm2dM9cExH\nROwMXAIcm5mLgZ8Bfzpb8WjG2Y77oB1XzgO+N8sx9AR7AjrjduAUgIg4ELiAxpnlY8C7mheMiKOB\nDwL/RuPvsTIz10fEacDxwFPVf8cD84DPA3NoVP6fzczPjdrercCLR8VzdmauaXr/N8DnADJza0Rs\nAX4VWF8jt4XV9ucDT2bmSBXbjs8foaq8m8+UImI9cC2wX7Xo32fm6mqdS4FvAwcCdwLLx5n/BeBi\nYAjYDTg3M1dHxB7AdcDcark5o4OOiDOAt46a/N3MPL3p/euBH2XmQ9X764CzaRwsVR7bcW+2Y4AP\nZ+YTEXFZjX1RNIuANouIucAxwB3VpGuAozPzvog4HThs1CovAY7LzJ9ExJ8D7wfOBFYBv5mZP4+I\n5cBewFLgh5l5anXW+oejPz8z3zRZjJn5dFO8x9A4OP3fOvll5r0RcR3wQET8A/BV4IuZ+XCN1e/P\nzA9FxFE0DqKrqy7Nw2gcQA+slvv8OPM/AazJzMsjYhC4JyK+QqPL75+qbR8A/MkYcZ9H4+xgInvR\nOPvf4WfVNBXGdjyhbm/HZOYTNfIQXg5ol6GI+HpEfB34GrABOD8idgdekpn3AWTmBZn5hVHr/hy4\nsmqIJwK7V9MvA9ZExEeABzPzXuDLwNKIuAI4AvjsdIKOiLcBnwSOyczn6q6XmX8MBPC3wAHA9yPi\niBqrfqP692bgwOoAsAz4VmY+1rTcePMPBU6t9vNNNK5lvhz4bRpnHmTmd4Bf1s1lEnOAkTZtS93P\ndtyf7VgTsCegPYar62rPU3WpjVtoVdXxtcABmXl/RLyfxrU5MvOMiPh14C3A9RHxgcz8ckTsDxwC\nvB04nca1wOZt1ulGJCL+C40zlSWZubFuohExB9g5MzcAlwOXR8S7gXcDN45afHQcT1e5PR0RN9M4\nMzgMuLp5oQnmb6MxUOvuMWJqPvjNHSPuOt2ID/P8M/+9gEdGb0t9y3bcH+1YU2AR0EGZ+a8R8YuI\neF1m3hURZ/Lv1wahMRDnOWB91S14JPCLiFhIo2tsVWZeFBEvAn63mr4+M9dGxNeq9QaaR77W6UaM\niN8EPgwcnJmbppjWe4CjI+KtTd2R+wHrqtdPAPsAD9Ko+J8dZzufpzEy+XXVNuvMvxM4Frg7IuYD\n59LoMvwX4CDgpura7a6jN1azG/FbwMsj4jcy88c0rpHeMMk66nO2455rx5oCi4DOWwl8OiKeAR6v\n3h8DkJmPRcRq4C4aP0c6h0a1vJTGgeWuiNhEo7vsZOBlwMURsY1GV/VftfjTl9Oq7f99ROyYdk7N\nEb1/AywC/jEinqTxU6ofAGdU888Gbo2I+4F7aBxIxnI7cAVwa479k6Sx5p8FXBoRd9IYXHVJNXDp\n08B1EfFV4PvAAzXyeIHqzOVkGtcwtwM/Bj7TyrbUd2zHY+u6dgwQEX8HvLR6e1tEPJKZ72x1e/1s\nzsiIlzw1thjn98WSeoftWBNxYKAkSYWyJ0CSpELZEyBJUqEsAiRJKpRFgCRJhZqRnwgOD2+uNfBg\n4cJd2LTpqckX7CH9llO/5QNl5zQ0tOAF92cfT512XPK+7CX9llO/5QNTy2kq7Xi0ruoJGBh4wQ2i\nel6/5dRv+YA59cPndpI5db9+ywdmLqeuKgIkSdLMsQiQJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmS\nCmURIElSoWrdLCgi5gP3AR8HbqPxrOy5wEZg5TjPkZYkSV2sbk/AR4HHqtergAszczGwDjipE4FJ\nkqTOmrQIiIhXAfsDN1WTlgA3VK9vBJZ2JDJJktRRdXoCzgXOaHo/2NT9/yiwZ9ujkiRJHTfhmICI\neBfwzcx8MCLGWqTWQwsWLtyl9n2Qh4YW1Fqul/RbTv2WD5hTHXXbsfuyN/RbTv2WD8xMTpMNDDwM\n2C8iDgf2BrYBT0bE/MzcCiwCNkz2IVN4EhLDw5trLdsr+i2nfssHys5pKgeZmk8lLHZf9pJ+y6nf\n8oGp5TSdYmHCIiAzj9vxOiLOAtYDbwBWANdU/65p+dMlSdKsaeU+AX8BnBARdwAvBa5sb0iSJGkm\n1LpPAEBmntX0dln7Q5EkSTPJOwZKklQoiwBJkgplESBJUqEsAiRJKpRFgCRJhbIIkCSpUBYBkiQV\nyiJAkqRCWQRIklQoiwBJkgplESBJUqEsAiRJKpRFgCRJhbIIkCSpUBYBkiQVyiJAkqRCDUy2QETs\nAlwB7AHsDHwcuAe4GpgLbARWZua2zoUpSZLarU5PwBHA3Zl5CHAscB6wCrgwMxcD64CTOheiJEnq\nhEl7AjLz2qa3+wCPAEuA91bTbgTOBC5qd3CSJKlzJi0CdoiIbwB7A4cDa5u6/x8F9uxAbJIkqYPm\njIyM1F44Il4DXAXsmZlD1bRXAFdl5hvGW2/79mdHBgbmTjdWSe03p+6CtmOpa9Vux6PVGRj4WuDR\nzHw4M78bEQPA5oiYn5lbgUXAhom2sWnTU7WCGRpawPDw5lrL9op+y6nf8oGycxoaWlB7m3Xaccn7\nspf0W079lg9MLaeptOPR6gwMPBj4AEBE7AHsCqwFVlTzVwBrWo5AkiTNijpjAi4GLouIO4D5wPuA\nu4GrIuIU4CHgys6FKEmSOqHOrwO2Au8YY9ay9ocjSZJmincMlCSpUBYBkiQVyiJAkqRCWQRIklQo\niwBJkgplESBJUqEsAiRJKpRFgCRJhbIIkCSpUBYBkiQVyiJAkqRCWQRIklQoiwBJkgplESBJUqEs\nAiRJKpRFgCRJhbIIkCSpUAN1FoqITwGLq+U/CdwFXA3MBTYCKzNzW6eClCRJ7TdpT0BEHAq8OjMP\nAt4MXACsAi7MzMXAOuCkjkYpSZLars7lgNuBt1evHwcGgSXADdW0G4GlbY9MkiR11KSXAzLzWWBL\n9fZk4GZgeVP3/6PAnp0JT5IkdcqckZGRWgtGxJHAh4E3Afdn5suq6a8ArsrMN4y37vbtz44MDMxt\nQ7iS2mxO3QVtx1LXqt2OR6s7MHA58BHgzZn5y4h4MiLmZ+ZWYBGwYaL1N216qlYwQ0MLGB7eXGvZ\nXtFvOfVbPlB2TkNDC2pvs047Lnlf9pJ+y6nf8oGp5TSVdjxanYGBuwHnAIdn5mPV5LXAiur1CmBN\nyxFIkqRZUacn4Dhgd+C6iNgx7QTg0og4BXgIuLIz4UmSpE6pMzDwEuCSMWYta384kiRppnjHQEmS\nCmURIElSoSwCJEkqlEWAJEmFsgiQJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoSwCJEkq\nlEWAJEmFsgiQJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCjVQZ6GIeDXwJeD8zPxMROwDXA3MBTYC\nKzNzW+fClCRJ7TZpERARg8BfA7c1TV4FXJiZX4yIvwROAi7qTIi6/o4HJpx/1OL9ZigSSVI/qXM5\nYBvwFmBD07QlwA3V6xuBpe0NS5IkddqkPQGZuR3YHhHNkwebuv8fBfbsQGySJKmDao0JmMScyRZY\nuHAXBgbm1trY0NCCaQfUbXbktPqWH467zDuWv2rceYOD8ybc/le+89OWttuqfv4b9ZN251S3HQ8N\nLZjwuw6d+V520mx8Pzq9D/vtO99v+cDM5NRqEfBkRMzPzK3AIp5/qeAFNm16qtZGh4YWMDy8ucWQ\nulNzTlu2jD92cqK8J1pvMu3en/3+N+oXdXOaykGmTjve8bmTfWd7aX/P1vejk/uw377z/ZYPTC2n\n6RQLrRYBa4EVwDXVv2tajkCSepADdtUP6vw64LXAucC+wDMR8TbgncAVEXEK8BBwZSeDlCRJ7Vdn\nYOC3afwaYLRlbY9GkiTNmHYMDFSPmqg7065MqbMmu5wgzQRvGyxJUqHsCegSnTor8GxDGlunB/bZ\n9tQL7AmQJKlQFgGSJBXKywEtmEo33+DgvGnd7EeSpE6xJ0CSpEJZBEiSVCiLAEmSCmURIElSoRwY\nqDGNN/hxcHAeyw5YNMPRqN/48J3pm+4+9G8gsCdAkqRiWQRIklQoLweorVq9Vep0uh5n4zPVWd3Q\nVb0jhl6918dE+3BwcN6sfv4OtsHZZ0+AJEmFsidAU9aJB6P4sBW1k9+n6ZuJ3phu6PEpXctFQESc\nD7weGAFOy8y72haVJEnquJYuB0TEIcArM/Mg4GTgf7Y1KkmS1HGt9gT8AXA9QGb+ICIWRsSvZOYT\n7QtN6qyJBn5N1A3pQESpOy65TDZ40zY3uVYHBv4aMNz0friaJkmSesSckZGRKa8UEZcAN2Xml6r3\ndwInZeaP2hyfJEnqkFZ7Ajbw/DP/vYCN0w9HkiTNlFaLgFuBtwFExAHAhszc3LaoJElSx7V0OQAg\nIs4GDgaeA96Xmfe0MzBJktRZLRcBkiSpt3nbYEmSCmURIElSoSwCJEkqlEWAJEmFsgiQJKlQFgGS\nJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoSwCJEkqlEWAJEmFGpjtAHpdROwLJPDNatJOwEPAH2Xm\n4+OscyKwNDOPn4kYx/j83YDPAUPAPOCWzPxvYyy3PjP3HWP6SuC9wDPAAuCfgdMzc9s04/oz4N7M\nvGma23kE+L3MXN/CuocDHwOeBjYCJ2Tm1unEo+5nO+67dvxi4FPAacBOmbl9OrH0M3sC2mM4M5dU\n/70R+Cnw0dkOagLvAP45Mw8G3gi8MyJeU2fFiNgb+EtgeWYuAX6HxgHkqOkGlZlnT/fAMR0RsTNw\nCXBsZi4Gfgb86WzFoxlnO+6Ddlw5D/jeLMfQE+wJ6IzbgVMAIuJA4AIaZ5aPAe9qXjAijgY+CPwb\njb/HysxcHxGnAccDT1X/HU+j2v88MAeYD3w2Mz83anu3Ai8eFc/Zmblmx5vMvKhp3ktpFIPDNXNb\nWG1/PvBkZo5Use34/BGqyrv5TCki1gPXAvtVi/59Zq6u1rkU+DZwIHAnsHyc+V8ALqZx5rMbcG5m\nro6IPYDrgLnVcnNGBx0RZwBvHTX5u5l5etP71wM/ysyHqvfXAWfTOFiqPLbj3mzHAB/OzCci4rKa\n+6NYFgFtFhFzgWOAO6pJ1wBHZ+Z9EXE6cNioVV4CHJeZP4mIPwfeD5wJrAJ+MzN/HhHLgb2ApcAP\nM/PU6qz1D0d/fma+aQqxfgX4j8AHMvOnddbJzHsj4jrggYj4B+CrwBcz8+Eaq9+fmR+KiKNoHERX\nR8RONPbJB2kcPKBxgBxr/ieANZl5eUQMAvdUOZwG/FO17QOAPxkj7vNonB1MZC8aZ/87/KyapsLY\njifU7e2YzHyiRh7CywHtMhQRX4+IrwNfAzYA50fE7sBLMvM+gMy8IDO/MGrdnwNXVg3xRGD3avpl\nwJqI+AjwYGbeC3wZWBoRVwBHAJ+dTtCZuQx4NfCh6kyn7np/DATwt8ABwPcj4ogaq36j+vdm4MDq\nALAM+FZmPta03HjzDwVOrfbzTTSuZb4c+G0aZx5k5neAX9bNZRJzgJE2bUvdz3bcn+1YE7AnoD2G\nq+tqz1N1qY1baFXV8bXAAZl5f0S8n8a1OTLzjIj4deAtwPUR8YHM/HJE7A8cArwdOJ3GtcDmbU7a\njRgRBwMPZOYjmTkcEWuBg4FvTZZoRMwBds7MDcDlwOUR8W7g3cCNoxYfHcfTVW5PR8TNNM4MDgOu\nbl5ogvnbaAzUunuMmJ5rmjR3jLjrdCM+zPPP/PcCHhm9LfUt23F/tGNNgUVAB2Xmv0bELyLidZl5\nV0Scyb9fG4TGQJzngPVVt+CRwC8iYiGNrrFVmXlRRLwI+N1q+vrMXBsRX6vWG2ge+VqzG/EwGtcl\nT4+IARoimGl6AAAOp0lEQVTdd5+omdZ7gKMj4q2Z+XQ1bT9gXfX6CWAf4EEaFf+z42zn8zRGJr+u\n2mad+XcCxwJ3R8R84FwaXYb/AhwE3FSdCe06emM1uxG/Bbw8In4jM39M4xrpDZOsoz5nO+65dqwp\nsAjovJXApyPiGeDx6v0xAJn5WESsBu6i8XOkc2hUy0tpHFjuiohNNLrLTgZeBlwcEdtodFX/VYs/\nffkf1XbuoDEwaG1m3lxz3b8BFgH/GBFP0vgp1Q+AM6r5ZwO3RsT9wD00DiRjuR24Arg1x/5J0ljz\nzwIujYg7aRz8LqkGLn0auC4ivgp8H3igZi7PU525nEzjGuZ24MfAZ1rZlvqO7XhsXdeOASLi72gM\nlgS4LSIeycx3trq9fjZnZMRLnhpbjPP7Ykm9w3asiTgwUJKkQtkTIElSoewJkCSpUBYBkiQVyiJA\nkqRCzchPBIeHN9caeLBw4S5s2vTU5Av2kH7Lqd/ygbJzGhpa8IL7s4+nTjsueV/2kn7Lqd/yganl\nNJV2PFpX9QQMDLzgBlE9r99y6rd8wJz64XM7yZy6X7/lAzOXU1cVAZIkaeZYBEiSVCiLAEmSCmUR\nIElSoXyAUIdcf0fj2ReDg/PYsmWs52rAUYv3m8mQJEl6HnsCJEkqlEWAJEmFqnU5ICLmA/cBHwdu\no/Gs7LnARmDlOM+RliRJXaxuT8BHgceq16uACzNzMbAOOKkTgUmSpM6atAiIiFcB+wM3VZOWADdU\nr28ElnYkMkmS1FF1egLOBc5oej/Y1P3/KLBn26OSJEkdN+GYgIh4F/DNzHwwIsZapNZDCxYu3KX2\nfZCHhhbUWq7bDQ7OG/N1s17NtVfjnog5Ta5uO3Zf9oZ+y6nf8oGZyWmygYGHAftFxOHA3sA24MmI\nmJ+ZW4FFwIbJPmQKT0JieHhzrWW73Y57A0x0n4BezLWf/kY7lJzTVA4yNZ9KWOy+7CX9llO/5QNT\ny2k6xcKERUBmHrfjdUScBawH3gCsAK6p/l3T8qdLkqRZ08odA/8CuCoiTgEeAq5sb0jdbcedACVJ\n6nW1i4DMPKvp7bL2hyJJkmaSdwyUJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoVq5T4Da\npO49B45avF+HI5EklcieAEmSCmURIElSoSwCJEkqlEWAJEmFsgiQJKlQFgGSJBXKIkCSpEJZBEiS\nVCiLAEmSCuUdA3tAnTsLeldBSdJUTVoERMQuwBXAHsDOwMeBe4CrgbnARmBlZm7rXJiSJKnd6lwO\nOAK4OzMPAY4FzgNWARdm5mJgHXBS50KUJEmdMGlPQGZe2/R2H+ARYAnw3mrajcCZwEXtDk6SJHVO\n7TEBEfENYG/gcGBtU/f/o8CeHYhNkiR1UO0iIDPfEBGvAa4B5jTNmjPOKv/fwoW7MDAwt9bnDA0t\nqBvSrBgcnDcj60zVTO63bv8btcKcJle3Hbsve0O/5dRv+cDM5FRnYOBrgUcz8+HM/G5EDACbI2J+\nZm4FFgEbJtrGpk1P1QpmaGgBw8Obay07W7Zsmdr4x8HBeVNepxUztd964W80VSXnNJWDTJ12XPK+\n7CX9llO/5QNTy2k6xUKdgYEHAx8AiIg9gF2BtcCKav4KYE3LEUiSpFlR53LAxcBlEXEHMB94H3A3\ncFVEnAI8BFzZuRAlSVIn1Pl1wFbgHWPMWtb+cCRJ0kzxtsGSJBXKIkCSpEJZBEiSVCiLAEmSCmUR\nIElSoSwCJEkqlEWAJEmFsgiQJKlQFgGSJBXKIkCSpEJZBEiSVCiLAEmSCmURIElSoSwCJEkqlEWA\nJEmFsgiQJKlQFgGSJBVqoM5CEfEpYHG1/CeBu4CrgbnARmBlZm7rVJCSJKn9Ju0JiIhDgVdn5kHA\nm4ELgFXAhZm5GFgHnNTRKCVJUtvVuRxwO/D26vXjwCCwBLihmnYjsLTtkUmSpI6a9HJAZj4LbKne\nngzcDCxv6v5/FNizM+FJkqROqTUmACAijqRRBLwJuL9p1pzJ1l24cBcGBubW+pyhoQV1Q5oVg4Pz\nZmSdqfrKd35aa7l3LH/VtD+r2/9GrTCnydVtx+7L3tBvOfVbPjAzOdUdGLgc+Ajw5sz8ZUQ8GRHz\nM3MrsAjYMNH6mzY9VSuYoaEFDA9vrrXsbNmyZWrjHwcH5015nU6a7v7thb/RVJWc01QOMnXaccn7\nspf0W079lg9MLafpFAuTFgERsRtwDrA0Mx+rJq8FVgDXVP+uaTkCdaXr73hgzOnNRc1Ri/ebyZAk\nSW1WpyfgOGB34LqI2DHtBODSiDgFeAi4sjPhSZKkTqkzMPAS4JIxZi1rfziSJGmmeMdASZIKZREg\nSVKhav9EUP1hvAF/kqTy2BMgSVKhLAIkSSqURYAkSYWyCJAkqVAWAZIkFcoiQJKkQlkESJJUKIsA\nSZIKZREgSVKhLAIkSSqURYAkSYWyCJAkqVBd+QChug+5OWrxfh2ORJKk/lWrCIiIVwNfAs7PzM9E\nxD7A1cBcYCOwMjO3dS5MSZLUbpNeDoiIQeCvgduaJq8CLszMxcA64KTOhCdJkjqlTk/ANuAtwIea\npi0B3lu9vhE4E7iorZG1UZ3LC15akCSVZtIiIDO3A9sjonnyYFP3/6PAnh2ITZIkdVA7BgbOmWyB\nhQt3YWBgbq2NDQ0tYHBwXu1l66izvXZuqx3rdLMd+XzlOz9t2zbfsfxVbdtWK+r+/XtJu3Oq2477\nZV+uvuWHE86f7Ds72fp1ttFJ/fJ32qHf8oGZyanVIuDJiJifmVuBRcCGiRbetOmpWhsdGlrA8PBm\ntmypN8ZweHhzreXqbK+d22o2ODhvyut0s07lU3f/d8KO710/qZvTVA4yddpxP+3LHd/z8b7zk+XZ\nzuNOu/XT3wn6Lx+YWk7TKRZavU/AWmBF9XoFsKblCCRJ0qyYtCcgIl4LnAvsCzwTEW8D3glcERGn\nAA8BV3YySEm9ZbLBuA7EbXA/abbVGRj4bRq/BhhtWdujkSRJM6Yr7xgoqb91+gy47l1HpdL57ABJ\nkgplESBJUqG8HCBJLfCSg/qBPQGSJBXKngBJ6lL+hFCdZk+AJEmFsgiQJKlQPX05oJ0DcxzkM7vq\n7n+7PyWpfewJkCSpUBYBkiQVyiJAkqRCWQRIklQoiwBJkgplESBJUqEsAiRJKlRP3ydA6jXeD6Ee\nb5dbT6v3NxkcnMeWLdsA92XpWi4CIuJ84PXACHBaZt7VtqgkSVLHtVQERMQhwCsz86CI+C3gc8BB\nbY1MGkMn7uzYfFY0HZ5RzRzv8Nk+092X3fC9X33LDydsw90QY7dqdUzAHwDXA2TmD4CFEfErbYtK\nkiR1XKtFwK8Bw03vh6tpkiSpR8wZGRmZ8koRcQlwU2Z+qXp/J3BSZv6ozfFJkqQOabUnYAPPP/Pf\nC9g4/XAkSdJMabUIuBV4G0BEHABsyMzNbYtKkiR1XEuXAwAi4mzgYOA54H2ZeU87A5MkSZ3VchEg\nSZJ6m7cNliSpUBYBkiQVqiueHdAvtyCOiE8Bi2ns108CdwFXA3Np/HpiZWZO/9Z0Mywi5gP3AR8H\nbqPHc4qIdwIfBLYD/w34Hj2aU0TsClwFLATmAf8d+BdmIR/bcfeyDXe32WzHs94T0HwLYuBk4H/O\nckgtiYhDgVdXebwZuABYBVyYmYuBdcBJsxjidHwUeKx63dM5RcSvAn8B/B5wOHAkvZ3TiUBm5qE0\nfrHzaWYhH9tx17MNd7cTmaV2POtFAP1zC+LbgbdXrx8HBoElwA3VtBuBpTMf1vRExKuA/YGbqklL\n6O2clgJrM3NzZm7MzPfQ2zn9AvjV6vXC6v0SZj4f23GXsg33hFlrx91QBPTFLYgz89nM3FK9PRm4\nGRhs6r55FNhzVoKbnnOBM5re93pO+wK7RMQNEXFHRPwBPZxTZn4B+A8RsY7G/8DOZHbysR13L9tw\nl5vNdtwNRcBoc2Y7gOmIiCNpHDzeP2pWz+UVEe8CvpmZD46zSM/lRCPmXwWOodEFdznPz6OncoqI\n44GfZOYrgN8HPjNqkdnKp6f242j90o5tw71hNttxNxQBfXML4ohYDnwE+M+Z+UvgyWpADsAiGrn2\nksOAIyPin4A/BD5G7+f0c+Abmbk9M38MbAY293BObwRuAahu2LUXsGUW8rEddyfbcG+YtXbcDUVA\nX9yCOCJ2A84BDs/MHQNw1gIrqtcrgDWzEVurMvO4zHxdZr4euJTGyOKezonG9+33I+JF1QCjXent\nnNYBBwJExK8DTwJfYebzsR13Idtwz5i1dtwVdwzsh1sQR8R7gLOA5icpnkCj4e0MPAT818x8Zuaj\nm76IOAtYT6NavYoezikiTqHR1QvwCRo/AevJnKqfFn0O2IPGT9o+BvyAWcjHdtzdbMPdazbbcVcU\nAZIkaeZ1w+UASZI0CywCJEkqlEWAJEmFsgiQJKlQFgGSJBXKIkAvEBF7RsT2iPiz2Y5F0tTZhlWX\nRYDGcgKNx1ieOMtxSGqNbVi1eJ8AvUBE/Ag4FbgCOC4zvxER/xk4m8bjSG8B3p+Ze0fEQuBiYAjY\nDTg3M1fPTuSSwDas+uwJ0PNExME07lj1VRp3q/qvETEH+Czwrup517s1rfIJYE1m/j6Nu8Wtioih\nGQ5bUsU2rKmwCNBoJwNXZOYIjadzHQvsA+zadBvY/9O0/KHAqRHxdRrPK38GePnMhStpFNuwahuY\n7QDUPSLiV2g8qOInEXFMNXkujYPEc02LPtv0ehvwR5l598xEKWk8tmFNlT0BavZfgH/IzP0z8zWZ\n+RrgPTQGGT0XEVEtd0zTOnfSONMgIuZHxP+KCItLaXbYhjUlFgFqdjJw0ahp/wfYH7gAuD4ibqFx\n5rC9mn8W8MqIuBO4Hfi/mbkdSbPBNqwp8dcBqiUijgS+l5kPVt2Mp2Tm8tmOS1I9tmGNxS4f1TUX\n+LuIeKJ6feosxyNpamzDegF7AiRJKpRjAiRJKpRFgCRJhbIIkCSpUBYBkiQVyiJAkqRCWQRIklSo\n/wdXjDfZQjYC1AAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7ff7684da550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# grid = sns.FacetGrid(train_df, col='Pclass', hue='Survived')\n",
"grid = sns.FacetGrid(train_df, col='Survived', row='Pclass', size=2.2, aspect=1.6)\n",
"grid.map(plt.hist, 'Age', alpha=.5, bins=20)\n",
"grid.add_legend();"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "36f5a7c0-c55c-f76f-fdf8-945a32a68cb0"
},
"source": [
"### Correlating categorical features\n",
"\n",
"Now we can correlate categorical features with our solution goal.\n",
"\n",
"**Observations.**\n",
"\n",
"- Female passengers had much better survival rate than males. Confirms classifying (#1).\n",
"- Exception in Embarked=C where males had higher survival rate. This could be a correlation between Pclass and Embarked and in turn Pclass and Survived, not necessarily direct correlation between Embarked and Survived.\n",
"- Males had better survival rate in Pclass=3 when compared with Pclass=2 for C and Q ports. Completing (#2).\n",
"- Ports of embarkation have varying survival rates for Pclass=3 and among male passengers. Correlating (#1).\n",
"\n",
"**Decisions.**\n",
"\n",
"- Add Sex feature to model training.\n",
"- Complete and add Embarked feature to model training."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"_cell_guid": "db57aabd-0e26-9ff9-9ebd-56d401cdf6e8"
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ff74c535ba8>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATcAAAHUCAYAAABf8m8eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8VPW5+PHPTCZ7IASSsARk50EWWV0QWaRq1Uq1SkWr\n1+q1i9ZqF1tv+7vtvdra1tZab9XW2ttbrV0sWqu4VVEBWUXZF+FhhxC2ENbsmcz8/jiTMAmZZBIy\nSybP+/XKK3PO+Z4z3wnhyfec7/K4/H4/xhiTaNyxroAxxkSCBTdjTEKy4GaMSUgW3IwxCcmCmzEm\nIVlwM8YkJE+sK2BaT0QGAAosb3ToTVV9NMxrLAQeVtX32liHNp8vIg8DXlV9sA3nuoGfAVOAaqAr\n8KyqPtnaa5nEZsGt4ypW1emxrkQM3AwIMFlV/SLSDXhXRN5Q1V0xrpuJIxbcEpCIlAIPAzOBFOCn\nwJdxgsLdqjovUHSmiDwAFAA/VtW/i8hw4BnAi9Mq+oGqviMiDwIDgf7A/Y3e71lgl6r+SETuBW7E\n+d3aAnxNVStE5CfANUAhUAZsbnSNEcBvm/g4N6nqwaDt7kAGkITT+jsOnN/an5FJfBbcElMmsFJV\nHwncPs5U1atF5Hbga0BdcPOo6hUiMgRYKiIvAr2AH6rqIhGZBDwJvBMoPxCYFmgxASAiDwGlgcB2\nAfA5YGqgzOPAl0TkHeAWnOBaC3xEo+Cmqp8A08P4bM8Ds4B9IvIe8B7wD1UtbeXPyCQ4C24dV14g\ncAV7QFU/CrxeEvi+D1gW9Do7qPy7AKq6PRCs8oADwKOBllYKkBtU/kNVDZ6vdzswHLggsD0dGAIs\nCFwvE6gBRgOrVLUKQEQWte6jnqaqJ4BpIjIKuAwnaP5MRC5S1T1tva5JPBbcOq6Wnrl5Q7x2Bb32\nNdrvB54CXlDVPwYCyBtBZaobvUcqTgCcgdOCqgJeU9WvBxcSkVmN3iupcWXDvS0VkWTAr6obgY3A\n/4jIX4EbgF81cb7ppCy4dW6fAl4TkWE4AbAY6AlsChyfjRPAQnkGOATMFZELgaXAfSKSpaqlIvI1\nYA3OLeh4EUnBCaDTgH8GX6iVt6VbgIegPtj1A+aEca7pRCy4dVxN3ZbuUtU7WnENr4jMxbmVvC/w\nnOwx4HkR2Y3TEro+sO9UUxdQ1Q0i8ivgOZwOg98AC0WkEtgPPKeq5SLyKrAC2AOsbUUdG7sHeEJE\nVuB0TGQA/1TV187imiYBuWzJI2NMIrIZCsaYhGTBzRiTkCy4GWMSkgU3Y0xCsuBmjElIHX4oSHHx\nKevuNZ1WXl4XV8ulOidruRljEpIFN2NMQrLgZoxJSBbcjDEJKaIdCoFVJeYCj6vqU42OXYaziGIt\n8Jaq/jiw/3HgIpwJ1t9Q1Y8jWUdjTGKKWHATkUychQ7fD1HkCeDTQBHwgYi8jLOe2FBVnSQi5wJ/\nBCZFqo7GmMQVydvSKuBqnJUhGhCRQcBRVS1UVR/wFs7yO58CXgVQ1c1Ajoh0jWAdjTEJKmItN1X1\n4iyp09ThXjhrh9U5DAzGWfV1VdD+4kDZkxGqZpuU11Tw0cHVLCpaxqHyYiRnKHePuYNkd4cfNmhM\nwoiX/42hBiK2OEAxJycDj+eMhV0jZtnelTz98V+o8lbV79Nj23jww0f47iV3MbTHwKjVxRgTWqyC\n236cFlmdgsC+6kb7++Cs6R/SsWPl7V65ULYc3cZTa/+InzMnRRyvPMnDC5/ge+d/g9z0HlGrk+nc\n8vK6xLoKcSsmQ0FUdTfQVUQGiIgHZwXXeYGvWQAiMh7Yr6pNrgAbC2/umtdkYKtT4a3k/b1tzn3S\n4czRV7ln/gPM0VdjXRVjzhDJ3tIJwGPAAKAmkCTkNZylsF8B7gZeCBSfo6pbga0iskpEluEkFLkn\nUvVrrZKKo+w80XJypSVFKzhZfYrM5EyykjPJTM4gMzkj6HUmWckZpHnScLs67jDDSm8Vi4uchPeL\ni5Zz7eCrSPM0l27BmOjq8MuMR2vi/O6Te3l05VMtFwyT2+Umw5NeH+yCv2cGfc8K+p6RnB4XAfFE\n1UkWFC7m3b0f1O/77oSvMyD7nBjWqnOyifOhxUuHQtzrmtK+zzZ8fh+lNWWU1pRxKMxzXLgCATEj\nZABseMwJmEnu9utwWXN4A3/65AVqfN4G+x9d9RQ3DLmGGedMbbf3MuZsWHALU/e0HIZ0G8j247ua\nLTe02yAkZwhlNeWU1pQ1+F5WU05lbWWb6+DHT5m3nDJvOVSEf15aUtrpVmFKBpmehq3ErJRMMj2B\n78kZZHoySE5KPuM6e0/u44+b/orP72viXeDl7W+Qk5bDuPzRbf2IxrQbC26t8JmBV/Dk2v8N+Z87\n05PBv507mx7pOSGv4fV5KaupoKymjLKaMkprygOvg4Ngw+1ybysiWRMqayuprK3kSOXRsM9JSUo5\nHfAC3/ecLAz52eu8s2c+Y/NG4XLZ3ZKJLXvm1kprizfyl80vUdEo4OSkZvPV826nX5eCdn/PWl8t\n5d6KZgNgU8ea69mNpIcmfY/c9O4xee/Oxp65hWYtt1YamzeK4TlDWXZgBS9ve6N+/wMT76VramRm\niiW5k+iSkkWXlKywz/H5fVR4K88Igo0DYONjLbXMwlHhbfuttzHtxYJbG6R5Urmg14QGwc3djg/t\n24Pb5a7vXAiX3++nsrYqKCAGtRCrS3mvcBHeRh0JTclJzT6bqhvTLiy4tZHH5cGFCz9+XLjwuDr+\nj9LlcpHuSSPdk9bkLAuvv5b3goZ/NCXFnczJ6lNkpWRGqprGhCX2g6Y6qDRPKlMKnNWYphRM6hQD\nWC87Zxo5qd2aLVPtq+GxVb9lc8nWKNXKmKZZh4JplSMVR/nTJ39n54ndDfZnejKcISoBbpebzw+9\nlql9bTm+SLIOhdAsuJk20aPbeGLt/9ZvP3LJf7H12A6e3zynwXO5S/tewvVDr4mLmRWJyIJbaPYb\nZ9qkoEufBtsul4sJPcfwzXFfpUvy6V7dBfuW8Mz6P1FpPagmyiy4mXY1MLs/3534dfpknl65amPJ\nZn61+mmOVR6PYc1MZ2PBzbRJXW8xcEZvcY/07nx7wtcY0f30KsxFpQf4xcon2XOyMOp1NZ2TBTfT\nJi31Fqd70rjrvNuZ1vfi+n0nq0/x+OrfsebwhqjW1XRO1qFgIm5h4VL+se21BtPBPjvoSq7of6nN\nQT1L1qEQmgU3ExUbj2zm2U1/o7L2dO6Ji3pN5Obh1+OxxDptZsEtNAtuJmqKSg/w9LpnOVZ1umNh\naLdBfGn0v5GVbDMa2sKCW2gRDW6hsseLSAHw16Cig4Dv4SSJeQnYFNi/QVXvbe49LLh1LCeqTvHM\nhucadCzkp+dy15g76JmRF8OadUwW3EKLWHATkWnAd1X1mrrs8ap6xnD1QIKYhcCVwETg66o6K9z3\nseDW8VTX1vDnzXNYfXh9/b4MTzpfHn0bw3IGx7BmHY8Ft9Ai2Vsabvb424GXVbU0gnUxcSQlKZk7\nRn6BKwd8qn5fubeCJ9f+L8v3fxzDmplEEsng1jirfF32+Ma+BPxf0PYIEXlNRJaIyOURrJ+JIbfL\nzcxBn+a2c2eT5HKWi/L5ffxly0u8uv2tdllXznRu0eymOqP5LCKTgC2qejKwaxvwEPAiznO4BSIy\nRFWrQ1002hnnTfu6Jm86g3sX8Mslz3CqugyAd/cu5ITvOPdeeAepnpQY19B0VJEMbo2zyjeVPf4a\n4L26DVUtAuYENneIyEGcbPQhs7JEM+O8iYxcevHt8ffwu/XPcqjcaex/tG8t/3niUe4673ayI7TC\ncSKwjPOhRfK2NJzs8ecD6+o2ROQWEflO4HUvoCdQFME6mjiRn5HLdybcw7CcIfX79p7axy9WPsm+\nU/tjWDPTUUV6KMgjwFROZ48fB5wIZJxHRDYAl6nqocB2F+BvQDcgBXhIVd9q7j2stzSx1Ppq+bu+\nwrIDH9XvS0lK4d9HfoHRuSNiWLP4ZL2lodkgXhN3/H4/7xcu4tXtb9VP2XLh4vqh13Bp30tsylYQ\nC26hWXAzcWtd8Uae2/QC1b6a+n2XFFzEjUOvJSnOEvLEigW30JoNbiIytbmTVXVRu9eolSy4Jba9\np/bxu3XPcaL6ZP2+4TlDuXPUrWQkp8ewZvHBgltoLQW3xYGXqcBoYAuQBAiwQlWbDX7RYMEt8R2v\nOsHv1j1LYenpjoVeGfncPeaOJrN0dSYW3EJrtrdUVaeo6hRgMzBQVcep6nnAEGBnNCpoTLfUbL45\n/u4GHQoHyw/z6MqnzkhUY0ydcIeCDFHVg3UbqloIDIxMlYw5U5onla+Mvo1P9Tt9s1BaU8av1/ye\njw+uiWHNTLwKdxDvERF5AViCM6zjYsBGz5qocrvcXD/0Gnpm5PH3ra/g8/vw+rw898kLHC4v5uqB\nl1tPqqkXbsvtJmA+zrO2EcAy4PORqpQxzZlccCH3jLmTdE9a/b63dr/Hc5+8QE1tTTNnms4k7KEg\nIjIK5/b0VRHppqpxkcrIOhQ6r4Nlh3h63bMcqTxav29Qdn++MvqLdEnJaubMxGEdCqGF1XITkW8B\nf8SZ1A7wQxH5QcRqZUwYemX25LsT72Vw9oD6fTtP7OHRlU9yoOxQ7Cpm4kK4t6U346yoW/cn8rs4\nk96NiamslEzuHfcVzu85vn5fSeUxfrnyN2wu2RrDmplYCze4nVLV+gW2Aq9twS0TF5LdHr44YjbX\nDPx0/b7K2kp+u/6PLC5aHsOamVgKt7d0h4j8N85qutcDs4FPIlctY1rH5XJx1cBPkZ/Rg+c3v4jX\n58Xn9/F3fYVD5cVcP+Qa3C5L09uZhPuvfQ9QhrP80K3AisA+Y+LKhJ5j+ea4r9Il+XSHwoLCJTyz\n/k9UeitjWDMTbWH1lorIz4A/q2rctdast9Q0paTiKE+vf7ZBx0JBVm/uPu8OctK6xbBm7ct6S0ML\nt+VWCvxdRFaJyLdEpGckK2XM2eqR3p37J9zDiO5Sv6+o9AC/WPlkg7SCJnG1asmjQIq+2Tg9pYdV\n9epIVSxc1nIzzan11fLy9tf5YN+y+n3J7mS+OOImxuWPjmHN2oe13EJrbQ6FCpxnb+VAiynCQyVl\nDhzbDRQCtYFdt6hqUXPnGNNaSe4kbhx2Hfnpefxj22v48VPjq+EPG//MtYOu4vL+023KVoIKK7iJ\nyPdx8iGk4CwDfpuq7m7hnGnAUFWdVJeUGWiclPmq4HylYZ5jTKtN7zeZ3PTu/HHTX6mqdZKpzd35\nLw5VFHOzXI/HHc1EcCYawn3mlgPcoaqjVfVnLQW2gHCTMp/tOcaEZVTuudw/4R5yUk93KHx4YCVP\nrf0DpTVlMayZiYRmg5uI3BF4WQXMEpEfBX+1cO1wkjL/LpB8+RERcYV5jjFtVpDVm+9OvJf+XfvV\n79t2fCePrfwNh8uLmznTdDQttcXrZiF42+G9Gj/Y+C/gbZwpXa8CN4RxzhksKbNprTy68JNe3+Gp\nj/7Eh4WrAThccYRfrv4N35n8VUbmD4txDU17aDa4qeqfAi/TgedbOc6t2aTMqvp83WsReQtnGfNw\nEjk3YEmZTVvdMuRGurm78fae+QCUVZfz8MInuFmuZ1Kf81s8f46+yqKiZUwtuJjZcl2kq9skS8oc\nWthzS2n9OLeQSZlFJFtE3hGRlEDZacDG5s6JR3+Zp/z7I/P5yzyNdVVMG7hdbmYOvpLbzp1Nkstp\n/df6a/nLlpeYu+Nf+Pyhp09Xeqvq560uLlpOpbcqKnU24QsruKnqTwK5E24FsoE3A62t5s5ZBqwS\nkWXAE8A9InK7iHxOVU8AbwEfishSnGdr/2jqnDZ/sgirrPayYHURAAvWFFFZ3R537iYWLuw9gXvH\nfplMT0b9vnl7FvB/G/9KdaBntTGv31ufU9WPH6/f/v3jTUTHuanq9xrtWhd07NfAr8M4Jy55a+t+\ntcHvd7ZNxzU0ZxDfmfh1nl7/Rw6XHwFgbfEGjq4+xl3n3U52qnXadzThLlb5fRFZBbyOExBvU9Vp\nEa2ZMVGWn5HLdyd8nWE5Q+r37T21j1+sfJJ9p/Y3c6aJR60Z5/bvrRznZkyHk5GcwdfH3MnFvU93\nKByvOsFjq3/LhiNxt26EaUa4we18VV3XcjFjOr4kdxJfGD6L6wZfjSswGqm6tppn1v+J+YWL8fv9\nVFsimrgX7jO3tYFBu8uA+iesqjo/IrUyJsZcLheX959OfkYuz216gWpfDX78vLztdT46sPqMAb9P\nr32Wm4Z/jn5dCmJUY9NYuC23scAU4D+AHwa+OnWCmIoq6x3rDMbkjeJbE+4mO+V0h0JhaRFVvoa9\nqLtP7eXx1U+z99S+aFfRhNCqJY/iUbSXPDpVXs3LH+xk+cYD1AT1kF5xfj9mTR+MJ8mWsk5Ex6tO\n8KtVT1MSlEawKQO6nsN3J349SrWyJY+aE+6qIIuBM4KIqk5t9xrFsVPl1fzsL6s5ePTMWRHzPi5k\nf0kZ991wngW4BNQtNZu8jNwWg9vuk3spKj1AQVbvKNXMhBLuM7fgW9AUYAbO6rydyiuLdjYZ2Ops\n3HmUxesPcOk4e+6SiMKdWH+g9KAFtzgQVnBT1Q8a7Xq3pRkKiaaiysuyTQdbLLdgdZEFtwSV7E4O\nq5wnKbxyJrLCvS0d1GjXOYA0VTZRHTpWTnVNy6la9xWXUlFVQ3qq/YInmhE9hnGo/HCzZZLdHoZ2\na/zfxcRCuLel7we++wNfJ4EHI1GheOVuxVLU335qGaMGdWf8sDzGDO5BRpoFukQwteBiFu9bjtdf\nG7LMRb3PJzM5I+RxEz3NBrfAKrh3qurAwPZdwN3ADpwVPDqNPrmZdM1M4WRZ0xOpg1XV1LJKi1ml\nxSS5XQzvn8P4YXmMG5pLt6zUKNTWREJ+Ri53jPwCz276W5MBbnjOUK4fck0Mamaa0uxQEBF5Adit\nqt8XkWHAh8DngcHADFW9KTrVDC2aQ0FeW7qLVxfvaraMJ8kVchK9CxhU0JXxw/IYPyyPnjn2F74j\nOlR2mHf3fMDyg6dzF9049DouKbiQJHd0F061oSChtXRbOkhVbw68ngW8pKrvA++LyBciW7X4c/VF\n/dl94BRrtx8JeXzm5AFs3HmU1VuLWbf9COVBg339wI6ik+woOslLC3ZQkJfJ+KF5TJA8+uVnWRam\nDqJnZj7XDb26QXCb0GtM1AObaV5LwS14uMd04P+Ctlt+up5gPElu7rl+FEs3HOS9lfvYV3z6x/Pl\nmSOYNNJZRHiCOAHLW+tDC4+zemsxa7YWc7y04S1tUXEZRcVlvL5sN7nZafUtuiEF2bjdFuiMORst\nBTePiOQDXXBS7M0GEJEswljPLRElud1MHdOH8cPyuO/Xi+v3jx7U44yyniQ3Iwd0Z+SA7txy+TB2\nHTjJ6q3FrN56hEONxssdOVHJvI8LmfdxIV0ykhk3NJfxw/I4t393kj02KNiY1mopuD0CfAJkAA+q\n6jERSQeWAP8b6colErfLxeA+2Qzuk82saYPZX1IeCHTF7DnYcCX1U+U1LFp3gEXrDpCWksR5g3sw\nflgeowf1ID3V8msaE46WEsT8S0R6A+mqejKwr0JEHlDVFntLW8g4fynwM5yM8wp8CZgKvARsChTb\noKr3tv5jxTeXy0VBbiYFuZnMvHgAR05UsGbrEVZvLWbrvuME9/FUVtfy0ebDfLT5MJ4kFyMGOENM\nxg7JpWtmSug3MaaTa7EZoKo1QE2jfeEEtpayx/8euFRV94nIS8CVOMuXf6Cqs1rxGTq83Ox0Lj+/\nH5ef34+T5dWs2+YEuk27j+GtPf1o01vrZ/2OEtbvKMHlgqEF2fXP6XK7pcfwE3Q+HpcHFy78+HHh\nwuOyFnW8ieS/SIPs8SKSIyJd61qAwISg18VAD5zg1ql1zUhhypg+TBnTh4oqLxt3OT2v63ccoaLq\n9Ngqvx+27jvB1n0n+Pv87ZyTn1Uf6AryMq3nNcLSPKlMKZjEoqJlTCmYRJrHxi/Gm0gGt17AqqDt\nuuzxdbe3JwECt71X4KwRNxoYISKvAd2Bh1T13QjWsc08Sc4arX7A5XK221t6qofzh+dz/vB8arw+\ntuw95vS8bjtyxmDivYdL2Xu4lFeX7CK/W3p9oBtU0LVVsytM+GbLdTHLV2paFs229Bn/wwI9sa8D\nX1PVEhHZBjwEvAgMAhaIyBBVDTktIJYZ56+ePJA3l+7i6osH0q8gJ+Lv16d3NjMuHECtz4/uOcry\nDQf4cOMBDpY0bPAePl7B2x/t5e2P9pLTJZWLRvXmotG9GT04t117Xn/3z/W8uXQXn5k8kLuuP6/d\nrmtMe4jYYpUi8iBwQFWfCWzvBMYEJWbuCiwA/lNV3w5xjY+A2aoaclpAtBerjDd+v5/Cw6X1Q0yC\nx941lp7qYcyQHowf6vS8pqa0/Y9CZbWXe361qL7l+ptvTSUtxZ47RZvNUAgtkr+N83BaYc+EyB7/\nGPB4cGATkVuA3qr6SxHpBfQEiiJYxw7P5XJxTs8unNOzC9dNGcTh4xWs1mJWbytmx74TDVYYrajy\n8uGmQ3y46RDJHmcM3vhheYwdmktWeusm91veVhPvIrrMuIg8gjO8w4eTPX4ccAJ4BzgGLA8q/jfg\nhcD3bjiLYj6kqs2uG9fZW27NOVFaxZrtTs/r5t3HqPU1/aNyu1wM63e657V717QWr11aUdNgEPMT\n35jS6gBpzp613EKzHAqdRHmll/U7j7B66xE27Cihqib0sj0DenWpD3R9cs+ciLJz/0neXrGHlXp6\nZdq7rx3JxOH51ksbZRbcQrPg1gnVeGvZtNvpeV277QilFaFzcPbqnlEf6Ab27sK8jwuZM397k2Uv\nHV/ArZcPswAXRRbcQrPg1snV+nxs33eCVYHJ/SUnq0KWzUpPbjYQAvzbFcO4dHzf9q5m3PH5/Owr\nLqW6xkd+TnrMZotYcAvNgpup5/f72XPoVH3P6/4jZa2+Rk6XVL5z01jSUjykJieRkuxOqGxgfr+f\n91buY97Hexv8IZggeXx++mDyo7xGnwW30Cy4mZAOHj09uX/n/pMtnxBCkttVH+hSk5Oc1ylJpHrc\npCQnkZqSRIrH2Z+a4g56nUSKx326fHJgu/61Uz7JHZ3g6ff7+fM7ysK1+5s8npWezPdvHU/vHtFb\nMMeCW2gW3ExYDh0t5/u//zDW1WiSJ8kVCHRJTrD0uOuD4RlBNfA6pcljdee4g14n1a+tt2nXUR6b\ns7bZugw/pxsPfGF8ND42YMGtOTbq0oQlPyedHl3TKDlZGeuqnMFb68db622w6nF78iQ5Lc9qb8vr\ns27Ze5wDJWVRbb2ZpllwM2FxuVxMG9uHfy7a2Wy5yaN7cftVw6mu8VFVU0tVTW2j1w33Nd5uuryP\n6qCy0R4wXBc8w7X3UKkFtzhgwc2E7bKJfVmph9l7qOkpXjldUrl+6mCS3G7SU90RW1iz1uc7q+BZ\nVV1LtTdwvLqWKq8TPJ39Zx88k2yJ+Lhgwc2ELS3FwwM3j+OF97bx4ScHCVpqjlEDu/PFK4eT0yXy\nS/9EI3hWVfuo9jYMhm8s2836HSUt1M3F0H7dIlIv0zqJ00dvoiIjLZk7rxnBj/79wgb7v/LZkfTI\nbnnaVkeQ5HaTkeahW1YqPXMyOKdnF4YUZDNr+mBaGp98wbk9ybYVkuOCBTfTJl064X/gvnlZfPHK\n4Weu3RUwoFcXbrl8WFTrZEKz21JjWmHqmD70yc3kXyv2sGbr6fy1n5syiCsu6EdqsuUujRfWcjNt\nUrcSMURuJeJ4NaQgm3uvP48Z4wsAmDG+gJmTB1hgizM2iNe02V/mKfNXFzFjfAG3XiGxrk6nZIN4\nQ7PgZkwHZsEtNLstNcYkpIh2KLSQlPky4Kc4SZnfUtUft3SOMcaEK2Itt+CkzMCdwBONijwB3ABM\nBq4QkRFhnGOMMWGJ5G1pg6TMQE4g4xUiMgg4qqqFquoD3gqUD3mOMca0RiSDWy+cRMx16pIyN3Xs\nMNC7hXOMMSZsMU3KHMaxFnuCYpmU2RgTvyIZ3PbTsNXVBzgQ4lhBYF91M+c06dix8uYOG5PQ8vK6\nxLoKcSuSt6XzgFkAjZMyq+puoKuIDBARD3BNoHzIc4wxpjVikpRZVV8RkanAzwNFX1bVXzZ1jqqu\na+49bBCv6cxsEG9oNkPBmA7MgltoNkPBGJOQLLgZYxKSBTdjTEKy4GaMSUgdvkPBGGOaYi03Y0xC\nsuBmjElIFtyMMQnJgpsxJiFZcDPGJCQLbsaYhGTBzRiTkCy4GWMSkgU3Y0xCsuBmjElI0cyhYNqJ\niAwAFFje6NCbqvpomNdYCDysqu+1sQ5tPl9EHga8qvpgG9/7SuCHOL+/ScBm4FuqeqQt1zOJyYJb\nx1WsqtNjXYloE5HzgKeBq1R1i4i4gO8BLwIzYlo5E1csuCUgESkFHgZmAinAT4EvAwLcrarzAkVn\nisgDOAl6fqyqfxeR4cAzgBfoCvxAVd8RkQeBgUB/4P5G7/cssEtVfyQi9wI34vxubQG+pqoVIvIT\nnFwZhUAZTmsr+BojgN828XFuUtWDQdsPAD9X1S0AquoXkZ+HONd0YhbcElMmsFJVHwncPs5U1atF\n5HbgaziJeAA8qnqFiAwBlorIizjZx36oqotEZBLwJPBOoPxAYFogoAAgIg8BpYHAdgHwOWBqoMzj\nwJdE5B3gFpzgWgt8RKPgpqqfANPD+GwjgccbnesDToT5szGdhAW3jisvELiCPaCqHwVeLwl83wcs\nC3qdHVT+XQBV3R4IVnk4qRQfDbS0UoDcoPIfqmrwGlm3A8OBCwLb04EhwILA9TKBGmA0sEpVqwBE\nZFHrPmoDtTjP2YxplgW3jqulZ27eEK+DE4r4Gu33A08BL6jqH0VkFPBGUJnqRu+RihMAZwDvAVXA\na6r69eBCIjKr0XudEZxacVu6AZiM0/oLPv8iVf2wifNNJ2XBrXP7FPCaiAzDCYDFQE9gU+D4bJwA\nFsozwCFgrohcCCwF7hORLFUtFZGvAWtwbkHHi0gKTgCdBvwz+EKtuC39BfCuiMyvS/soIt8BrgQu\nC+N800kP6MOnAAAgAElEQVRYcOu4mrot3aWqd7TiGl4RmYtzK3lf4DnZY8DzIrIb+BVwfWBfk8mx\nVXWDiPwKeA6nw+A3wEIRqQT2A8+parmIvAqsAPYAa1tRx8bvt1lErgd+IyKpOLe9a4Dr2npNk5hs\nmXFjTEKyGQrGmIRkwc0Yk5AsuBljEpIFN2NMQrLgZoxJSB1+KEhx8Snr7jWdVl5eF1fLpTona7kZ\nYxKSBTdjTELq8LelJjYOHStna+Fx/H4Y0KsL5/TsEusqGdOABTfTKidKq3juX1tYt6Okwf7BBV25\n46pz6ZObGaOaGdNQRKdfBVaVmAs8rqpPNTp2Gc4iirXAW6r648D+x4GLcCZYf0NVP27uPWLZoVDj\n9eFygSepc9zdl1XW8PDzqzh0tLzJ41npyfzgtgnk52REuWadl3UohBaxlpuIZOIsdPh+iCJPAJ8G\nioAPRORlnPXEhqrqJBE5F/gjMClSdWwLb62PD9buZ+GaIoqOlAEwrG82l03sxwTJw+VK3N+1eR8V\nhgxsAKUVNfxz0U7uunZUFGtlTNMieVtaBVwN/EfjAyIyCDiqqoWB7bdwlt/JA16F+tUfckSkq6qe\njGA9w1bj9fHky+vZuOtog/1b951g674TXHF+P2bPGBJXAc7v91Pj9VHt9Tnfa2qpbvTdOV5LdY2v\n2WNrthW3+H6rtJiyyhoy05Kj8OmMCS1iwU1VvThL6jR1uBfO2mF1DgODcVZ9XRW0vzhQNi6C25vL\nd58R2ILN+7iQoX2zmSD5zV7HFwg4oYJNy4GoiX1124321Xh9RPO+vdbnp+REpQU3E3Px0qEQqqnT\nYhMoJycDjyfyq07XeH18sG5/i+X+PG8rK7YUU1VdS3VNLVWBr+qa2vp91V5fi9fpyFLSUsjLs95T\nE1uxCm77cVpkdQoC+6ob7e+Ds6Z/SMeOhX4G1J72HjrFidLGq2yf6WRZNSs3H4pCjdrOk+Qi2ZNE\nSrKbFI+blMDr0/tOHwve9/GWQ+w/0vLP+79+v4zLJvTjMxf3txZchNkfkdBiEtxUdbeIdA0kF96H\ns4LrLTi3pQ8Bz4jIeGC/qja5Amy0+SK8qKcnKRBokhsGm9S6oBM4dmYASiK50XmNvyd7Gu5zu9v2\nTHDEgBwe+cvqFm9zvbV+3v5oL4vX72fmxQO4dHxfkj2do0fZxI+IDQURkQnAY8AAnKWgi4DXcJbC\nfkVEpgI/DxR/WVV/GTjvEWAqTkKRe+rWyQ8lWkNBKqu9fOuppVRV1zZbLjszhU9fcM7poBRGIEr2\nuNsccKJt8fr9/Olf2mSwHz2oB3sOneJkWcMWbm52GrOmD+b84flx1dmSCGwoSGgdfpnxaI5z+/M8\nZcHqombLfPFKYdrYgijVKDYOHi1nweoitPAYPh8M6N2FS8cVMLB3VyqqvLzz0V7e/mgv1TUNny0O\n6tOVGy8dwrB+3WJU88RjwS00C26tUFZZwyN/XU1RcVmTx8cOyeWe60eR5LZbsGOnqpi7ZCeL1x+g\n8a/YuKG5zJo+mN49bDbD2bLgFpoFt1Yqq6zh1UW7WLJhP1VBLZOrJ/XnuksGdprZCuHaV1zKSwt2\nsGFnw+labpeLaeP6cO3kgXTNTIlR7To+C26hWXBroxNl1XzrSSepu8sFv/nWVNJS4mVkTfz5ZPdR\nXpy/nb2HSxvsT0tJ4qqL+nPF+f1ITbZE8q1lwS00C25n4S/zlPmri5gxvoBbr2hysLIJ4vP7+XDT\nQf65aCdHT1Y1OJbTJZXPTRnExaN6dZjOlXhgwS00C24m6qpranlv1T7eXL6biqqGvc9987K4ccZg\nRg3sEZvKdTAW3EKz4GZi5lR5Na8t3c3CNUXU+hr+M44c2J0bLx1Cv/ysGNWuY7DgFpoFNxNzh46W\n848PdrBKG07MdwEXj+7F56YMonvXtNhULs5ZcAvNgpuJG9v3nWDOgm3sKGq4TkKKx83l5/fj6ov6\nk55qnTbBLLiFZsHNxBW/388qLeYfH+zg8LGKBse6ZCTz2ckDmTa2jw25CbDgFpoFNxOXvLU+Fqwp\n4vWluymtqGlwrGf3DD4/fTDjhuZ2+ulcFtxCs+Bm4lp5ZQ1vfriHdz/eh7e24XSuoX2zuXHGEAb3\nyY5R7WLPgltoFtxMh1ByopJ/LtrJ8k0Hzzh2/vB8bpg+mPxu6VGtUzyMc7TgFpoFN9Oh7Dl4ihcX\nbGfznmMN9ie5XcwY35eZkweQlR75NeQqq73c86tF+IntDBULbqHZU1nTofTv1YXv3DSWb35+DAVB\naQRrfX7eXVnI9363nH+t2EONt/mlqc6Wt9Zfv66d3+9sm/hi/eqmw3G5XJw3uAcjB+awdMNBXlm8\ns36V5PIqLy8t2MH8VUXcMG0QF4zoibuTdzp0VtZyMx1WktvN1DF9eOQrk7jukoENJt6XnKzk969/\nwo//tJItjW5hTecQ0ZZbqATLIlIA/DWo6CDgezh5FF4CNgX2b1DVeyNZR9PxpaYk8dlLnPFvc5fs\n4oN1++vXkNtz8BS/eGENYwb3YNalQxrcyprEFsmkzNMIkWBZVYuA6YFyHmAhzhLkE4EPVHVWpOpl\nEld2Viq3XTmcyyb24x8Ld7B2+5H6Y+t2lLB+ZwlTx/ThuksGkp2VGsOammiI5G3ppwhKsAzkiEjX\nJsrdjpNDobSJY8a0Wp/cTO6bdR4P3DyO/r1OZ4fy++GDtfv53jMfMnfJrhbzYZiOLZLBrXHi5boE\ny419Cfi/oO0RIvKaiCwRkcsjWD+T4Ib3z+GHX5zIV2aOoEfQxPuqmlrmLtnF936/nEXr9uPzWU9n\nIopmb+kZXVYiMgnYoqp1M6W34aT2exHnOdwCERmiqiEThkYrKbPpuGbmd+XTkwfxxpJdvPj+VsoC\n07lOlFbz3L+2MH9NEXdcM5IJrcjOldoow1ePHlm2XHqciWRwa5x4uakEy9cA79VtBJ7FzQls7hCR\ngzgJm3eFepNoJWU2Hd+UUT0ZN7g7byzbzfur9tWvIbf34Cke+sOHnNs/hxsvHdLgVjaUxvNdS0pK\nqSqPfgJqS8ocWiRvS+cBswCaSbB8PlCfl1REbhGR7wRe9wJ64uQ7NaZdZKUnc9OnhvKTr1zEBefm\nNzi2ec8xHnruY/739U2UnKiMUQ1Ne4lYcFPVZcAqEVkGPAHcIyK3i8jngor1Bg4Hbb8GTBORxcBc\n4O7mbkmNaav8buncde0o/vO2CQzt23Di/fJNh/j+7z/kpQXbKa+sCXEFE+9sbqnp9Px+P2u2HeGl\nhTs4dLThY46s9GRmXjyAS8cXNFhDrrSihvt+vbh++4lvTInKnNbGbG5paBbcjAnw1vpYtG4/c5fs\n4lR5wxZbfrd0Zk0fzATJY2vhcd5cvoeNu47WH//MpP4xWSnYgltozQY3EZna3Mmquqjda9RKFtxM\ne6uo8vKvFXt456NCarwN15DLzU7jSIjncf3ys/juzeOi2oKz4BZaS8Gtrt2dCowGtgBJgAArVLXZ\n4BcNFtxMpBw9Wckri3eybMNBwv0lu3BET7762ZERrVcwC26hNduhoKpTVHUKsBkYqKrjVPU8YAiw\nMxoVNCZWundN487PjOC/7zifkQO7h3XOyi2HOV5a1XJBE3Hh9pYOUdX6JVBVtRAYGJkqGRNfzunZ\nhftnj6VbVsuDdGt9frbvOxGFWpmWhPv084iIvAAsAXzAxYCNnjWdSrgZtxonmDaxEW7L7SZgPs6z\nthHAMuDzkaqUMfFoQBgzF1pTzkRWWMFNVSuA5cD8wPpqL9gqHqazuXR83xbLjByQQ8/uGVGojWlJ\nWMFNRL6Fsx7bQ4FdPxSRH0SsVsbEoXP753DZhNABrmtmCrd+OjZZsMyZwr0tvRlnRd26UYvfxZn0\nbkyncvNlQ7nt00JudlqD/ROG5fGD2ybQM8dabfEi3OB2SlXrRzMGXvuaKW9MQnK5XEwfV8APvjix\nwf4vXjWc3Ozo5k01zQu3t3SHiPw3zmq61wOzgU8iVy1j4ptl1Ip/4bbc7gHKcJYfuhVYEdjXqc3R\nV7ln/gPM0VdjXRVjTCPhttx+BPxZVX8Zycp0JJXeKhYXLQdgcdFyrh18FWkeSzpiTLwIN7iVAn8X\nkRrgL8DfVPVQ5KoV/7x+L3U5x/348fq9OFNwjTHxINxxbj8JzCm9FcgG3hSRtyJaM2OMOQutXXyq\nAufZWzlg2W2N6YBunHN3AU4CplJg/Yuzn07IHIdhBTcR+T5OPoQU4G/Abaq6O4zzmsw4Hzi2GygE\n6n6wt6hqUXPnGGPa7sY5d48EHgWu5HQ2uj03zrn7UeC3L85+uk2TYkUkGWfe+RZV/WJ71FVEBgD/\nUNWJLZUNJdyWWw5wh6quD/fCzWWcD3JV8DSuMM8xxrTSjXPuHgt8ADROjN4feApnGbNvtfHyvYHU\n9gps7aXZ4CYid6jqs0AVMEtEZgUfV9X/aub0BhnnRSRHRLoG5Shtr3OMMc24cc7dLuAPnBnYgn3z\nxjl3v/Ti7KeXteEtHgcGi8izQBecxpAHuFdV14vIDuB/ce7+tgOrcBbe2Kaqt4jIGOA3QA3O5IAG\ni3KIyBTgp4HjhcCXw0kc1VKHQt0sBC/O7WPjr+aEk3H+d4HM8o+IiCvMc0ycsHF+HcaEwFdL7mrj\n9e8HFGcB27dV9VPA3cBjgeNJwGqcVJ6Tgd2qegEwRUS6Afk4gfBSYClwS6PrPwFcq6ozgEOEuSJR\nsy03Vf1T4GU68Lyqns2shMZDuv8LeBtnvuqrwA1hnHOGWGWcT6tqWLXcHll0Sc2Kej1ipbKm8vQ4\nv/3L+dKFnyctOa2FsxJHB8s4PzbMcuPO8n0uBvJE5NbAdvBE249U1S8ih4A1gX2HcUZfHAJ+LiIZ\nOMnb/1p3koj0BIYC/xQRcDoyj4RTmXCfuZ2i9ePcms04r6rP170ODCsZ3dI5TYlVxvnSmrIG20dK\nSqlM7jyLFJbWlJ0e5+f3c6D4OFnJnacDvYNlnA83+erZ5giuxmmBLW/imDfEaxfwa+Dnqvp2ICl7\ncCuhGihS1emtrUwkx7mFzDgvItki8o6I1P2pmwZsbO4cY+KJJ8lVf1vhcjnbcWwhhJXj5v2zfJ8V\nwHUAIjJCRL4d5nm5OPPXU4GrcUZlAKCqx+quF/h+r4icF85FW5txPuxxbs1lnFfVE8BbwIcishTn\n2do/mjqnlfUzJirSUjxcOr4AgEvHFZCWEt18pa3x4uyn9wD/bKFYNfD0Wb7Vk8CQQNa8PwDhpv58\nEufR1EuB11/EaUTVuRN4NnDdS3Ce77UorKTMTYxzeyGccW7REIvUfluPbWf+3sVsKNlcv++b4+5i\naM6gaFclZkpryviPxQ/Vb/98yn93qtvSeBFuar8b59zdA6dlNqaJw17glhdnP/1ie9Yt1lozzu3f\nVXVdJCsT7/x+Py9te40P9i0949j/rPkdN8v1XFJwUQxqZkzzXpz9dMmNc+6+BKcX806ch/SlwFzg\n8RdnP72mufM7onCD2/mq+kBEa9IBLC76sMnAVufv+gq9MnsypJtlPTTx58XZT5fizFB4NNZ1iYZw\ng9taEfkRTtar+h4VVZ0fkVrFIZ/fx/t7P2i2jB8/8wsXW3AzJg6EG9zqxslMCdrnx0n31ykcKi/m\nSOXRFsttKP4En9+H29XavhpjTHsKK7gFRg53atW14Q0B8uHjhS0vMy7/PCRnCEnu6A8wNsaEvyrI\nYpoYJ6OqU9u9RnGqR3p33C43Pn/LeXGWHfiYZQc+Jt2Tznm5IxibN4pzuw8jOSn6gzyNaWzm/XMb\nLHn0+mPXRn3JIxF5Dmf41xuReo9wb0uDc5SmADNwfjCdRlZyJuPyRrPqcPgdxhXeClYcXMWKg6tI\nTUphVI9zGZs/mpE9hpOaFLdTdUyCmnn/3CaXPJp5/9xHgd++/ti1CTXFJtzb0sZP0t/tjCvxfnbw\nVWw9toNTNU3H9d6ZPbmkz0VsKtmCHttOrf/0H8Sq2mpWHV7HqsPrSHYnM6KHMDZvFKNzzyXdYynh\nOqI5+iqLipYxteBiZst1sa5Os2bePzdiSx6JyO04s4xygZHAf+LkOh6BMwl+NnABkAb8TlX/EHRu\nEvB7nJZkMvBf7dVRGe5taePRqecAnS61dm56d7494W5e0FfYemx7g2Nj80Zxy/BZZCRnML3fZMpr\nKthw5BPWFm/kk6OK13d6Ol2Nr4Z1xRtZV7wRjysJ6T6UsXmjOS9vhA2E7SA6UoKgmffPDWvJo5n3\nz33p9ceubcuSR+CMm5sCfAn4Ps4k/NuBO4BPVPXbIpIO7AjUpc4XgAOqeqeI5OJ0UoY1vaol4d6W\n1s058we+TgIPtkcFOpr8jDy+Me4r7Dq+h1+u/k39/puH30BG8ulFEDKS07mw9wQu7D2BSm8Vm0q2\nsLZ4AxtLtjTonPD6a9lUsoVNJVt4Qd0M7TaIcfmjOS93FNmpYU2KNjHQwRIEtWbJo7YGt5WBVT8O\nAOtVtTawAkgq0D0wpbIayGt03sU4Sx9dEthOF5GUcNZra0lLi1V2Be5U1YGB7btwRjjvwJnk3mnl\nZeaGXTbNk8qEnmOY0HMM1bU1bD66lTWHN7Cx5BMqvJX15Xx+H3psO3psO3P0VQZlD2Bc/mjG5o0i\nJ61bJD6G6RyiseRRqFU/BgCDgWmqWiMijZ/pVAM/UdUXzuK9m9RSy+0ZYDeAiAzDWQ3z8ziV/TVw\nU3tXKNGlJCUzJm8kY/JG4vV50WPbWXt4A+uObKKs5vTyTX787Dixix0ndvGPba/Rv2s/xuWNZmze\naPIyesTwE5gOKFpLHjVlIvBaILB9FkgKWg0InJVErgVeEJF84Juq+v/a441bCm6DVPXmwOtZwEuq\n+j7wvoh8oT0q0Jl53B5G9hjOyB7Ducl3PduP72Jt8QbWFm/kZHXDlZ72nCxkz8lCXt3xFgVZvZ1A\nlz+a3pk9Y1R704EsxHmc1NIk+7Nd8qgp7wFDReQDnJU/3qDh6iMvAjMCt61JtOPjrpaCW3ATcjrw\nf0HbLQ/4MmFLcich3Ycg3Yfw+WHXsuvEXtYWb2DN4Q0cqzreoGxR6QGKSg/wxq559MzIZ1zeKMbm\nj6ZvVh9crrheV8zEwOuPXbtn5v1z/0nTq13XafOSR6r6XNDrN3ACWIPXQR5v4hJfasv7tqSl4OYJ\nNBW74GShmg0gIllY3tKIcbvcDO42gMHdBnD9kGvYe2ofaw5vYG3xBoorShqUPVR+mLf3zOftPfPJ\nTevOmPxRjMs7j/5d+9oUMBPsqzjDPUItefRvrz927a7oVimyWgpujwCf4KyF/qCqHgt05y7ByWZj\nIszlctG/az/6d+3HtYOvYn/ZwfpAd6Cs4UrvRyqP8v7eRby/dxHdUrMZmzeKsXmjGdxtgAW6Tu71\nx64tmXn/3JBLHr3+2LWda8kjVf2XiPQG0uvS66lqhYg8oKot9pa2kJT5UuBnOFm0FKdpOhVnNc5N\ngWIbVPXe1n+sxORyuSjI6k1BVm+uGXQFh8oOs6Z4I2uLN1B4qqhB2eNVJ1i4bykL9y2lS3IWY/JG\nMjZ/NMO6Dbb5ru2goqaiwXY4i77G2uuPXWtLHgVT1Roa9baEGdhaSrD8e+BSVd0nIi/hTAkpBz5Q\n1VlnXtE01jMznyszZ3DlgBkcqTjqdEYc3siuk3salDtVU8qS/StYsn8FmZ4MRueOYGz+KIZ3H0ay\nO36Xx45HNT4vr2x/k6VFKxrs//Wa3/OF4dczKHtAbCpmzhDJ3+yWEixPCHpdDPTACW6mDXLTu3PZ\nOdO47JxpHK86wdrijaw9vIHtx3fVDzYFKPOW8+HBlXx4cCVpSamMyj2XsXmjGdlDSLH5rs3y+X38\nYcPzbCzZcsaxA2UHeWLN77lv3FcZlN0/BrUzjUUyuPXCySxdpy7Bct3t7UmAwG3vFcAPcdL7jRCR\n14DuwEOq+m4E69hmHpcHFy78+HHhwuOKnxZQt9RspvedzPS+kzlVXcq64o2sLd6IHtveYFWTytoq\nVh5ay8pDa0l2JzOyhzA2bzSjcs8l3dN5cpCGa23xxiYDW50an5cXt77Kf0y8z3qt40A0/0ee8a8d\n6Il9HfiaqpaIyDbgIZyxL4OABSIypLmpGLFKygxduGLIVN7Z/gFXDJlKv97hz1iIpjy6MKigN5/j\nckqry1hZtJ4V+9aw/uBmahrNd10bCIIet4fzep3LRX3HMbHPeWSlnu4Y9/v9LN27kje2NPybs+7E\nOq6Ry/B0sOd5tb5aKryVVNZUUV5TQaW3igpvJRU1gS9vpbOvppIlez9u8XqFp4oo9RxnUPdzolB7\n05ywsl+1hYg8iDMh9pnA9k5gTFDu0q7AAuA/VfXtENf4CJitqiG7qGOR/SoRVHor2ViyhbWHN7Cp\nZAvVvqYHsbtdboZ1G8zY/NGclzuCf+1+v37CeGMjeghfHf1FPBF8juf3+6nxeamqraLSW0VlrRN8\nqmqrqKytospbRUVtJVXewHZ9uaoG5Sq9lVTVVjUI8O3ljhE3M7HX2SZvD0+42a86o0i23ObhtMKe\nCZFg+THg8eDAJiK3AL1V9Zci0gvoCTTsBjTtIs2TxsSeY5nYcyzVtdV8cnQraw9vYMORzVTWNpzv\nuuXYNrYc28bftfnUl5+UKO/uWchVAy9rsN/n91FVW910QPI2CkCBANU4aFUGBatwFgyNpWR7dhkX\nItZyAxCRR3CGd/hwEiyPA04A7wDHgOAmwN+AFwLfu+EsivmQqja7bpy13NpXjc+LHt3G2uKNrC/e\nRJm3dX08Sa4k+mT2ospX14qqCnuJ9lhz4SLNk0pqUippSamkedJIS0ol1eNsF5UeYF/p/mavkeJO\n5qeX/CBqa/RZyy20iAa3aLDgFjm1vlq2Hd/JmuINrDu8MeQinbHkcXucAJSUSlogCNUFo7oAVXfs\ndNBqWC41KY10TyrJ7uRmOwJKKo7y8IrHQt7CA8zoN4Ubhs6MxEdtkgW30Cy4mbBUeav49qIftsu1\nUpNSGgae4IBU11oKDkie00Gpcasq2gOSN5Uof9jwfJMBbnTuCL406taIPnNszIJbaBbcTNh+suJX\n7C872GyZZLeHG4d9jszkjIYBKRCgUpJSOvxUsJKKY7y/dxEfFJ1O0P1v597IBb3GR/2zWXALLX4G\nZ5m4N6VgEnO2vtJimYv7nB+lGsVGj/Qcrh50WYPgNir33A4ftBON/WuYsE3ucwGjepwb8ni/rD5c\nPfDyKNbImNAsuJmwJbmT+Mro25g56Eq6JGc1ODalz0V8Y/xdNrPBxA0LbqZVktxJXDlgBv/vgm82\n2H/N4E9bYDNxxYKbaRN3B5tmZTofC27GmIRkwc0Yk5AsuBnTBnVLXgFxt+SVcVhwM6YN0jypTClw\nFpaeUjCJNE/cZpvvtOzPjTFtNFuuY7ZcF+tqmBCs5WaMSUgW3IwxCcmCmzEmIVlwM8YkpIh2KLSQ\nlPky4Kc4SZnfUtUft3SOMcaEK2Itt+CkzMCdwBONijwB3ABMBq4QkRFhnGOMMWGJ5G1pg6TMQE4g\n4xUiMgg4qqqFquoD3gqUD3mOMca0RiSDWy+cRMx16pIyN3XsMNC7hXOMMSZsMU3KHMaxFpdQjl1S\n5s6tS00yLlz48eNyueid1420ZFvyyMSPSAa3/TRsdfUBDoQ4VhDYV93MOU06dqx1qedM+5lSMIlF\nRcuY0mcSp47XcIrQWaFMZOTldYl1FeJWTJIyq+puEekqIgOAfcA1wC1AbqhzTPyx6UcmnsUkKbOq\nviIiU4GfB4q+rKq/bOocVV3X3HtY9ivTmVn2q9AstZ8xHZgFt9BshoIxJiFZcDPGJCQLbsaYhGTB\nzRiTkDp8h4IxxjTFWm7GmIRkwc0Yk5AsuBljEpIFN2NMQrLgZoxJSBbcjDEJyYKbMSYhWXAzxiQk\nC27GmIRkwc0Yk5CimUPBtJPACsYKLG906E1VfTTMaywEHlbV99pYhzafLyIPA15VfbCN730B8CiQ\nifM7vAP4jqruasv1TGKy4NZxFavq9FhXItpEZAQwB7hGVTcF9t0IvC0io1W1OqYVNHHDglsCEpFS\n4GFgJpAC/BT4MiDA3ao6L1B0pog8gJOg58eq+ncRGQ48A3iBrsAPVPUdEXkQGAj0B+5v9H7PArtU\n9Ucici9wI87v1hbga6paISI/wcmVUQiUAZsbXWME8NsmPs5NqnowaPs/gV/UBTYAVX1RRG4CbgX+\n2IoflUlg9swtMWUCK1V1Mk4gmamqVwM/Br4WVM6jqlcA1wK/FhE3TvaxH6rqp4D7gJ8ElR8IXKqq\nq+p2iMhDQGkgsF0AfA6YqqqTgOPAl0RkGE4CoAuA64ChjSusqp+o6vQmvg42KjoO+KiJz7wcmBjm\nz8d0AtZy67jyAs+9gj2gqnX/8ZcEvu8DlgW9zg4q/y6Aqm4XEYA8nFSKjwZaWik4GcnqfKiqwWtk\n3Q4MxwlaANOBIcCCwPUygRpgNLBKVasARGRR6z5qA5WE/qNceRbXNQnGglvH1dIzN2+I18EJRXyN\n9vuBp4AXVPWPIjIKeCOoTOPnWak4AXAG8B5QBbymql8PLiQisxq91xlZtFtxW7oRmAR83Kjc+cBb\nTZxvOikLbp3bp4DXAreNXqAY6AnUPc+ajRPAQnkGOATMFZELgaXAfSKSpaqlIvI1YA3O87XxIpKC\nE0CnAf8MvpCqfoLT8mvJr4A3RWShqq4HEJFrcW5X/y2M800nYcGt42rqtnSXqt7Rimt4RWQuzq3k\nfarqF5HHgOdFZDdOILk+sK/J5NiqukFEfgU8h9Nh8BtgoYhUAvuB51S1XEReBVYAe4C1rahj4/db\nKyK3AX8WET9OK/ATYHrdba8xYMuMmw5MRGYATwATLLCZxqy31HRYqjofeBNYFei1NaaetdyMMQnJ\nWm7GmIRkwc0Yk5AsuBljElKHHwpSXHzKHhqaTisvr4ur5VKdk7XcjDEJyYKbMSYhWXAzxiSkDv/M\nLU1iMA0AABeDSURBVNr8fj/rj2xi0b7lFJ4qwu1yMyxnMNP7TWZQ9oBYVy8qth/fxcJ9S9l+bCc+\nfJzTpS9TCyYxOncELpc9AjLxIaKDeAOrSswFHlfVpxoduwxnEcVa4C1V/XFg/+PARTgTrL+hqo1X\nf2ggmh0KPr+Pv2x+iRUHVzV5/HNDPsNl50yLVnVi4u3d83l959tNHru49/ncPPwG3C67IYgW61AI\nLWK/hSKSCTwJvB+iyBPADcBk4AoRGSEi04ChgYUO7wyUiRvzCxeHDGwAr2x/ky1Ht0WxRtG1qWRL\nyMAGsOzAxyza1zitgzGxEcnb0irgauA/Gh8QkUHAUVUtDGy/hbP8Th7wKoCqbhaRHBHpqqonI1jP\nsNT6allYuLTFcv/Y9hqX9r0kCjWKvvmFi8MqM7XvJGu9mZiLWHBTVS/OkjpNHe6Fs3ZYncPAYJxV\nX4ObRsWBsjEPbofKizlWdbzFcgfKDvE3fTkKNYpPJZVHKS4/Qs/M/FhXxXRy8dKhEOq5QYvPE3Jy\nMvB4zljYtd2dTDoa8fdIFCdcxxiVNzjW1TCdXKyC236cFlmdgsC+6kb7++Cs6R/SsWPl7V65pnhq\n0vG4PXh93mbLJbuTOadL36jUKdr2nCzE62/+8wP8+sP/443N85nebzJjckeS5I78H5/OKi+vS6yr\nELdiEtxUdbeIdA0kF96Hs4LrLTi3pQ8Bz4jIeGC/qja5Amy0ZSSnMyF/TLMdCgCfHfRpZpwzNUq1\niq55exYwd8e/wiq748QudpzYRU5qN6b1vZiL+1xAZnJGhGtozGkRGwoiIhOAx4ABOBmQioDXcJbC\nfkVEpgI/DxR/WVV/GTjvEWAqTkKRe1R1XXPvE82hIMerTvDLlb8J+extUHZ/7hv7FZKTkqNVpaiq\nrq3mf9Y8w56ThU0ez/Rk0DW1CwfKDp1xLNmdzAW9xjO972T6ZPVq4mzTFjYUJLQOv1hltCfOH6s8\nzj+2vca64k34cd46xZ3MRb0ncu3gq0nzNJdPpeOr9FbyyvY3WXFwNTW+GgDcLjdj8kYxa+hMslO6\nsv34ThbuW9rgZxRMcoZwab9LGNljuPWqniULbqFZcGuj41UnKCo9gNvlpn+XfmQkp8eiGjFTXlPO\nnpP78OGjb1YfslO7nlGmpOIoHxQtY9n+j6nwVpxxPDetO9P6TWZS74mkezrXz6+9WHALzYKbibiq\n2mo+OriKhYVLOVh++IzjqUkpXNR7ItP6TqZnRl4MathxWXALzYKbiRq/38+WY9tYWLiEjSVbmiwz\nssdwpvedzLndh9k81TBYcAvNgpuJicPlxXywbxkfHlhJZe2ZWfl6ZuQzve9kLug1PuGfY54NC26h\nWXAzMfX/27vz6CrrO4/j75sAWchCWBLSsAt8lYoFESsgm7tTp1Vry6jUY5fpTLU9Pe3Y6Tqt2tN9\nr50zdebUaavVoVprN1q1si9KRcX9SwFREkD2QCB77vxx740huRtJ7sLN53VOTu7z3OdJvvcYv/x+\nz/N7vt/Gtiae3PM0q2vXs7/xYI/3iwYVMrf6fBaOmcuIouEZiDC7KbnFpuQmWaEj2MFLB19l1a71\nvHq4Z/GBAAHOGTmNRWMvZMqwSZqyhim5xabkJllnz/E3WVW7nk17NtMSXm7SVU1JNYvGzOO8qpkM\nydE1hclScotNyU2y1onWE6zfvYnVtRuiLpweOriYeW97Jwtq5lBROCzt8S3zR1hTt4EFNXNZYlen\n/feDkls8Sm6S9do72nnhwMusrF3HtiOv9Xg/L5DHzFHTWTT2QiaWjUvLlLWprZnb1nyZIEECBPju\ngjszcuNDyS22bKkKIhJTfl4+MyqnM6NyOruO7WZV7TqefvO5ziIGHcEONu/bwuZ9WxhXOobFYy9k\nZuU5DM5L3Z93W7Ct8+mLIMFwQQHd1c0mGrnJaelYSwPrdz/FmtqN1Lf0LPdXOqSE+TVzmF9zAWVD\n+r9yRkPrcT679o7O7W/N/wolg4f2++9JRCO32DRyk9NS6ZASrphwMZeOW8Sz+55nVe16Xjv6Ruf7\nx1oaWP7a4zy2cwXnVr2DxWMuZFxZbpaikuiU3OS0lp+Xz3mjZ3Le6JnsPPoGq3at55l9z9MebAeg\nLdjOpr3PsGnvM0wqn8CiMfOYMeps1ZgbAJTcJGdMKBvHzW8fxzWT38Xauo2srXuShtbjne/vqN/J\njvqdDCsoZ2HNXObWnJ+RqaSkh5Kb5JzygjKumnQ5l4+/iM37trBq1zp2NezufP9Icz2/2/Fnlu98\nnNlV57Jo7DxqSqozGLGkgpKb5KzB+aE6e+8cPYvt9TtZtWsdWw68REewA4DWjjY27NnEhj2bmDrs\nDBaNvZDpI89SjbkcoeQmOS8QCDB52EQmD5vIoabDrKndyPrdT3GiS425rUe2s/XIdkYUDmfhmLnM\nqZ494Gr05ZpUd5yP2j3ezGqAX3U5dBLwOUJNYh4EXgrvf8HdPxHvd2gpiPRGS3sLf9v7LCtr10Ut\niz4kfwgXjJ7FojHzorYpPNZyjM+t+2rntpaCZJ9U9lBYCHzG3a8ys7OAe8Kd5LsfNwhYBVwBnAd8\n3N2vS/b3KLlJXwSDQfzwNlbVrufFA69ELYs+bbixaGy4xhwB1tY9yeOvr+RQl0fCZow6m2snX5X2\nyiVKbrGlclp6Mcl1j7+ZUIOYhhgNnEVSJhAIcObwKZw5fAr7Txxkdd16Nu5+mqb2ps5jXj7kvHzI\nqSoeRemQUrYd2dHj5zy3/0W21+/k0+feQmXxyHR+BIkhlVdOu3eVj3SP7+4jwM+6bE8zs9+b2Toz\nuzSF8YmcZFTxCK6b8m6+Nu8LvG/qe3okqTdP7I+a2CKOtTSwzH+b6jAlSem8odBj+Gxmc4BXu4zm\n/k6ob+mvCV2HW2lmk929JdYPTVfHeRlIShlbfQXvnXEZW/a+zPKtK9my9+Wkznz18N9pK2ykurTn\ndTpJr1Qmt+5d5aN1j78K+Gtkw93rgGXhze1mtpdQN/qepSDC0tVxXgamMYPG89FpN7N3/D6+/fRd\nNEcpid7dc687g6rSc6dVHedjS+W09DHgOoA43eNnA51Nl83sRjO7Lfx6NFBFqJmzSEaNHlpJeZIP\n4OcHNJPIBilLbu6+AdhsZhuAHwO3mtnNZnZNl8Oqga693n4PLDSztcDvgI/Fm5KKpNPkYZMSHpMX\nyGNS+fg0RCOJqOSRSJLqGvbwjU0/jLpcJOK8qhl88O03pC0mLQWJLe41NzNbEO99d1/Tv+GIZK+a\nkmreP/Vqlm2Nfke0pqSaJVMzU25cekp0Q+Fr4e8FwHTgVSAfMOApIG7yE8k1C8bMoXpoFY++voJX\nDm3t3H/5+Iu4bPxi9VjNInGvubn7fHefD7wCTHT3me5+DjAZiL3gRySHTamYxM1vv/6kfReNm6/E\nlmWSvaEw2d33RjbcfRcwMTUhiYj0XbLr3A6Y2QPAOqADmAtogZmIZK1kk9s/AUsJXXcLABuAe1MV\nlIhIXyU1LXX3RmAjsCJcgugBd29IaWQiIn2QVHIzs08B9xB67hPgP8zsSymLSkSkj5K9oXA9oaKT\nh8LbnyH0XKiISFZKNrkdc/eOyEb4dUec40VEMirZGwrbzewrQIWZXQssAZKrASMikgHJjtxuBY4T\nqtCxlNDTCbemKigRkb5KduR2J3Cvu383lcGIiPSXZJNbA/B/ZtYK3Afc7+49WwaJiGSJZNe5fS38\nTOlSoBz4k5ktT2lkIiJ9cKrFKhsJXXs7AaS/SaNIlhgUGEQg3BYkQIBBAfU3zzZJ/Rcxs88TKhk+\nBLgfuMnddyZxXtSmzOH3dgK7gPbwrhvdvS7eOSLZonBQAfNr5rCmbgPza+aoIkgWSvafmwrgg+7+\nfLI/ONyUeYq7z4k0ZQa6N2W+sutjXEmeI5IVltjVLDEVp8xWcaelZvbB8Mtm4Dozu7PrV4KffVJT\nZkJr5MpScI6ISA+JRm6RpxDaevGzRwObu2xHmjJ37Tj/UzObQKiU0ueTPEdEJKG4yc3dfxF+WQT8\n0t378lRC90YWXwb+Quh51UeA9yZxTg9qyiwi0SR7ze0Yp77OLW5TZnf/ZeR1eFnJ9ETnRKOmzDKQ\nqSlzbKlc5xazKbOZlZvZo2Y2JHzsQuDFeOeIiJyKU12ck/Q6N3ffYGaRpswdhJsyA/Xu/ttwcnzS\nzBqBZ4GH3D3Y/ZxTjE9EBEiyKXOUdW4PJLPOLR3UlFkGMjVlju1U1rl9yN23pDIYEZH+kuzjV7OV\n2ETkdJLsyO258KLdDUBLZKe7r0hJVCIifZRscpsR/j6/y74goOQmIlkpqRsK2Uw3FGQg0w2F2JKt\nCrKW0EjtJO6+oN8jEhHpB8lOS7v2KB0CXESoOq+ISFZKKrm5++puux5XJV4RyWbJTksndds1DrD+\nD0dEpH8kOy19Ivw9GP46CtyeioBERPpD3OQWLhT5YXefGN7+V+BjwHZCD7mLiGSlRE8o3A1UApjZ\nVODrwKcJJbYfpTY0EZHeSzQtneTu14dfXwc86O5PAE+Y2Q2pDU1EpPcSjdy6LvdYxMlPJHQgIpKl\nEo3cBplZJVBKqAvVEgAzK0F9S0UkiyVKbt8EXgaKgdvd/bCZFRFq6PI/qQ5ORKS3Ej5bamaDgSJ3\nP9pl32XunvBuaYKmzIuBbxBqyuzAR4AFwIPAS+HDXnD3T8T7HXq2VAYyPVsaW8J1bu7eCrR225dM\nYkvUYPm/gcXuXmtmDwJXECpfvtrdrzuFzyAi0kOyxSp7I1GD5VnuXht+vR8YkcJYRGSASWVyG00o\naUVEGiwDEJnmmlk1cBkQeVZ1mpn93szWmdmlKYxPRHLYqXa/6ose1wbCd2L/ANzi7gfN7O/AHcCv\ngUnASjOb7O4t3c+NUFNmEYkmlcktboPl8BT1z8AXI9fw3L0OWBY+ZLuZ7QVqgNdi/RI1ZZaBTE2Z\nY0vltDRRg+XvAT9w979EdpjZjWZ2W/j1aKAKqEthjCKSo1JaZtzMvkloeUekwfJMoB54FDgMbOxy\n+P3AA+HvwwgVxbzD3ePWjdNSEBnItBQkNvVQEDmNKbnFlsppqYhIxii5iUhOUnIT6aX7HnM+9M0V\n3PeYZzoUiULJTaQXmlraWPlM6Eb+ymfraGppy3BE0p2Sm0gvtLUHOxv5BoOhbckuSm4ikpOU3EQk\nJym5iUhOUnITOUUv7DjIXb95/qR9f9zwGieaWmOcIZmgJxRETsGjm95g2YptUd+rHlHMZ288l7Li\nIWmLR08oxKaRm0iSdu49GjOxAew5eIL7HtWat2yh5CaSpCeerk14zOat+zl8rDkN0Ugi6SxWKZLV\nOjqCHGlo5kB9EwePNnGwvumk13sPJa4dGAzCtrp6Zp9ZmYaIJR4lNxkw2to7OBQlaR08Gto+fKyZ\n9o6+X8I93a9j5wolN8kZzS3tnYnq5MTVyMH6JuobWkhH2plQXZb4IEk5JbdeamhsZfeB4+TlBRhb\nWULB4IHVx6G5tZ1d+xro6AhSM2ooQwsHp/T3BYNBTjS3hRJWfRMHIsmry+uGxv5ZilFUkM+IskJG\nlBUysryIEeWFjCgv5HhjK79McMNg+qQRVA4r6pc4pG+U3E5R/fEWHlq5jade2UdbewcAxQWDWDjj\nbVw9fyKDc7xZTXNrO4+s3cGaLXtobA49LD4oP48LplVx3eIzer0MIhgMcvREa4/RVtdpY1NLe798\nhpKiwYwMJ6wRZaHvIyPfywspjpOo99c38ucn34j6XkVpAR+4fGq/xCh9l+oy4/E6zl8CfJ1Qx/nl\n7v7VROdEk851bkePt/D1ezez70hj1PfPGl/Bp97/Dgbl5+ZN6Na2Dr6/7Dl815Go71cNL+YLS8+l\nNEqC6+gIcvhY81sX6jtHXo0cONrMoaNNtLZ19DnGADCstKAzaY3okrQio7GCIb3/BygYDLLhxb38\naePrJ91guGBaFe9bPJmK0oI+f4ZToXVusaVs5JZEx/kfA5cTagCz2sx+A4xKcE5GPbxme8zEBvDK\n64dZ/dxuLp41Jo1Rpc/KZ2pjJjaANw+d4J7lr3DulFE9rnv118X6/LwAFaUFbyWr8pNHXsPLClP6\nj0sgEGDe9GrOOWMEn/zxus79N1w6lZKi1E7N5dSkclp6Usd5M6swszJ3P2pmk4BD7r4LwMyWh48f\nFeucFMaZlMbmNp586c2Exz20ajsv7DiYhojSKxgMxk1sEVu2HWTLtt5//sGD8nqOtsoj178KGVZS\nQF5e5gcrgUDmY5D4UpncRgObu2xHOs4fpWc3+n3AGcDIOOdk1JuHT9CSxLSpubWd57fnXnLrL5GL\n9SPLi04aeUWSV2nxYCUO6RcZ7TifxHsJ/8rT1XG+obXv14MGinGjSxlTWUJlRTGjKoqoqiimcngx\noyqKc2bqVtLcRiAQWrSbF4DRVWUUFej+XDbJVMf57u/VhPe1xDknqnR1nC/Mg/KSIdQ3tMQ9rqqi\niEUza9ISU7qteKaW/Uea4h4zvKyAL990XtSpY2NDE40N8c8/nSyeWcOKZ+pYNLOGhqONNGQgBnWc\njy2Vye0x4A7g7u4d5919p5mVmdkEoBa4CriR0LQ06jmZNig/j4vPHcPDa3bEPe76S6Zwzhkj0xRV\nelUOK+Kuh1+Ie8wls8ZmxTWxdFh6mbH0Mst0GBJDym4rufsGYLOZbSB0Z/RWM7vZzK4JH/IxQh3m\n1wLL3H1rtHNSFV9vXHnBOGbZqJjvv3vehJxNbAAzp47iXXPGx3z//LMquWz22DRGJBKb6rmdoo6O\nIE++vJeVz9bx+t5j5OUFOHNcBZecN4azJ45IZygZ88KOg/z16Vr8jcN0BINMqC7jopk1nD+tijzd\nDEgrrXOLTclN5DSm5BZbbi6lF5EBT8lNRHKSkpuI5KTT/pqbiEg0GrmJSE5SchORnKTkJiI5SclN\nRHKSkpuI5CQlNxHJSSpA1QdmdjbwO+AH7v6TTMeTbmb2bWA+ob+jb7j7wxkOKW3MrBj4OVAFFAJf\ndfc/ZjQoOYlGbr1kZkOBu4AnMh1LJpjZYuBsd58DXAH8MMMhpds/Ak+7+0Lg/cD3MxyPdKORW+81\nA/8AfDbTgWTIGmBT+PURYKiZ5bt7//Tfy3LuvqzL5lhCdQkliyi59ZK7twFtZgOzWGE4iR0Pb36Y\nUHvGAZHYugrXHhxDqOCqZBFNS6VPzOw9hJLbxzMdSya4+1zg3cB9ZqbyQ1lEyU16zcwuB74IXOnu\n9ZmOJ53MbJaZjQVw9+cIzYJil2mWtFNyk14xs3LgO8BV7n4o0/FkwALg3wDMrAooAQ5kNCI5iaqC\n9JKZzQK+B0wAWoE64NqB8j+6mX0UuB3Y2mX3Te7+RmYiSi8zKwJ+RuhmQhFwh7v/IbNRSVdKbiKS\nkzQtFZGcpOQmIjlJyU1EcpKSm4jkJCU3EclJevxKMLMJgAMbw7sGA68Dt7j7kSjH3wxc4u5L0xWj\nyKlScpOI/e6+KLJhZt8BvgTclrGIRPpAyU1iWQP8i5m9k1A5oxbgEHBT14PM7Brg34EmQn9PH3D3\nnWb2SWApcCL8tRQoAH4FBAgtfL3b3e9Jz8eRgUbX3KQHM8sHrgXWAvcB/xyuW7YaeFe3w4cBS9x9\nMbCctx6gv5PQo1kLCSXHtwFLgFfDI8SFQHGKP4oMYBq5ScQoM1sVfp1HKLH9L3Cbu78I4O4/hM5r\nbhFvAr8wszxgNG9dt/sZ8Bczewh40N23mlkrcIuZ/Rz4E3B3Sj+RDGgauUnEfndfFP5a4O5fBNqJ\n8zdiZoOBZcBHwyO0uyLvufungasJTWUfMbMr3f1VYBqh0eAlwKqUfRoZ8JTcJCZ3PwgcMLPZAGZ2\nm5nd0uWQUqAD2GlmhcB7gAIzqzCz24Fd7v5fwH8C55vZDcBsd/8rcAswzsw0e5CU0B+WJPIB4Efh\nKeWR8Pa1AO5+yMzuB/5GaOnId4B7CY3KSoG/mdlhQlVTPgxUAj81s2ZCNxW+Fa5oLNLvVBVERHKS\npqUikpOU3EQkJym5iUhOUnITkZyk5CYiOUnJTURykpKbiOQkJTcRyUn/D1gGgz3J9TyTAAAAAElF\nTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7ff74c5355f8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# grid = sns.FacetGrid(train_df, col='Embarked')\n",
"grid = sns.FacetGrid(train_df, row='Embarked', size=2.2, aspect=1.6)\n",
"grid.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')\n",
"grid.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "6b3f73f4-4600-c1ce-34e0-bd7d9eeb074a"
},
"source": [
"### Correlating categorical and numerical features\n",
"\n",
"We may also want to correlate categorical features (with non-numeric values) and numeric features. We can consider correlating Embarked (Categorical non-numeric), Sex (Categorical non-numeric), Fare (Numeric continuous), with Survived (Categorical numeric).\n",
"\n",
"**Observations.**\n",
"\n",
"- Higher fare paying passengers had better survival. Confirms our assumption for creating (#4) fare ranges.\n",
"- Port of embarkation correlates with survival rates. Confirms correlating (#1) and completing (#2).\n",
"\n",
"**Decisions.**\n",
"\n",
"- Consider banding Fare feature."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"_cell_guid": "a21f66ac-c30d-f429-cc64-1da5460d16a9"
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ff74c3df780>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAHUCAYAAABMP5BeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm4XHWd5/H3NYGYhGhf4CJLK0ujXwZtnXFaZIuETUBB\nmg4KPYhBEFAU7aFpenpEDWERZVhcogMNsonY6DgSRFYB2RFsARX4SuiwaCJcJEpCYujgnT/OudNF\nceveusupm9zzfj1PnlSd9VtV+Z186nd+dU5XX18fkiSpXl413gVIkqTOMwBIklRDBgBJkmrIACBJ\nUg0ZACRJqiEDgCRJNTR5vAuYCCJiCyCBu5pmXZ2ZZ7S5jVuAUzLzxhHWMOL1I+IUYHVmzh3Buq8C\nPg/MBF4EXgNcmJlfGWDZxzNziwGmHwp8FPh3YAbwE+DvMnPVcOtp2u7/AH6emVePcju/BnbOzMdH\nsO6+wGco3pslwJzMXDmaejQytlPb6SDrrgt8EfgUsE5mrh5NLWsLA8DY6c3MWeNdxDj4WyCAnTKz\nLyL+DLghIn6QmYuGWjki/hw4DfhPmbk8IrqAS4G/Bv5lNIVl5umjWX+0IuLVwHnADpn5RER8Gfjv\nFK9X48N2ajsdyFnAv453EZ1mAOiAiFgOnALsB6xL0ZCOpGiQH8vM68tF94uIE4DNgJMz89sRsQ1w\nLrCaIrWfmJnXRcRcYEtgc+Dvm/Z3IbAoM+dFxLHAByg+60eAYzJzZUScCuwLPAW8ADzctI1tga8N\n8HIOzszfNjxfH5gGTKL4dvJ74B3DeHu6y/dkKrA8M/uADzbU0UeZyCPiMGCPzPxgRDxOceDZqlz0\n/2bmt8p1zgd+CrwTuB3Yq8X8bwP/G+gBXgucmZnfiojXAVeUr+mnQFdz0RFxHPC+psn3Z+bfNTzf\nHvhVZj5RPr8COB0DwBrJdjqoidxOAf5nZj4fERcM4z1Z6xkAOmM6cF9mnl52Ae6Xme8pG8oxQP+B\nZXJmvjsitgbuiIgrgI2Bz2TmrRGxA/AV4Lpy+S2BXcpED0BEnETRQOdFxHbAAcC7ymXOBj4SEdcB\nh1Ac2F6i6Mp72YElMx8CZrXx2i4BDgR+HRE3AjcC383M5e28MZn58/J1/ltE/Bi4CfhOZj7VxuqP\nZuY/RsRfAx8CvhUR6wDvBU6gOLAAXNZi/inAtZl5YURMBx6IiBsougHvLrf9duCTA9R9FsW3hsFs\nCjQehH9bTtOayXbawgRvp2Tm8228jgnHQYBjpycibmn6s13D/NvLv38N3Nnw+LUNy9wAkJkL+7dJ\ncd74+Ii4DTgH2LBh+bvLJN7vMIpG86ny+Sxga+Dm8oC2M/B64C+Bn2bmqvJc160je8mQmX/IzF2A\nPYD7KA5Yj0bE5sPYxrEUB7n/A7wd+GVE7NfGqv3v4w+Bd5YHhz2BezLzuYblWs3fFfhY+d5cTXFu\nc0uK9+f2srZ/Bf7Q7msZQhfgtbfHl+3UdqqSPQBjZ6hzi6tbPG7stvpT0/Q+4KvA5Zn5jYh4C/CD\nhmVebNrHFIpuut0oEv4qYEFmfqJxoYg4sGlfk5qLbbdrsUzqfZn5C+AXwDkRcRkwmzaSd3ku8dWZ\nuRi4ELgwIo6k6Hq9qmnxdZuevwiQmS9GxA8pDqrvpTg3+f8NMn8VRVfrfQPUNNT7007X4lO8/Bv/\nphT/mWj82E5tpwOdAqglA8CaZXdgQUS8ieLg0wu8DvhlOf8gioNHK+cCTwNXRsQ7gTuAT0bEeuXA\nnWOAn1F0I749ipGvfcAuwPcaNzTMrsVHgJPg/x9oXk/7A4OOAg6IiPdlZv+Bciug/9vV8+X2FlF8\nE3ipxXYuoxih/I5ym+3Mv53ivOt9ETEVOJOiG/EhYAfg6vJ9XK95Y212Ld4DbBkRf5GZj1GcM10w\nxDpa89lOCxOlndaWAWDs9JRdVI0WZeaHh7GN1RFxJUV34CfL84FnApeUg2nOAv6mnLZsoA2U5+rO\nAi6iGDw0H7glIv4ILAYuyswVEfF9iv+gngDuH0aNzT4OfDki7qEYpDQN+F5mtvsf3T9TDKa6oxyE\ntQ7Fge+4cv7pwPUR8SjwAMVBZiC3Urzm63PgnyUNNH8ucH5E3E5xwD6vHMT0JeCKiLiJ4qD+b22+\nlpcpv9EcQXFOczXwGMU3RY0f26nt9BUi4nsUAyUBfhQRv87MQ0a6vbVFl7cDVqdEi98XS1pz2E7r\nw0GAkiTVkD0AkiTVkD0AkiTVkAFAkqQaMgBIklRDa/TPAHt7lzlAQRonPT0zXnFt9VZsq9L4GE47\nbWYPgCRJNWQAkCSphgwAkiTVkAFAkqQaMgBIklRDlf0KICLWo7gDVTfFDRxOorh706UUt21cAhza\n4oYQkiSpQlX2ABwGZGbuChwIfAmYB8zPzJkUt5E8vML9S5KkFqoMAM8CG5SPu8vns/iP+6FfBexR\n4f4lSVILlQWAzPw28IaIWEhxj+fjgekNXf7PAJtUtX9JktRalWMAPgg8mZl7R8TbgAuaFhny6kXd\n3dOYPHlSJfVJGju2VWntU+WlgHcCrgPIzAciYlPghYiYmpkrgc2AxYNtYOnSFRWWJ2kwPT0z2l7W\ntiqNj+G002ZVjgFYCLwTICI2B5YDNwCzy/mzgWsr3L8kSWqhyh6Ac4FvRMSPy/18FHgYuCQijgae\nAC6ucP+SJKmFrr6+NfcmXt5hTBo/3g1QWvN5N0BJkjQsBgBJkmrIACBJUg0ZACRJqiEDgCRJNWQA\nkCSphgwAkiTVkAFAkqQaMgBIklRDBgBJkmrIACBJUg0ZACRJqiEDgCRJNWQAkCSphgwAkiTVkAFA\nkqQaMgBIklRDBgBJkmpocpUbj4hDgBOA1cBngQeBS4FJwBLg0MxcVWUNkiTplSrrAYiIDYDPATsD\n+wL7A/OA+Zk5E1gIHF7V/iVJUmtVngLYA7gxM5dl5pLMPAqYBSwo519VLiNJkjqsylMAWwDTImIB\n0A3MBaY3dPk/A2xS4f4lSVILVQaALmAD4ABgc+Dmclrj/EF1d09j8uRJ1VQnaczYVqW1T5UB4Gng\nzsxcDTwWEcuA1RExNTNXApsBiwfbwNKlKyosT9JgenpmtL2sbVUaH8Npp82qHANwPbBbRLyqHBC4\nHnAjMLucPxu4tsL9S5KkFioLAJn5G+C7wN3ANcCxFL8KmBMRtwHrAxdXtX9JktRaV19f33jX0FJv\n77I1tzhpguvpmTHkOJ1+tlVpfAynnTbzSoCSJNWQAUCSpBoyAEiSVEMGAEmSasgAIElSDRkAJEmq\nobYDQHlBn42rLEaSJHVGWwEgInYHHgNuKZ+fHRH7VliXJEmqULs9AKcC2wNLGp6fWElFkiSpcu0G\ngOWZ+XT/k8x8FnixmpIkSVLV2r0b4MqI2AXoiohu4GDgj9WVJUmSqtRuADgG+DrwDmAhcDtwVFVF\nSZKkarUbADbMTAf9SZI0QbQ7BuDMSquQJEkd1W4PwJMRcQtwNw2D/zLzs1UUJUmSqtVuAFhU/mnk\n/b8lDds1dzwy3iXUwj47bTPeJWgN11YAyMyTmqdFxBljX44kSeqEtgJAROwJnAZsUE6aAjwH/MMQ\n600FfgGcDPwIuBSYRHFBoUMzc9XIypYkSaPR7iDAU4BjgWeA/YALgOPaWO9EiqAAMA+Yn5kzKX5K\nePjwSpUkSWOl3QDwfGbeDbyYmb8sB/8NGgAiYhtgW+DqctIsYEH5+Cpgj+GXK0mSxkK7gwDXiYid\ngaURMQd4CNhyiHXOBD4BzCmfT2/o8n8G2GS4xUqSpLExaACIiLdm5oPA0cDGFDcBOgXYiGJMQKv1\nPgTclZmLImKgRbraKa67exqTJ09qZ1FJ42g4bXXatCkVVyOAnp4Z412C1nBD9QCcA+yWmQlkRNyU\nmbu1sd33AluVtwz+c2AVsDwipmbmSmAzYPFQG1m6dEUbu5JUheH8BzKctrpihWN/O6G3d9l4l6AO\nGE3QGyoANH9Tb+ube2Ye1P84IuYCjwM7ArOBb5Z/X9tukZIkaWwNNQiw+WI/o7n4z+eAORFxG7A+\ncPEotiVJkkah3UGAI5aZcxue7ln1/iRJ0tCGCgA7RsSTDc83Kp93AX2Z+YbqSpMkSVUZKgAMOIRf\nkiSt3QYNAJn5RKcKkSRJndPulQAlSdIEYgCQJKmGDACSJNWQAUCSpBoyAEiSVEOVXwhIkjRxXHPH\nI+NdwoS3z07bdGQ/9gBIklRDBgBJkmrIACBJUg0ZACRJqiEDgCRJNeSvALRGcGRxZ3RqdLGkNZ89\nAJIk1ZABQJKkGqr0FEBEfBGYWe7n88C9wKXAJGAJcGhmrqqyBkmS9EqV9QBExK7AWzJzB2Bv4Bxg\nHjA/M2cCC4HDq9q/JElqrcpTALcC7y8f/x6YDswCFpTTrgL2qHD/kiSphcpOAWTmS8AL5dMjgB8C\nezV0+T8DbFLV/iVJUmuV/wwwIvanCADvBh5tmNU11Lrd3dOYPHlSVaVpDTJt2pTxLqEWenpmVLLd\n4bRVP+vOqOqz9vOrXlWfXbOqBwHuBXwa2Dsz/xARyyNiamauBDYDFg+2/tKlK4a1P39LXr2qfke+\nYoVjQTuht3dZ28sO5yA0nLbqZ90Zw/msh8PPr3pVtdNmVQ4CfC1wBrBvZj5XTr4RmF0+ng1cW9X+\nJUlSa1X2ABwEbAhcERH90+YA50fE0cATwMUV7l+SJLVQ5SDA84DzBpi1Z1X7lCRJ7fFKgJIk1ZAB\nQJKkGjIASJJUQwYASZJqyAAgSVINGQAkSaohA4AkSTVkAJAkqYYMAJIk1ZABQJKkGjIASJJUQwYA\nSZJqyAAgSVINGQAkSaohA4AkSTVkAJAkqYYMAJIk1ZABQJKkGprc6R1GxNnA9kAf8KnMvLfTNUiS\nVHcd7QGIiF2AN2bmDsARwJc7uX9JklTo9CmA3YHvA2Tmw0B3RLymwzVIklR7nQ4AGwO9Dc97y2mS\nJKmDuvr6+jq2s4g4D7g6M68sn98OHJ6Zv+pYEZIkqeM9AIt5+Tf+TYElHa5BkqTa63QAuB44ECAi\n3g4szsxlHa5BkqTa6+gpAICIOB14F/An4OOZ+UBHC5AkSZ0PAJIkafx5JUBJkmrIACBJUg0ZACRJ\nqiEDgCRJNWQAkCSphgwAkiTVkAFAkqQaMgBIklRDBgBJkmrIACBJUg0ZACRJqiEDgCRJNTR5vAuY\nCCJiCyCBu5pmXZ2ZZ7S5jVuAUzLzxhHWMOL1I+IUYHVmzh3hvvcGPkPx72kS8DDw3zPz2abl5gKP\nZ+ZFTdN7gK8BrwP6gFcD/5SZN42knobtbgx8JTPfP8rtfATYOTMPG8G6U4GLgE2BKcDJmXnVaOrR\nyNhObadDrL8d8C/AZZl54mhqWVsYAMZOb2bOGu8iOi0i3gp8HdgnMx+JiC7gfwBXALu1uZnTgDsz\n8+xym/8V+GpE7JiZI75dZWb+FhjVQWUMfBL4XWYeFBGvB+6KiB9l5opxrquubKe201eIiL8ATgGu\nG886Os0A0AERsZziH9d+wLoUDelIIICPZeb15aL7RcQJwGYU3xS/HRHbAOcCq4HXACdm5nVlSt8S\n2Bz4+6b9XQgsysx5EXEs8AGKz/oR4JjMXBkRpwL7Ak8BL1B8G2jcxrYUab/ZwWWD7XcC8IXMfAQg\nM/si4gst1m1l/fK1UW7jp8AOZR1zgcn9iTwiHgf2AHYu6+8Gvgd8MjOjXOb1wN3Au4AfA3sB3xtg\n/huA2cCxQBfQC3wkM38XEccAx5Tvz+LmgiNi/XK/zf4uM+9veL4PMLd8XU9FxCPAjsCIvkGqOrbT\nIU3kdrqEoq3295DUQm1e6DibDtyXmaeXXYD7ZeZ7IuIwin+8/QeWyZn57ojYGrgjIq4ANgY+k5m3\nRsQOwFf4j5S6JbBL2ZgBiIiTgOXlQWU74ADgXeUyZwMfiYjrgEMoDmwvAT+h6cCSmQ8Bs9p4bW8G\nzm5a90/AH9p8bwBOBq6MiIOBHwE/BK4ttzOY/wy8OTNXRcSREfHWzHyQ4kB6OcVrIzN/GRErB5i/\nKfBpYLtyG58C/mdEzCtrelN5kLkSWNr0Gp+jvfdnU6DxQPzbcprWPLbTwU3YdtrfI9f/+dSFAWDs\n9JQHjUYnZOZPyse3l3//Griz4fFrG5a/ASAzF5b/EHsokukZ5TeBdYENG5a/u6nr7TBgG2C78vks\nYGvg5nJ704F/B/4S+GlmrgKIiFuH91Jf5iWK84kjlpn3R8RWFN8WdgXOoGjguwyx6r/2vwbgMuBA\n4EHgIOCopmUHmr8DsAlwXfn+TAEWUbxnj2fm78p1b6Y4iI2FLorzpxofttMRqlk7rQUDwNgZ6tzi\n6haPuxoe/6lpeh/wVeDyzPxGRLwF+EHDMi827WMKxcFnN4ou5lXAgsz8RONCEXFg075ecWAYRtfi\nz4GdKL6dNK6/fWbePcD6rxAR08oE/mPgx+VB9FHgbbzyP8t1Gx43vv7LgWvLbtVXlwerLYaYvznw\nk8zct6mev2Lo96fdrsWnKL7BPFI+35TiPxSND9vpy9e3ndaYAWDNsjuwICLeRHHw6aUYcfvLcv5B\nFAePVs4FnqbopnsncAfwyYhYLzOXl+fLfkbRjfj2iFiXouHuQlMjGUbX4heBGyLipsx8ACAijgf2\npjgHOKiImAQ8EhEfysxbyskbUhxAfg08T5nqI+LNwEYDbSczfx0RzwL/AHyzzfn3Av8cERtn5m8j\n4v0UB6tbga0i4s8oukh3B55p2l67pwB+APwtcFM50GhrXjkKXWsX22lhIrXTWjIAjJ2BuhYXZeaH\nh7GN1eV5rK0pBsv0RcSZwCXloJqzgL8ppy0baAOZ+fOIOIvip2f7AvOBWyLijxSDZC7KzBUR8X3g\nHuAJYMRJODMfjoi/AeZHxBSKrsufAX/d5vovRcT+FN2nJ1M07CnAkZn5TER8B/hwRNwG3Md/HGQH\nchnF692qnfmZubg8n/iDiFgBrADmZObS8tvNbRRdjY8D09p5PQP4GnB+RNxB8Q3l8Mz84wi3pdGz\nndpOXyEi3gccB2wBdEXEzsCpmXnDSLa3tujq6/N0pDojWvy+WNKaw3ZaH14JUJKkGrIHQJKkGrIH\nQJKkGjIASJJUQwYASZJqaI3+GWBv7zIHKEjjpKdnRtfQSxVsq9L4GE47bWYPgCRJNWQAkCSphgwA\nkiTVkAFAkqQaWqMHAUqaeK6545GhF9Ko7bPTNuNdgtZwlQWAiFgPuAToprhpxEnAQ8ClFDdFWQIc\n2nCfaEmS1CFVngI4DMjM3BU4EPgSMA+Yn5kzgYXA4RXuX5IktVBlAHgW2KB83F0+nwUsKKddRRv3\noZYkSWOvsgCQmd8G3hARC4FbgeOB6Q1d/s8Am1S1f0mS1FqVYwA+CDyZmXtHxNuAC5oWGfLqRd3d\n05g8eVIl9UkaO8Npq9OmTam4GgH09MwY7xK0hqvyVwA7AdcBZOYDEbEp8EJETM3MlcBmwOLBNrB0\n6YoKy5M0mOH8BzKctrpiheN+O6G3d9l4l6AOGE3Qq3IMwELgnQARsTmwHLgBmF3Onw1cW+H+JUlS\nC1X2AJwLfCMiflzu56PAw8AlEXE08ARwcYX7lyRJLVQWADJzOfCBAWbtWdU+JUlSe7wUsCRJNWQA\nkCSphgwAkiTVkAFAkqQaMgBIklRDBgBJkmqoyusASJImmGvueGS8S5jw9tlpm47sxx4ASZJqyAAg\nSVINGQAkSaohA4AkSTVkAJAkqYYMAJIk1ZABQJKkGjIASJJUQwYASZJqyAAgSVINVXop4Ig4BDgB\nWA18FngQuBSYBCwBDs3MVVXWIEmSXqmyHoCI2AD4HLAzsC+wPzAPmJ+ZM4GFwOFV7V+SJLVW5SmA\nPYAbM3NZZi7JzKOAWcCCcv5V5TKSJKnDqjwFsAUwLSIWAN3AXGB6Q5f/M8AmFe5fkiS1UGUA6AI2\nAA4ANgduLqc1zh9Ud/c0Jk+eVE11ksbMcNrqtGlTKq5GAD09MyrZrp9f9ar67JpVGQCeBu7MzNXA\nYxGxDFgdEVMzcyWwGbB4sA0sXbqiwvIkDWY4B6HhtNUVKxz32wm9vcsq2a6fX/WG89mNJixUOQbg\nemC3iHhVOSBwPeBGYHY5fzZwbYX7lyRJLVQWADLzN8B3gbuBa4BjKX4VMCcibgPWBy6uav+SJKm1\nSq8DkJnnAuc2Td6zyn1KkqSheSVASZJqyAAgSVINGQAkSaohA4AkSTVkAJAkqYYMAJIk1ZABQJKk\nGmo7AJRX9Nu4ymIkSVJntBUAImJ34DHglvL52RGxb4V1SZKkCrXbA3AqsD2wpOH5iZVUJEmSKtdu\nAFiemU/3P8nMZ4EXqylJkiRVrd17AayMiF2ArojoBg4G/lhdWZIkqUrtBoBjgK8D7wAWArcDR1VV\nlCRJqla7AWDDzHTQnyRJE0S7YwDOrLQKSZLUUe32ADwZEbcAd9Mw+C8zP1tFUZIkqVrtBoBF5Z9G\nfWNciyRJ6pC2AkBmntQ8LSLOGPtyJElSJ7QVACJiT+A0YINy0hTgOeAfhlhvKvAL4GTgR8ClwCSK\nCwodmpmrRla2JEkajXYHAZ4CHAs8A+wHXAAc18Z6J1IEBYB5wPzMnEnxU8LDh1eqJEkaK+0GgOcz\n827gxcz8ZTn4b9AAEBHbANsCV5eTZgELysdXAXsMv1xJkjQW2h0EuE5E7AwsjYg5wEPAlkOscybw\nCWBO+Xx6Q5f/M8Amwy1WkiSNjUEDQES8NTMfBI4GNqa4CdApwEYUYwJarfch4K7MXBQRAy3S1U5x\n3d3TmDx5UjuLShpHw2mr06ZNqbgaAfT0zKhku35+1avqs2s2VA/AOcBumZlARsRNmblbG9t9L7BV\necvgPwdWAcsjYmpmrgQ2AxYPtZGlS1e0sStJVRjOQWg4bXXFCsf+dkJv77JKtuvnV73hfHajCQtD\nBYDmb+ptfXPPzIP6H0fEXOBxYEdgNvDN8u9r2y2yXdfc8chYb1JN9tlpm/EuQZI0BoYaBNh8sZ/R\nXPznc8CciLgNWB+4eBTbkiRJo9DuIMARy8y5DU/3rHp/kiRpaEMFgB0j4smG5xuVz7uAvsx8Q3Wl\nSZKkqgwVAAYcwi9JktZugwaAzHyiU4VIkqTOafdKgJIkaQIxAEiSVEMGAEmSasgAIElSDRkAJEmq\nIQOAJEk1ZACQJKmGDACSJNWQAUCSpBoyAEiSVEMGAEmSasgAIElSDRkAJEmqIQOAJEk1NOjtgEcr\nIr4IzCz383ngXuBSYBKwBDg0M1dVWYMkSXqlynoAImJX4C2ZuQOwN3AOMA+Yn5kzgYXA4VXtX5Ik\ntVblKYBbgfeXj38PTAdmAQvKaVcBe1S4f0mS1EJlpwAy8yXghfLpEcAPgb0auvyfATapav+SJKm1\nSscAAETE/hQB4N3Aow2zuoZat7t7GpMnT2p7X9OmTRl2fRqenp4Z412C1kDDaau2086oqq36+VWv\nU8fZqgcB7gV8Gtg7M/8QEcsjYmpmrgQ2AxYPtv7SpSuGtb8VKxxPWLXe3mXjXYI6ZDgHoeG0Vdtp\nZ1TVVv38qjecz240YaHKQYCvBc4A9s3M58rJNwKzy8ezgWur2r8kSWqtyh6Ag4ANgSsion/aHOD8\niDgaeAK4uML9S5KkFqocBHgecN4As/asap+SJKk9lQ8ClNpxzR2PjHcJtbDPTtuMdwmS1hBeCliS\npBoyAEiSVEMGAEmSasgAIElSDRkAJEmqIQOAJEk1ZACQJKmGDACSJNWQAUCSpBoyAEiSVEMGAEmS\nasgAIElSDRkAJEmqIQOAJEk1ZACQJKmGDACSJNXQ5E7vMCLOBrYH+oBPZea9na5BkqS662gPQETs\nArwxM3cAjgC+3Mn9S5KkQqdPAewOfB8gMx8GuiPiNR2uQZKk2ut0ANgY6G143ltOkyRJHdTV19fX\nsZ1FxHnA1Zl5Zfn8duDwzPxVx4qQJEkd7wFYzMu/8W8KLOlwDZIk1V6nA8D1wIEAEfF2YHFmLutw\nDZIk1V5HTwEARMTpwLuAPwEfz8wHOlqAJEnqfACQJEnjzysBSpJUQwYASZJqyAAgSVINGQAkSaoh\nA4AkSTVkAJAkqYYMAJIk1ZABQJKkGjIASJJUQwYASZJqyAAgSVINGQAkSaqhyeNdwNouIrYAErir\nadbVmXlGm9u4BTglM28cYQ0jXj8iTgFWZ+bcEe57O+AMYDrFv6fHgOMzc9EAyz6emVsMMH0f4J+A\nl8rtLAKOzszfj6Smhu0eBkzKzAtGuZ3bgRMz85YRrPtO4GxgNfAC8KHM7B1NPRoZ26ptdYh1u4B/\nBE4G/lNmLhxNLWsDA8DY6M3MWeNdRKdFxLbAvwD7ZuYvy2kfAK6NiL/MzBfb2Ma6wDeBt2TmknLa\nF4AjgDNHU19mXjSa9cfIRcCczPxJRBwHnAocNb4l1Zpt1bbayj8BXcDi8S6kUwwAFYuI5cApwH7A\nusBpwJFAAB/LzOvLRfeLiBOAzYCTM/PbEbENcC7Ft8fXUCTb6yJiLrAlsDnw9037uxBYlJnzIuJY\n4AMUn/MjwDGZuTIiTgX2BZ6i+Fb6cNM2tgW+NsDLOTgzf9vw/NPAF/sPKACZeUVEHAx8EPhGG2/R\nVIpvEtMbtvGPDbU8DuyRmQsjYhbFt6edy29S9wP/BfgJsDQzTyvXORGYAawsX/uUFvM/A8wHti6f\nX56ZZ0bENODbQA/wKPDq5qIj4r/xyv/If5uZBzcsswUwNTN/Uk66Ari7jfdE48C2OqQJ21ZLX83M\n5yPiyDbeiwnBAFC96cB9mXl62RD2y8z3lF1exwD9B5XJmfnuiNgauCMirgA2Bj6TmbdGxA7AV4Dr\nyuW3BHbJzL6IACAiTgKWlweU7YADgHeVy5wNfCQirgMOoTiovUTRIF92UMnMh4BZbby2/wKcNcD0\nu4C/oo2DSmb+ISI+B9wfEXcDNwPfzcxsY//LM3OXiPjP5b5OK6cfBPwtcGD5/LIW8z8FLM7MIyNi\nEnB3RNwAbA+szMwdImITim7O5rq/BXxriPo2BRoPwr8FNmnjdWl82FYHMcHbKpn5fBuvY0JxEODY\n6ImIW5qoZ9ghAAAPN0lEQVT+bNcw//by718DdzY8fm3DMjcANJx36gGWAMdHxG3AOcCGDcvfnZl9\nDc8PA95L0VCgOChsDdxcHsx2Bl4P/CXw08xclZmrgVtH9pIB+COt/w39sd2NZOYXKL4hXVD+fU9E\nfKyNVe8s178fmBIRW5XfiFZn5i8att9q/q7AAeX78yOKbw9bU7xHt5frLqH4RjYWuoC+IZdSlWyr\nA89rS43aai3YAzA2hjqvuLrF466Gx39qmt4HfJWiq+sbEfEW4AcNyzSfs5tC0W25G3AjsApYkJmf\naFwoIg5s2tek5mKH0a34C2AH4N6m5d4B/HCA9QcUEdMy83fA5cDlEfEdinOKX+fl/2Gu27Rq43vw\nLYpvEdMpzlM2G2j+KmBeZn63qZ7dGPo9aqdb8SmKXoB+mwK/GaA2dY5t9eVsqzVmAFhz7A4siIg3\nURx4eoHXAf3n7A6iOHC0ci7wNHBlFCPP7wA+GRHrZebyiDgG+BlFF+LbywE9fcAuwPcaNzSMbsWz\ngKsj4pbMfBAgIvan6G48tI31iYi9gC9GxM6ZuaycvBXQ/+3qeYpvQwspDpitfAu4hOKg8d42599O\ncd71uxHxKuB/UZwDfgjYEZgfEa+n6IJ9mXa6FTPzqYhYGhE7ZeYdFOdaFwy2jtYKttUJ1lbrygAw\nNnrKrqlGizLzw8PYxuqIuJKiW+uT5bnAM4FLysE1ZwF/U05bNtAGMvPnEXEWxcjzfSkGzdwSEX+k\nGNl6UWauiIjvA/cAT1AMzhmRzLw/Ij4EXBoRfRTp+yFgVmauanMb15UH0h9FxAqKb1RPAx8vFzkT\nuCAifkVxoGy1nUVlDb39I5TbmD8feHNE3FXW/oPMfC4iLgXeV3bnLqI49zpShwFfLff9HDBnFNvS\n6NlWbasDioivAdtSjOe4LCKWZ+buI93e2qCrr89Tkhq9sivuy8B/bXVAiRa/LZbUObZV9XMQoMZE\nZt4EXA38tBzhLGkNZFtVP3sAJEmqIXsAJEmqIQOAJEk1ZACQJKmG1uifAfb2LnOAgjROenpmdA29\nVMG2Ko2P4bTTZvYASJJUQwYASZJqyAAgSVINGQAkSaohA4AkSTVU2a8AImI9ijs6dVPcGeskiptP\nXEpxM4clwKHt3ohCkiSNnSp7AA4DMjN3pbi385eAecD8zJxJccvIwyvcvyRJaqHKAPAssEH5uLt8\nPov/uB/6VcAeFe5fkiS1UFkAyMxvA2+IiIXArcDxwPSGLv9ngE2q2r8kSWqtyjEAHwSezMy9I+Jt\nwAVNiwx59aLu7mlMnjypkvokjR3bqrT2qfJSwDsB1wFk5gMRsSnwQkRMzcyVwGbA4sE2sHTpigrL\nkzSYnp4ZbS9rW5XGx3DaabMqxwAsBN4JEBGbA8uBG4DZ5fzZwLUV7l+SJLVQZQ/AucA3IuLH5X4+\nCjwMXBIRRwNPABdXuH9JktRCV1/fmnsTL+8wJo0f7wYorfm8G6AkSRoWA4AkSTVkAJAkqYYMAJIk\n1ZABQJKkGjIASJJUQwYASZJqyAAgSVINGQAkSaohA4AkSTVkAJAkqYYMAJIk1ZABQJKkGjIASJJU\nQwYASZJqyAAgSVINGQAkSaohA4AkSTU0ucqNR8QhwAnAauCzwIPApcAkYAlwaGauqrIGSZL0SpX1\nAETEBsDngJ2BfYH9gXnA/MycCSwEDq9q/5IkqbUqTwHsAdyYmcsyc0lmHgXMAhaU868ql5EkSR1W\n5SmALYBpEbEA6AbmAtMbuvyfATapcP+SJKmFKgNAF7ABcACwOXBzOa1x/qC6u6cxefKkaqqTNGZs\nq9Lap8oA8DRwZ2auBh6LiGXA6oiYmpkrgc2AxYNtYOnSFRWWJ2kwPT0z2l7WtiqNj+G002ZVjgG4\nHtgtIl5VDghcD7gRmF3Onw1cW+H+JUlSC5UFgMz8DfBd4G7gGuBYil8FzImI24D1gYur2r8kSWqt\nq6+vb7xraKm3d9maW5w0wfX0zBhynE4/26o0PobTTpt5JUBJkmrIACBJUg0ZACRJqiEDgCRJNWQA\nkCSphgwAkiTVUNsBoLygz8ZVFiNJkjqjrQAQEbsDjwG3lM/Pjoh9K6xLkiRVqN0egFOB7YElDc9P\nrKQiSZJUuXYDwPLMfLr/SWY+C7xYTUmSJKlq7d4NcGVE7AJ0RUQ3cDDwx+rKkiRJVWo3ABwDfB14\nB7AQuB04qqqiJElStdoNABtmpoP+JEmaINodA3BmpVVIkqSOarcH4MmIuAW4m4bBf5n52SqKkiRJ\n1Wo3ACwq/zTy/t+SJK2l2goAmXlS87SIOGPsy5EkSZ3QVgCIiD2B04ANyklTgOeAfxhivanAL4CT\ngR8BlwKTKC4odGhmrhpZ2ZIkaTTaHQR4CnAs8AywH3ABcFwb651IERQA5gHzM3MmxU8JDx9eqZIk\naay0GwCez8y7gRcz85fl4L9BA0BEbANsC1xdTpoFLCgfXwXsMfxyJUnSWGh3EOA6EbEzsDQi5gAP\nAVsOsc6ZwCeAOeXz6Q1d/s8Amwy3WEmSNDYGDQAR8dbMfBA4GtiY4iZApwAbUYwJaLXeh4C7MnNR\nRAy0SFc7xXV3T2Py5EntLCppHNlWpbXPUD0A5wC7ZWYCGRE3ZeZubWz3vcBW5S2D/xxYBSyPiKmZ\nuRLYDFg81EaWLl3Rxq4kVaGnZ0bby9pWpfExnHbabKgA0PxNva1v7pl5UP/jiJgLPA7sCMwGvln+\nfW27RUqSpLE11CDA5ov9jObiP58D5kTEbcD6wMWj2JYkSRqFdgcBjlhmzm14umfV+5MkSUMbKgDs\nGBFPNjzfqHzeBfRl5huqK02SJFVlqAAw4BB+SZK0dhs0AGTmE50qRJIkdU67VwKUJEkTiAFAkqQa\nMgBIklRDBgBJkmrIACBJUg0ZACRJqiEDgCRJNWQAkCSphgwAkiTVkAFAkqQaMgBIklRDBgBJkmrI\nACBJUg0ZACRJqqFBbwc8WhHxRWBmuZ/PA/cClwKTgCXAoZm5qsoaJEnSK1XWAxARuwJvycwdgL2B\nc4B5wPzMnAksBA6vav+SJKm1Kk8B3Aq8v3z8e2A6MAtYUE67Ctijwv1LkqQWKjsFkJkvAS+UT48A\nfgjs1dDl/wywSVX7lyRJrVU6BgAgIvanCADvBh5tmNU11Lrd3dOYPHlSVaVJGiO2VWntU/UgwL2A\nTwN7Z+YfImJ5REzNzJXAZsDiwdZfunRFleVJGkRPz4y2l7WtSuNjOO20WZWDAF8LnAHsm5nPlZNv\nBGaXj2cD11a1f0mS1FqVPQAHARsCV0RE/7Q5wPkRcTTwBHBxhfuXJEktdPX19Y13DS319i5bc4uT\nJrienhlDjtPpZ1uVxsdw2mkzrwQoSVINGQAkSaohA4AkSTVU+XUAOumaOx4Z7xImvH122ma8S5Ak\njYEJFQAkrfkM6p1hWNdQDACSpLYZ4KrXqfDmGABJkmrIHgCtEfxW0Rl2C0vqZw+AJEk1ZACQJKmG\nDACSJNWQAUCSpBoyAEiSVEMGAEmSasgAIElSDRkAJEmqIQOAJEk1ZACQJKmGOn4p4Ig4G9ge6AM+\nlZn3droGSZLqrqM9ABGxC/DGzNwBOAL4cif3L0mSCp0+BbA78H2AzHwY6I6I13S4BkmSaq/TAWBj\noLfheW85TZIkdVBXX19fx3YWEecBV2fmleXz24HDM/NXHStCkiR1vAdgMS//xr8psKTDNUiSVHud\nDgDXAwcCRMTbgcWZuazDNUiSVHsdPQUAEBGnA+8C/gR8PDMf6GgBkiSp8wFAkiSNP68EKElSDRkA\nJEmqIQPAKEXEOhFxT0RcPIbb3CIi7hur7al6EXFRROw73nWoNduqbKcvZwAYvU2AKZk5Z7wLkTQo\n26rUoOM3A5qAzgb+IiIuBGYA3RTv67GZ+WBEPAb8M8XPHxcCPwXeDzyamYdExNuA+cC/U/wy4v2N\nG4+ImcBp5fyngCMz88WOvLKaiojDgF2ADYE3A58G/hbYFjgEOAjYDng18L8z8/yGdScB5wFbAesA\nn83MmzpZv1qyrU4gttPRswdg9P4eSODfgGszc3fgY8CZ5fxJwL8C7wB2Ah7PzO2AmRHxZ8BGFAeg\nXYE7KP7hNvoysH9m7gY8TdNBR5V5I/A+4PPAPwEHlI8/TPEZ7gzMBOY1rfffgCXl5/nXwDkdq1hD\nsa1OPLbTUbAHYOzsCPRExAfL59Ma5v0kM/si4mngZ+W0Z4DXUhwovhAR0yiujHhZ/0oR8TqKf+Df\niwiA6cCzlb4K9buv/MyWAA9m5kvl5zcFWD8i7gReBHqa1tuR4j+MncvnUyNiXb8JrlFsqxOH7XQU\nDABj50WKbwd3DTBvdYvHXcCXgC9k5rURcTywXtM2f5OZs8a6WA2p1We2BfAXwC6Z+e8RsbxpvReB\nUzPz8orr08jZVicO2+koeApg7NxD0ZVERGwbEce1ud6GwGMRMQV4D7Bu/4zMXNq/vfLvYyPirWNa\ntYbrr4CnyoPK+4BJEbFuw/x7gP0BImKjiDhtPIrUoGyrE5/ttA0GgLHzFWDriLgNOB+4dRjrfR/4\nTvl4DkV3Y78jgAvL7e5McQ5T4+dG4I0R8WOKbxg/AL7eMP8KYHnZ9XgVcFvnS9QQbKsTn+20DV4K\nWJKkGrIHQJKkGjIASJJUQwYASZJqyAAgSVINGQAkSaohLwSkEYuIfSguv/kSxZXPFgFHZ+bvx7Uw\nSS9jW9VA7AHQiJQX1fgmcFBm7lpeM/1xit9CS1pD2FbVij0AGqmpFN8kpvdPyMx/BCivgHYmxV22\n1gE+QXEDlnuBfTLzsYi4iOI63l/tcN1S3dhWNSB7ADQimfkH4HPA/RFxY0R8Osq7oFDcJOWj5XXR\njwHOL5f/BPDViJgFbEZxa1VJFbKtqhWvBKhRiYgNgHcDuwIfoLjn+qeBOxsW2wyIzPxTRJwH7AXs\nnJlPdbpeqa5sq2rmKQCNWERMy8zfAZcDl0fEd4DzgFWD3BVtY2Alxb3VPahIHWBb1UA8BaARiYi9\ngLsiYkbD5K0o7qH+eES8p1zuTRHx2fLxHOB3wPuBC8q7qkmqkG1VrXgKQCMWEccChwIrKO6X/jTw\nKYpvDl8G+igGFh0HPAHcDOyQmc9FxKnAlMw8fjxql+rEtqqBGAAkSaohTwFIklRDBgBJkmrIACBJ\nUg0ZACRJqiEDgCRJNWQAkCSphgwAkiTVkAFAkqQa+n/UpFoHqheEMQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7ff74c751cf8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# grid = sns.FacetGrid(train_df, col='Embarked', hue='Survived', palette={0: 'k', 1: 'w'})\n",
"grid = sns.FacetGrid(train_df, row='Embarked', col='Survived', size=2.2, aspect=1.6)\n",
"grid.map(sns.barplot, 'Sex', 'Fare', alpha=.5, ci=None)\n",
"grid.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "cfac6291-33cc-506e-e548-6cad9408623d"
},
"source": [
"## Wrangle data\n",
"\n",
"We have collected several assumptions and decisions regarding our datasets and solution requirements. So far we did not have to change a single feature or value to arrive at these. Let us now execute our decisions and assumptions for correcting, creating, and completing goals.\n",
"\n",
"### Correcting by dropping features\n",
"\n",
"This is a good starting goal to execute. By dropping features we are dealing with fewer data points. Speeds up our notebook and eases the analysis.\n",
"\n",
"Based on our assumptions and decisions we want to drop the Cabin (correcting #2) and Ticket (correcting #1) features.\n",
"\n",
"Note that where applicable we perform operations on both training and testing datasets together to stay consistent."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"_cell_guid": "da057efe-88f0-bf49-917b-bb2fec418ed9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Before (891, 12) (418, 11) (891, 12) (418, 11)\n"
]
},
{
"data": {
"text/plain": [
"('After', (891, 10), (418, 9), (891, 10), (418, 9))"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(\"Before\", train_df.shape, test_df.shape, combine[0].shape, combine[1].shape)\n",
"\n",
"train_df = train_df.drop(['Ticket', 'Cabin'], axis=1)\n",
"test_df = test_df.drop(['Ticket', 'Cabin'], axis=1)\n",
"combine = [train_df, test_df]\n",
"\n",
"\"After\", train_df.shape, test_df.shape, combine[0].shape, combine[1].shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "6b3a1216-64b6-7fe2-50bc-e89cc964a41c"
},
"source": [
"### Creating new feature extracting from existing\n",
"\n",
"We want to analyze if Name feature can be engineered to extract titles and test correlation between titles and survival, before dropping Name and PassengerId features.\n",
"\n",
"In the following code we extract Title feature using regular expressions. The RegEx pattern `(\\w+\\.)` matches the first word which ends with a dot character within Name feature. The `expand=False` flag returns a DataFrame.\n",
"\n",
"**Observations.**\n",
"\n",
"When we plot Title, Age, and Survived, we note the following observations.\n",
"\n",
"- Most titles band Age groups accurately. For example: Master title has Age mean of 5 years.\n",
"- Survival among Title Age bands varies slightly.\n",
"- Certain titles mostly survived (Mme, Lady, Sir) or did not (Don, Rev, Jonkheer).\n",
"\n",
"**Decision.**\n",
"\n",
"- We decide to retain the new Title feature for model training."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"_cell_guid": "df7f0cd4-992c-4a79-fb19-bf6f0c024d4b"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>Sex</th>\n",
" <th>female</th>\n",
" <th>male</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Title</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Capt</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Col</th>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Countess</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Don</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dr</th>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Jonkheer</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Lady</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Major</th>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Master</th>\n",
" <td>0</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Miss</th>\n",
" <td>182</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mlle</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mme</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mr</th>\n",
" <td>0</td>\n",
" <td>517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mrs</th>\n",
" <td>125</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ms</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Rev</th>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Sir</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Sex female male\n",
"Title \n",
"Capt 0 1\n",
"Col 0 2\n",
"Countess 1 0\n",
"Don 0 1\n",
"Dr 1 6\n",
"Jonkheer 0 1\n",
"Lady 1 0\n",
"Major 0 2\n",
"Master 0 40\n",
"Miss 182 0\n",
"Mlle 2 0\n",
"Mme 1 0\n",
"Mr 0 517\n",
"Mrs 125 0\n",
"Ms 1 0\n",
"Rev 0 6\n",
"Sir 0 1"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Title'] = dataset.Name.str.extract(' ([A-Za-z]+)\\.', expand=False)\n",
"\n",
"pd.crosstab(train_df['Title'], train_df['Sex'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "908c08a6-3395-19a5-0cd7-13341054012a"
},
"source": [
"We can replace many titles with a more common name or classify them as `Rare`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"_cell_guid": "553f56d7-002a-ee63-21a4-c0efad10cfe9"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Title</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Master</td>\n",
" <td>0.575000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Miss</td>\n",
" <td>0.702703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Mr</td>\n",
" <td>0.156673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mrs</td>\n",
" <td>0.793651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Rare</td>\n",
" <td>0.347826</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Title Survived\n",
"0 Master 0.575000\n",
"1 Miss 0.702703\n",
"2 Mr 0.156673\n",
"3 Mrs 0.793651\n",
"4 Rare 0.347826"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col',\\\n",
" \t'Don', 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')\n",
"\n",
" dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')\n",
" dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')\n",
" dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')\n",
" \n",
"train_df[['Title', 'Survived']].groupby(['Title'], as_index=False).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "6d46be9a-812a-f334-73b9-56ed912c9eca"
},
"source": [
"We can convert the categorical titles to ordinal."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"_cell_guid": "67444ebc-4d11-bac1-74a6-059133b6e2e8"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Fare Embarked Title \n",
"0 0 7.2500 S 1 \n",
"1 0 71.2833 C 3 \n",
"2 0 7.9250 S 2 \n",
"3 0 53.1000 S 3 \n",
"4 0 8.0500 S 1 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"title_mapping = {\"Mr\": 1, \"Miss\": 2, \"Mrs\": 3, \"Master\": 4, \"Rare\": 5}\n",
"for dataset in combine:\n",
" dataset['Title'] = dataset['Title'].map(title_mapping)\n",
" dataset['Title'] = dataset['Title'].fillna(0)\n",
"\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "f27bb974-a3d7-07a1-f7e4-876f6da87e62"
},
"source": [
"Now we can safely drop the Name feature from training and testing datasets. We also do not need the PassengerId feature in the training dataset."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"_cell_guid": "9d61dded-5ff0-5018-7580-aecb4ea17506"
},
"outputs": [
{
"data": {
"text/plain": [
"((891, 9), (418, 9))"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df = train_df.drop(['Name', 'PassengerId'], axis=1)\n",
"test_df = test_df.drop(['Name'], axis=1)\n",
"combine = [train_df, test_df]\n",
"train_df.shape, test_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "2c8e84bb-196d-bd4a-4df9-f5213561b5d3"
},
"source": [
"### Converting a categorical feature\n",
"\n",
"Now we can convert features which contain strings to numerical values. This is required by most model algorithms. Doing so will also help us in achieving the feature completing goal.\n",
"\n",
"Let us start by converting Sex feature to a new feature called Gender where female=1 and male=0."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"_cell_guid": "c20c1df2-157c-e5a0-3e24-15a828095c96"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Fare Embarked Title\n",
"0 0 3 0 22.0 1 0 7.2500 S 1\n",
"1 1 1 1 38.0 1 0 71.2833 C 3\n",
"2 1 3 1 26.0 0 0 7.9250 S 2\n",
"3 1 1 1 35.0 1 0 53.1000 S 3\n",
"4 0 3 0 35.0 0 0 8.0500 S 1"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0} ).astype(int)\n",
"\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "d72cb29e-5034-1597-b459-83a9640d3d3a"
},
"source": [
"### Completing a numerical continuous feature\n",
"\n",
"Now we should start estimating and completing features with missing or null values. We will first do this for the Age feature.\n",
"\n",
"We can consider three methods to complete a numerical continuous feature.\n",
"\n",
"1. A simple way is to generate random numbers between mean and [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation).\n",
"\n",
"2. More accurate way of guessing missing values is to use other correlated features. In our case we note correlation among Age, Gender, and Pclass. Guess Age values using [median](https://en.wikipedia.org/wiki/Median) values for Age across sets of Pclass and Gender feature combinations. So, median Age for Pclass=1 and Gender=0, Pclass=1 and Gender=1, and so on...\n",
"\n",
"3. Combine methods 1 and 2. So instead of guessing age values based on median, use random numbers between mean and standard deviation, based on sets of Pclass and Gender combinations.\n",
"\n",
"Method 1 and 3 will introduce random noise into our models. The results from multiple executions might vary. We will prefer method 2."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"_cell_guid": "c311c43d-6554-3b52-8ef8-533ca08b2f68"
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ff74bfede80>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgEAAAHUCAYAAACj/ftgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X20XXV95/H3JRdDuI14levIw1Sk1S/DYo1drKI8TDDU\nQGih8hCR1YEAJQJV6UCR5UwFChNbpVARu6RCFBCxTIsOQ6FotIAKtoxm1mqrzOAXeQiIiXApQZKA\ngcCdP86O3lyS3H1uzvPv/Vori3N29j77+7s3v8Nnf/c++wxNTEwgSZLKs0O3C5AkSd1hCJAkqVCG\nAEmSCmUIkCSpUIYASZIKZQiQJKlQw90uoDQRsReQwH3Voh2Bx4APZuazW9nmNGBBZp7ciRq3UsNb\ngS8D92+tjohYmZl7bWH5YuAPgJeAucD3gHMzc0P7Kv7Fvo8GLgJeBFYDp2bmC+3erwab87jj8/g1\nwGXAOcCOmbmx3fsshZ2A7hjPzPnVn0OAnwAXdruorYmIEeALwFdnsO2ewMeBhZk5H/hNGm8gx7aw\nxK3teydgGfC+zJwH/BT4o3bvV8VwHndgHleuAL7foX0VxU5Ab7gHOAsgIt4JXEnjyPUZ4JTJK0bE\nccBHgJ/T+P0tzsyVEXEOcDLwfPXnZGA28NfAEDAHuCYzr5vyet8AXjOlnkszc/mk5xuAw4H3Ab/a\n5NhGq9efA6zLzImqtk37Pwy4uKrxJeAM4BXgLuCAzFwTEXcDV2Tm30/a7jzgPVP29S+Zee6k5wcC\nD2bmY9Xzm4FLabyZSa3mPG7PPAb4aGY+FxHXNlm3pmEI6LKImAUcD9xbLfoScFxm3h8R5wJHTdnk\ndcCJmfl4RPwxcDZwPrAUeFtmPhkRC4HdgQXADzPzA9VR8fun7j8zj5iuxqr1tjEimh5fZv4gIm4G\nHomIbwN3A1/OzB9HxM7A1cBBmflMRBwD/EVmLoqIy4BLI+I+4NHJbxzV615B4+hgW3ancfS/yU+r\nZVJLOY/bOo/JzOeaLlq1GAK6YywivlU93oHGG8enImJX4HWZeT9AZl4JvziXuMmTwA0RsQPwJn55\nTvJaYHlEfIXG5HwwIl4CPhgRXwDuAK5p66i2IjP/MCI+ASwE3g1cEhEn0RjLbsAt1RvTLGCi2mZZ\nRHwNOBg4pEWlDG16fakFnMfdmcdqIUNAd4xX59U2ExETbOM6jYjYEfhbYP/M/FFEnE3j3ByZeV5E\nvBn4HeDWiPhwZn4tIvYF3gWcAJzLlIlYs404YxExBOyUmauA64HrI+IMGu3Ci4DHt/KzGKZxtDQE\n7AI8N+Xv67QRf8zmR/67A09s14CkX3Ied2Yeq40MAT0kM/8tIp6OiAMyc0VEnM8vzw1C40KcV4CV\nVVvwGODpiBilcdXs0sz8bHV08Y5q+crMvDMivlltNzz5yto6bcTtdCZwXES8JzNfrJbtDTwEPAjs\nGhH7VW3TQ4F9MnMZcAGwHPgucF1EHFGdh9xUd5024neBt0TEr2XmwzTOYd7W0tFJUziPWz6P1UaG\ngN6zGPh01QJ8tnp+PEB1vu0mYAWNjyNdDtxI45zhXGBFRKyhcWHOEuCNwNURsYFGEv/zmXy0JiLe\nQePjOW8CRqsW6LWZeWONzT8H7AH8Y0Sso/FRqgeA8zLzhYg4Gbg2In5erX9mRLy9GvMBmfli9dGk\nDwGfaabuatslwE0RsRF4uNnXkGbIedyieVzVfgvw+urpXRHxRGae1Ozr6NWG/CphtUps5fPFkvqH\n87gs3idAkqRC2QmQJKlQdgIkSSqUIUCSpEIZAiRJKlRHPiI4Pr621oUHo6M7s2bN89Ov2EcGbUyD\nNh4oe0xjY3OH6r5mnXlc8s+ynwzamAZtPNDcmJqZx1P1VCdgeHhWt0touUEb06CNBxzTIOy3nRxT\n7xu08UDnxtRTIUCSJHWOIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJ\nKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqU\nIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKtRwnZUiYg5wP/Ax4C7g\nRmAWsBpYnJkb2lahJElqi7qdgAuBZ6rHS4GrMnMe8BBwejsKkyRJ7TVtCIiIfYB9gTuqRfOB26rH\ntwML2lKZJElqqzqdgE8C5016PjKp/f8UsFvLq5IkSW23zWsCIuIU4L7MfDQitrTKUJ2djI7uzPDw\nrFoFjY3NrbVePxm0MQ3aeMAx1VF3Hvuz7A+DNqZBGw90ZkzTXRh4FLB3RBwN7AlsANZFxJzMfAHY\nA1g13U7WrHm+VjFjY3MZH19ba91+MWhjGrTxQNljauZNps48Lvln2U8GbUyDNh5obkzbExa2GQIy\n88RNjyPiEmAlcDCwCPhS9d/lM967JEnqmpncJ+Bi4NSIuBd4PXBDa0uSJEmdUOs+AQCZecmkp4e3\nvhRJktRJ3jFQkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZ\nAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIk\nSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkq\nlCFAkqRCGQIkSSrUcJ2VIuIyYF61/ieAFcCNwCxgNbA4Mze0q0hJktR603YCIuIwYL/MPAg4ErgS\nWApclZnzgIeA09tapSRJark6pwPuAU6oHj8LjADzgduqZbcDC1pemSRJaqtpTwdk5svA+urpEuCr\nwMJJ7f+ngN3aU54kSWqXoYmJiVorRsQxwEeBI4AfZeYbq+W/DnwxMw/e2rYbN748MTw8qwXlSmqx\noborOo+lnlV7Hk9V98LAhcAFwJGZ+bOIWBcRczLzBWAPYNW2tl+z5vlaxYyNzWV8fG2tdfvFoI1p\n0MYDZY9pbGxu7desM49L/ln2k0Eb06CNB5obUzPzeKo6FwbuAlwOHJ2Zz1SL7wQWVY8XActnXIEk\nSeqKOp2AE4FdgZsjYtOyU4HPR8RZwGPADe0pT5IktUudCwOXAcu28FeHt74cSZLUKd4xUJKkQhkC\nJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQhkCJEkqVK3vDpBm6tZ7H6m13rHz9m5zJZKk\nqewESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVCg/HdBj2nE1vVfoS5K2xE6AJEmFshOgX6jTMRgZ\nmc3h++/RlX1vYsdCklrDToAkSYWyEyBJPW66TpndMc2UnQBJkgplJ0CSptHuI/FmromRWslOgCRJ\nhbIT0Kc8cpB6R6+fs59a38jIbNav3/CL592uT91jJ0CSpELZCVDf8a6Katb2dM5GRma3sJL26HZn\nsNc7Ido6OwGSJBXKTkAHdDulS5K0JXYCJEkqlCFAkqRCeTpgO9T9wp1B0y+nN+rWOYi/I6kZ3b4Z\nkhcOdo+dAEmSCmUnQGoDvxpZqm97u4tTb340lXNs6+wESJJUqBl3AiLiU8CBwARwTmauaFVRHkWp\nV/XL9RBSK/nvfnDNqBMQEe8C3pqZBwFLgL9saVWSJKntZtoJeDdwK0BmPhARoxHx2sx8rnWltZbd\nBfW7Zv4Nn3H829tYSfO6ffV5t/V6fSrXTK8JeBMwPun5eLVMkiT1iaGJiYmmN4qIZcAdmfl31fPv\nAKdn5oMtrk+SJLXJTDsBq9j8yH93YPX2lyNJkjplpiHgG8B7ASJif2BVZq5tWVWSJKntZnQ6ACAi\nLgUOBV4BPpSZ/9rKwiRJUnvNOARIkqT+5h0DJUkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKk\nQhkCJEkqlCFAkqRCGQIkSSqUIUCSpEIZAiRJKpQhQJKkQg13u4DSRMReQAL3VYt2BB4DPpiZz25l\nm9OABZl5cidq3ML+dwQ+C/wHYA7wPzLz8i2stzIz99rC8sXAHwAvAXOB7wHnZuaGdtZd7fto4CLg\nRWA1cGpmvtDu/WqwOY87Po9fA1wGnAPsmJkb273PUtgJ6I7xzJxf/TkE+AlwYbeL2oYzgdlVrYcA\n/6V6E5xWROwJfBxYmJnzgd+k8QZybHtK3WzfOwHLgPdl5jzgp8AftXu/KobzuAPzuHIF8P0O7aso\ndgJ6wz3AWQAR8U7gShpHrs8Ap0xeMSKOAz4C/JzG729xZq6MiHOAk4Hnqz8nA7OBvwaGaCT/azLz\nuimv9w3gNVPquTQzl096/jngOoDMfCEi1gNvAFbWGNto9fpzgHWZOVHVtmn/hwEXVzW+BJwBvALc\nBRyQmWsi4m7gisz8+0nbnQe8Z8q+/iUzz530/EDgwcx8rHp+M3ApjTczqdWcx+2ZxwAfzcznIuLa\nGrWqCYaALouIWcDxwL3Voi8Bx2Xm/RFxLnDUlE1eB5yYmY9HxB8DZwPnA0uBt2XmkxGxENgdWAD8\nMDM/UB0Vv3/q/jPziOlqzMwXJ9V7PI03p3+uM77M/EFE3Aw8EhHfBu4GvpyZP46InYGrgYMy85mI\nOAb4i8xcFBGXAZdGxH3Ao5PfOKrXvYLG0cG27E7j6H+Tn1bLpJZyHrd1HpOZz9WpU80zBHTHWER8\nq3q8A403jk9FxK7A6zLzfoDMvBJ+cS5xkyeBGyJiB+BN/PKc5LXA8oj4Co3J+WBEvAR8MCK+ANwB\nXLM9RUfEe4E/o9ESfKXudpn5hxHxCWAh8G7gkog4qRrLbsAtEQEwC5iotlkWEV8DDqbRumyFoU2v\nL7WA87g781gtZAjojvHqvNpmImKCbVynUV3Y87fA/pn5o4g4m8a5OTLzvIh4M/A7wK0R8eHM/FpE\n7Au8CzgBOJcpE7FmG5GI+D0aRyrzM3N13YFGxBCwU2auAq4Hro+IM2i0Cy8CHt/Kz2KYxtHSELAL\n8NyUv6/TRvwxmx/57w48Ubd2aRrO487MY7WRIaCHZOa/RcTTEXFAZq6IiPP55blBaFyI8wqwsmoL\nHgM8HRGjNK6aXZqZn62OLt5RLV+ZmXdGxDer7YYnX1lbp40YEW8DPgocmplrmhzWmcBxEfGeSe3I\nvYGHgAeBXSNiv6pteiiwT2YuAy4AlgPfBa6LiCOq85Cb6q7TRvwu8JaI+LXMfJjGOczbmqxfaorz\nuOXzWG1kCOg9i4FPVy3AZ6vnxwNU59tuAlbQ+DjS5cCNNM4ZzgVWRMQaGhfmLAHeCFwdERtoJPE/\nn+FHa86pXv9/Ve0+gMsz844a234O2AP4x4hYR+OjVA8A51UXJ50MXBsRP6/WPzMi3l6N+YDMfDEa\nH036EPCZZoqutl0C3BQRG4GHm30NaYacxy2axwARcQvw+urpXRHxRGae1Ozr6NWGJiY8RarWiK18\nvlhS/3Ael8X7BEiSVCg7AZIkFcpOgCRJhTIESJJUKEOAJEmF6shHBMfH19a68GB0dGfWrHl++hX7\nyKCNadDGA2WPaWxs7lDd16wzj0v+WfaTQRvToI0HmhtTM/N4qp7qBAwPz+p2CS03aGMatPGAYxqE\n/baTY+p9gzYe6NyYeioESJKkzjEESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJU\nKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChD\ngCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUqOE6K0XEHOB+4GPAXcCN\nwCxgNbA4Mze0rUJJktQWdTsBFwLPVI+XAldl5jzgIeD0dhQmSZLaa9oQEBH7APsCd1SL5gO3VY9v\nBxa0pTJJktRWdToBnwTOm/R8ZFL7/ylgt5ZXJUmS2m6b1wRExCnAfZn5aERsaZWhOjsZHd2Z4eFZ\ntQoaG5tba71+MmhjGrTxgGOqo+489mfZHwZtTIM2HujMmKa7MPAoYO+IOBrYE9gArIuIOZn5ArAH\nsGq6naxZ83ytYsbG5jI+vrbWuv1i0MY0aOOBssfUzJtMnXlc8s+ynwzamAZtPNDcmLYnLGwzBGTm\niZseR8QlwErgYGAR8KXqv8tnvHdJktQ1M7lPwMXAqRFxL/B64IbWliRJkjqh1n0CADLzkklPD299\nKZIkqZO8Y6AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIE\nSJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiS\nVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmFMgRIklQo\nQ4AkSYUyBEiSVChDgCRJhRqus1JEXAbMq9b/BLACuBGYBawGFmfmhnYVKUmSWm/aTkBEHAbsl5kH\nAUcCVwJLgasycx7wEHB6W6uUJEktV+d0wD3ACdXjZ4ERYD5wW7XsdmBByyuTJEltNe3pgMx8GVhf\nPV0CfBVYOKn9/xSwW3vKkyRJ7TI0MTFRa8WIOAb4KHAE8KPMfGO1/NeBL2bmwVvbduPGlyeGh2e1\noFxJLTZUd0XnsdSzas/jqepeGLgQuAA4MjN/FhHrImJOZr4A7AGs2tb2a9Y8X6uYsbG5jI+vrbVu\nvxi0MQ3aeKDsMY2Nza39mnXmcck/y34yaGMatPFAc2NqZh5PVefCwF2Ay4GjM/OZavGdwKLq8SJg\n+YwrkCRJXVGnE3AisCtwc0RsWnYq8PmIOAt4DLihPeVJkqR2qXNh4DJg2Rb+6vDWlyNJkjrFOwZK\nklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVKhaXyCkzrr13kem\nXefYeXt37HUkSYPJToAkSYWyE9Cn6hzlS5K0LXYCJEkqlJ0ASeqAbXXvvDZH3WInQJKkQhkCJEkq\nlKcDJKlFvGBX/cZOgCRJhbIT0EEeJUiSeomdAEmSCtWTnQBvdytJUvvZCZAkqVA92QnoR1vrXoyM\nzGb9+g0drkbSoJiuM2pXVNvDToAkSYWyE1DDIF/V3+zYttbZ8GhEkvqPnQBJkgplJ0CSumyQu43q\nbXYCJEkqlJ0A9ZS6R0Reg9Bfeu1rdHutnu0xky7CyMhsDt9/jzZUo35jJ0CSpELZCVDHtPK8p3eV\nHBy9dlQ+tZ5BvdfH9szHbf1eeu33qW2zEyBJUqFm3AmIiE8BBwITwDmZuaJlVanv9OLVzXVrmu5I\nz6OX/tSL/ya1dd4ZsTtm1AmIiHcBb83Mg4AlwF+2tCpJktR2M+0EvBu4FSAzH4iI0Yh4bWY+17rS\ntp/njdUKrTqi9N9a8zy/3JtmOifszvSemV4T8CZgfNLz8WqZJEnqE0MTExNNbxQRy4A7MvPvquff\nAU7PzAdbXJ8kSWqTmXYCVrH5kf/uwOrtL0eSJHXKTEPAN4D3AkTE/sCqzFzbsqokSVLbzeh0AEBE\nXAocCrwCfCgz/7WVhUmSpPaacQiQJEn9zTsGSpJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJ\nhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhRrudgGliYi9gATuqxbtCDwGfDAzn93K\nNqcBCzLz5E7UuIX97wJcB4wBs4GvZ+afbGG9lZm51xaWLwb+AHgJmAt8Dzg3Mze0s+5q30cDFwEv\nAquBUzPzhXbvV4PNedzxefwa4DLgHGDHzNzY7n2Wwk5Ad4xn5vzqzyHAT4ALu13UNvxn4HuZeShw\nCHBSRPxGnQ0jYk/g48DCzJwP/CaNN5Bj21Tr5H3vBCwD3peZ84CfAn/U7v2qGM7jDszjyhXA9zu0\nr6LYCegN9wBnAUTEO4EraRy5PgOcMnnFiDgO+Ajwcxq/v8WZuTIizgFOBp6v/pxMI+3/NTAEzAGu\nyczrprzeN4DXTKnn0sxcvulJZn520t+9nkZ4HK85ttHq9ecA6zJzoqpt0/4PAy6uanwJOAN4BbgL\nOCAz10TE3cAVmfn3k7Y7D3jPlH39S2aeO+n5gcCDmflY9fxm4FIab2ZSqzmP2zOPAT6amc9FxLU1\n61VNhoAui4hZwPHAvdWiLwHHZeb9EXEucNSUTV4HnJiZj0fEHwNnA+cDS4G3ZeaTEbEQ2B1YAPww\nMz9QHRW/f+r+M/OIJmr9B+A/Ah/OzJ/U2SYzfxARNwOPRMS3gbuBL2fmjyNiZ+Bq4KDMfCYijgH+\nIjMXRcRlwKURcR/w6OQ3jup1r6BxdLAtu9M4+t/kp9UyqaWcx22dx2Tmc3XHp+Z4OqA7xiLiWxHx\nLeCbwCrgUxGxK/C6zLwfIDOvzMy/mbLtk8AN1UQ8Ddi1Wn4tsDwiLqAx2X4AfA1YEBFfAH4XuGZ7\nis7Mw4H9gP9aHenU3e4PgQD+J7A/8H8j4ner19oNuKX6WZxP43wlmbkM+FXgw7SuhT8ETLTotSTn\ncXfmsVrITkB3jFfn1TYTERNsI5hFxI7A3wL7Z+aPIuJsGufmyMzzIuLNwO8At0bEhzPzaxGxL/Au\n4ATgXBrnAie/5rRtxIg4FHgkM5/IzPGIuBM4FPjudAONiCFgp8xcBVwPXB8RZ9BoF14EPL6Vn8Uw\njaOlIWAX4Lkpf1+njfhjNj/y3x14YrqapZqcx52Zx2ojQ0APycx/i4inI+KAzFwREefzy3OD0LgQ\n5xVgZdUWPAZ4OiJGaVw1uzQzPxsROwDvqJavzMw7I+Kb1XbDk6+srdlGPIrGeclzq0n9TuBPaw7r\nTOC4iHhPZr5YLdsbeAh4ENg1Ivar2qaHAvtURw8XAMtpvEFdFxFHVOchN9Vdp434XeAtEfFrmfkw\njXOYt9WsW5oR53HL57HayBDQexYDn46Il4Bnq+fHA1Tn224CVtD4ONLlwI00zhnOBVZExBoaF+Ys\nAd4IXB0RG2gk8T+f4Udr/qx6nXtpXBh0Z2Z+tea2nwP2AP4xItbR+CjVA8B5mflCRJwMXBsRP6/W\nPzMi3l6N+YDMfLH6aNKHgM80U3S17RLgpojYCDzc7GtIM+Q8btE8BoiIW2hczAhwV0Q8kZknNfs6\nerWhiQlPkao1YiufL5bUP5zHZfHCQEmSCmUnQJKkQtkJkCSpUIYASZIKZQiQJKlQHfmI4Pj42loX\nHoyO7syaNc9Pv2IfGbQxDdp4oOwxjY3NHar7mnXmcck/y34yaGMatPFAc2NqZh5P1VOdgOHhWd0u\noeUGbUyDNh5wTIOw33ZyTL1v0MYDnRtTT4UASZLUOYYASZIKZQiQJKlQhgBJkgrlFwi12K33PrLZ\n85GR2axfv+FV6x07b+9OlSRJ0hbZCZAkqVCGAEmSCmUIkCSpUIYASZIKZQiQJKlQhgBJkgplCJAk\nqVCGAEmSCmUIkCSpUIYASZIKZQiQJKlQhgBJkgplCJAkqVCGAEmSCmUIkCSpUIYASZIKZQiQJKlQ\nhgBJkgplCJAkqVCGAEmSCmUIkCSpUIYASZIKNVxnpYiYA9wPfAy4C7gRmAWsBhZn5oa2VShJktqi\nbifgQuCZ6vFS4KrMnAc8BJzejsIkSVJ7TRsCImIfYF/gjmrRfOC26vHtwIK2VCZJktqqTifgk8B5\nk56PTGr/PwXs1vKqJElS223zmoCIOAW4LzMfjYgtrTJUZyejozszPDyrVkFjY3NrrderRkZm11rW\nz+Ps59q3xjFNr+489mfZHwZtTIM2HujMmKa7MPAoYO+IOBrYE9gArIuIOZn5ArAHsGq6naxZ83yt\nYsbG5jI+vrbWur1q/frNr5EcGZn9qmVA345zEH5HU5U8pmbeZOrM45J/lv1k0MY0aOOB5sa0PWFh\nmyEgM0/c9DgiLgFWAgcDi4AvVf9dPuO9S5KkrpnJfQIuBk6NiHuB1wM3tLYkSZLUCbXuEwCQmZdM\nenp460uRJEmd5B0DJUkqlCFAkqRCGQIkSSpU7WsC1Fq33vtIrfWOnbd3myuRJJXKToAkSYUyBEiS\nVChDgCRJhTIESJJUKEOAJEmFMgRIklQoQ4AkSYUyBEiSVChDgCRJhTIESJJUKEOAJEmF8rsDBoDf\nQyBJmgk7AZIkFcoQIElSoQwBkiQVyhAgSVKhDAGSJBXKECBJUqEMAZIkFcoQIElSoQwBkiQVyhAg\nSVKhvG1wj6t7S2BJkpplJ0CSpEIZAiRJKpQhQJKkQnlNQBM8Py9JGiR2AiRJKlStTkBEXAbMq9b/\nBLACuBGYBawGFmfmhnYVKUmSWm/aTkBEHAbsl5kHAUcCVwJLgasycx7wEHB6W6uUJEktV+d0wD3A\nCdXjZ4ERYD5wW7XsdmBByyuTJEltNe3pgMx8GVhfPV0CfBVYOKn9/xSwW3vKkyRJ7VL70wERcQyN\nEHAE8KNJfzU03bajozszPDyr1n7GxubWLanjRkZmd3S7VmvVz7aXf0cz5ZimV3ce+7PsD4M2pkEb\nD3RmTHUvDFwIXAAcmZk/i4h1ETEnM18A9gBWbWv7NWuer1XM2NhcxsfX1lq3G9avb/7ax5GR2TPa\nrh1a8bPt9d/RTJQ8pmbeZOrM45J/lv1k0MY0aOOB5sa0PWGhzoWBuwCXA0dn5jPV4juBRdXjRcDy\nGVcgSZK6ok4n4ERgV+DmiNi07FTg8xFxFvAYcEN7ypMkSe1S58LAZcCyLfzV4a0vR5IkdYp3DJQk\nqVCGAEmSCmUIkCSpUIYASZIK5VcJF6TuVyEfO2/vNlciSeoFdgIkSSqUIUCSpEIZAiRJKpQhQJKk\nQhkCJEkqVM99OsAr2LtvW7+Dyd+K6O9AkvqbnQBJkgrVc50A9Y86XRu7BepVdbuOk7tfU/nvW/3O\nToAkSYWyEyBJXTJdN8JOg9rNToAkSYUyBEiSVChDgCRJhfKaAEkt1wufHKl79b9UMjsBkiQVaqA7\nAd59UFI2Xx2LAAAFxElEQVQ72W1Qv7MTIElSoQa6EyCpd/kZean77ARIklQoOwFqK6/L0EzZKZDa\nz06AJEmFshOgnmDHQGqPLc2tyd+M6Jwqm50ASZIKZScAP+srqX+1+/3LazMGm50ASZIKZQiQJKlQ\nfXs6wBa+JEnbx06AJEmFmnEnICI+BRwITADnZOaKllUlDSg/Ctk6JXQDSxijumtGnYCIeBfw1sw8\nCFgC/GVLq5IkSW03007Au4FbATLzgYgYjYjXZuZzrStNerV2HBlNvnHKTHnkrn61vXOqzvbtnh83\nff2H25zDzs+tm+k1AW8Cxic9H6+WSZKkPjE0MTHR9EYRsQy4IzP/rnr+HeD0zHywxfVJkqQ2mWkn\nYBWbH/nvDqze/nIkSVKnzDQEfAN4L0BE7A+sysy1LatKkiS13YxOBwBExKXAocArwIcy819bWZgk\nSWqvGYcASZLU37xjoCRJhTIESJJUqJ74AqFBuQVxRFwGzKPxc/0EsAK4EZhF49MTizNz++5K0wUR\nMQe4H/gYcBd9PqaIOAn4CLAR+BPg+/TpmCLiV4AvAqPAbOC/A/+PLozHedy7nMO9rZvzuOudgEG5\nBXFEHAbsV43jSOBKYClwVWbOAx4CTu9iidvjQuCZ6nFfjyki3gBcDPwn4GjgGPp7TKcBmZmH0fjE\nzqfpwnicxz3POdzbTqNL87jrIYAptyAGRiPitd0taUbuAU6oHj8LjADzgduqZbcDCzpf1vaJiH2A\nfYE7qkXz6e8xLQDuzMy1mbk6M8+kv8f0NPCG6vFo9Xw+nR+P87hHOYf7QtfmcS+EgIG4BXFmvpyZ\n66unS4CvAiOT2jdPAbt1pbjt80ngvEnP+31MewE7R8RtEXFvRLybPh5TZv4N8KsR8RCN/4GdT3fG\n4zzuXc7hHtfNedwLIWCqoW4XsD0i4hgabx5nT/mrvhtXRJwC3JeZj25llb4bE42a3wAcT6MFdz2b\nj6OvxhQRJwOPZ+avA78FfGbKKt0aT1/9HKcalHnsHO4P3ZzHvRACBuYWxBGxELgA+O3M/Bmwrrog\nB2APGmPtJ0cBx0TE/wbeD1xE/4/pSeCfMnNjZj4MrAXW9vGYDgG+DlDdsGt3YH0XxuM87k3O4f7Q\ntXncCyFgIG5BHBG7AJcDR2fmpgtw7gQWVY8XAcu7UdtMZeaJmXlAZh4IfJ7GlcV9PSYa/95+KyJ2\nqC4w+hX6e0wPAe8EiIg3A+uAf6Dz43Ee9yDncN/o2jzuiTsGDsItiCPiTOASYPI3KZ5KY+LtBDwG\n/H5mvtT56rZfRFwCrKSRVr9IH48pIs6i0eoF+FMaHwHryzFVHy26Dvh3ND7SdhHwAF0Yj/O4tzmH\ne1c353FPhABJktR5vXA6QJIkdYEhQJKkQhkCJEkqlCFAkqRCGQIkSSqUIUCvEhG7RcTGiPhv3a5F\nUvOcw6rLEKAtOZXG11ie1uU6JM2Mc1i1eJ8AvUpEPAh8APgCcGJm/lNE/DZwKY2vI/06cHZm7hkR\no8DVwBiwC/DJzLypO5VLAuew6rMToM1ExKE07lh1N427Vf1+RAwB1wCnVN93vcukTf4UWJ6Zv0Xj\nbnFLI2Ksw2VLqjiH1QxDgKZaAnwhMydofDvX+4B/D/zKpNvAfmXS+ocBH4iIb9H4vvKXgLd0rlxJ\nUziHVdtwtwtQ74iI19L4oorHI+L4avEsGm8Sr0xa9eVJjzcAH8zM/9OZKiVtjXNYzbIToMl+D/h2\nZu6bmb+Rmb8BnEnjIqNXIiKq9Y6ftM13aBxpEBFzIuKvIsJwKXWHc1hNMQRosiXAZ6cs+wqwL3Al\ncGtEfJ3GkcPG6u8vAd4aEd8B7gH+OTM3IqkbnMNqip8OUC0RcQzw/cx8tGoznpWZC7tdl6R6nMPa\nEls+qmsWcEtEPFc9/kCX65HUHOewXsVOgCRJhfKaAEmSCmUIkCSpUIYASZIKZQiQJKlQhgBJkgpl\nCJAkqVD/Hy4ht1Cp0SRwAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7ff74bfed8d0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# grid = sns.FacetGrid(train_df, col='Pclass', hue='Gender')\n",
"grid = sns.FacetGrid(train_df, row='Pclass', col='Sex', size=2.2, aspect=1.6)\n",
"grid.map(plt.hist, 'Age', alpha=.5, bins=20)\n",
"grid.add_legend()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "a4f166f9-f5f9-1819-66c3-d89dd5b0d8ff"
},
"source": [
"Let us start by preparing an empty array to contain guessed Age values based on Pclass x Gender combinations."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"_cell_guid": "9299523c-dcf1-fb00-e52f-e2fb860a3920"
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0., 0., 0.],\n",
" [ 0., 0., 0.]])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"guess_ages = np.zeros((2,3))\n",
"guess_ages"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "ec9fed37-16b1-5518-4fa8-0a7f579dbc82"
},
"source": [
"Now we iterate over Sex (0 or 1) and Pclass (1, 2, 3) to calculate guessed values of Age for the six combinations."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"_cell_guid": "a4015dfa-a0ab-65bc-0cbe-efecf1eb2569"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>22</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>38</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>26</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>35</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>35</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Fare Embarked Title\n",
"0 0 3 0 22 1 0 7.2500 S 1\n",
"1 1 1 1 38 1 0 71.2833 C 3\n",
"2 1 3 1 26 0 0 7.9250 S 2\n",
"3 1 1 1 35 1 0 53.1000 S 3\n",
"4 0 3 0 35 0 0 8.0500 S 1"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" for i in range(0, 2):\n",
" for j in range(0, 3):\n",
" guess_df = dataset[(dataset['Sex'] == i) & \\\n",
" (dataset['Pclass'] == j+1)]['Age'].dropna()\n",
"\n",
" # age_mean = guess_df.mean()\n",
" # age_std = guess_df.std()\n",
" # age_guess = rnd.uniform(age_mean - age_std, age_mean + age_std)\n",
"\n",
" age_guess = guess_df.median()\n",
"\n",
" # Convert random age float to nearest .5 age\n",
" guess_ages[i,j] = int( age_guess/0.5 + 0.5 ) * 0.5\n",
" \n",
" for i in range(0, 2):\n",
" for j in range(0, 3):\n",
" dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) & (dataset.Pclass == j+1),\\\n",
" 'Age'] = guess_ages[i,j]\n",
"\n",
" dataset['Age'] = dataset['Age'].astype(int)\n",
"\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "dbe0a8bf-40bc-c581-e10e-76f07b3b71d4"
},
"source": [
"Let us create Age bands and determine correlations with Survived."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"_cell_guid": "725d1c84-6323-9d70-5812-baf9994d3aa1"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>AgeBand</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>(-0.08, 16]</td>\n",
" <td>0.550000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>(16, 32]</td>\n",
" <td>0.337374</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>(32, 48]</td>\n",
" <td>0.412037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>(48, 64]</td>\n",
" <td>0.434783</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>(64, 80]</td>\n",
" <td>0.090909</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" AgeBand Survived\n",
"0 (-0.08, 16] 0.550000\n",
"1 (16, 32] 0.337374\n",
"2 (32, 48] 0.412037\n",
"3 (48, 64] 0.434783\n",
"4 (64, 80] 0.090909"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df['AgeBand'] = pd.cut(train_df['Age'], 5)\n",
"train_df[['AgeBand', 'Survived']].groupby(['AgeBand'], as_index=False).mean().sort_values(by='AgeBand', ascending=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "ba4be3a0-e524-9c57-fbec-c8ecc5cde5c6"
},
"source": [
"Let us replace Age with ordinals based on these bands."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"_cell_guid": "797b986d-2c45-a9ee-e5b5-088de817c8b2"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>AgeBand</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" <td>(16, 32]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" <td>(32, 48]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" <td>(16, 32]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" <td>(32, 48]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" <td>(32, 48]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Fare Embarked Title AgeBand\n",
"0 0 3 0 1 1 0 7.2500 S 1 (16, 32]\n",
"1 1 1 1 2 1 0 71.2833 C 3 (32, 48]\n",
"2 1 3 1 1 0 0 7.9250 S 2 (16, 32]\n",
"3 1 1 1 2 1 0 53.1000 S 3 (32, 48]\n",
"4 0 3 0 2 0 0 8.0500 S 1 (32, 48]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine: \n",
" dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0\n",
" dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1\n",
" dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2\n",
" dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3\n",
" dataset.loc[ dataset['Age'] > 64, 'Age']\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "004568b6-dd9a-ff89-43d5-13d4e9370b1d"
},
"source": [
"We can not remove the AgeBand feature."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"_cell_guid": "875e55d4-51b0-5061-b72c-8a23946133a3"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age SibSp Parch Fare Embarked Title\n",
"0 0 3 0 1 1 0 7.2500 S 1\n",
"1 1 1 1 2 1 0 71.2833 C 3\n",
"2 1 3 1 1 0 0 7.9250 S 2\n",
"3 1 1 1 2 1 0 53.1000 S 3\n",
"4 0 3 0 2 0 0 8.0500 S 1"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df = train_df.drop(['AgeBand'], axis=1)\n",
"combine = [train_df, test_df]\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "1c237b76-d7ac-098f-0156-480a838a64a9"
},
"source": [
"### Create new feature combining existing features\n",
"\n",
"We can create a new feature for FamilySize which combines Parch and SibSp. This will enable us to drop Parch and SibSp from our datasets."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"_cell_guid": "7e6c04ed-cfaa-3139-4378-574fd095d6ba"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>FamilySize</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>0.724138</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>0.578431</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>0.552795</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>7</td>\n",
" <td>0.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0.303538</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0.200000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6</td>\n",
" <td>0.136364</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>8</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>11</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" FamilySize Survived\n",
"3 4 0.724138\n",
"2 3 0.578431\n",
"1 2 0.552795\n",
"6 7 0.333333\n",
"0 1 0.303538\n",
"4 5 0.200000\n",
"5 6 0.136364\n",
"7 8 0.000000\n",
"8 11 0.000000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1\n",
"\n",
"train_df[['FamilySize', 'Survived']].groupby(['FamilySize'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "842188e6-acf8-2476-ccec-9e3451e4fa86"
},
"source": [
"We can create another feature called IsAlone."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"_cell_guid": "5c778c69-a9ae-1b6b-44fe-a0898d07be7a"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>IsAlone</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>0.505650</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0.303538</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" IsAlone Survived\n",
"0 0 0.505650\n",
"1 1 0.303538"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['IsAlone'] = 0\n",
" dataset.loc[dataset['FamilySize'] == 1, 'IsAlone'] = 1\n",
"\n",
"train_df[['IsAlone', 'Survived']].groupby(['IsAlone'], as_index=False).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "e6b87c09-e7b2-f098-5b04-4360080d26bc"
},
"source": [
"Let us drop Parch, SibSp, and FamilySize features in favor of IsAlone."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"_cell_guid": "74ee56a6-7357-f3bc-b605-6c41f8aa6566"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>IsAlone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>7.2500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>71.2833</td>\n",
" <td>C</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>7.9250</td>\n",
" <td>S</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>53.1000</td>\n",
" <td>S</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>8.0500</td>\n",
" <td>S</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age Fare Embarked Title IsAlone\n",
"0 0 3 0 1 7.2500 S 1 0\n",
"1 1 1 1 2 71.2833 C 3 0\n",
"2 1 3 1 1 7.9250 S 2 1\n",
"3 1 1 1 2 53.1000 S 3 0\n",
"4 0 3 0 2 8.0500 S 1 1"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df = train_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)\n",
"test_df = test_df.drop(['Parch', 'SibSp', 'FamilySize'], axis=1)\n",
"combine = [train_df, test_df]\n",
"\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "f890b730-b1fe-919e-fb07-352fbd7edd44"
},
"source": [
"We can also create an artificial feature combining Pclass and Age."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"_cell_guid": "305402aa-1ea1-c245-c367-056eef8fe453"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age*Class</th>\n",
" <th>Age</th>\n",
" <th>Pclass</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>6</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age*Class Age Pclass\n",
"0 3 1 3\n",
"1 2 2 1\n",
"2 3 1 3\n",
"3 2 2 1\n",
"4 6 2 3\n",
"5 3 1 3\n",
"6 3 3 1\n",
"7 0 0 3\n",
"8 3 1 3\n",
"9 0 0 2"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Age*Class'] = dataset.Age * dataset.Pclass\n",
"\n",
"train_df.loc[:, ['Age*Class', 'Age', 'Pclass']].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "13292c1b-020d-d9aa-525c-941331bb996a"
},
"source": [
"### Completing a categorical feature\n",
"\n",
"Embarked feature takes S, Q, C values based on port of embarkation. Our training dataset has two missing values. We simply fill these with the most common occurance."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"_cell_guid": "bf351113-9b7f-ef56-7211-e8dd00665b18"
},
"outputs": [
{
"data": {
"text/plain": [
"'S'"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"freq_port = train_df.Embarked.dropna().mode()[0]\n",
"freq_port"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"_cell_guid": "51c21fcc-f066-cd80-18c8-3d140be6cbae"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Embarked</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>C</td>\n",
" <td>0.553571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Q</td>\n",
" <td>0.389610</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>S</td>\n",
" <td>0.339009</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Embarked Survived\n",
"0 C 0.553571\n",
"1 Q 0.389610\n",
"2 S 0.339009"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Embarked'] = dataset['Embarked'].fillna(freq_port)\n",
" \n",
"train_df[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean().sort_values(by='Survived', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "f6acf7b2-0db3-e583-de50-7e14b495de34"
},
"source": [
"### Converting categorical feature to numeric\n",
"\n",
"We can now convert the EmbarkedFill feature by creating a new numeric Port feature."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"_cell_guid": "89a91d76-2cc0-9bbb-c5c5-3c9ecae33c66"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>IsAlone</th>\n",
" <th>Age*Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age Fare Embarked Title IsAlone Age*Class\n",
"0 0 3 0 1 7.2500 0 1 0 3\n",
"1 1 1 1 2 71.2833 1 3 0 2\n",
"2 1 3 1 1 7.9250 0 2 1 3\n",
"3 1 1 1 2 53.1000 0 3 0 2\n",
"4 0 3 0 2 8.0500 0 1 1 6"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)\n",
"\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "e3dfc817-e1c1-a274-a111-62c1c814cecf"
},
"source": [
"### Quick completing and converting a numeric feature\n",
"\n",
"We can now complete the Fare feature for single missing value in test dataset using mode to get the value that occurs most frequently for this feature. We do this in a single line of code.\n",
"\n",
"Note that we are not creating an intermediate new feature or doing any further analysis for correlation to guess missing feature as we are replacing only a single value. The completion goal achieves desired requirement for model algorithm to operate on non-null values.\n",
"\n",
"We may also want round off the fare to two decimals as it represents currency."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"_cell_guid": "3600cb86-cf5f-d87b-1b33-638dc8db1564"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>IsAlone</th>\n",
" <th>Age*Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>892</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>7.8292</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>893</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>7.0000</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>894</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>9.6875</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>895</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>8.6625</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>896</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12.2875</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Pclass Sex Age Fare Embarked Title IsAlone Age*Class\n",
"0 892 3 0 2 7.8292 2 1 1 6\n",
"1 893 3 1 2 7.0000 0 3 0 6\n",
"2 894 2 0 3 9.6875 2 1 1 6\n",
"3 895 3 0 1 8.6625 0 1 1 3\n",
"4 896 3 1 1 12.2875 0 3 0 3"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_df['Fare'].fillna(test_df['Fare'].dropna().median(), inplace=True)\n",
"test_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "4b816bc7-d1fb-c02b-ed1d-ee34b819497d"
},
"source": [
"We can not create FareBand."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"_cell_guid": "0e9018b1-ced5-9999-8ce1-258a0952cbf2"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>FareBand</th>\n",
" <th>Survived</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>[0, 7.91]</td>\n",
" <td>0.197309</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>(7.91, 14.454]</td>\n",
" <td>0.303571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>(14.454, 31]</td>\n",
" <td>0.454955</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>(31, 512.329]</td>\n",
" <td>0.581081</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" FareBand Survived\n",
"0 [0, 7.91] 0.197309\n",
"1 (7.91, 14.454] 0.303571\n",
"2 (14.454, 31] 0.454955\n",
"3 (31, 512.329] 0.581081"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_df['FareBand'] = pd.qcut(train_df['Fare'], 4)\n",
"train_df[['FareBand', 'Survived']].groupby(['FareBand'], as_index=False).mean().sort_values(by='FareBand', ascending=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "d65901a5-3684-6869-e904-5f1a7cce8a6d"
},
"source": [
"Convert the Fare feature to ordinal values based on the FareBand."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"_cell_guid": "385f217a-4e00-76dc-1570-1de4eec0c29c"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>IsAlone</th>\n",
" <th>Age*Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age Fare Embarked Title IsAlone Age*Class\n",
"0 0 3 0 1 0 0 1 0 3\n",
"1 1 1 1 2 3 1 3 0 2\n",
"2 1 3 1 1 1 0 2 1 3\n",
"3 1 1 1 2 3 0 3 0 2\n",
"4 0 3 0 2 1 0 1 1 6\n",
"5 0 3 0 1 1 2 1 1 3\n",
"6 0 1 0 3 3 0 1 1 3\n",
"7 0 3 0 0 2 0 4 0 0\n",
"8 1 3 1 1 1 0 3 0 3\n",
"9 1 2 1 0 2 1 3 0 0"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for dataset in combine:\n",
" dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] = 0\n",
" dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1\n",
" dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31), 'Fare'] = 2\n",
" dataset.loc[ dataset['Fare'] > 31, 'Fare'] = 3\n",
" dataset['Fare'] = dataset['Fare'].astype(int)\n",
"\n",
"train_df = train_df.drop(['FareBand'], axis=1)\n",
"combine = [train_df, test_df]\n",
" \n",
"train_df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "27272bb9-3c64-4f9a-4a3b-54f02e1c8289"
},
"source": [
"And the test dataset."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"_cell_guid": "d2334d33-4fe5-964d-beac-6aa620066e15"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>Title</th>\n",
" <th>IsAlone</th>\n",
" <th>Age*Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>892</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>893</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>894</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>895</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>896</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>897</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>898</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>899</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>900</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>901</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Pclass Sex Age Fare Embarked Title IsAlone Age*Class\n",
"0 892 3 0 2 0 2 1 1 6\n",
"1 893 3 1 2 0 0 3 0 6\n",
"2 894 2 0 3 1 2 1 1 6\n",
"3 895 3 0 1 1 0 1 1 3\n",
"4 896 3 1 1 1 0 3 0 3\n",
"5 897 3 0 0 1 0 1 1 0\n",
"6 898 3 1 1 0 2 2 1 3\n",
"7 899 2 0 1 2 0 1 0 2\n",
"8 900 3 1 1 0 1 3 1 3\n",
"9 901 3 0 1 2 0 1 0 3"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "69783c08-c8cc-a6ca-2a9a-5e75581c6d31"
},
"source": [
"## Model, predict and solve\n",
"\n",
"Now we are ready to train a model and predict the required solution. There are 60+ predictive modelling algorithms to choose from. We must understand the type of problem and solution requirement to narrow down to a select few models which we can evaluate. Our problem is a classification and regression problem. We want to identify relationship between output (Survived or not) with other variables or features (Gender, Age, Port...). We are also perfoming a category of machine learning which is called supervised learning as we are training our model with a given dataset. With these two criteria - Supervised Learning plus Classification and Regression, we can narrow down our choice of models to a few. These include:\n",
"\n",
"- Logistic Regression\n",
"- KNN or k-Nearest Neighbors\n",
"- Support Vector Machines\n",
"- Naive Bayes classifier\n",
"- Decision Tree\n",
"- Random Forrest\n",
"- Perceptron\n",
"- Artificial neural network\n",
"- RVM or Relevance Vector Machine"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"_cell_guid": "0acf54f9-6cf5-24b5-72d9-29b30052823a"
},
"outputs": [
{
"data": {
"text/plain": [
"((891, 8), (891,), (418, 8))"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train = train_df.drop(\"Survived\", axis=1)\n",
"Y_train = train_df[\"Survived\"]\n",
"X_test = test_df.drop(\"PassengerId\", axis=1).copy()\n",
"X_train.shape, Y_train.shape, X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "579bc004-926a-bcfe-e9bb-c8df83356876"
},
"source": [
"Logistic Regression is a useful model to run early in the workflow. Logistic regression measures the relationship between the categorical dependent variable (feature) and one or more independent variables (features) by estimating probabilities using a logistic function, which is the cumulative logistic distribution. Reference [Wikipedia](https://en.wikipedia.org/wiki/Logistic_regression).\n",
"\n",
"Note the confidence score generated by the model based on our training dataset."
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"_cell_guid": "0edd9322-db0b-9c37-172d-a3a4f8dec229"
},
"outputs": [
{
"data": {
"text/plain": [
"80.359999999999999"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Logistic Regression\n",
"\n",
"logreg = LogisticRegression()\n",
"logreg.fit(X_train, Y_train)\n",
"Y_pred = logreg.predict(X_test)\n",
"acc_log = round(logreg.score(X_train, Y_train) * 100, 2)\n",
"acc_log"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "3af439ae-1f04-9236-cdc2-ec8170a0d4ee"
},
"source": [
"We can use Logistic Regression to validate our assumptions and decisions for feature creating and completing goals. This can be done by calculating the coefficient of the features in the decision function.\n",
"\n",
"Positive coefficients increase the log-odds of the response (and thus increase the probability), and negative coefficients decrease the log-odds of the response (and thus decrease the probability).\n",
"\n",
"- Sex is highest positivie coefficient, implying as the Sex value increases (male: 0 to female: 1), the probability of Survived=1 increases the most.\n",
"- Inversely as Pclass increases, probability of Survived=1 decreases the most.\n",
"- This way Age*Class is a good artificial feature to model as it has second highest negative correlation with Survived.\n",
"- So is Title as second highest positive correlation."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"_cell_guid": "e545d5aa-4767-7a41-5799-a4c5e529ce72"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Feature</th>\n",
" <th>Correlation</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Sex</td>\n",
" <td>2.201527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Title</td>\n",
" <td>0.398234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Age</td>\n",
" <td>0.287163</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Embarked</td>\n",
" <td>0.261762</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>IsAlone</td>\n",
" <td>0.129140</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Fare</td>\n",
" <td>-0.085150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Age*Class</td>\n",
" <td>-0.311200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Pclass</td>\n",
" <td>-0.749007</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Feature Correlation\n",
"1 Sex 2.201527\n",
"5 Title 0.398234\n",
"2 Age 0.287163\n",
"4 Embarked 0.261762\n",
"6 IsAlone 0.129140\n",
"3 Fare -0.085150\n",
"7 Age*Class -0.311200\n",
"0 Pclass -0.749007"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coeff_df = pd.DataFrame(train_df.columns.delete(0))\n",
"coeff_df.columns = ['Feature']\n",
"coeff_df[\"Correlation\"] = pd.Series(logreg.coef_[0])\n",
"\n",
"coeff_df.sort_values(by='Correlation', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "ac041064-1693-8584-156b-66674117e4d0"
},
"source": [
"Next we model using Support Vector Machines which are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training samples, each marked as belonging to one or the other of **two categories**, an SVM training algorithm builds a model that assigns new test samples to one category or the other, making it a non-probabilistic binary linear classifier. Reference [Wikipedia](https://en.wikipedia.org/wiki/Support_vector_machine).\n",
"\n",
"Note that the model generates a confidence score which is higher than Logistics Regression model."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"_cell_guid": "7a63bf04-a410-9c81-5310-bdef7963298f"
},
"outputs": [
{
"data": {
"text/plain": [
"83.840000000000003"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Support Vector Machines\n",
"\n",
"svc = SVC()\n",
"svc.fit(X_train, Y_train)\n",
"Y_pred = svc.predict(X_test)\n",
"acc_svc = round(svc.score(X_train, Y_train) * 100, 2)\n",
"acc_svc"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "172a6286-d495-5ac4-1a9c-5b77b74ca6d2"
},
"source": [
"In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. A sample is classified by a majority vote of its neighbors, with the sample being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor. Reference [Wikipedia](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm).\n",
"\n",
"KNN confidence score is better than Logistics Regression but worse than SVM."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"_cell_guid": "ca14ae53-f05e-eb73-201c-064d7c3ed610"
},
"outputs": [
{
"data": {
"text/plain": [
"84.739999999999995"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"knn = KNeighborsClassifier(n_neighbors = 3)\n",
"knn.fit(X_train, Y_train)\n",
"Y_pred = knn.predict(X_test)\n",
"acc_knn = round(knn.score(X_train, Y_train) * 100, 2)\n",
"acc_knn"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "810f723d-2313-8dfd-e3e2-26673b9caa90"
},
"source": [
"In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features) in a learning problem. Reference [Wikipedia](https://en.wikipedia.org/wiki/Naive_Bayes_classifier).\n",
"\n",
"The model generated confidence score is the lowest among the models evaluated so far."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"_cell_guid": "50378071-7043-ed8d-a782-70c947520dae"
},
"outputs": [
{
"data": {
"text/plain": [
"72.280000000000001"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Gaussian Naive Bayes\n",
"\n",
"gaussian = GaussianNB()\n",
"gaussian.fit(X_train, Y_train)\n",
"Y_pred = gaussian.predict(X_test)\n",
"acc_gaussian = round(gaussian.score(X_train, Y_train) * 100, 2)\n",
"acc_gaussian"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "1e286e19-b714-385a-fcfa-8cf5ec19956a"
},
"source": [
"The perceptron is an algorithm for supervised learning of binary classifiers (functions that can decide whether an input, represented by a vector of numbers, belongs to some specific class or not). It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. The algorithm allows for online learning, in that it processes elements in the training set one at a time. Reference [Wikipedia](https://en.wikipedia.org/wiki/Perceptron)."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"_cell_guid": "ccc22a86-b7cb-c2dd-74bd-53b218d6ed0d"
},
"outputs": [
{
"data": {
"text/plain": [
"78.0"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Perceptron\n",
"\n",
"perceptron = Perceptron()\n",
"perceptron.fit(X_train, Y_train)\n",
"Y_pred = perceptron.predict(X_test)\n",
"acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)\n",
"acc_perceptron"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"_cell_guid": "a4d56857-9432-55bb-14c0-52ebeb64d198"
},
"outputs": [
{
"data": {
"text/plain": [
"79.010000000000005"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Linear SVC\n",
"\n",
"linear_svc = LinearSVC()\n",
"linear_svc.fit(X_train, Y_train)\n",
"Y_pred = linear_svc.predict(X_test)\n",
"acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)\n",
"acc_linear_svc"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"_cell_guid": "dc98ed72-3aeb-861f-804d-b6e3d178bf4b"
},
"outputs": [
{
"data": {
"text/plain": [
"77.329999999999998"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Stochastic Gradient Descent\n",
"\n",
"sgd = SGDClassifier()\n",
"sgd.fit(X_train, Y_train)\n",
"Y_pred = sgd.predict(X_test)\n",
"acc_sgd = round(sgd.score(X_train, Y_train) * 100, 2)\n",
"acc_sgd"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "bae7f8d7-9da0-f4fd-bdb1-d97e719a18d7"
},
"source": [
"This model uses a decision tree as a predictive model which maps features (tree branches) to conclusions about the target value (tree leaves). Tree models where the target variable can take a finite set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Reference [Wikipedia](https://en.wikipedia.org/wiki/Decision_tree_learning).\n",
"\n",
"The model confidence score is the highest among models evaluated so far."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"_cell_guid": "dd85f2b7-ace2-0306-b4ec-79c68cd3fea0"
},
"outputs": [
{
"data": {
"text/plain": [
"86.760000000000005"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Decision Tree\n",
"\n",
"decision_tree = DecisionTreeClassifier()\n",
"decision_tree.fit(X_train, Y_train)\n",
"Y_pred = decision_tree.predict(X_test)\n",
"acc_decision_tree = round(decision_tree.score(X_train, Y_train) * 100, 2)\n",
"acc_decision_tree"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "85693668-0cd5-4319-7768-eddb62d2b7d0"
},
"source": [
"The next model Random Forests is one of the most popular. Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees (n_estimators=100) at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Reference [Wikipedia](https://en.wikipedia.org/wiki/Random_forest).\n",
"\n",
"The model confidence score is the highest among models evaluated so far. We decide to use this model's output (Y_pred) for creating our competition submission of results."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"_cell_guid": "f0694a8e-b618-8ed9-6f0d-8c6fba2c4567"
},
"outputs": [
{
"data": {
"text/plain": [
"86.760000000000005"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Random Forest\n",
"\n",
"random_forest = RandomForestClassifier(n_estimators=100)\n",
"random_forest.fit(X_train, Y_train)\n",
"Y_pred = random_forest.predict(X_test)\n",
"random_forest.score(X_train, Y_train)\n",
"acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)\n",
"acc_random_forest"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "f6c9eef8-83dd-581c-2d8e-ce932fe3a44d"
},
"source": [
"### Model evaluation\n",
"\n",
"We can now rank our evaluation of all the models to choose the best one for our problem. While both Decision Tree and Random Forest score the same, we choose to use Random Forest as they correct for decision trees' habit of overfitting to their training set. "
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"_cell_guid": "1f3cebe0-31af-70b2-1ce4-0fd406bcdfc6"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Model</th>\n",
" <th>Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Random Forest</td>\n",
" <td>86.76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Decision Tree</td>\n",
" <td>86.76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>KNN</td>\n",
" <td>84.74</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Support Vector Machines</td>\n",
" <td>83.84</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Logistic Regression</td>\n",
" <td>80.36</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Linear SVC</td>\n",
" <td>79.01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Perceptron</td>\n",
" <td>78.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Stochastic Gradient Decent</td>\n",
" <td>77.33</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Naive Bayes</td>\n",
" <td>72.28</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Model Score\n",
"3 Random Forest 86.76\n",
"8 Decision Tree 86.76\n",
"1 KNN 84.74\n",
"0 Support Vector Machines 83.84\n",
"2 Logistic Regression 80.36\n",
"7 Linear SVC 79.01\n",
"5 Perceptron 78.00\n",
"6 Stochastic Gradient Decent 77.33\n",
"4 Naive Bayes 72.28"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"models = pd.DataFrame({\n",
" 'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression', \n",
" 'Random Forest', 'Naive Bayes', 'Perceptron', \n",
" 'Stochastic Gradient Decent', 'Linear SVC', \n",
" 'Decision Tree'],\n",
" 'Score': [acc_svc, acc_knn, acc_log, \n",
" acc_random_forest, acc_gaussian, acc_perceptron, \n",
" acc_sgd, acc_linear_svc, acc_decision_tree]})\n",
"models.sort_values(by='Score', ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"_cell_guid": "28854d36-051f-3ef0-5535-fa5ba6a9bef7"
},
"outputs": [],
"source": [
"submission = pd.DataFrame({\n",
" \"PassengerId\": test_df[\"PassengerId\"],\n",
" \"Survived\": Y_pred\n",
" })\n",
"# submission.to_csv('../output/submission.csv', index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "fcfc8d9f-e955-cf70-5843-1fb764c54699"
},
"source": [
"Our submission to the competition site Kaggle results in scoring 3,883 of 6,082 competition entries. This result is indicative while the competition is running. This result only accounts for part of the submission dataset. Not bad for our first attempt. Any suggestions to improve our score are most welcome."
]
},
{
"cell_type": "markdown",
"metadata": {
"_cell_guid": "aeec9210-f9d8-cd7c-c4cf-a87376d5f693"
},
"source": [
"## References\n",
"\n",
"This notebook has been created based on great work done solving the Titanic competition and other sources.\n",
"\n",
"- [A journey through Titanic](https://www.kaggle.com/omarelgabry/titanic/a-journey-through-titanic)\n",
"- [Getting Started with Pandas: Kaggle's Titanic Competition](https://www.kaggle.com/c/titanic/details/getting-started-with-random-forests)\n",
"- [Titanic Best Working Classifier](https://www.kaggle.com/sinakhorami/titanic/titanic-best-working-classifier)"
]
}
],
"metadata": {
"_change_revision": 3,
"_is_fork": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment