Skip to content

Instantly share code, notes, and snippets.

@Z30G0D
Last active December 31, 2017 10:07
Show Gist options
  • Save Z30G0D/62196e42a52fb43902a0961f0686a251 to your computer and use it in GitHub Desktop.
Save Z30G0D/62196e42a52fb43902a0961f0686a251 to your computer and use it in GitHub Desktop.
This file describes the titanic competition
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Titanic - Kaggle Competition\n",
"## Hey all!\n",
"This will be my first Python notebook depicting my code for the kaggle contest \"Titanic\".\n",
"The competition page is __[Here](https://www.kaggle.com/c/titanic#evaluation)__\n",
"\n",
"I am still in my early stages of my Machine Learning journey, so this could be a nice tutorial for people who are at my stage.\n",
"\n",
"The target of this competition is to classify correctly the people who survived or did not survive the disaster.\n",
"\n",
"We will start by importing the relevent packages to our code and examine our data."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from IPython.display import Image\n",
"from IPython.core.display import HTML \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will first load our data, and see how it is represented. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"train = pd.read_csv('train.csv')\n",
"test = pd.read_csv('test.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train.head(n=5)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((891, 12), (418, 11))"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train.shape, test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see we have a training set containing 12 features and 891 samples and a test set containing 418 samples and 11 features (no \"Survived\" column).\n",
"\n",
"Let's visualize the data to see the features impact and choose our desired features for later on.\n",
"From what I understood from most partcipants is that the key for this competition is to handle the features well enough in order to help the classifying algorithm to obtain decent results."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def Visualize_Sex(train):\n",
" survived = train[train['Survived'] == 1]['Sex'].value_counts()\n",
" not_survived = train[train['Survived'] == 0]['Sex'].value_counts()\n",
" df = pd.DataFrame([survived, not_survived])\n",
" df.index = ['Survived', 'Not survived']\n",
" df.plot(kind='barh',color=['r', 'b'])\n",
" plt.show()\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAacAAAD8CAYAAADT0WsYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAEe5JREFUeJzt3XuQnXV9x/H3JxKJkQjlYgsETZQMCRAuEkSg03KpQpVbKShtnELBUgZasYMwAoaJMOpYY1VQ1FBsgAG5CqQMKoJkELEkWXIhFBGCAaO0QJRIUgNJ+PWP8wQ3yYbsbnb3/JJ9v2bO7PP8zu8853u+k+wnz++cnCelFCRJqsmQdhcgSdK6DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdbZqdwGbqx133LGMGjWq3WVI0majo6PjxVLKTt2Zazj10qhRo5g9e3a7y5CkzUaSZ7o712U9SVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnW2ancBm6uODkjaXYUkDZxSBu65PHOSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFVno+GUpCT5Uqf9TyaZvJHHnJBkzz6or9uS3J1kuz44zuQkn+yLmiRJvdOdM6dXgBOT7NiD454A9Hk4JXnThu4rpXywlPJSXz+nJGngdSecVgFTgX9Z944k70xyX5L5zc93JDkEOA74YpK5Sd69zmNOTrIgybwkDzRjpyX5Wqc5dyU5rNleluTSJA8DFyW5udO8w5L8Z7O9KMmOSb6Q5OxOcyYnOa/ZPj/JrKbez3Sac3GSJ5LcC+zRjZ5IkvrRVt2c93VgfpJ/XWf8a8C1pZRrkpwOXF5KOSHJdOCuUsqtXRzrEuCoUsqvurkM91ZgQSnlkiRbAU8neWspZTnwEeCmdebfCHwFuLLZ/zBwdJIPAGOA9wIBpif5M2A5cAqwP61+PAJ0dKMuSVI/6VY4lVJ+l+Ra4OPA7zvddTBwYrN9HbBueHXlJ8C05gzou92Yvxq4raljVZLvA8cmuRX4EHDBOrXOSfL2JLsAOwG/LaU8m+TjwAeAOc3UbWiF1Qjg9lLK/wE0wdqlJGcCZwK8A3iGdKP8zVAp7a5A0iDX3TMnaJ2NPAL8xxvM2ehvtVLKWUkOohUsc5PsR2vpsPMS47BO2ytKKas77d8EnAP8BphVSnm5i6e5FTgJ+BNaZ1LQOlv6fCnlW50nJvlEd+puap9Ka4mTCYm/wSWpn3T7o+SllN8ANwNndBp+iNaSGMBE4MFm+2VaZyTrSfLuUsrDpZRLgBeB3YBFwH5JhiTZjdbS24bMAN4D/APrL+mtcWNT10m0ggrgB8DpSbZp6tg1yduBB4C/SvKWJCOAY9/guSVJA6AnZ04AXwL+qdP+x4FvJzkfeAH4+2b8RuCqZintpFLKwk6P+WKSMbTOZO4D5jXjvwAeBRbQOkPrUilldZK7gNOAUzcw57EmaH5VSnmuGbsnyTjgp0kAlgEfLaU8kuQmYC7wDPDjbnVCktRvUnx/oVcmJGV2u4voL/6ZkNQPknSUUiZ0Z67fECFJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqs5W7S5gs3XAATB7drurkKQtkmdOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOps1e4CNlcdHZC0u4r6ldLuCiRtjjxzkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFXHcJIkVcdwkiRVx3CSJFVnwMMpycVJHksyP8ncJAf1wTGPS/KpPqpvWV8cR5LUe1sN5JMlORg4BnhPKeWVJDsCb+7mY7cqpazq6r5SynRget9VKklqp4E+c9oZeLGU8gpAKeXFUsqvkyxqgookE5LMaLYnJ5ma5B7g2iQPJ9lrzcGSzEhyQJLTknwtybbNsYY09w9P8sskQ5O8O8n3k3Qk+XGSsc2c0Ul+mmRWkssGuB+SpC4MdDjdA+yW5OdJrkzy5914zAHA8aWUvwVuBD4MkGRnYJdSSseaiaWUpcA8YM1xjwV+UEpZCUwF/rmUcgDwSeDKZs5XgW+UUg4E/ueNCklyZpLZSWa/gw4K8baRG/HWpzdpkBjQcCqlLKMVNmcCLwA3JTltIw+bXkr5fbN9M3Bys/1h4JYu5t8EfKTZPqV5jm2AQ4BbkswFvkXrLA7gUOA7zfZ1G6l/aillQillwk4bKVqS1HsD+p4TQCllNTADmJHkUeBUYBV/CMph6zxkeafH/irJkiT70Aqgf+ziKaYDn0+yPa0g/BHwVuClUsp+Gyqrly9HktQPBvTMKckeScZ0GtoPeAZYRCtIAP56I4e5EbgA2LaU8ui6dzZnZzNpLdfdVUpZXUr5HfCLJCc3dSTJvs1DfkLrDAtgYs9flSSprw30e07bANck+e8k84E9gcnAZ4CvJvkxsHojx7iVVpjc/AZzbgI+2vxcYyJwRpJ5wGPA8c34ucA5SWYB2/bs5UiS+kNKcUWrNyYkZXa7i9Dg499XbcaSdJRSJnRnrt8QIUmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSaqO4SRJqo7hJEmqjuEkSarOgF8yQ5K2JCtXrmTx4sWsWLGi3aVUY9iwYYwcOZKhQ4f2+hiGkyRtgsWLFzNixAhGjRpFvFoxpRSWLFnC4sWLGT16dK+P47KeJG2CFStWsMMOOxhMjSTssMMOm3wmaThJ0iYymNbWF/0wnCRpM3f55Zczbtw4Jk7sn4t5T548mSlTpvTLsTfE95wkqS/19VlUNy4weeWVV/K9731vk97jqY3hJEmbsbPOOounn36a4447jlNOOYWFCxfy6KOPsmrVKiZPnszxxx/PtGnTuOOOO1i9ejULFizgvPPO49VXX+W6665j66235u6772b77bfnqquuYurUqbz66qvsvvvuXHfddQwfPnyt51u4cCHnnHMOL7zwAsOHD+eqq65i7Nixff66XNaTpM3YN7/5TXbZZRfuv/9+li9fzhFHHMGsWbO4//77Of/881m+fDkACxYs4IYbbmDmzJlcfPHFDB8+nDlz5nDwwQdz7bXXAnDiiScya9Ys5s2bx7hx47j66qvXe74zzzyTK664go6ODqZMmcLZZ5/dL6/LMydJ2kLcc889TJ8+/fX3h1asWMGzzz4LwOGHH86IESMYMWIE2267LcceeywA48ePZ/78+UArwD796U/z0ksvsWzZMo466qi1jr9s2TIeeughTj755NfHXnnllX55LYaTJG0hSincdttt7LHHHmuNP/zww2y99dav7w8ZMuT1/SFDhrBq1SoATjvtNO644w723Xdfpk2bxowZM9Y6zmuvvcZ2223H3Llz+/eF4LKeJG0xjjrqKK644gpK8yGKOXPm9OjxL7/8MjvvvDMrV67k+uuvX+/+t73tbYwePZpbbrkFaIXhvHnzNr3wLhhOvXXAAa1P0XjzNpA36Q1MmjSJlStXss8++7D33nszadKkHj3+sssu46CDDuL973//Bj/kcP3113P11Vez7777stdee3HnnXf2RenrSfEPfK9MmDChzJ49u91lSGqzxx9/nHHjxrW7jOp01ZckHaWUCd15vGdOkqTqGE6SpOoYTpKk6hhOkqTqGE6SpOoYTpKk6hhOkjSIzZgxg2OOOabdZazHry+SpD7UhitmbJE8c5KkzdyiRYsYO3YsH/vYx9h7772ZOHEi9957L4ceeihjxoxh5syZzJw5k0MOOYT999+fQw45hCeeeGK94yxfvpzTTz+dAw88kP3337/fvv2hOwwnSdoCPPXUU5x77rnMnz+fn/3sZ9xwww08+OCDTJkyhc997nOMHTuWBx54gDlz5nDppZdy0UUXrXeMz372sxu85MZAc1lPkrYAo0ePZvz48QDstddeHHnkkSRh/PjxLFq0iKVLl3Lqqafy5JNPkoSVK1eud4wNXXKjHV/PZDhJ0hZgY5fEmDRpEocffji33347ixYt4rDDDlvvGBu65EY7uKwnSYPA0qVL2XXXXQGYNm1al3M29ZIbfclwkqRB4IILLuDCCy/k0EMPZfXq1V3O2dRLbvQlL5nRS14yQxJ4yYwN8ZIZkqQtjuEkSaqO4SRJqo7hJEmbyPfu19YX/TCcJGkTDBs2jCVLlhhQjVIKS5YsYdiwYZt0HP8TriRtgpEjR7J48WJeeOGFdpdSjWHDhjFy5MhNOobhJEmbYOjQoYwePbrdZWxxXNaTJFXHcJIkVcdwkiRVx68v6qUkLwPrX61r8NoReLHdRVTEfqzNfqxtsPbjnaWUnboz0Q9E9N4T3f2OqMEgyWz78Qf2Y232Y232Y+Nc1pMkVcdwkiRVx3DqvantLqAy9mNt9mNt9mNt9mMj/ECEJKk6njlJkqpjOPVQkqOTPJHkqSSfanc9AyHJt5M8n2RBp7Htk/wwyZPNzz9qxpPk8qY/85O8p32V948kuyW5P8njSR5Lcm4zPih7kmRYkplJ5jX9+EwzPjrJw00/bkry5mZ862b/qeb+Ue2sv78keVOSOUnuavYHdT96ynDqgSRvAr4O/CWwJ/A3SfZsb1UDYhpw9DpjnwLuK6WMAe5r9qHVmzHN7UzgGwNU40BaBZxXShkHvA84p/lzMFh78gpwRCllX2A/4Ogk7wO+AHy56cdvgTOa+WcAvy2l7A58uZm3JToXeLzT/mDvR48YTj3zXuCpUsrTpZRXgRuB49tcU78rpTwA/Gad4eOBa5rta4ATOo1fW1r+C9guyc4DU+nAKKU8V0p5pNl+mdYvoF0ZpD1pXteyZndocyvAEcCtzfi6/VjTp1uBI5NkgModEElGAh8C/r3ZD4O4H71hOPXMrsAvO+0vbsYGoz8upTwHrV/WwNub8UHVo2YJZn/gYQZxT5olrLnA88APgYXAS6WUVc2Uzq/59X409y8FdhjYivvdV4ALgNea/R0Y3P3oMcOpZ7r614wfd1zboOlRkm2A24BPlFJ+90ZTuxjbonpSSlldStkPGElrhWFcV9Oan1t0P5IcAzxfSunoPNzF1EHRj94ynHpmMbBbp/2RwK/bVEu7/e+apanm5/PN+KDoUZKhtILp+lLKd5vhQd0TgFLKS8AMWu/FbZdkzVekdX7Nr/ejuX9b1l823pwdChyXZBGtpf8jaJ1JDdZ+9Irh1DOzgDHNp27eDJwCTG9zTe0yHTi12T4VuLPT+N81n1B7H7B0zVLXlqJ5P+Bq4PFSyr91umtQ9iTJTkm2a7bfAvwFrffh7gdOaqat2481fToJ+FHZgv7DZSnlwlLKyFLKKFq/I35USpnIIO1Hr5VSvPXgBnwQ+DmtNfWL213PAL3m7wDPAStp/SvvDFpr4vcBTzY/t2/mhtYnGhcCjwIT2l1/P/TjT2ktu8wH5ja3Dw7WngD7AHOafiwALmnG3wXMBJ4CbgG2bsaHNftPNfe/q92voR97cxhwl/3o+c1viJAkVcdlPUlSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1DCdJUnUMJ0lSdQwnSVJ1/h/5oJ7/F7HGhQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xb6ef780>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"a = Visualize_Sex(train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\"It is easy to notice that...\" (god, I love this phrase) the majority of male sex on the ship did not survive the disaster. This would mean that sex is a crucial feature in this data set.\n",
"![Image of Yaktocat](https://i.pinimg.com/originals/03/29/a4/0329a484e1378fbadf0125de449dd42e.jpg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, Lets examine the fare category."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def plot_fare(data):\n",
" \"\"\"plotting fare as a function of survival\"\"\"\n",
" plt.figure()\n",
" age_bins = np.arange(0, max(data['Fare']), 10)\n",
" plt.hist([data[data['Survived'] == 1]['Fare'], data[data['Survived'] == 0]['Fare']], stacked=True, bins=age_bins, color=['y', 'g'], label=['Survived', 'Not Survived'])\n",
" plt.legend(loc='upper right')\n",
" plt.show()\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xb92db70>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"a = plot_fare(train)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment