Skip to content

Instantly share code, notes, and snippets.

@rebeccabilbro
Created November 4, 2015 01:45
Show Gist options
  • Save rebeccabilbro/45dc0c0500fd92b0f32e to your computer and use it in GitHub Desktop.
Save rebeccabilbro/45dc0c0500fd92b0f32e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{"nbformat_minor": 0, "cells": [{"source": "# TITANIC: Wrangling the Passenger Manifest\n\n## Exploratory Analysis with Pandas\n\nThis tutorial is based on the Kaggle Competition,\n\"Predicting Survival Aboard the Titanic\"\nhttps://www.kaggle.com/c/titanic\n\n___Be sure to read the README before you begin!___\n\nSee also: \nhttp://www.analyticsvidhya.com/blog/2014/08/baby-steps-python-performing-exploratory-analysis-python/ \nhttp://www.analyticsvidhya.com/blog/2014/09/data-munging-python-using-pandas-baby-steps-python/", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "import pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport pandas.io.sql as pd_sql\nimport sqlite3 as sql\n\n%matplotlib inline", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"source": "Here's a ```sqlite``` database for you to store the data once it's ready:", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "con = sql.connect(\"titanic.db\") ", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to open up the csv.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 3, "cell_type": "code", "source": "df = pd.read_csv(\"../titanic/data/train.csv\") ", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}, {"source": "### Exploring the Tabular Data", "cell_type": "markdown", "metadata": {}}, {"source": "The file we'll be exploring today, ```train.csv```, is the training set -- it represents\na subset of the full passenger manifest dataset. The rest of the data is in another\nfile called ```test.csv``` - we'll use that later (when we get to Machine Learning).\nLet's take a look...", "cell_type": "markdown", "metadata": {}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to view the \"head\" of the file with the first 10 rows.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 4, "cell_type": "code", "source": "df.head(10)", "outputs": [{"execution_count": 4, "output_type": "execute_result", "data": {"text/plain": " PassengerId Survived Pclass \\\n0 1 0 3 \n1 2 1 1 \n2 3 1 3 \n3 4 1 1 \n4 5 0 3 \n5 6 0 3 \n6 7 0 1 \n7 8 0 3 \n8 9 1 3 \n9 10 1 2 \n\n Name Sex Age SibSp \\\n0 Braund, Mr. Owen Harris male 22 1 \n1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 1 \n2 Heikkinen, Miss. Laina female 26 0 \n3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 1 \n4 Allen, Mr. William Henry male 35 0 \n5 Moran, Mr. James male NaN 0 \n6 McCarthy, Mr. Timothy J male 54 0 \n7 Palsson, Master. Gosta Leonard male 2 3 \n8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27 0 \n9 Nasser, Mrs. Nicholas (Adele Achem) female 14 1 \n\n Parch Ticket Fare Cabin Embarked \n0 0 A/5 21171 7.2500 NaN S \n1 0 PC 17599 71.2833 C85 C \n2 0 STON/O2. 3101282 7.9250 NaN S \n3 0 113803 53.1000 C123 S \n4 0 373450 8.0500 NaN S \n5 0 330877 8.4583 NaN Q \n6 0 17463 51.8625 E46 S \n7 1 349909 21.0750 NaN S \n8 2 347742 11.1333 NaN S \n9 0 237736 30.0708 NaN C ", "text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Survived</th>\n <th>Pclass</th>\n <th>Name</th>\n <th>Sex</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Ticket</th>\n <th>Fare</th>\n <th>Cabin</th>\n <th>Embarked</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>0</td>\n <td>3</td>\n <td>Braund, Mr. Owen Harris</td>\n <td>male</td>\n <td>22</td>\n <td>1</td>\n <td>0</td>\n <td>A/5 21171</td>\n <td>7.2500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>1</td>\n <td>1</td>\n <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n <td>female</td>\n <td>38</td>\n <td>1</td>\n <td>0</td>\n <td>PC 17599</td>\n <td>71.2833</td>\n <td>C85</td>\n <td>C</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>1</td>\n <td>3</td>\n <td>Heikkinen, Miss. Laina</td>\n <td>female</td>\n <td>26</td>\n <td>0</td>\n <td>0</td>\n <td>STON/O2. 3101282</td>\n <td>7.9250</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>1</td>\n <td>1</td>\n <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n <td>female</td>\n <td>35</td>\n <td>1</td>\n <td>0</td>\n <td>113803</td>\n <td>53.1000</td>\n <td>C123</td>\n <td>S</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>0</td>\n <td>3</td>\n <td>Allen, Mr. William Henry</td>\n <td>male</td>\n <td>35</td>\n <td>0</td>\n <td>0</td>\n <td>373450</td>\n <td>8.0500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>5</th>\n <td>6</td>\n <td>0</td>\n <td>3</td>\n <td>Moran, Mr. James</td>\n <td>male</td>\n <td>NaN</td>\n <td>0</td>\n <td>0</td>\n <td>330877</td>\n <td>8.4583</td>\n <td>NaN</td>\n <td>Q</td>\n </tr>\n <tr>\n <th>6</th>\n <td>7</td>\n <td>0</td>\n <td>1</td>\n <td>McCarthy, Mr. Timothy J</td>\n <td>male</td>\n <td>54</td>\n <td>0</td>\n <td>0</td>\n <td>17463</td>\n <td>51.8625</td>\n <td>E46</td>\n <td>S</td>\n </tr>\n <tr>\n <th>7</th>\n <td>8</td>\n <td>0</td>\n <td>3</td>\n <td>Palsson, Master. Gosta Leonard</td>\n <td>male</td>\n <td>2</td>\n <td>3</td>\n <td>1</td>\n <td>349909</td>\n <td>21.0750</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>8</th>\n <td>9</td>\n <td>1</td>\n <td>3</td>\n <td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td>\n <td>female</td>\n <td>27</td>\n <td>0</td>\n <td>2</td>\n <td>347742</td>\n <td>11.1333</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>9</th>\n <td>10</td>\n <td>1</td>\n <td>2</td>\n <td>Nasser, Mrs. Nicholas (Adele Achem)</td>\n <td>female</td>\n <td>14</td>\n <td>1</td>\n <td>0</td>\n <td>237736</td>\n <td>30.0708</td>\n <td>NaN</td>\n <td>C</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "__What do you see?__\n - Are there any missing values?\n - What kinds of values/numbers/text are there?\n - Are the values continuous or categorical?\n - Are some variables more sparse than others?\n - Are there multiple values in a single column?", "cell_type": "markdown", "metadata": {}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to run summary statistics on the data.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 5, "cell_type": "code", "source": "df.describe()", "outputs": [{"execution_count": 5, "output_type": "execute_result", "data": {"text/plain": " PassengerId Survived Pclass Age SibSp \\\ncount 891.000000 891.000000 891.000000 714.000000 891.000000 \nmean 446.000000 0.383838 2.308642 29.699118 0.523008 \nstd 257.353842 0.486592 0.836071 14.526497 1.102743 \nmin 1.000000 0.000000 1.000000 0.420000 0.000000 \n25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n75% 668.500000 1.000000 3.000000 38.000000 1.000000 \nmax 891.000000 1.000000 3.000000 80.000000 8.000000 \n\n Parch Fare \ncount 891.000000 891.000000 \nmean 0.381594 32.204208 \nstd 0.806057 49.693429 \nmin 0.000000 0.000000 \n25% 0.000000 7.910400 \n50% 0.000000 14.454200 \n75% 0.000000 31.000000 \nmax 6.000000 512.329200 ", "text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Survived</th>\n <th>Pclass</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Fare</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>count</th>\n <td>891.000000</td>\n <td>891.000000</td>\n <td>891.000000</td>\n <td>714.000000</td>\n <td>891.000000</td>\n <td>891.000000</td>\n <td>891.000000</td>\n </tr>\n <tr>\n <th>mean</th>\n <td>446.000000</td>\n <td>0.383838</td>\n <td>2.308642</td>\n <td>29.699118</td>\n <td>0.523008</td>\n <td>0.381594</td>\n <td>32.204208</td>\n </tr>\n <tr>\n <th>std</th>\n <td>257.353842</td>\n <td>0.486592</td>\n <td>0.836071</td>\n <td>14.526497</td>\n <td>1.102743</td>\n <td>0.806057</td>\n <td>49.693429</td>\n </tr>\n <tr>\n <th>min</th>\n <td>1.000000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.420000</td>\n <td>0.000000</td>\n <td>0.000000</td>\n <td>0.000000</td>\n </tr>\n <tr>\n <th>25%</th>\n <td>223.500000</td>\n <td>0.000000</td>\n <td>2.000000</td>\n <td>20.125000</td>\n <td>0.000000</td>\n <td>0.000000</td>\n <td>7.910400</td>\n </tr>\n <tr>\n <th>50%</th>\n <td>446.000000</td>\n <td>0.000000</td>\n <td>3.000000</td>\n <td>28.000000</td>\n <td>0.000000</td>\n <td>0.000000</td>\n <td>14.454200</td>\n </tr>\n <tr>\n <th>75%</th>\n <td>668.500000</td>\n <td>1.000000</td>\n <td>3.000000</td>\n <td>38.000000</td>\n <td>1.000000</td>\n <td>0.000000</td>\n <td>31.000000</td>\n </tr>\n <tr>\n <th>max</th>\n <td>891.000000</td>\n <td>1.000000</td>\n <td>3.000000</td>\n <td>80.000000</td>\n <td>8.000000</td>\n <td>6.000000</td>\n <td>512.329200</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "__What can we infer from the summary statistics?__\n - How many missing values does the 'Age' column have?\n - What's the age distribution?\n - What percent of the passengers survived?\n - How many passengers belonged to Class 3?\n - Are there any outliers in the 'Fare' column?", "cell_type": "markdown", "metadata": {}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to get the median for the Age column.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.median.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 6, "cell_type": "code", "source": "df['Age'].median()", "outputs": [{"execution_count": 6, "output_type": "execute_result", "data": {"text/plain": "28.0"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to find the number of unique values in the Ticket column.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.nunique.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 7, "cell_type": "code", "source": "df['Ticket'].nunique()", "outputs": [{"execution_count": 7, "output_type": "execute_result", "data": {"text/plain": "681"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "### Visually Exploring the Data", "cell_type": "markdown", "metadata": {}}, {"source": "Let's look at a histogram of the age distribution.\nWhat can you tell from the graph?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 8, "cell_type": "code", "source": "fig = plt.figure()\nax = fig.add_subplot(111)\nax.hist(df['Age'], bins = 10, range = (df['Age'].min(),df['Age'].max()))\nplt.title('Age distribution')\nplt.xlabel('Age')\nplt.ylabel('Count of Passengers')\nplt.show()", "outputs": [{"output_type": "display_data", "data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEZCAYAAAB8culNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucHGWd7/HPl0BQAhhACLdAIhIgGlgQwlVoEFlcuR1v\n3MSIrrtHVNTjwU04K4xHV0FdLy8Vz5EVjBiygiCHBMEESGM8KjdBLiEGkAhBMuG2AYUsgfz2j3om\n0+nqmem51FTN8H2/Xv2iq7q66jvDpH9dz/NUPYoIzMzMGm1UdgAzM6seFwczM8txcTAzsxwXBzMz\ny3FxMDOzHBcHMzPLcXEwSyR1SLosPd9F0vOSNET7/p6kf07Pa5IeG4r9pv29VdLSodqfGbg4WEVJ\nqkt6RtLYYTzs+ot+IuLRiNgi+rgQSNIHJS3uc8cRH42ILw5FSEnrJL2hYd+LI2LPodi3WRcXB6sc\nSZOA6cAq4IRSwwwBSUX8OxuSMxqznrg4WBV9ALgRuAyY0fiCpG0kzZO0WtJtkr7Y+M1d0p6SFkp6\nWtJSSe/t6SCSJku6RdJzkhYAr294bVL6hr5RWv6gpIfTtn+UdJqkPYH/AxycmqCeSdv+MDUj/VzS\nX4Aj07ovNB1/lqQnJT0i6bSG9XVJH25YXn92IumXafXv0zHf29xMJWmvtI9nJd0n6fiG134o6buS\n5qef5beNZyFmXVwcrIo+APwEuAL4W0nbNbz2XeB5YAJZ4fgAqTlI0jhgIfBjYFvgFOAiSXv1cJzL\ngduBbYAvpP3lmpHSfr8FHBsRWwIHA3dHxFLgH4HfpCaorRvedirwhYjYHPhV2m/jvrdPx90xHff7\nknZPrzVvu15EHJ6e7p2OeWVT1k2AecAN6XfwCWCOpCkNm50MdABbAQ8B/9Lyt2Ovai4OVimSDgN2\nAq6NiAeBJcBp6bUxwLuA8yNiTUQ8AMymu4nlOOCRiJgdEesi4m7gaiB39iBpF2B/4HMRsTYiFpN9\nqPbUXLMOmCbptRHRGRFLunbVYtsAromI3wBExH/2sG3XsX8JXEf2oT1YBwHjIuKCiHg5IhYB88mK\nVZerI+KOiHgFmAP8zRAc10YZFwermhnAgoh4Pi1fSXfT0rbAxkDjSJ8VDc93BQ5MzSnPSnqWrLBM\naHGcHYFnI+LFhnV/ahUoIv5K9sH934E/pyaZPfr4OfoajdTq2Dv08Z527Nji2H9K6yErXJ0Nr70I\nbD4Ex7VRZuOyA5h1kfRa4H3ARpKeSKs3BcZLmkZ2FvEyMBF4ML0+sWEXjwK3RMQxbRzuCWArSZtF\nxAtp3a7AK602jogFwAJJm5I1w1wMHE4PzT89aNy21bHvSc//Coxr2Hb7fhzjz8BESWoYabUr4KGu\n1i8+c7AqOYnsw38vYJ/02AtYDMxIzSBXAx2SXps6hM+g+0P3OmCKpPdL2iQ9DkjbbSAi/gTcAXw+\nbXcYWbNUjqTtJJ2Y+h7Wkn14dxWRTmDn1Na//i2tdtNifdex3wq8k+wsCeBu4F3pZ3wj8OGm93UC\nu7XKCtwKvAB8Nu27ln6uf+8lm1mOi4NVyQeASyJiRUSsSo9O4DvAaWnk0MeB1wEryfob5gIvAaSm\nqGPIOqIfJzs7+DLQ07USpwEHAs8A56X9NeoqOhsBn077fBp4K/DR9NpNwP3ASkmrGt7XfEbRvO4J\n4Fmyb/qXAf8YEcvSa99IP1MncClZB3vjezuA2anp7D2N+46Il4DjgXcAT5L97s5o2HdP2cw2oKIm\n+5F0Cdm3oVURMS2tm072x7oJ2TfEsyLi9vTaLOBDZN/Izk6n8Wa9knQhsF1EnFl2FrPRpMgzh0uB\nY5vWfYVshMa+ZN/UvgIgaSpZh9/U9J6LCrpwyEY4SXtI2luZ6WRfKH5Wdi6z0aawD+A0NPDZptVP\nkDUJAIwnO00HOBGYm4b1LScbez29qGw2om0BXAX8hawd/WsRcW25kcxGn+EerTQT+JWkr5EVpoPT\n+h2B3zZst4JsrLvZBiLiDmD3Pjc0s0EZ7qabH5D1J+xC1sF3SS/bupPMzKwkw33mMD0ijk7Pfwr8\nW3r+OBuOV9+Z7ian9SS5YJiZDUBE9GsY83CfOTwk6Yj0/Ciga3jdtcApksZKmkzWbHBbqx1EROUf\n559/fukZnNM5ndMZux4DUdiZg6S5wBHA69MdI88D/gH4brrK9MW0TEQskXQF3VfAnhUD/YnMzGzQ\nCisOEXFqDy8d2MP2XwK+VFSe0UqDnKjs85///JDkcC03G118LUEBarXaMB8xBvhYNIj3Nj6KNfy/\nz4FxzqE1EnKOhIwDVdgV0kXY8F5iBl1nDmX/TuQzB7MKk0RUvEPazMxGABcHMzPLcXEwM7McFwcz\nM8txcTAzsxwXBzMzy3FxMDOznOG+8Z6NUoO9Unso+FoLs6Hj4mBDpOwP5vKLk9lo4mYlMzPLcXEw\nM7McFwczM8txcTAzsxwXBzMzyymsOEi6RFKnpHub1n9C0gOS7pN0YcP6WZIelLRU0jFF5TIzs74V\nOZT1UuDbwI+6Vkg6EjgB2Dsi1kraNq2fCpwMTAV2Am6UNCUi1hWYz8zMelDYmUNELAaebVr9UeDL\nEbE2bfNkWn8iMDci1kbEcuAhYHpR2czMrHfD3eewO3C4pN9KqkvaP63fEVjRsN0KsjMIMzMrwXBf\nIb0xsFVEHCTpAOAK4A09bNvyktuOjo71z2u12qiew9XMbCDq9Tr1en1Q+yh0DmlJk4B5ETEtLV8P\nXBARt6Tlh4CDgL8HiIgL0vobgPMj4tam/XkO6SZVmUO6Chn8t2HW2kiYQ/oa4CgASVOAsRHxFHAt\ncIqksZImkzU/3TbM2czMLCmsWUnSXOAIYBtJjwHnAZcAl6ThrS8BHwCIiCWSrgCWAC8DZ/kUwcys\nPIU2Kw01NyvluVmpO4P/NsxaGwnNSmZmNgK4OJiZWY6Lg5mZ5bg4mJlZjouDmZnluDiYmVmOi4OZ\nmeW4OJiZWY6Lg5mZ5bg4mJlZjouDmZnluDiYmVmOi4OZmeW4OJiZWY6Lg5mZ5bg4mJlZTmHFQdIl\nkjrTrG/Nr31G0jpJWzesmyXpQUlLJR1TVC4zM+tbkWcOlwLHNq+UNBF4O/CnhnVTgZOBqek9F0ny\nWY2ZWUkK+wCOiMXAsy1e+jrw2aZ1JwJzI2JtRCwHHgKmF5XNzMx6N6zfziWdCKyIiHuaXtoRWNGw\nvALYadiCmZnZBjYergNJ2gw4l6xJaf3qXt7Scrb4jo6O9c9rtRq1Wm0I0pmZjR71ep16vT6ofSii\n5WfwkJA0CZgXEdMkTQNuBF5IL+8MPA4cCJwJEBEXpPfdAJwfEbc27S+KzDsSSaKHOjqcKSqRwX8b\nZq1JIiJ6+zKeM2zNShFxb0RMiIjJETGZrOlov4joBK4FTpE0VtJkYHfgtuHKZmZmGypyKOtc4NfA\nFEmPSTqzaZP1X/MiYglwBbAEuB44y6cIZmblKbRZaai5WSnPzUrdGfy3YdZapZuVzMxs5HBxMDOz\nHBcHMzPLcXEwM7McFwczM8txcTAzsxwXBzMzy+mzOEjaXNKY9HwPSSdI2qT4aGZmVpZ2zhx+CWwq\naSfgF8AZwA+LDGVmZuVqpzgoIl4A3gVcFBHvBd5cbCwzMytTW30Okg4GTgeu68/7zMxsZGrnQ/5T\nwCzgZxFxv6TdgEXFxjIzszL1euO91BH9lYj4zPBF6plvvJfnG+91Z/DfhllrQ37jvYh4BThU2SeQ\nmZm9SrQzTejdwP+TdCXds7hFRFxdXCwzMytTO8XhNcAzwFFN610czMxGqcIm+5F0CfBOYFVETEvr\nvgocB7wEPAycGRGr02uzgA8BrwBnR8SCFvt0n0MT9zl0Z/DfhllrhUz2k66KvknS/Wl5b0n/3Ma+\nLwWObVq3AHhTROwDLCMbBYWkqcDJwNT0noskebismVlJ2vkAvhg4l+zbPsC9wKl9vSkiFgPPNq1b\nGBHr0uKtwM7p+YnA3IhYGxHLgYeA6W1kMzOzArRTHDaLiFu7FlK7ztohOPaHgJ+n5zsCKxpeWwHs\nNATHMDOzAWinQ/pJSW/sWpD0HuCJwRxU0v8CXoqIy3vZrGUDckdHx/rntVqNWq02mChmZqNOvV6n\nXq8Pah99dkinK6K/DxxC1kz0CHB6av7p672TgHldHdJp3QeBjwBvi4g1ad1MgIi4IC3fAJzfeMaS\n1rtDuok7pLsz+G/DrLWBdEj3eeYQEQ8Db5M0DtgoIp4fRMBjgXOAI7oKQ3ItcLmkr5M1J+0O3DbQ\n49irU1Wu1XSRstGgz+Ig6TM0fC1M/wBXA3dGxN29vG8ucATwekmPAeeTjU4aCyxM+/lNRJwVEUsk\nXQEsAV4GzvIpgvVfFf5kqlGgzAarnWaly4H9gXlkf/nvJBuxtCvw04i4sOiQDVlcM5q4WalKGcDN\nW1ZFA2lWaqc4LAbeERF/Scubk40yOpbs7GGvAebtNxeHPBeHKmUAFwerokIuggO2pfsaB8iGsU5I\nEwCtaf0WMzMbydoZyjoHuFXSNWRfz44n6zweR9ZHYGZmo0xb91aSdABwKNl5+/+PiDuKDtZDDjcr\nNXGzUpUygJuVrIoK6XNIOx4DbE92phEAEfHoQEIOhotDnotDlTKAi4NVUSHXOUj6BNkw1FVkd0zt\nMq31O8zMbKRrZ7TSw8D0iHh6eCL1msVnDk185lClDOAzB6uiokYrPQo8N7BIZmY2ErUzWukRYJGk\n6+ge0hoR8fXiYpmZWZnaKQ6PpsfY9DAzs1Gu7WlCJY2LiL8WnKevDO5zaOI+hyplAPc5WBUVNU3o\nIZKWAEvT8j6SLhpgRjMzGwHa6ZD+Jtl9lJ4CiIjfk91t1czMRql2ikOrC95eLiCLmZlVRFsd0pIO\nBZA0FjgbeKDQVGZmVqp2zhw+CnyMbIa2x4F903KvJF0iqVPSvQ3rtpa0UNIySQskjW94bZakByUt\nlXRM/38UMzMbKm2PVur3jqW3An8BftQ1h7SkrwBPRcRXJP0TsFVEzJQ0FbgcOICsCN0ITImIdU37\n9GilJh6tVKUM4NFKVkVFjVb6qqQtJW0i6SZJT0k6o6/3RcRi4Nmm1ScAs9Pz2cBJ6fmJwNyIWBsR\ny4GHgOnt/hBmZja02mlWOiYingOOA5YDuwHnDPB4EyKiMz3vBCak5zsCKxq2W0F2BmFmZiVopzh0\ndVofRzZn9GqG4Pw9tQ/1th+fm5uZlaSd0UrzJC0lmxL0o5K2Y+DTg3ZK2j4iVkragew24JB1dE9s\n2G7ntC6no6Nj/fNarUatVhtgFDOz0aler1Ov1we1j3Yn+9kGWB0RL6fpQbeIiJVtvG8SMK+pQ/rp\niLhQ0kxgfFOH9HS6O6Tf2Nz77A7pPHdIVykDuEPaqqioDun3AmtTYfgc8GOyPoK+3jcX+DWwh6TH\nJJ0JXAC8XdIy4Ki0TEQsAa4gm5P6euAsVwEzs/K0M9nPvRExTdJhwBeBrwHnRcSwjybymUOezxyq\nlAF85mBVVNRkP11Tgx4HXBwR84FN+hvOzMxGjnaKw+OSvg+cDFwn6TVtvs/MzEaodpqVxpHdlfWe\niHgwjTKaFhELhiNgUxY3KzVxs1KVMoCblayKBtKs1J/JfrYDXtO13OJOrYVzcchzcahSBnBxsCoq\narTSCZIeJJtL+hayq6SvH1BCMzMbEdrpO/gicDCwLCImA28Dbi00lZmZlaqd4rA2Ip4CNpI0JiIW\nAfsXnMvMzErUzu0znpW0BbAYmCNpFdmtuM3MbJRqd7TSi8AY4HRgS2BORDxdfLxcFndIN3GHdJUy\ngDukrYqGdLSSpIOA/wu8EbgH+HC6zUVpXBzyXByqlAFcHKyKhnq00neB/wlsA3wd+MYgspmZ2QjS\nW3HYKCIWRsSaiLgS2G64QpmZWbl665B+naR3kZ2vNy9HRFxdeDozMytFb30OP2TDRtwNGnUj4sxC\nk7XO5D6HJu5zqFIGcJ+DVVGht8+oAheHPBeHKmUAFweroqJu2W1mZq8ypRQHSbMk3S/pXkmXS9pU\n0taSFkpaJmmBpPFlZDMzs16KQ5oeFElvGMoDpnmlPwLsl+aWHgOcAswEFkbEFOCmtGxmZiXo7czh\n3PTfq4b4mM8Ba4HNJG0MbAb8GTgBmJ22mQ2cNMTHNTOzNvU2lPVpSQuByZLmNb0WEXHCQA4YEc9I\n+lfgUbLbcvwiIhZKmhARnWmzTmDCQPZvZmaD11tx+DtgP+DHwNfovt4BBjEsRNJuwKeAScBq4EpJ\n72/cJiJCkod8mJmVpMfiEBEvAb+VdHBEPClp87R+sHdk3R/4ddeN+yRdTTZfxEpJ20fEyjQV6apW\nb+7o6Fj/vFarUavVBhnHzGx0qdfr1Ov1Qe2jnbuyTgN+RHaPJYAngRkRcd+ADijtA8wBDgDWAD8E\nbgN2BZ6OiAslzQTGR8TMpvf6Oocmvs6hShnA1zlYFRVyEZyk3wDnpkl+kFQDvhQRhwwi6GeBGcA6\n4HfA3wNbAFcAu5BNRfq+iPiPpve5ODRxcahSBnBxsCoqqjj8PiL26WvdcHBxyHNxqFIGcHGwKhpI\ncWhnJrhHJH0OuIzsX+DpwB8HkM/MzEaIdq6Q/hDZ7bqvJrvmYdu0zszMRinfeG+Ec7NSlTKAm5Ws\ninzjPTMzGxIuDmZmltNncZB0WIt1hxYTx8zMqqCdM4dvt1j3naEOYmZm1dHjUFZJBwOHANtK+h90\n31tpC9wcZWY2qvV2ncNYskIwJv23y3PAe4oMZTaSZSPIyuURUzZY7VwhPSkilg9PnN55KGueh7JW\nKQNUI4eH09qGirpCelNJF5PdYrtr+4iIo/qZz8zMRoh2zhzuAb5HdoO8V9LqiIg7C87WKovPHJr4\nzKFKGaAaOXzmYBsq6sxhbUR8b4CZzMxsBGpn1NE8SR+TtIOkrbsehSczM7PStNOstJwW58kRMbmg\nTL1lcbNSEzcrVSkDVCOHm5VsQ4XM51AlLg55Lg5VygDVyOHiYBsqpM9B0gxanzn8qD8HatrneODf\ngDelfZ8JPAj8hGy60OW0mAnOzMyGRzt9Dgc0PA4HOoATBnncbwE/j4i9gL2BpcBMYGFETAFuSstm\nZlaCfjcrpW/9P4mIvx3QAaXXAXdFxBua1i8FjoiITknbA/WI2LNpGzcrNXGzUpUyQDVyuFnJNjRc\n8zm8AAymM3oy8KSkSyX9TtLFksYBEyKiM23TCUwYxDHMzGwQ2ulzmNewuBEwFbhikMfcD/h4RNwu\n6Zs0NSFFREhq+dWno6Nj/fNarUatVhtEFDOz0ader1Ov1we1j3aGstbS0wBeBh6NiMcGfMCsyeg3\nXUNh03wRs4A3AEdGxEpJOwCL3KzUNzcrVSkDVCOHm5VsQ4U0K0VEnazDeEtgK+A/B5Sue38rgcck\nTUmrjgbuB+YBM9K6GcA1gzmOmZkNXDtnDu8DvgrcklYdDpwTEVcO+KDSPmRDWccCD5MNZR1D1ly1\nCz0MZfWZQ57PHKqUAaqRw2cOtqFCLoJLN947OiJWpeVtgZsiYu8BJx0gF4c8F4cqZYBq5HBxsA0V\nNVpJwJMNy0/TPSucmZmNQu3clfUG4BeSLicrCicD1xeayszMStXWRXCS3g0cmhYXR8TPCk3Vc464\n8847Oe+8C3nllb63L9Kmm8LVV89lo43KnU7bzUpVygDVyOFmJdvQkN5bSdLuZBem/SoirgKuSusP\nk7RbRDw8uLgD88QTT3Dzzct48cWy765xChGXl5zBzKwYvTUrfZPs+oNmz6XXji8kURvGjt2RF188\nuazDAyCdVurxzcyK1FubyISIuKd5ZVo37HM5mJnZ8OmtOIzv5bXXDHUQMzOrjt6Kwx2S/qF5paSP\nAHcWF8nMzMrWW5/Dp4CfSTqd7mLwFmBT4L8VHczMzMrTY3FIN8A7BDgSeDPZ+Lz5EXHzcIUzM7Ny\n9HoRXLpXxc3pYWZmrxLlXsFlZmaV5OJgZmY57dxbyXqw8cb+9ZnZ6ORPt0Er+x42vkGu5WX33CqX\n7+80srk4mI1KZX8wl1+cbHBK63OQNEbSXZLmpeWtJS2UtEzSAkm9XaFtZmYFKrND+pPAErq/4swE\nFkbEFOCmtGxmZiUopThI2hn4O7J5pLvOP08AZqfns4GTSohmZmaUd+bwDeAcYF3DugkR0ZmedwIT\nhj2VmZkBJXRISzoOWBURd0mqtdomIkJSyx61OXPmsGbNMqADqKWHmZl1qdfr1Ov1Qe2jrWlCh5Kk\nLwFnAC+T3fp7S+Bq4ACglu7ptAOwKCL2bHpvzJ8/n9NPv4jVq68b1tzNpDFErKMao0KcoRoZoBo5\nqpHBQ1mrYyDThA57s1JEnBsREyNiMnAKcHNEnAFcC8xIm80ArhnubGZmlqnC7TO6vl5cALxd0jLg\nqLRsZmYlKPUiuIi4BbglPX8GOLrMPGZmlqnCmYOZmVWMi4OZmeW4OJiZWY6Lg5mZ5bg4mJlZjouD\nmZnluDiYmVmOi4OZmeW4OJiZWY6Lg5mZ5bg4mJlZjouDmZnluDiYmVmOi4OZmeW4OJiZWU6p8zmY\n2egl9WtWysJ4utKBGfYzB0kTJS2SdL+k+ySdndZvLWmhpGWSFkgaP9zZzGwoRQUeNlBlNCutBT4d\nEW8CDgI+JmkvYCawMCKmADelZTMzK8GwF4eIWBkRd6fnfwEeAHYCTgBmp81mAycNdzYzM8uU2iEt\naRKwL3ArMCEiOtNLncCEkmKZmb3qldYhLWlz4CrgkxHxfGPnVUSEpJYNhnPmzGHNmmVAB1BLDzMz\n61Kv16nX64Pah8royZe0CTAfuD4ivpnWLQVqEbFS0g7AoojYs+l9MX/+fE4//SJWr75u2HNvmGUM\nEesov9NLzlCZDFCNHM7QTR6tRDZyLCL6NXysjNFKAn4ALOkqDMm1wIz0fAZwzXBnMzOzTBnNSocC\n7wfukXRXWjcLuAC4QtKHgeXA+0rIZmZmlFAcIuJX9HzGcvRwZjEzs9Z8+wwzM8txcTAzsxwXBzMz\ny3FxMDOzHBcHMzPLcXEwM7McFwczM8txcTAzsxwXBzMzy/E0oWY2qlVhutKRePM/FwczG+XK/mAu\nvzgNhJuVzMwsx8XBzMxyXBzMzCzHxcHMzHIqVRwkHStpqaQHJf1T2XnMzF6tKlMcJI0BvgMcC0wF\nTpW0V7mpBqpedoA21csO0KZ62QHaVC87QJvqZQdoU73sAG2olx2gMJUpDsB04KGIWB4Ra4F/B04s\nOdMA1csO0KZ62QHaVC87QJvqZQdoU73sAG2qlx2gDfW2tpJU6mMgqnSdw07AYw3LK4ADS8piZjaE\nRt61FlUqDm3/9tasuZ0ttzy+yCx9eu65daUe38ysSKrKZd2SDgI6IuLYtDwLWBcRFzZsU42wZmYj\nTET06/ShSsVhY+APwNuAPwO3AadGxAOlBjMzexWqTLNSRLws6ePAL4AxwA9cGMzMylGZMwczM6uO\nKg1l7VVVL5CTdImkTkn3NqzbWtJCScskLZA0vsyMKdNESYsk3S/pPklnVy2rpNdIulXS3ZKWSPpy\n1TI2kjRG0l2S5qXlyuWUtFzSPSnnbRXOOV7STyU9kP7fH1i1nJL2SL/HrsdqSWdXLWfKOiv9W79X\n0uWSNu1vzhFRHCp+gdylZLkazQQWRsQU4Ka0XLa1wKcj4k3AQcDH0u+wMlkjYg1wZET8DbA3cKSk\nw6qUsckngSV0j7SrYs4AahGxb0RMT+uqmPNbwM8jYi+y//dLqVjOiPhD+j3uC7wFeAH4GRXLKWkS\n8BFgv4iYRtZMfwr9zRkRlX8ABwM3NCzPBGaWnashzyTg3oblpcCE9Hx7YGnZGVtkvgY4uqpZgc2A\n24E3VTEjsDNwI3AkMK+q/9+BR4BtmtZVKifwOuCPLdZXKmdTtmOAxVXMCWxNNrhnK7J+5XnA2/ub\nc0ScOdD6ArmdSsrSjgkR0ZmedwITygzTLH2z2Be4lYpllbSRpLtTlkURcT8Vy5h8AzgHaLzgpYo5\nA7hR0h2SPpLWVS3nZOBJSZdK+p2kiyWNo3o5G50CzE3PK5UzIp4B/hV4lGzk539ExEL6mXOkFIcR\n22seWZmuTH5JmwNXAZ+MiOcbX6tC1ohYF1mz0s7A4ZKObHq99IySjgNWRcRd9HDpaRVyJodG1gzy\nDrKmxLc2vliRnBsD+wEXRcR+wF9pavKoSE4AJI0FjgeubH6tCjkl7QZ8iqxFY0dgc0nvb9ymnZwj\npTg8DkxsWJ5IdvZQVZ2StgeQtAOwquQ8AEjahKwwXBYR16TVlcwaEauB68jadquW8RDgBEmPkH17\nPErSZVQvJxHxRPrvk2Tt49OpXs4VwIqIuD0t/5SsWKysWM4u7wDuTL9TqN7vc3/g1xHxdES8DFxN\n1jTfr9/nSCkOdwC7S5qUqvbJwLUlZ+rNtcCM9HwGWft+qSQJ+AGwJCK+2fBSZbJKen3XCApJryVr\nJ72LCmUEiIhzI2JiREwma164OSLOoGI5JW0maYv0fBxZO/m9VCxnRKwEHpM0Ja06GrifrK28Mjkb\nnEp3kxJU7PdJ1rdwkKTXpn/3R5MNnOjf77Psjp1+dLK8g6yT5SFgVtl5GnLNJWvXe4msX+RMsg6h\nG4FlwAJgfAVyHkbWPn432QfuXWSjrCqTFZgG/C5lvAc4J62vTMYWmY8Arq1iTrK2/LvT476ufzdV\ny5ky7UM2AOH3ZN90X1fRnOOAp4AtGtZVMednyQrsvcBsYJP+5vRFcGZmljNSmpXMzGwYuTiYmVmO\ni4OZmeW4OJiZWY6Lg5mZ5bg4mJlZjouDWZsknSRpnaQ9ys5iVjQXB7P2nQrMT/81G9VcHMzakG5Y\neCDwcbLbt3TdQfaiNEHNAknXSXp3eu0tkurpbqg3dN3TxmykcHEwa8+JZHOKPEp2e+n9gHcBu0Y2\nQc0ZZDc3i3SDw28D746I/ckmhPqXknKbDcjGZQcwGyFOJZvDAbJbNZ9K9u/nCoCI6JS0KL2+B9kk\nRTdm9z1jDNn9t8xGDBcHsz5I2ppsxrc3SwqyD/sguwV2y/kcgPsj4pBhimg25NysZNa39wA/iohJ\nETE5InYhJN2UAAAAnUlEQVQhm37zGeDdykwAamn7PwDbSjoIsnk0JE0tI7jZQLk4mPXtFLKzhEZX\nkc3Du4LsXvmXkd1ufHVErCUrKBemKU/vIuuPMBsxfMtus0GQNC4i/ippG7I5uQ+JiLJnAjMbNPc5\nmA3O/DR73Vjgf7sw2GjhMwczM8txn4OZmeW4OJiZWY6Lg5mZ5bg4mJlZjouDmZnluDiYmVnOfwF+\nC2qRw1kMzQAAAABJRU5ErkJggg==\n", "text/plain": "<matplotlib.figure.Figure at 0x852e810>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "Now let's look at a histogram of the fares.\nWhat does it tell you?", "cell_type": "markdown", "metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "fig = plt.figure()\nax = fig.add_subplot(111)\nax.hist(df['Fare'], bins = 10, range = (df['Fare'].min(),df['Fare'].max()))\nplt.title('Fare distribution')\nplt.xlabel('Fare')\nplt.ylabel('Count of Passengers')\nplt.show()", "outputs": [{"output_type": "display_data", "data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEZCAYAAACJjGL9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHyJJREFUeJzt3Xu4XVV97vHvGyAC4RJSMCEXJBWipNWKl6ggdWtpih5M\nqKcC3p5YOT1WVNpqrYlPW5PHNlpPW+2pxSr1klJMjQoUvCYgC+2pgCgIElKIJULA7CAgIEhN4D1/\nzLGT5XbvuVd29lx77Z338zzryZhj3n4jhPVbY44x55RtIiIihjNlvAOIiIjelkQRERG1kigiIqJW\nEkVERNRKooiIiFpJFBERUSuJImIQSSslXVjKx0h6WJLG6NgfkfSnpdwn6a6xOG453imSNo3V8SIG\nJFFET5G0RdKj5cv5YUkPSZrV5TB23Vxk+07bh3qEG44kvUHSN0Y8sP1m238xFkFKekLSL7cd+xu2\nnz4Wx45ot/94BxAxiIHTbX9tNDsP/PIf6Yt9PEiaYvuJsT7sGB8v4hekRxE9T9J0SV+QtF3S/ZIu\nlzSnbX1L0l9I+n/AI8B8SU+XtEHSfZI2SXpVzfHnS7q69F7WA0e2rTu2/HKfUpbfIOn7Zdv/kvQa\nSU8H/hF4YekF3V+2/VS51PQlST8BXlLq3jvo/Csk3SvpDkmvGdSuc9qWd/VaJH29VH+3nPNVgy9l\nSTqhHOMBSd+T9Iq2dZ+S9A/l7/UhSde0904i2iVRRC8a/Ct5CvBx4Jjy+Snw4UHbvA74X8AhwH3A\nBuBfgKOAs4HzJZ0wzPk+DXwL+CXgvcAy2i4/7QpKmgb8HXCa7cOAFwI32t4EvAn4ZrlMNaNtt1cD\n77V9CPDv5bjtx55Vzju7nPdjko4v6wZvu4vtXy/FZ5ZzfnZQrAcAlwNfKX8HbwMukrSgbbOzgJXA\nEcBm4C+H/NuJfV4SRfQaAZeWX8EPSLrY9v22L7H9mO2fAKuBF7ftY+BTtm8tl3ZOA+6wvcb2E7Zv\nBC4GfqFXIekY4LnAn9neYfsbVF+ww13SeQJ4hqSDbPfb3tgW92AGLrX9TQDb/z3MtgPn/jrwRaov\n8L31AmCa7ffb3mn7KuALVIlrwMW2r7f9OHAR8KwxOG9MQkkU0WsMLLV9RPm8UtLBkj5aBrofBK4G\nDh80E6l99tBTgOe3JZsHgNcAM4c432zgAds/bav7wZCB2Y9QfYn/PnBPuWzztBHaM9KspqHOffQI\n+3Ri9hDn/kGph+rvub9t3U+pemMRvyCJIiaCdwALgEW2D6fqTYif/2XefonmTuDqtmRzRLk885Yh\njv1D4AhJB7fVPYXhL/mst72Y6pLRJuCCIc4/kvZthzr3PaX8CDCtbd2ezP66B5g3KJk+Bbh7D44R\nASRRxMRwCNUv3gclzQDeM8Q27V+IXwAWSHqdpAPK53ll0Pnn2P4BcD2wqmz3IuD0oYKQ9GRJS8tY\nxQ6qL/LHy+p+YG4ZGxgqpva6wfUD5z4F+B/AwHjDjcArJR0k6TjgnEH79QNPHSpW4FrgUeBPyrH7\nSrv+tSa2iCElUcRE8CHgIOBHwH8AX+YXf8G33/vwE2Ax1SD23VS9hvcBU4c5/muA5wP3A38OrBnm\n2FOAPyrHvA84BXhzWXclcAuwTdL2tv2GirO97ofAA1Q9gAuBN9m+raz7IPAzqoTwSarB+fZ9VwJr\nyuW132k/tu2fAa8AXgbcSzX4//q2Yw8XW8QvUJPTzSWtoJqN8gRwM/C7VF3pz1B1g7cAZ9r+cdv2\nb6T6lXae7fWNBRcRER1pLFFIOhb4GnCC7f+W9BngS8CvAD+y/QFJ7wKOsL1c0kKqaYrPA+YAVwAL\nGrhBKSIi9kCTl54eorqOe7Ck/YGDqbrXS9jdtV8DnFHKS4G1ZZrgFqp53YsajC8iIjrQWKKwfT/w\nN1QzUO4Bfmx7AzDT9sC0vH52T1mcDWxtO8RWqp5FRESMo8YShaSnAn8IHEuVBA6R9Lr2bcrzeOqu\nfWVwLSJinDX5UMDnAv9h+z4ASRdTPfJgm6RZtrdJOhoYmCFyNzCvbf+5DDHnW1KSR0TEKNge1bTo\nJscoNgEvKHPABZwKbKR6PMKyss0y4NJSvgw4W9JUSfOB44Hrhjqw7Un7ec973jPuMaR9ad++1rZ9\noX17o7Eehe3vSvpnqpuZngC+A3wMOBRYV56KuQU4s2y/UdI6qmSyEzjXe9u6iIjYa42+j8L2B4AP\nDKq+n6p3MdT2q6ke+BYRET0id2b3mL6+vvEOoVFp38Q1mdsGk799e6PRO7ObIClXpCIi9pAk3IOD\n2RERMQkkUURERK0kioiIqJVEERERtZIoIiKiVhJFRETUSqKIiIhaSRQREVEriSIiImolUURERK0k\nioiIqJVEERERtRp9zHhTpk2b0bVzTZkCV175FRYtWtS1c0ZE9JIJmSgefXRz18512GG/xc6dO7t2\nvoiIXjMhEwV0r0chHdC1c0VE9KKMUURERK0kioiIqNVoopD0NEk3tH0elHSepBmSNki6TdJ6SdPb\n9lkh6XZJmyQtbjK+iIgYWaOJwvZ/2j7R9onAc4BHgUuA5cAG2wuAK8sykhYCZwELgdOA8yWl1xMR\nMY66+SV8KrDZ9l3AEmBNqV8DnFHKS4G1tnfY3gJsBjIvNSJiHHUzUZwNrC3lmbb7S7kfmFnKs4Gt\nbftsBeZ0J7yIiBhKVxKFpKnAK4DPDl5n24Brdq9bFxERDevWfRQvA75t+96y3C9plu1tko4Gtpf6\nu4F5bfvNLXWDrGwr95VPREQMaLVatFqtMTmWqh/0zZL0r8CXba8pyx8A7rP9V5KWA9NtLy+D2Z+m\nGpeYA1wBHOe2ICW5m52Mww8/iS996a856aSTunbOiIixJgnbGs2+jfcoJE2jGsj+vbbq9wPrJJ0D\nbAHOBLC9UdI6YCOwEzjX3chkERExrMYThe1HgCMH1d1PlTyG2n41sLrpuCIiojO5RyEiImolUURE\nRK0kioiIqJVEERERtZIoIiKiVhJFRETUSqKIiIhaSRQREVEriSIiImolUURERK0kioiIqJVEERER\ntZIoIiKiVhJFRETUSqKIiIhaSRQREVEriSIiImolUURERK0kioiIqNV4opA0XdLnJN0qaaOk50ua\nIWmDpNskrZc0vW37FZJul7RJ0uKm44uIiHrd6FH8HfAl2ycAzwQ2AcuBDbYXAFeWZSQtBM4CFgKn\nAedLSq8nImIcNfolLOlw4BTbnwCwvdP2g8ASYE3ZbA1wRikvBdba3mF7C7AZWNRkjBERUa/pX+vz\ngXslfVLSdyRdIGkaMNN2f9mmH5hZyrOBrW37bwXmNBxjRETU2L8Lx3828Fbb35L0IcplpgG2Lck1\nxxhi3cq2cl/5RETEgFarRavVGpNjNZ0otgJbbX+rLH8OWAFskzTL9jZJRwPby/q7gXlt+88tdYOs\nbCreiIhJoa+vj76+vl3Lq1atGvWxGr30ZHsbcJekBaXqVOAW4HJgWalbBlxaypcBZ0uaKmk+cDxw\nXZMxRkREvaZ7FABvAy6SNBX4PvC7wH7AOknnAFuAMwFsb5S0DtgI7ATOtV13WSoiIhrWeKKw/V3g\neUOsOnWY7VcDqxsNKiIiOpZ7FCIiotaIiULSIZL2K+WnSVoi6YDmQ4uIiF7QSY/i68CTJM0Bvgq8\nHvhUk0FFRETv6CRRyPajwCuB822/CvjVZsOKiIhe0dEYhaQXAq8Fvrgn+0VExMTXyRf+H1LdJHeJ\n7VskPRW4qtmwIiKiV9ROjy2D2EtsLxmos/194LymA4uIiN5Q26Ow/ThwsiR1KZ6IiOgxndxwdyPw\nb5I+Czxa6mz74ubCioiIXtFJojgQuB946aD6JIqIiH3AiInC9hu6EEdERPSoTu7MfpqkKyXdUpaf\nKelPmw8tIiJ6QSfTYy8A3g38rCzfDLy6sYgiIqKndJIoDrZ97cBCeez3juZCioiIXtJJorhX0nED\nC5J+B/hhcyFFREQv6WTW01uBjwFPl3QPcAfV4zwiImIf0Mmsp+8DvyFpGjDF9sPNhxUREb1ixEQh\n6R2A25YBHgS+bfvG5kKLiIhe0MkYxXOA3wfmAHOBNwEvAy6Q9K4GY4uIiB7QSaKYBzzb9jtsv50q\ncTwZeDHwhpF2lrRF0k2SbpB0XambIWmDpNskrZc0vW37FZJul7RJ0uJRtSoiIsZMJ4niKHbfQwHV\n1NiZ5WVGj3Wwv4E+2yfaXlTqlgMbbC8ArizLSFoInAUsBE4DzpeUd19ERIyjTmY9XQRcK+lSQMAr\ngE+Xwe2NHZ5n8NNnl1D1SADWAC2qZLEUWGt7B7BF0mZgEXBNh+eJiIgx1smsp/dK+gpwMlXv4E22\nry+rO5kma+AKSY8DH7V9AVWPpL+s7wdmlvJsfj4pbKUaG4mIiHHSSY8C4DvAPWV7SzrG9p0d7nuy\n7R9KOgrYIGlT+0rbluRh9oW2GVe7rWwr95VPREQMaLVatFqtMTlWJ9Nj3wa8B9gOPN626hmdnMD2\nD8uf90q6hOpSUr+kWba3STq6HBvgbqrB8wFzS90gKzs5dUTEPquvr4++vr5dy6tWrRr1sTp9Z/bT\nbC+0/YyBTycHl3SwpENLeRqwmOqhgpcBy8pmy4BLS/ky4GxJUyXNB44Hruu8ORERMdY6ufR0J/DQ\nKI8/E7ik3KS3P3CR7fWSrgfWSToH2AKcCWB7o6R1VIPkO4Fzy0MIIyJinHSSKO4ArpL0RXZPk7Xt\nvx1pR9t3AM8aov5+4NRh9lkNrO4groiI6IJOexR3AlPLJyIi9iGdTI9dCdUYg+1HGo8oIiJ6Siev\nQj1J0kZgU1n+NUnnNx5ZRET0hE5mPX2I6nEaPwKw/V1231UdERGTXEfPURri5rqdDcQSERE9qKPB\nbEknA0iaCpwH3NpoVBER0TM66VG8GXgL1TOX7gZOLMsREbEP6GTW073Aa7oQS0RE9KBOZj39H0mH\nSTpA0pWSfiTp9d0ILiIixl8nl54W234IOJ3qcRtPBd7ZZFAREdE7OkkUA5enTgc+Z/tBhnz0d0RE\nTEadzHq6vLxD4jHgzZKeTGevQI2IiElgxB6F7eVUb7d7ru2fAY9QvbI0IiL2AZ0MZr8K2GF7p6Q/\nA/6F6pWlERGxD+hkjOLPbT8k6UXAbwAfB/6x2bAiIqJXdJIoBl5/ejpwge0vAAc0F1JERPSSThLF\n3ZI+BpwFfFHSgR3uFxERk0AnX/hnAl+lup/ix8AR5D6KiIh9Rieznh6x/XngQUnHUF122tR4ZBER\n0RM6mfW0RNLtVO/Ovprq7uwvd3oCSftJukHS5WV5hqQNkm6TtF7S9LZtV0i6XdImSYv3uDURETHm\nOrn09BfAC4HbbM+nmvl07R6c4w+Ajey+m3s5sMH2AuDKsoykhVTjIAupXpR0vqSMhUREjLNOvoh3\n2P4RMEXSfravAp7bycElzQVeDvwToFK9BFhTymuAM0p5KbDW9g7bW4DNwKKOWhEREY3p5BEeD0g6\nFPgGcJGk7cBPOjz+B6kGvg9rq5tpu7+U+4GZpTwbuKZtu61U78CIiIhx1EmiWAr8FPgj4LVUX/qr\nRtpJ0unAdts3SOobahvbllT3gMFh1q1sK/eVT0REDGi1WrRarTE51rCJQtILgI8CxwE3AefY/tQe\nHPskYImklwMHAodJuhDolzTL9jZJRwPby/Z3A/Pa9p9b6oawcg/CiIjY9/T19dHX17dredWqEX/f\nD6tujOIfgD8Gfgn4W6rLSB2z/W7b88oA+NnA12y/HrgMWFY2WwZcWsqXAWdLmippPnA8cN2enDMi\nIsZeXaKYYnuD7cdsfxZ48l6ea+Ay0vuB35R0G/DSsoztjcA6qhlSXwbOtZ33XkREjLO6MYrDJb2S\n3bOV2pdt++JOT2L7aqp7MLB9P3DqMNutBlZ3etyIiGheXaL4OvCKmuWOE0VERExcwyYK22/oYhwR\nEdGjcudzRETUSqKIiIhawyaK8gpUJP1y98KJiIheU9ejeHf58/PdCCQiInpT3ayn+yRtAOYPPCK8\njW0vaTCuiIjoEXWJ4uXAs4F/Af6a3fdTwLDPYIqIiMmmbnrsz4BrJL3Q9r2SDin1nT45NiIiJoFO\nZj3NknQD1aM1Nkr6tqRfbTiuiIjoEZ0kio8Bb7d9jO1jgHeUuoiI2Ad0kigOLm+1A8B2C5jWWEQR\nEdFTOnlx0R2S/gy4kGpA+7XAfzUaVURE9IxOehRvpHrE+MVU91QcVeoiImIfMGKPojwW/G1diCUi\nInpQnvUUERG1kigiIqLWiIlC0ouGqDu5mXAiIqLXdNKj+Psh6j481oFERERvGnYwW9ILgZOAoyS9\nnd3PejqUznoiB1K9J/tJwFTg32yvkDQD+AzwFGALcKbtH5d9VlDNqHocOM/2+lG2KyIixkjdF/5U\nqqSwX/nzkPJ5CPidkQ5s+zHgJbafBTwTeEm5jLUc2GB7AXBlWUbSQuAsYCFwGnC+pIyhRESMs7qH\nAl4NXC3pU7a3jObgth8txalUCecBYAnw4lK/BmhRJYulwFrbO4AtkjYDi4BrRnPuiIgYG53cmf0k\nSRcAx7Ztb9svHWnH0iP4DvBU4CO2b5E003Z/2aQfmFnKs/n5pLAVmNNBfBER0aBOEsVngY8A/0Q1\ndgAdvo/C9hPAsyQdDnxV0ksGrbekumMNs25lW7mvfCIiYkCr1aLVao3JsTpJFDtsf2RvTmL7QUlf\nBJ4D9EuaZXubpKOB7WWzu4F5bbvNLXVDWLk34URETHp9fX309fXtWl61atWoj9XJYPHlkt4i6WhJ\nMwY+I+0k6UhJ00v5IOA3gRuAy4BlZbNlwKWlfBlwtqSpkuYDxwPX7WF7IiJijHXSo3gD1SWgPx5U\nP3+E/Y4G1pRxiinAhbavLC9BWifpHMr0WADbGyWto3pB0k7gXNt55WpExDjr5KGAx47mwLZvpnrn\n9uD6+4FTh9lnNbB6NOeLiIhmjJgoJC1jiEFl2//cSEQREdFTOrn09Dx2J4qDgJdSTXlNooiI2Ad0\ncunpre3LZYD6M41FFBERPWU0j8h4lJEHsiMiYpLoZIzi8rbFKVTPYlrXWEQREdFTOhmj+Jvyp6mm\nrd5p+67mQoqIiF4y4qUn2y1gE3AYcATw3w3HFBERPaST90qcCVwLvIrq5rjrJL2q6cAiIqI3dHLp\n6U+B59neDiDpKKr3SHy2ycAiIqI3dDLrScC9bcv3sfttdxERMcl10qP4CtUjwj9NlSDOAr7caFQR\nEdEzOrnh7p2S/idwcqn6qO1Lmg0rIiJ6xbCJQtLxwEzb/27788DnS/2LJD3V9ve7FWRERIyfujGK\nDwEPDVH/UFkXERH7gLpEMdP2TYMrS10e4RERsY+oSxTTa9YdONaBREREb6pLFNdL+t+DKyX9HvDt\n5kKKiIheUjfr6Q+BSyS9lt2J4TnAk4DfbjqwiIjoDcMmCtvbJJ0EvAT4VaqHAn7B9te6FVxERIy/\n2juzXfma7f9r++/3NElImifpKkm3SPqepPNK/QxJGyTdJml9eRnSwD4rJN0uaZOkxaNrVkREjJXR\nvLhoT+wA/sj2rwAvAN4i6QRgObDB9gKq50YtB5C0kOrO74XAacD5kpqOMSIiajT6JWx7m+0bS/kn\nwK3AHGAJsKZstgY4o5SXAmtt77C9BdgMLGoyxoiIqNe1X+uSjgVOpHpk+Uzb/WVVPzCzlGcDW9t2\n20qVWCIiYpx08lDAvSbpEKpHgPyB7Yel3Q+ftW1Jrtl9iHUr28p95RMREQNarRatVmtMjtV4opB0\nAFWSuND2paW6X9KsMrPqaGB7qb8bmNe2+9xSN8jKxuKNiJgM+vr66Ovr27W8atWqUR+r0UtPqroO\nHwc22m5/PtRlwLJSXgZc2lZ/tqSpkuYDxwPXNRljRETUa7pHcTLwOuAmSTeUuhXA+4F1ks4BtlC9\nYhXbGyWtAzYCO4FzbdddloqIiIY1mihs/zvD91pOHWaf1cDqxoKKiIg9knsUIiKiVhJFRETUSqKI\niIhaSRQREVEriSIiImolUURERK0kioiIqJVEERERtZIoIiKiVhJFRETUSqKIiIhaSRQREVEriSIi\nImolUURERK0kioiIqJVEERERtZIoIiKiVhJFRETUajRRSPqEpH5JN7fVzZC0QdJtktZLmt62boWk\n2yVtkrS4ydgiIqIzTfcoPgmcNqhuObDB9gLgyrKMpIXAWcDCss/5ktLjiYgYZ41+Edv+BvDAoOol\nwJpSXgOcUcpLgbW2d9jeAmwGFjUZX0REjGz/cTjnTNv9pdwPzCzl2cA1bdttBeZ0M7DhnHzyyV0/\np+2unzMiYijjkSh2sW1Jdd+IPfRt2c1Q1MVzRUTUG49E0S9plu1tko4Gtpf6u4F5bdvNLXVDWNlW\n7iufiIgY0Gq1aLVaY3IsNX2JQ9KxwOW2n1GWPwDcZ/uvJC0HptteXgazP001LjEHuAI4zoMCrHog\n3ft1f/jhJ/Hgg9+k2z2KXHqKiLEkCdujulzRaI9C0lrgxcCRku4C/hx4P7BO0jnAFuBMANsbJa0D\nNgI7gXMHJ4mIiOi+xnsUYy09ioiIPbc3PYrcpxAREbWSKCIiolYSRURE1EqiiIiIWkkUERFRK4ki\nIiJqJVFEREStJIqIiKiVRBEREbWSKCIiolYSRURE1EqiiIiIWkkUERFRK4kiIiJqJVFEREStcX1n\ndgxP6v57s/MOjIgYShJFz+r2l3b3E1NETAy59BQREbWSKCIiolbPJQpJp0naJOl2Se8a73giIvZ1\nPZUoJO0HfBg4DVgIvFrSCeMbVbe1xjuARrVarfEOoVGTuX2TuW0w+du3N3oqUQCLgM22t9jeAfwr\nsHScY+qy1ngH0KiB/xkljcunW+2bjCZz22Dyt29v9NqspznAXW3LW4Hnj1Ms0bjuz+zqRrJYtWrV\nzy1n2nFMdL2WKDr6P+qww17RdBy7PPbYrV0713jr1r0bg79Iu6vpL+2V5TMg047HWpP/Tuv+be7L\nCV+91HhJLwBW2j6tLK8AnrD9V23b9E7AERETiO1RZdleSxT7A/8J/AZwD3Ad8Grb+87P+oiIHtNT\nl55s75T0VuCrwH7Ax5MkIiLGV0/1KCIiovf02vTYWhP9ZjxJn5DUL+nmtroZkjZIuk3SeknT29at\nKG3dJGnx+ETdOUnzJF0l6RZJ35N0XqmfFG2UdKCkayXdKGmjpPeV+knRPqjuZZJ0g6TLy/JkatsW\nSTeV9l1X6iZT+6ZL+pykW8u/z+ePWftsT4gP1aWozcCxwAHAjcAJ4x3XHrbhFOBE4Oa2ug8Af1LK\n7wLeX8oLSxsPKG3eDEwZ7zaM0L5ZwLNK+RCq8aYTJlkbDy5/7g9cA7xokrXv7cBFwGWT8N/nHcCM\nQXWTqX1rgDe2/fs8fKzaN5F6FBP+Zjzb3wAeGFS9hOo/MOXPM0p5KbDW9g7bW6j+Qy7qRpyjZXub\n7RtL+SfArVT3xkymNj5ailOpfrw8wCRpn6S5wMuBf2L3vN5J0bY2g2f9TIr2STocOMX2J6Aa77X9\nIGPUvomUKIa6GW/OOMUylmba7i/lfmBmKc+mauOACdVeScdS9Z6uZRK1UdIUSTdSteMq27cwedr3\nQeCdwBNtdZOlbVDdRHOFpOsl/V6pmyztmw/cK+mTkr4j6QJJ0xij9k2kRDHpR91d9Qnr2jkh/g4k\nHQJ8HvgD2w+3r5vobbT9hO1nAXOBX5f0kkHrJ2T7JJ0ObLd9A8PcJThR29bmZNsnAi8D3iLplPaV\nE7x9+wPPBs63/WzgEWB5+wZ7076JlCjuBua1Lc/j5zPiRNUvaRaApKOB7aV+cHvnlrqeJukAqiRx\noe1LS/WkaiNA6dZ/EXgOk6N9JwFLJN0BrAVeKulCJkfbALD9w/LnvcAlVJdaJkv7tgJbbX+rLH+O\nKnFsG4v2TaREcT1wvKRjJU0FzgIuG+eYxsJlwLJSXgZc2lZ/tqSpkuYDx1PdgNizVD1b4ePARtsf\nals1Kdoo6ciBWSOSDgJ+E7iBSdA+2++2Pc/2fOBs4Gu2X88kaBuApIMlHVrK04DFwM1MkvbZ3gbc\nJWlBqToVuAW4nLFo33iP1O/hqP7LqGbSbAZWjHc8o4h/LdUd5z+jGm/5XWAGcAVwG7AemN62/btL\nWzcBvzXe8XfQvhdRXd++keoL9AaqR8ZPijYCzwC+U9p3E/DOUj8p2tcW84vZPetpUrSN6hr+jeXz\nvYHvj8nSvhLvrwHfAr4LXEw162lM2pcb7iIiotZEuvQUERHjIIkiIiJqJVFEREStJIqIiKiVRBER\nEbWSKCIiolZPvbgooldJepzq3okBS23fOV7xRHRT7qOI6ICkh20fuof7CHY9Yydiwsqlp4hRkDRN\n0hWSvl1ehrOk1B8r6T8lraF6RMQ8Se+UdJ2k70paOa6BR4xCLj1FdOYgSTeU8n8BZwK/bfthSUcC\n32T3s8eOA15v+7ry5rDjbC+SNAX4N0mnuHo3ScSEkEQR0ZmfunpENbDrKbnvK4+qfgKYLenJZfUP\nbA88YG0xsLgtyUyjSiRJFDFhJFFEjM5rgSOBZ9t+vDye+8Cy7pFB277P9se6Gl3EGMoYRcToHEb1\nop/Hy8uLnjLMdl8F3lgebY2kOZKO6laQEWMhPYqIzgyeuXQRcLmkm6jelXLrUNva3iDpBOCbZRLU\nw8DrgHubDTdi7GR6bERE1Mqlp4iIqJVEERERtZIoIiKiVhJFRETUSqKIiIhaSRQREVEriSIiImol\nUURERK3/D7JXM1Ml5KC+AAAAAElFTkSuQmCC\n", "text/plain": "<matplotlib.figure.Figure at 0x8692c70>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "### Dealing with Missing Values\n\nPart of data wrangling is figuring out how to deal with missing values.\nBut before you decide, think about which variables are likely to be predictive\nof survival. Which ones do you think will be the best predictors?\n\n__Age__\nAge is likely to play a role, so we'll probably want to estimate or 'impute'\nthe missing values in some way.\n\n__Fare__\nThere are a lot of extremes on the high end and low end for ticket fares.\nHow should we handle them?\n\n__Other Variables__\nWhat do YOU think??", "cell_type": "markdown", "metadata": {}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to get the sum of all the null values in the Cabin column.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.isnull.html \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sum.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 10, "cell_type": "code", "source": "sum(df['Cabin'].isnull()) ", "outputs": [{"execution_count": 10, "output_type": "execute_result", "data": {"text/plain": "687"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to drop the Ticket column.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 14, "cell_type": "code", "source": "df = df.drop(['Ticket'], axis=1)\ndf.head(20)", "outputs": [{"ename": "ValueError", "evalue": "labels ['Ticket'] not contained in axis", "traceback": ["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m<ipython-input-14-9b9b556b42cc>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;34m'Ticket'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mhead\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m20\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32mP:\\Continuum\\Anaconda\\lib\\site-packages\\pandas\\core\\generic.pyc\u001b[0m in \u001b[0;36mdrop\u001b[1;34m(self, labels, axis, level, inplace, errors)\u001b[0m\n\u001b[0;32m 1595\u001b[0m \u001b[0mnew_axis\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlabels\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mlevel\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merrors\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0merrors\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1596\u001b[0m \u001b[1;32melse\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1597\u001b[1;33m \u001b[0mnew_axis\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0maxis\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mlabels\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0merrors\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0merrors\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1598\u001b[0m \u001b[0mdropped\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mreindex\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m**\u001b[0m\u001b[1;33m{\u001b[0m\u001b[0maxis_name\u001b[0m\u001b[1;33m:\u001b[0m \u001b[0mnew_axis\u001b[0m\u001b[1;33m}\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1599\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32mP:\\Continuum\\Anaconda\\lib\\site-packages\\pandas\\core\\index.pyc\u001b[0m in \u001b[0;36mdrop\u001b[1;34m(self, labels, errors)\u001b[0m\n\u001b[0;32m 2568\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mmask\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0many\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2569\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0merrors\u001b[0m \u001b[1;33m!=\u001b[0m \u001b[1;34m'ignore'\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2570\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'labels %s not contained in axis'\u001b[0m \u001b[1;33m%\u001b[0m \u001b[0mlabels\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mmask\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2571\u001b[0m \u001b[0mindexer\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mindexer\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m~\u001b[0m\u001b[0mmask\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2572\u001b[0m \u001b[1;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mdelete\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mindexer\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mValueError\u001b[0m: labels ['Ticket'] not contained in axis"], "output_type": "error"}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to calculate the mean age and fill all the null values in the Age column with that number..\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 13, "cell_type": "code", "source": "mean_age = np.mean(df.Age)\ndf.Age = df.Age.fillna(mean_age)\ndf.head(10)", "outputs": [{"execution_count": 13, "output_type": "execute_result", "data": {"text/plain": " PassengerId Survived Pclass \\\n0 1 0 3 \n1 2 1 1 \n2 3 1 3 \n3 4 1 1 \n4 5 0 3 \n5 6 0 3 \n6 7 0 1 \n7 8 0 3 \n8 9 1 3 \n9 10 1 2 \n\n Name Sex Age \\\n0 Braund, Mr. Owen Harris male 22.000000 \n1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.000000 \n2 Heikkinen, Miss. Laina female 26.000000 \n3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.000000 \n4 Allen, Mr. William Henry male 35.000000 \n5 Moran, Mr. James male 29.699118 \n6 McCarthy, Mr. Timothy J male 54.000000 \n7 Palsson, Master. Gosta Leonard male 2.000000 \n8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.000000 \n9 Nasser, Mrs. Nicholas (Adele Achem) female 14.000000 \n\n SibSp Parch Fare Cabin Embarked \n0 1 0 7.2500 NaN S \n1 1 0 71.2833 C85 C \n2 0 0 7.9250 NaN S \n3 1 0 53.1000 C123 S \n4 0 0 8.0500 NaN S \n5 0 0 8.4583 NaN Q \n6 0 0 51.8625 E46 S \n7 3 1 21.0750 NaN S \n8 0 2 11.1333 NaN S \n9 1 0 30.0708 NaN C ", "text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>PassengerId</th>\n <th>Survived</th>\n <th>Pclass</th>\n <th>Name</th>\n <th>Sex</th>\n <th>Age</th>\n <th>SibSp</th>\n <th>Parch</th>\n <th>Fare</th>\n <th>Cabin</th>\n <th>Embarked</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1</td>\n <td>0</td>\n <td>3</td>\n <td>Braund, Mr. Owen Harris</td>\n <td>male</td>\n <td>22.000000</td>\n <td>1</td>\n <td>0</td>\n <td>7.2500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>1</th>\n <td>2</td>\n <td>1</td>\n <td>1</td>\n <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n <td>female</td>\n <td>38.000000</td>\n <td>1</td>\n <td>0</td>\n <td>71.2833</td>\n <td>C85</td>\n <td>C</td>\n </tr>\n <tr>\n <th>2</th>\n <td>3</td>\n <td>1</td>\n <td>3</td>\n <td>Heikkinen, Miss. Laina</td>\n <td>female</td>\n <td>26.000000</td>\n <td>0</td>\n <td>0</td>\n <td>7.9250</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4</td>\n <td>1</td>\n <td>1</td>\n <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n <td>female</td>\n <td>35.000000</td>\n <td>1</td>\n <td>0</td>\n <td>53.1000</td>\n <td>C123</td>\n <td>S</td>\n </tr>\n <tr>\n <th>4</th>\n <td>5</td>\n <td>0</td>\n <td>3</td>\n <td>Allen, Mr. William Henry</td>\n <td>male</td>\n <td>35.000000</td>\n <td>0</td>\n <td>0</td>\n <td>8.0500</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>5</th>\n <td>6</td>\n <td>0</td>\n <td>3</td>\n <td>Moran, Mr. James</td>\n <td>male</td>\n <td>29.699118</td>\n <td>0</td>\n <td>0</td>\n <td>8.4583</td>\n <td>NaN</td>\n <td>Q</td>\n </tr>\n <tr>\n <th>6</th>\n <td>7</td>\n <td>0</td>\n <td>1</td>\n <td>McCarthy, Mr. Timothy J</td>\n <td>male</td>\n <td>54.000000</td>\n <td>0</td>\n <td>0</td>\n <td>51.8625</td>\n <td>E46</td>\n <td>S</td>\n </tr>\n <tr>\n <th>7</th>\n <td>8</td>\n <td>0</td>\n <td>3</td>\n <td>Palsson, Master. Gosta Leonard</td>\n <td>male</td>\n <td>2.000000</td>\n <td>3</td>\n <td>1</td>\n <td>21.0750</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>8</th>\n <td>9</td>\n <td>1</td>\n <td>3</td>\n <td>Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)</td>\n <td>female</td>\n <td>27.000000</td>\n <td>0</td>\n <td>2</td>\n <td>11.1333</td>\n <td>NaN</td>\n <td>S</td>\n </tr>\n <tr>\n <th>9</th>\n <td>10</td>\n <td>1</td>\n <td>2</td>\n <td>Nasser, Mrs. Nicholas (Adele Achem)</td>\n <td>female</td>\n <td>14.000000</td>\n <td>1</td>\n <td>0</td>\n <td>30.0708</td>\n <td>NaN</td>\n <td>C</td>\n </tr>\n </tbody>\n</table>\n</div>"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": false}}, {"source": "### Save Your Work\n...you will need it in a few weeks!", "cell_type": "markdown", "metadata": {}}, {"source": "__=>YOUR TURN!__\n\nUse ```pandas``` to write your dataframe to our sqlite database.\n\nRead the documentation to find out how: \nhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html", "cell_type": "markdown", "metadata": {}}, {"execution_count": 17, "cell_type": "code", "source": "pd_sql.to_sql(df, \"training_data\", con) ", "outputs": [], "metadata": {"collapsed": false, "trusted": false}}, {"execution_count": null, "cell_type": "code", "source": "", "outputs": [], "metadata": {"collapsed": true, "trusted": false}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.10", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment