Skip to content

Instantly share code, notes, and snippets.

@arghyadeep99
Created September 6, 2019 19:35
Show Gist options
  • Save arghyadeep99/7c51e86331b3512ba9684023474ac27f to your computer and use it in GitHub Desktop.
Save arghyadeep99/7c51e86331b3512ba9684023474ac27f to your computer and use it in GitHub Desktop.
Logistic Regression.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Logistic Regression.ipynb",
"version": "0.3.2",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/arghyadeep99/7c51e86331b3512ba9684023474ac27f/logistic-regression.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gpfV-EvZPiP0",
"colab_type": "text"
},
"source": [
"# Logistic Regression\n",
"\n",
"####While we used linear regression to deal with continuous data, that helped in predicting a future value on the basis of past data, logistic regression is different. Logsitic regression is used when the dependent variable is dichotomous (binary). For example: positive-negative, 0-1, pass-fail, benign-malignant, etc. It is assumed that the data for such dichotomous nature would be independent and that there's no correlation of data between two classes.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "avQGnJNbTvNN",
"colab_type": "text"
},
"source": [
"Many will think, why not linear regression? That's because linear data plotted on graph may look something like this:\n",
"\n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=192bLivyYTXRLQaJsG8h-8jphieelQ-HX'/>\n",
" </center>\n",
" <center><figcaption><b>Linear Regression to fit dichotomous data</b></figcaption></center>\n",
"</figure>\n",
"\n",
"But what if we are faced with data having outlier points? \n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=1Bk_tgAS1gMjEKqh6CVo6iZ3vNYzLf6Hx'/>\n",
" </center>\n",
" <center><figcaption><b>Linear Regression to fit dichotomous data with outliers</b></figcaption></center>\n",
"</figure>\n",
"\n",
"It is clearly visible that the line shifts just because of one outlier. This increases confusion of the model. \n",
"\n",
"Hence, Logistic regression is used."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lPX6x1cRX81E",
"colab_type": "text"
},
"source": [
"In logistic regression, we use logistic functions that are used to plot probabilistic models. \n",
"\n",
"Sigmoid function is a logistic function that's used in logistic regression. This is how it looks: \n",
"\n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=1ro0WaW1kszK3InLQNVoKAoFD9K370REO'/>\n",
" </center>\n",
" <center><figcaption><b>Sigmoid function</b></figcaption></center>\n",
"</figure>\n",
"\n",
"### Let's plot this graph using Python."
]
},
{
"cell_type": "code",
"metadata": {
"id": "r0ZJewoXXrKQ",
"colab_type": "code",
"colab": {}
},
"source": [
"import math\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import warnings\n",
"\n",
"warnings.simplefilter('ignore')\n",
"\n",
"def sigmoid(x):\n",
" a = []\n",
" for item in x:\n",
" a.append(1/(1+math.exp(-item)))\n",
" return a"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "v1PijRvKaf2E",
"colab_type": "code",
"outputId": "4213cfe1-17ab-4622-e920-f0e267891e3b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 269
}
},
"source": [
"x = np.arange(-10., 10., 0.2)\n",
"sig = sigmoid(x)\n",
"plt.plot(x,sig)\n",
"plt.grid(True)\n",
"plt.show()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XmYVPWd7/H3t3egmx1aBARRMICi\n2LigiYqiopmRTKIJmeuSiRmSTJjHTGbmjl7vdYzOzJ1MJnNv5saM2ZyM2YhmXFBRMKbJ4g7I1iza\nLEI3dDe0LN3QW1V97x9VOGWnl+qmqs+p6s/reerps/yq6sOp098+/Oqc8zN3R0REckte0AFERCT9\nVNxFRHKQiruISA5ScRcRyUEq7iIiOUjFXUQkB6m4i4jkIBV3EZEcpOIuIpKDCoJ647Fjx/rUqVP7\n9dzjx48zbNiw9AZKk7BmC2suCG825eq7sGYLay7oe7Z169YdcvdxvTZ090AeFRUV3l+VlZX9fm6m\nhTVbWHO5hzebcvVdWLOFNZd737MBaz2FGqtuGRGRHKTiLiKSg1TcRURykIq7iEgOUnEXEclBvRZ3\nM3vEzBrMbEs3683M/tXMqs1sk5ldmP6YIiLSF6kcuf8QWNTD+huA6YnHUuDfTj2WiIicil4vYnL3\n35jZ1B6aLAYeTZx/+ZqZjTSzCe5+IE0ZRSTHxWJOWyRGeyRGWyRKWyRGRzRGJOa0J013RGNEY04k\n5sQ6/3QnGnNiDjGPLz85vWNvB3tf3UMs5jgQ85PX+IBz8icfmD/JEzMn25ycfn89ntQ2eXmSTsOZ\nXjOznPMnj0zDluueufc+hmqiuD/r7ud2se5Z4B/d/XeJ+ZeAv3H3tV20XUr86J7y8vKK5cuX9yt0\nc3MzpaWl/XpupoU1W1hzQXizKVdq3J3jHXC03Wk4eoJIfgnN7c7xDudEBI53OC0RpzUCrVGnJQLt\nUactCm1RpyMKkUEwlLMlTd82q4irzygE+v55LliwYJ27z+ut3YDefsDdvwt8F2DevHl+1VVX9et1\n1qxZQ3+fm2lhzRbWXBDebMoV5+40NLWx+9Bx3m08zp7GE+w/0pJ4tHKwqY32aCzR2oC2959blJ/H\n8CGFDC8pYFhxAWOKCxhWnM+QogKGFuYzpCifksJ8igvyKC7Mo7ggPl1UkEdxQR6F+XkU5Fn8Z75R\nkBf/mZ9n5FviZ/LDjDwz8vKI/0xMv/rKq1x++WWJZWAYlhdPa2aJn4nliSr8/s/kZYn2J6dPMkue\nTl7Tu0x9nuko7rXA5KT5SYllIpJlOqIx3q5vYlPNUTbXHuXtuiZ21DfR1Bp5v01BnjFhZAmnjxjC\nxWeOpnx4CePKihlbWkTtzu0suOwiRg0tYuTQQkoK8wP81/yXEcXG2NLioGMMqHQU9xXAMjNbDlwC\nHFV/u0h2iERjvLXvCK/tbOS13Y2se/cwrR3xo/DhJQV8aMJwFl9wOjPKy5g2tpQpY4YyYUQJBfld\nn4ux5sg7zJwwfCD/CdKNXou7mf0MuAoYa2Y1wN8ChQDu/jCwErgRqAZOAH+SqbAicuqOt0V4aXsD\nL22rp3J7A8daI5jBzNOGs+SiM5h7xkjOnzSSKWOG9rmLQcIjlbNlPt3Lege+lLZEIpJ2sZjzys5G\nnlhfw/Nb6mjpiDJ6WBHXzT6Naz40nvlnjWHk0KKgY0oaBXY/dxHJvONtEX6xroZ/f3k3expPUFZS\nwMfmns7HLpjIvKmjyc/TkXmuUnEXyUFNrR1877e7+eHLuznWGuGCySP55rUzuH72aaH5klMyS8Vd\nJIe0dkT58Wvv8lBlNYdPdHD97HKWXnEWFVNGBR1NBpiKu0iOeGXnIe59cgu7Dx3nI9PH8tfXn8Oc\nSZm9ClLCS8VdJMsdPdHB36/cymNra5gyZiiPfvZirpjR+xCbkttU3EWy2Lp3D7Psp+tpaGrjC1ee\nxV3XTGdIkfrURcVdJCu5O4+8vIf/vXIbE0aW8OSfXaYuGPkAFXeRLNMedZb97C2e23SA62aV8/Vb\nzmfEkMKgY0nIqLiLZJFjrR18Y20rOw6f4O4bPsTnr5imq0ilSyruIlmi/lgrdzzyBtVHYnxzyQUs\nvmBi0JEkxFTcRbJA/bFWbnn4VRqb2/iLihIVdumVBsgWCbnDx9u57Qev09jcxo8/dwnnjtXZMNI7\nFXeREGtui/CZf3+DPY0n+N4d85h7hq40ldSouIuEVEc0xtJH17Jl/zG+/ccXctlZY4OOJFlExV0k\npP7+uW28srORr31iDgtnlQcdR7KMirtICD2+dh8/fGUPd374TG6umBR0HMlCKu4iIbNx3xHufWoL\nl501hntu+FDQcSRLqbiLhMjREx188cfrGFdazP/79NxuxyoV6Y3OcxcJkfufqaK+qY0nvngZY0qL\ng44jWUyHBSIhsXLzAZ58q5Y/v/pszp+sm4DJqVFxFwmBhqZW7n1yM3MmjeBLC84OOo7kABV3kYC5\nO//jic2caI/yL588n0L1s0saaC8SCdjzW+r45bYG/vr6czh7fFnQcSRHqLiLBOhEe4S/e3YrMycM\n5zOXTQ06juQQFXeRAH27cif7j7by4OLZOu1R0kp7k0hAdh86znd/s4uPz53IvKmjg44jOUbFXSQA\n7s5Xn6miqCCPu3UVqmSAirtIAH5XfYg1Ow5y1zXTGT+8JOg4koNU3EUGmLvz9VU7mDhyCLdfNiXo\nOJKjVNxFBtiqqno21RzlroXTKS7QqEqSGSkVdzNbZGY7zKzazO7uYv0ZZlZpZm+Z2SYzuzH9UUWy\nXzTm/MuLO5g2bhgfn6txUCVzei3uZpYPPATcAMwCPm1mszo1+5/AY+4+F1gCfDvdQUVywYqNtbxd\n38xXrp2hUx8lo1LZuy4Gqt19l7u3A8uBxZ3aODA8MT0C2J++iCK5oSMa4/+8+A4zJwznxnMnBB1H\nclwqxX0isC9pviaxLNn9wK1mVgOsBP48LelEcsjTG/az970T/OW1M8jLs6DjSI4zd++5gdnNwCJ3\n/1xi/jbgEndfltTmK4nX+oaZzQd+AJzr7rFOr7UUWApQXl5esXz58n6Fbm5uprS0tF/PzbSwZgtr\nLghvtnTmirnzv15uwYAHLx+CWf+Le1i3F4Q3W1hzQd+zLViwYJ27z+u1obv3+ADmA6uS5u8B7unU\npgqYnDS/Cxjf0+tWVFR4f1VWVvb7uZkW1mxhzeUe3mzpzPXLrXU+5W+e9SfW7zvl1wrr9nIPb7aw\n5nLvezZgrfdSt909pW6ZN4HpZnammRUR/8J0Rac2e4FrAMxsJlACHEzhtUUGhe/8ehcTRw7hD+ac\nHnQUGSR6Le7uHgGWAauAbcTPiqkyswfM7KZEs78E/tTMNgI/Az6T+AsjMuite/cwb+x5jzs/fKbu\n1S4DJqUxVN19JfEvSpOX3Zc0vRW4PL3RRHLDd369kxFDCvnURZODjiKDiA4jRDJo58FmXtxWz+3z\npzCsWOPRy8BRcRfJoB+9+i6FeXncPn9q0FFkkFFxF8mQ420R/nNdDTeedxrjyoqDjiODjIq7SIY8\nvWE/TW0RbpuvOz/KwFNxF8kAd+fRV/cwc8JwLjxjVNBxZBBScRfJgPV7D7O9ronbLp1ySlejivSX\nirtIBvzo1XcpKy5g8QW6aEmCoeIukmaHmttYubmOT1RM0umPEhgVd5E0+891NbRHY9x66RlBR5FB\nTMVdJI3cncfX1VAxZRRnjy8LOo4MYiruImm0Yd8RqhuauaViUtBRZJBTcRdJo8fX1VBSmMdH52ik\nJQmWirtImrR2RHlm435uOHcCZSWFQceRQU7FXSRNVlXV0dQaUZeMhIKKu0iaPL62hkmjhnDptDFB\nRxFRcRdJh9ojLby88xCfuHCSBr+WUFBxF0mDp96qxR1uVpeMhISKu8gpcneeequWi6aOYvLooUHH\nEQFU3EVO2fa6Jt5paOamCyYGHUXkfSruIqfo6Q37KcgzPnqezm2X8FBxFzkFsZjzzMb9fGT6WEYP\nKwo6jsj7VNxFTsG6vYepPdLCYnXJSMiouIucghUb9lNSmMe1s8qDjiLyASruIv3UEY3x3OYDLJxZ\nrvu2S+iouIv00++qD/He8XZ1yUgoqbiL9NMzG/czvKSAK2aMDTqKyO9RcRfph/ZIjBe31nPd7NMo\nLsgPOo7I71FxF+mHl6sP0dQa4cbzTgs6ikiXVNxF+uG5zQcoKyngw2ePCzqKSJdU3EX6qD0SY3VV\nHdfOKqeoQL9CEk7aM0X66OWdhzjWGtHtBiTUUiruZrbIzHaYWbWZ3d1Nm0+a2VYzqzKzn6Y3pkh4\nPL/5AGXFBXx4us6SkfDq9coLM8sHHgKuBWqAN81shbtvTWozHbgHuNzdD5vZ+EwFFglSRzTG6q31\nLJxVrrNkJNRSOXK/GKh2913u3g4sBxZ3avOnwEPufhjA3RvSG1MkHF7Z2ciREx3cqC4ZCTlz954b\nmN0MLHL3zyXmbwMucfdlSW2eAt4GLgfygfvd/YUuXmspsBSgvLy8Yvny5f0K3dzcTGlpab+em2lh\nzRbWXBDebF3lemRLG28ciPCvVw+lKD+Y4fTCur0gvNnCmgv6nm3BggXr3H1erw3dvccHcDPw/aT5\n24BvdWrzLPAkUAicCewDRvb0uhUVFd5flZWV/X5upoU1W1hzuYc3W+dckWjML3xgtS/76fpgAiWE\ndXu5hzdbWHO59z0bsNZ7qdvunlK3TC0wOWl+UmJZshpghbt3uPtu4kfx01N4bZGssXbPezQeb2fR\nbF24JOGXSnF/E5huZmeaWRGwBFjRqc1TwFUAZjYWmAHsSmNOkcC9UFVHUUEeV52jC5ck/Hot7u4e\nAZYBq4BtwGPuXmVmD5jZTYlmq4BGM9sKVAJ/7e6NmQotMtDcndVV9Vwxfaxu7ytZIaW91N1XAis7\nLbsvadqBryQeIjlnS+0xao+0cNdC9TZKdtAVqiIpWFVVR36esXCmRlyS7KDiLpKCF6rquHjqaA2C\nLVlDxV2kF9UNzVQ3NLPoXJ0lI9lDxV2kF6uq6gC4bra6ZCR7qLiL9GL11nrOnzSCCSOGBB1FJGUq\n7iI9qDvaysZ9R7hOFy5JllFxF+nBi9vqAbheXTKSZVTcRXqwuqqOaWOHcda4cN50SqQ7Ku4i3Tje\n4by6s5FrZ5djFswdIEX6S8VdpBubD0aJxJzrZqm/XbKPirtIN9Y3RBhbWszcySODjiLSZyruIl1o\ni0TZdDDKtbPKyctTl4xkHxV3kS68urOR1qguXJLspeIu0oXVW+spyYfLzhoTdBSRflFxF+kkFnNe\n3FrPuWPzKS7IDzqOSL+ouIt0sqHmCAeb2qgo16Ackr1U3EU6WV1VT0GeMWecjtole6m4i3Syemsd\n888aw7BCnSUj2UvFXSRJdUMzuw4e57pZOktGspuKu0iSF7fGbxS2UMVdspyKu0iS1VvrmKN7t0sO\nUHEXSWg41spbe4+oS0Zygoq7SMLJe7drYA7JBSruIgmrq+qZOmYo08fr3u2S/VTcRYBjrR28svMQ\n180+Tfdul5yg4i4CVG5voCPqXK8uGckRKu4iwAtb6hhfpnu3S+5QcZdBr7UjypodB7lutu7dLrlD\nxV0Gvd+8fZCWjqi6ZCSnqLjLoLeqqp7hJQVcOk33bpfckVJxN7NFZrbDzKrN7O4e2n3CzNzM5qUv\nokjmdERj/HJbPQtnllOYr2MdyR297s1mlg88BNwAzAI+bWazumhXBtwFvJ7ukCKZ8sbu9zja0qEL\nlyTnpHKocjFQ7e673L0dWA4s7qLdg8DXgNY05hPJqBe21FFSmMeVM8YFHUUkrVIp7hOBfUnzNYll\n7zOzC4HJ7v5cGrOJZFQs5qyqquPKGeMYUqSBOSS3mLv33MDsZmCRu38uMX8bcIm7L0vM5wG/Aj7j\n7nvMbA3wV+6+tovXWgosBSgvL69Yvnx5v0I3NzdTWhrOS8TDmi2suSC4bG8fjvIPr7fy+TnFzD/9\n94fUC+s2C2suCG+2sOaCvmdbsGDBOnfv/XtNd+/xAcwHViXN3wPckzQ/AjgE7Ek8WoH9wLyeXrei\nosL7q7Kyst/PzbSwZgtrLvfgsv3t01t8+r0r/VhLe5frw7rNwprLPbzZwprLve/ZgLXeS91295S6\nZd4EppvZmWZWBCwBViT9cTjq7mPdfaq7TwVeA27yLo7cRcIiFnNe2FLHFdPHUVZSGHQckbTrtbi7\newRYBqwCtgGPuXuVmT1gZjdlOqBIJry17zB1x1r56BydJSO56fc7Grvg7iuBlZ2W3ddN26tOPZZI\nZj23qY6i/DyumamBOSQ36aoNGXRiMef5LQe4YsZYhqtLRnKUirsMOhtqjnDgaCs3nDsh6CgiGaPi\nLoPOyk0HKMw3FmqsVMlhKu4yqLg7z2+p4yPTxzFiiLpkJHepuMugsn7vYWqPtPDR89QlI7lNxV0G\nlac37Ke4II/rz9UpkJLbVNxl0OiIxnhu0wEWziqntDils4BFspaKuwwaL1cfovF4O4vPPz3oKCIZ\np+Iug8aKjfsZXlLAlefo9r6S+1TcZVBo7YiyaksdN543geIC3d5Xcp+KuwwKL21r4Hh7lJvUJSOD\nhIq7DApPb6hlfFkxl2gQbBkkVNwl5x050c6aHQf5gzmnk59nQccRGRAq7pLzVmzcT3s0xicqJvbe\nWCRHqLhLznt8bQ2zJgxn9ukjgo4iMmBU3CWnba87xubao9wyb1LQUUQGlIq75LTH19ZQmG8svkBd\nMjK4qLhLzuqIxnjqrVoWzixn9LCioOOIDCgVd8lZv9reQOPxdm6uUJeMDD4q7pKzHl9bw7iyYq6c\nodsNyOCj4i45qaGplcodDXx87kQK8rWby+CjvV5y0s/f2Ec05nzqoslBRxEJhIq75JxINMZP39jL\nR6aPZdq40qDjiARCxV1yzkvbGzhwtJVbL50SdBSRwKi4S8758WvvcvqIEq750Pigo4gERsVdcsqu\ng8389p1D/PElZ+iLVBnUtPdLTvnxa3spzDc+qS9SZZBTcZeccaI9wuPr9rHo3AmMLysJOo5IoFTc\nJWf8Yl0NTa0Rbp+vL1JFVNwlJ0SiMb73211ceMZI5k0ZFXQckcCpuEtOWLmljn3vtfD5K8/CTKMt\niaRU3M1skZntMLNqM7u7i/VfMbOtZrbJzF4yM/2/WAaMu/OdX+9k2rhhXDuzPOg4IqHQa3E3s3zg\nIeAGYBbwaTOb1anZW8A8d58D/AL4p3QHFenOy9WNVO0/xuevmEaexkgVAVI7cr8YqHb3Xe7eDiwH\nFic3cPdKdz+RmH0N0D1WZcA8/OudjC8r5mNzNSCHyEnm7j03MLsZWOTun0vM3wZc4u7Lumn/LaDO\n3f+ui3VLgaUA5eXlFcuXL+9X6ObmZkpLw3nPkLBmC2suOLVsu49G+eqrrXxyRiE3TkvvgBxh3WZh\nzQXhzRbWXND3bAsWLFjn7vN6bejuPT6Am4HvJ83fBnyrm7a3Ej9yL+7tdSsqKry/Kisr+/3cTAtr\ntrDmcj+1bLf/4HU//6ur/GhLe/oCJYR1m4U1l3t4s4U1l3vfswFrvZf66u4pdcvUAsmX+01KLPsA\nM1sI3Avc5O5tKbyuyCl5c897/Prtg3zhyrMYXlIYdByRUEmluL8JTDezM82sCFgCrEhuYGZzge8Q\nL+wN6Y8p8kHuztdf2MG4smLumD816DgiodNrcXf3CLAMWAVsAx5z9yoze8DMbko0+zpQCjxuZhvM\nbEU3LyeSFr955xBv7HmPP7/6bIYU5QcdRyR0ClJp5O4rgZWdlt2XNL0wzblEuuXu/POqHUwcOYQl\nF50RdByRUNIVqpJ1ntt8gM21R/nywukUFWgXFumKfjMkq5xoj/APz21j5oTh/JHOaxfploq7ZJWH\nKqvZf7SVBxbP1mAcIj3Qb4dkjd2HjvO93+zm43MnctHU0UHHEQk1FXfJCu7OV5+poqggj7tv+FDQ\ncURCT8VdssKqqjrW7DjIlxdOZ/xwjbIk0hsVdwm9Q81t3PvkFmZNGM4dl00NOo5IVkjpPHeRoLg7\n9z65mabWCD/90wso1JeoIinRb4qE2hPra1lVVc9fXT+Dc04rCzqOSNZQcZfQqj3Swv0rqrh46mju\n/PC0oOOIZBUVdwmltkiUL/1kPTF3/vmW88nXCEsifaI+dwkdd+e+p6rYsO8ID996IWeMGRp0JJGs\noyN3CZ2fvL6Xn6/dx5cWnMWicycEHUckK6m4S6i8uec9vvpMFVedM46vXHtO0HFEspaKu4TG1v3H\nuPOHbzJ51FC+uWSu+tlFToGKu4TC7kPHuf2R1xlWXMCjd17MiCEaNk/kVKi4S+AaW2Lc+v3XcYcf\n3XkJk0bpC1SRU6WzZSRQOw828w+vt9Lu+fxs6aWcPb406EgiOUHFXQLz1t7DfPaHbxKJOT9deinn\nThwRdCSRnKHiLoH41fZ6vvSTtxhXVsyXZhdw3iQVdpF0Up+7DKhozPnG6h3c+R9rOWv8MH7xxfmU\nD9NuKJJuOnKXAdPQ1MpdP9vAq7sa+dS8yXx18WxKCvPZGnQwkRyk4i4Z5+48+VYtDz67lZaOKP98\ny/ncXDEp6FgiOU3FXTJqb+MJ7n1qM7995xAXnjGSr31iDtPLdetekUxTcZeMaGxu46HKnfz4tXcp\nKsjjwcWz+W+XTCFPV52KDAgVd0mrg01t/OjVPfzgd7tp6YhyS8Vk/uLaGZw2QuOeigwkFXdJi637\nj/HvL+/m6Q37aY/GuOHc0/jL687RRUkiAVFxl3472NTGio37eWJ9DVX7j1FSmMcnL5rEn1x+JmeN\nU1EXCZKKu6TM3aluaOal7Q38cms96/ceJuZw3sQR/O0fzuJjF0xk1LCioGOKCCru0oOOaIwddU1s\n2HeE13e/x2u7GjnY1AbA7NOHs+zq6fzhnAk6+0UkhFIq7ma2CPgmkA98393/sdP6YuBRoAJoBD7l\n7nvSG1UyJRZz6pta2XXwOG/XN/F2fTPb646xdf8x2iIxAMaXFTN/2hgunTaGK88Zx8SRQwJOLSI9\n6bW4m1k+8BBwLVADvGlmK9w9+cLCO4HD7n62mS0BvgZ8KhOBpW/aIlGOnOjg3WNR1uxo4GBTGw1N\nbdQeaeHAkRZqj7TwbuOJ94s4wIghhZxzWhm3XjqFOZNGcP6kkUwZMxQzncYoki1SOXK/GKh2910A\nZrYcWAwfuGp8MXB/YvoXwLfMzNzd05g168ViTtSdaCz+iJz8GY3REXOiUacjFqMjGqMj4rRHo7RF\nYrQnHq2RGK0dUdo6orR0RDnRHqWlPcrx9gjH26I0tUZobuvgaEuEYy0dHG3poLkt8l8BXnnz/clR\nQws5feQQpowZxpUzxjFlzDCmjhnGjPJSxpUVq5CLZLlUivtEYF/SfA1wSXdt3D1iZkeBMcChdIRM\n9tib+/i/vz3B0PW/JvF+XbbzbmZOTrr7B9qcfBnHcU+aT2rnHl8fe3/9yel4m1jM6YhEyPvVC8Qc\nou54opjHMvRnrrggj2HFBZQWFzCsuICy4gImjhzCzAlljBhSyJhhRYwaVsT+3e9w9fwLGVdawriy\nYoYU5WcmkIiEwoB+oWpmS4GlAOXl5axZs6bPr1HbEKF8SIyCvJb/et1U3vsDObpefnLGsPeXm/3+\nc+1kU0vcVjOxLM8g0uEUFRpmRl7S8jyLPzcv6ZFvRv7J6TzITywryIOCxHxhnlGYmC/MN4ryoDAf\nivON4nzI+8ARdjTxaPvgP74FRpS20rR7E03ArhS210Bqbm7u176QacrVd2HNFtZckMFsnji67O4B\nzAdWJc3fA9zTqc0qYH5iuoD4Ebv19LoVFRXeX5WVlf1+bqaFNVtYc7mHN5ty9V1Ys4U1l3vfswFr\nvZe67e4p3c/9TWC6mZ1pZkXAEmBFpzYrgDsS0zcDv0qEEBGRAPTaLePxPvRlxI/O84FH3L3KzB4g\n/hdkBfAD4EdmVg28R/wPgIiIBCSlPnd3Xwms7LTsvqTpVuCW9EYTEZH+0vhmIiI5SMVdRCQHqbiL\niOQgFXcRkRyk4i4ikoMsqNPRzewg8G4/nz6WDNzaIE3Cmi2suSC82ZSr78KaLay5oO/Zprj7uN4a\nBVbcT4WZrXX3eUHn6EpYs4U1F4Q3m3L1XVizhTUXZC6bumVERHKQiruISA7K1uL+3aAD9CCs2cKa\nC8KbTbn6LqzZwpoLMpQtK/vcRUSkZ9l65C4iIj0IbXE3s1vMrMrMYmY2r9O6e8ys2sx2mNn13Tz/\nTDN7PdHu54nbFWci58/NbEPiscfMNnTTbo+ZbU60W5uJLJ3e734zq03KdmM37RYltmO1md2d6VyJ\n9/y6mW03s01m9qSZjeym3YBss962gZkVJz7n6sQ+NTVTWZLec7KZVZrZ1sTvwV1dtLnKzI4mfcb3\ndfVaGcrX42djcf+a2GabzOzCAch0TtK22GBmx8zsy53aDNg2M7NHzKzBzLYkLRttZi+a2TuJn6O6\nee4diTbvmNkdXbXpVSo3fQ/iAcwEzgHWAPOSls8CNgLFwJnATiC/i+c/BixJTD8MfHEAMn8DuK+b\ndXuAsQO4/e4H/qqXNvmJ7TcNKEps11kDkO06oCAx/TXga0Fts1S2AfBnwMOJ6SXAzwdgG00ALkxM\nlwFvd5HrKuDZgdqn+vLZADcCzxMfjOxS4PUBzpcP1BE/JzyQbQZcAVwIbEla9k/A3Ynpu7va94HR\nxAdMGw2MSkyP6uv7h/bI3d23ufuOLlYtBpa7e5u77waqiQ/i/T6Lj+58NfHBugH+A/hYJvMm3vOT\nwM8y+T5p9v7g5+7eDpwc/Dyj3H21u58cufs1YFKm37MHqWyDxcT3IYjvU9dYhkcQd/cD7r4+Md0E\nbCM+VnG2WAw86nGvASPNbMIAvv81wE537++FkqfM3X9DfHyLZMn7Und16XrgRXd/z90PAy8Ci/r6\n/qEt7j3oasDuzjv9GOBIUgHpqk26fQSod/d3ulnvwGozW5cYS3YgLEv8l/iRbv77l8q2zLTPEj/C\n68pAbLNUtsEHBoAHTg4APyAS3UBzgde7WD3fzDaa2fNmNnugMtH7ZxP0vrWE7g+0gtpmAOXufiAx\nXQeUd9EmLdtuQAfI7szMfgnIAeTSAAAC00lEQVSc1sWqe9396YHO050Uc36ano/aP+zutWY2HnjR\nzLYn/rJnJBfwb8CDxH8JHyTeZfTZU3m/dGU7uc3M7F4gAvykm5dJ+zbLNmZWCvwn8GV3P9Zp9Xri\n3Q7Nie9UngKmD1C00H42ie/XbiI+3nNnQW6zD3B3N7OMna4YaHF394X9eFotMDlpflJiWbJG4v8N\nLEgcaXXVJmW95TSzAuDjQEUPr1Gb+NlgZk8S7w44pV+GVLefmX0PeLaLValsy35JYZt9BvgD4BpP\ndDR28Rpp32ZdSGUbnGxTk/isRxDfxzLKzAqJF/afuPsTndcnF3t3X2lm3zazse6e8XuopPDZZGzf\nSsENwHp3r++8IshtllBvZhPc/UCim6qhiza1xL8bOGkS8e8e+yQbu2VWAEsSZzCcSfyv7hvJDRLF\nopL4YN0QH7w7k/8TWAhsd/earlaa2TAzKzs5TfwLxS1dtU2XTv2bf9TN+6Uy+Hkmsi0C/jtwk7uf\n6KbNQG2zUA4An+jT/wGwzd3/pZs2p53s+zezi4n/Pg/EH51UPpsVwO2Js2YuBY4mdUdkWrf/iw5q\nmyVJ3pe6q0urgOvMbFSiO/W6xLK+GYhvjfvzIF6QaoA2oB5YlbTuXuJnOOwAbkhavhI4PTE9jXjR\nrwYeB4ozmPWHwBc6LTsdWJmUZWPiUUW8ayLT2+9HwGZgU2KHmtA5V2L+RuJnYuwciFyJ96wm3qe4\nIfF4uHO2gdxmXW0D4AHif3wAShL7UHVin5o2ANvow8S71DYlbacbgS+c3NeAZYlts5H4F9OXDdDn\n1+Vn0ymbAQ8ltulmks54y3C2YcSL9YikZYFsM+J/YA4AHYladifx72peAt4BfgmMTrSdB3w/6bmf\nTexv1cCf9Of9dYWqiEgOysZuGRER6YWKu4hIDlJxFxHJQSruIiI5SMVdRCQHqbiLiOQgFXcRkRyk\n4i4ikoP+P7xoo1i73lcEAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ItbgATl7tE0G",
"colab_type": "text"
},
"source": [
"### As an example, we will start working on the famous Titanic Dataset hosted on Kaggle. This example will also help us understand some data pre processing and how to draw inference and make next steps.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "oykQKWNotUX1",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np \n",
"import pandas as pd \n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib import style\n",
"\n",
"# Algorithms\n",
"from sklearn import linear_model\n",
"from sklearn.linear_model import LogisticRegression\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "7BHDjA76uPFa",
"colab_type": "code",
"outputId": "314974ac-1b09-4d28-af20-b36c51cddc1e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 122
}
},
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code\n",
"\n",
"Enter your authorization code:\n",
"··········\n",
"Mounted at /content/drive\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "VKbFxzy8tX_q",
"colab_type": "code",
"colab": {}
},
"source": [
"test_df = pd.read_csv(\"/content/drive/My Drive/ML-DL101 Datasets/test_titanic.csv\")\n",
"train_df = pd.read_csv(\"/content/drive/My Drive/ML-DL101 Datasets/train_titanic.csv\")"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "e7Pb5nNeu-8Q",
"colab_type": "code",
"outputId": "9b75e656-1665-4e47-b26d-aabf41ed3a8b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 306
}
},
"source": [
"train_df.info()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
"PassengerId 891 non-null int64\n",
"Survived 891 non-null int64\n",
"Pclass 891 non-null int64\n",
"Name 891 non-null object\n",
"Sex 891 non-null object\n",
"Age 714 non-null float64\n",
"SibSp 891 non-null int64\n",
"Parch 891 non-null int64\n",
"Ticket 891 non-null object\n",
"Fare 891 non-null float64\n",
"Cabin 204 non-null object\n",
"Embarked 889 non-null object\n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.6+ KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "h0UBHi6ivEUe",
"colab_type": "code",
"outputId": "023943f5-8143-4266-b033-f452daf1dd59",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
}
},
"source": [
"train_df.describe()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>714.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>446.000000</td>\n",
" <td>0.383838</td>\n",
" <td>2.308642</td>\n",
" <td>29.699118</td>\n",
" <td>0.523008</td>\n",
" <td>0.381594</td>\n",
" <td>32.204208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>257.353842</td>\n",
" <td>0.486592</td>\n",
" <td>0.836071</td>\n",
" <td>14.526497</td>\n",
" <td>1.102743</td>\n",
" <td>0.806057</td>\n",
" <td>49.693429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.420000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>223.500000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>20.125000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.910400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>446.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>14.454200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>668.500000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>38.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>891.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>80.000000</td>\n",
" <td>8.000000</td>\n",
" <td>6.000000</td>\n",
" <td>512.329200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass ... SibSp Parch Fare\n",
"count 891.000000 891.000000 891.000000 ... 891.000000 891.000000 891.000000\n",
"mean 446.000000 0.383838 2.308642 ... 0.523008 0.381594 32.204208\n",
"std 257.353842 0.486592 0.836071 ... 1.102743 0.806057 49.693429\n",
"min 1.000000 0.000000 1.000000 ... 0.000000 0.000000 0.000000\n",
"25% 223.500000 0.000000 2.000000 ... 0.000000 0.000000 7.910400\n",
"50% 446.000000 0.000000 3.000000 ... 0.000000 0.000000 14.454200\n",
"75% 668.500000 1.000000 3.000000 ... 1.000000 0.000000 31.000000\n",
"max 891.000000 1.000000 3.000000 ... 8.000000 6.000000 512.329200\n",
"\n",
"[8 rows x 7 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "EW2l3yuIvJ2q",
"colab_type": "code",
"outputId": "b658bfb4-84c5-438b-a45e-49339b1dda5f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
}
},
"source": [
"train_df.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass ... Fare Cabin Embarked\n",
"0 1 0 3 ... 7.2500 NaN S\n",
"1 2 1 1 ... 71.2833 C85 C\n",
"2 3 1 3 ... 7.9250 NaN S\n",
"3 4 1 1 ... 53.1000 C123 S\n",
"4 5 0 3 ... 8.0500 NaN S\n",
"\n",
"[5 rows x 12 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7h7PKBWuwTAR",
"colab_type": "text"
},
"source": [
"The head of the data gives us an indication of various parameters that need to be converted into numeric form for prediction. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "6LsdW4ojwigB",
"colab_type": "code",
"outputId": "00a0ec08-1dee-4815-eaa1-93c3b6ae3ee4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
}
},
"source": [
"total = train_df.isnull().sum().sort_values(ascending=False)\n",
"percent_1 = train_df.isnull().sum()/train_df.isnull().count()*100\n",
"percent_2 = (round(percent_1, 1)).sort_values(ascending=False)\n",
"missing_data = pd.concat([total, percent_2], axis=1, keys=['Total', '% missing'])\n",
"missing_data.head(5)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Total</th>\n",
" <th>% missing</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Cabin</th>\n",
" <td>687</td>\n",
" <td>77.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Age</th>\n",
" <td>177</td>\n",
" <td>19.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Embarked</th>\n",
" <td>2</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fare</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ticket</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Total % missing\n",
"Cabin 687 77.1\n",
"Age 177 19.9\n",
"Embarked 2 0.2\n",
"Fare 0 0.0\n",
"Ticket 0 0.0"
]
},
"metadata": {
"tags": []
},
"execution_count": 9
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LHm7kQY8w7Zs",
"colab_type": "text"
},
"source": [
"From above data, we can make following inferences:\n",
"\n",
"1. Around 77% of cabin data is missing. This % is huge. So, we can't afford to drop data row-wise. It's better to delete this column entirely since majority values are missing anyways.\n",
"\n",
"2. Embarked value can be easily filled.\n",
"\n",
"3. With some common sense, we can eliminate variables like PassengerId, Name and Ticket as we don't expect them to have much corelation with survival chance."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Vr0zZm8mPZFu",
"colab_type": "code",
"outputId": "285cee9b-07a5-4ecc-92c6-6762e4afa491",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
}
},
"source": [
"survived = 'survived'\n",
"not_survived = 'not survived'\n",
"fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(10, 4))\n",
"women = train_df[train_df['Sex']=='female']\n",
"men = train_df[train_df['Sex']=='male']\n",
"ax = sns.distplot(women[women['Survived']==1].Age.dropna(), bins=18, label = survived, ax = axes[0], kde =False)\n",
"ax = sns.distplot(women[women['Survived']==0].Age.dropna(), bins=40, label = not_survived, ax = axes[0], kde =False)\n",
"ax.legend()\n",
"ax.set_title('Female')\n",
"ax = sns.distplot(men[men['Survived']==1].Age.dropna(), bins=18, label = survived, ax = axes[1], kde = False)\n",
"ax = sns.distplot(men[men['Survived']==0].Age.dropna(), bins=40, label = not_survived, ax = axes[1], kde = False)\n",
"ax.legend()\n",
"_ = ax.set_title('Male')"
],
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAEWCAYAAABCPBKqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAHyNJREFUeJzt3X+01XWd7/HnW6AwNVBkef2FoFOO\nqEh5FMwy0gYpHY0bBqioLYvyx8rubZxxujVjt+5aOcumabx3nOwa+FtQs7xqmpmlMxbjgfAHoDkO\niCiFoZI6WoDv+8fe0BHOkX32Z++z9+E8H2udxd7f/d3f7/u7v+e8ee3v/u7PNzITSZIk1WeHVhcg\nSZLUnxmmJEmSChimJEmSChimJEmSChimJEmSChimJEmSChim1HYi4uKIuLbVdUjSliJidERkRAxu\ndS1qH4YpbSUiVkTEaxHxSpefvVpdlySVqva3P0TE7ltM/2U1JI1uTWXqzwxT6smfZ+bOXX6ea3VB\nktQgy4GZm+5ExKHAO1pXjvo7w5RqFhETI+LBiHgpIh6OiEldHvtpRHyt+vgrEfH/ImJERFwXEb+L\niIe6vuOLiG9FxDPVxxZGxAfqWa8k1eEa4Iwu988Ert50JyJOqB6p+l21T13c04IiYlhEXBkRqyPi\n2WofHNS80tWODFOqSUTsDdwBfA3YDfgL4JaIGNllthnALGBv4ADg58Cc6vzLgL/tMu9DwPjqY9cD\nN0XE0DrXK0m98QvgnRFxUDX4zAC6nqf5KpWwNRw4ATgnIj7Ww7LmAhuAPwHeA0wGPtWkutWmDFPq\nyferR4JeiojvA6cDd2bmnZn5RmbeA3QCH+3ynDmZ+VRmrgN+CDyVmT/OzA3ATVQaDQCZeW1mrs3M\nDZn5DeDtwIHd1FHLeiWptzYdnfozKm/2nt30QGb+NDMfrfacR4AbgA9uuYCI2INKL/p8Zr6amWuA\nb1IJZxpA/DaCevKxzPzxpjsR8U/AKRHx513mGQLc1+X+b7rcfq2b+zt3Wd5fAGcDewEJvBN40wmh\nVfvVsF5J6q1rgPuBMXT5iA8gIiYAXwcOAd5G5c3eTd0sYz8q/Wh1RGyatgPwTHNKVrsyTKlWzwDX\nZOanSxdUPT/qL4HjgCWZ+UZEvAhEN7M3bL2StElmPh0Ry6kcWTp7i4evB/438JHMfD0i/oHu3+w9\nA/we2L16BF4DlB/zqVbXAn8eEcdHxKCIGBoRkyJinzqWtQuVcwyeBwZHxN9QOTLV7PVKUldnA8dm\n5qtbTN8FeKEapI4ETu3uyZm5GvgR8I2IeGdE7BARB0TEVh8JavtmmFJNMvMZ4GTgi1RC0DPAhdT3\nO3Q3cBfwK+Bp4HV6OCze4PVK0mbVczw7u3noXOB/RsTLwN8A899iMWdQ+ShwKfAicDOwZ6NrVXuL\nzGx1DZIkSf2W7+4lSZIKGKYkSZIKGKYkSZIKGKYkSZIK9Ok4U7vvvnuOHj26L1cpqcUWLlz428zs\n95f/sX9JA0+t/atPw9To0aPp7OzuW6iStlcR8XSra2gE+5c08NTav/yYT5IkqYBhSpIkqYBhSpIk\nqYAXOla/t379elatWsXrr7/e6lIGtKFDh7LPPvswZMiQVpci9Sv2sNYr7V+GKfV7q1atYpdddmH0\n6NFERKvLGZAyk7Vr17Jq1SrGjBnT6nKkfsUe1lqN6F9+zKd+7/XXX2fEiBE2oRaKCEaMGOE7a6kO\n9rDWakT/Mkxpu2ATaj33gVQ//35aq/T1N0xJkiQV8JwpbXeuX7Cyocs7dcKohi6vVrfddhtLly7l\noosuKl7WzjvvzCuvvNKAqiQ12/bQwwZa/zJMbUfq+QNsVVBQxYYNGxg8uPs/w5NOOomTTjqpjyuS\nuuics/W0jk/2fR1qS/avP/JjPqkBXn31VU444QQOO+wwDjnkEObNm8fo0aP57W9/C0BnZyeTJk0C\n4OKLL2bWrFkcffTRzJo1i4kTJ7JkyZLNy5o0aRKdnZ3MnTuX888/n3Xr1rHffvvxxhtvbF7Xvvvu\ny/r163nqqaeYMmUKhx9+OB/4wAd4/PHHAVi+fDlHHXUUhx56KF/60pf69sWQ1K/Yv8oZpqQGuOuu\nu9hrr714+OGHeeyxx5gyZcpbzr906VJ+/OMfc8MNNzB9+nTmz58PwOrVq1m9ejUdHR2b5x02bBjj\nx4/nZz/7GQC33347xx9/PEOGDGH27NlcdtllLFy4kEsvvZRzzz0XgAsuuIBzzjmHRx99lD333LNJ\nWy1pe2D/KmeYkhrg0EMP5Z577uGv/uqveOCBBxg2bNhbzn/SSSex4447AvCJT3yCm2++GYD58+cz\nbdq0reafPn068+bNA+DGG29k+vTpvPLKKzz44IOccsopjB8/ns985jOsXr0agH/9139l5syZAMya\nNath2ylp+2P/Kuc5U1IDvPvd72bRokXceeedfOlLX+K4445j8ODBmw9tbzl+yU477bT59t57782I\nESN45JFHmDdvHv/8z/+81fJPOukkvvjFL/LCCy+wcOFCjj32WF599VWGDx/O4sWLu63Jr1pLqoX9\nq5xHpqQGeO6553jHO97B6aefzoUXXsiiRYsYPXo0CxcuBOCWW255y+dPnz6dv/u7v2PdunWMGzdu\nq8d33nlnjjjiCC644AJOPPFEBg0axDvf+U7GjBnDTTfdBFRG8X344YcBOProo7nxxhsBuO666xq5\nqZK2M/avch6Z0nanFd9QfPTRR7nwwgvZYYcdGDJkCJdffjmvvfYaZ599Nl/+8pc3n7zZk2nTpnHB\nBRfw5S9/ucd5pk+fzimnnMJPf/rTzdOuu+46zjnnHL72ta+xfv16ZsyYwWGHHca3vvUtTj31VC65\n5BJOPvnkBm2lpL7Q1z3M/lUuMrPPVtbR0ZGdnZ19tr6BZqAOjbBs2TIOOuigVpchut8XEbEwMzt6\neEq/MSD7V3dDI/TEIRPqZg9rDyX9y4/5JEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmS\nCjjOlLY/vfk6dy2a8JXvuXPnMnnyZPbaa6+GL7sn73vf+3jwwQeLl3PWWWdx4okndnvZCEkNYA/r\nVjv3MI9MSS0wd+5cnnvuuYYuMzM3X/6hO41oQpIE9rAtGaakQitWrOCggw7i05/+NAcffDCTJ0/m\ntddeA2Dx4sVMnDiRcePGMXXqVF588UVuvvlmOjs7Oe200xg/fvzmeTf5x3/8R8aOHcu4ceOYMWMG\nABdffDGXXnrp5nkOOeQQVqxYwYoVKzjwwAM544wzOOSQQ/jqV7/KhRdeuHm+uXPncv755wOVSzoA\nzJgxgzvuuGPzPGeddRY333wzGzdu5MILL+SII45g3LhxfPvb3wYqDe7888/nwAMP5MMf/jBr1qxp\nwqsoqVXsYeUMU1IDPPnkk5x33nksWbKE4cOHb76W1RlnnMEll1zCI488wqGHHspXvvIVpk2bRkdH\nB9dddx2LFy/efPX1Tb7+9a/zy1/+kkceeaTbi4Z2t+5zzz2XJUuWcO6553LrrbdufmzevHmbm9km\n06dPZ/78+QD84Q9/4N577+WEE07gyiuvZNiwYTz00EM89NBDfOc732H58uXceuutPPHEEyxdupSr\nr766rd8dSqqPPayMYUpqgDFjxjB+/HgADj/8cFasWMG6det46aWX+OAHPwjAmWeeyf3337/NZY0b\nN47TTjuNa6+9lsGDt31a43777cfEiRMBGDlyJPvvvz+/+MUvWLt2LY8//jhHH330m+b/yEc+wn33\n3cfvf/97fvjDH3LMMcew44478qMf/Yirr76a8ePHM2HCBNauXcuTTz7J/fffz8yZMxk0aBB77bUX\nxx57bG9fHkltzh5WZpthKiL2jYj7ImJpRCyJiAuq0y+OiGcjYnH156MNr07qJ97+9rdvvj1o0CA2\nbNhQ97LuuOMOzjvvPBYtWsQRRxzBhg0bGDx48JvOJXj99dc3395pp53e9PwZM2Ywf/58brnlFqZO\nnUpEvOnxoUOHMmnSJO6++27mzZvH9OnTgcqh8Msuu4zFixezePFili9fzuTJk+vejnZg/5JqYw8r\nU8uRqQ3AFzJzLDAROC8ixlYf+2Zmjq/+3Nm0KqV+aNiwYey666488MADAFxzzTWb3+HtsssuvPzy\ny1s954033uCZZ57hQx/6EJdccgnr1q3jlVdeYfTo0SxatAiARYsWsXz58h7XO3XqVH7wgx9www03\nbHV4fJPp06czZ84cHnjgAaZMmQLA8ccfz+WXX8769esB+NWvfsWrr77KMcccw7x589i4cSOrV6/m\nvvvuq/9F6Xv2L6lO9rDabfP4W2auBlZXb78cEcuAvRteidQobXT1+quuuorPfvaz/Od//if7778/\nc+ZUvvJ81lln8dnPfpYdd9yRn//855vPOdi4cSOnn34669atIzP53Oc+x/Dhw/n4xz/O1VdfzcEH\nH8yECRN497vf3eM6d911Vw466CCWLl3KkUce2e08kydPZtasWZx88sm87W1vA+BTn/oUK1as4L3v\nfS+ZyciRI/n+97/P1KlT+clPfsLYsWMZNWoURx11VINfpeaxf6lfsof1ux4WmVn7zBGjgfuBQ4D/\nDpwF/A7opPLu78VunjMbmA0watSow59++unSmtWD6xes7PVzTp0wqgmV9K1ly5Zx0EEHtboM0f2+\niIiFmdnRopK61jEa+1fv9Ga8o2YEgO7W30ZBo1HsYe2hpH/VfAJ6ROwM3AJ8PjN/B1wOHACMp/LO\n7xvdPS8zr8jMjszsGDlyZK2rk6SGsX9JaqaawlREDKHSiK7LzO8BZOZvMnNjZr4BfAfo/licJLWQ\n/UtSs9Xybb4ArgSWZebfd5m+Z5fZpgKPNb48qTa9+bhazdGO+8D+pf6iHf9+BpLS17+Wa/MdDcwC\nHo2IxdVpXwRmRsR4IIEVwGeKKpHqNHToUNauXcuIESO2+gqt+kZmsnbtWoYOHdrqUrZk/1Lbs4e1\nViP6Vy3f5vsXoLu961eJ1Rb22WcfVq1axfPPP9/qUga0oUOHss8++7S6jDexf6k/sIe1Xmn/quXI\nlNTWhgwZwpgxY1pdhiTVxR7W/xmmJEnN0eqhFaQ+4rX5JEmSChimJEmSChimJEmSChimJEmSChim\nJEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmSChimJEmS\nCgxudQEDxfULVvb6OadOGNWESiSpDXXOaXUFUt08MiVJklTAMCVJklTAMCVJklTAMCVJklTAMCVJ\nklTAMCVJklTAoRHUJxwaQpK0vfLIlCRJUgHDlCRJUgHDlCRJUgHDlCRJUoFthqmI2Dci7ouIpRGx\nJCIuqE7fLSLuiYgnq//u2vxyJal29i9JfaGWI1MbgC9k5lhgInBeRIwFLgLuzcx3AfdW70tSO7F/\nSWq6bYapzFydmYuqt18GlgF7AycDV1Vnuwr4WLOKlKR62L8k9YVenTMVEaOB9wALgD0yc3X1oV8D\ne/TwnNkR0RkRnc8//3xBqZJUP/uXpGapOUxFxM7ALcDnM/N3XR/LzASyu+dl5hWZ2ZGZHSNHjiwq\nVpLqYf+S1Ew1hamIGEKlEV2Xmd+rTv5NROxZfXxPYE1zSpSk+tm/JDVbLd/mC+BKYFlm/n2Xh24D\nzqzePhP4QePLk6T62b8k9YVars13NDALeDQiFlenfRH4OjA/Is4GngY+0ZwSJalu9i9JTbfNMJWZ\n/wJEDw8f19hyJKlx7F+S+oIjoEuSJBWo5WM+6U2uX7Cy1SVIktQ2PDIlSZJUwDAlSZJUwDAlSZJU\nwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAlSZJUwDAl\nSZJUwDAlSZJUYHCrC5AkqSk652w9reOTfV+HtnsemZIkSSpgmJIkSSpgmJIkSSpgmJIkSSpgmJIk\nSSpgmJIkSSrg0AiSpO6HEWhHpcMdOFyCmsAjU5IkSQUMU5IkSQUMU5IkSQUMU5IkSQW2GaYi4rsR\nsSYiHusy7eKIeDYiFld/PtrcMiWpPvYwSc1Wy5GpucCUbqZ/MzPHV3/ubGxZktQwc7GHSWqibYap\nzLwfeKEPapGkhrOHSWq2knGmzo+IM4BO4AuZ+WJ3M0XEbGA2wKhRowpWp2a4fsHKVpcgtco2e1hJ\n/6rnb+vUCQO7Ry5Y3vvMO2HMbk2oROqdek9Avxw4ABgPrAa+0dOMmXlFZnZkZsfIkSPrXJ0kNVRN\nPcz+JakWdYWpzPxNZm7MzDeA7wBHNrYsSWoee5ikRqorTEXEnl3uTgUe62leSWo39jBJjbTNc6Yi\n4gZgErB7RKwC/haYFBHjgQRWAJ9pYo2SVDd7mKRm22aYysyZ3Uy+sgm1SFLD2cMkNZsjoEuSJBUo\nGRpBait+FV1qoc45ra5AahmPTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmS\nJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUY3OoCenL9gpW9fs6p\nE0Y1oRJJUrtasPwFntrY/f8XB6x8YatpE8bstvWMnXO6X3jHJ0tK0wDikSlJkqQChilJkqQChilJ\nkqQChilJkqQChilJkqQChilJkqQCbTs0guobHmJ7MtC3X5LUP3hkSpIkqYBhSpIkqYBhSpIkqYBh\nSpIkqcA2w1REfDci1kTEY12m7RYR90TEk9V/d21umZJUH3uYpGar5cjUXGDKFtMuAu7NzHcB91bv\nS1I7mos9TFITbTNMZeb9wJaX3j4ZuKp6+yrgYw2uS5Iawh4mqdnqHWdqj8xcXb39a2CPnmaMiNnA\nbIBRo0bVuTpJaqiaepj9q384YOVNzVlw55ytp3V8sjnrUr9WfAJ6ZiaQb/H4FZnZkZkdI0eOLF2d\nJDXUW/Uw+5ekWtQbpn4TEXsCVP9d07iSJKnp7GGSGqbeMHUbcGb19pnADxpTjiT1CXuYpIapZWiE\nG4CfAwdGxKqIOBv4OvBnEfEk8OHqfUlqO/YwSc22zRPQM3NmDw8d1+BaJKnh7GGSms0R0CVJkgrU\nOzSCJEk9WrB8y6G9BpjuhlXoicMt9HsemZIkSSpgmJIkSSpgmJIkSSpgmJIkSSpgmJIkSSpgmJIk\nSSpgmJIkSSrgOFOSpAGjnvGvJozZ7a1n6M2YUrXqaZmOSdWWPDIlSZJUwDAlSZJUwDAlSZJUwDAl\nSZJUwDAlSZJUwDAlSZJUwKERpD5w/YKVvX7OqRNGNaESqffqGU5gu1XjMAi9ec2e2ljpD3X/zXdX\nk0Mo9CmPTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBUwTEmSJBXYroZG8Ovn6q16fmf6Sm9r\n83dZ6p8OWHlT5cag3VpbiOrmkSlJkqQChilJkqQChilJkqQChilJkqQCRSegR8QK4GVgI7AhMzsa\nUZQk9QV7mKRGaMS3+T6Umb9twHIkqRXsYZKK+DGfJElSgdIjUwn8KCIS+HZmXrHlDBExG5gNMGpU\n2Tg4m8fi6OKpUacULbNbnXO2ntbxyc03mzU2Ua3bV/o69NnrKLW/t+xhjehf/r2p7W3j/zxtW+mR\nqfdn5nuBjwDnRcQxW86QmVdkZkdmdowcObJwdZLUUG/Zw+xfkmpRFKYy89nqv2uAW4EjG1GUJPUF\ne5ikRqg7TEXEThGxy6bbwGTgsUYVJknNZA+T1Cgl50ztAdwaEZuWc31m3tWQqiSp+exhkhqi7jCV\nmf8BHNbAWiSpz9jDJDWKQyNIkiQVaMSgnZL6qXqG+Th1QtkQJ+pZb/eH+6JvLFj+QqtLaIzuhkBo\n1nIH2NAKHpmSJEkqYJiSJEkqYJiSJEkqYJiSJEkqYJiSJEkqYJiSJEkqYJiSJEkq4DhT6tEBK2/a\natpTo06pab7udPfcUrXW2O7raIR6xoxSazTsb2vQbn+8XeO4PtcvWFnz36z6Vi3jWT218c1/5z2O\nNdasMaUarac6+9k4VR6ZkiRJKmCYkiRJKmCYkiRJKmCYkiRJKmCYkiRJKmCYkiRJKuDQCC3S6K8m\n97S8Wr/CX2s9/fUr1f1leAOpN970Vfrl39jqcX/HtaVahl8A3vT7NGHMbm8xYy/0l+Ea6uCRKUmS\npAKGKUmSpAKGKUmSpAKGKUmSpAKGKUmSpAKGKUmSpALb5dAIvbki+4KVW03uXpeviR6wjWW2k3ar\np1692Y6+GOah5LnN+Lq6Qz9IA9P1C1ZywMoahzvoT7obRqHjk1tNur7m/8QrTp0wqt6K3pJHpiRJ\nkgoYpiRJkgoYpiRJkgoYpiRJkgoUhamImBIRT0TEv0fERY0qSpL6gj1MUiPUHaYiYhDwf4CPAGOB\nmRExtlGFSVIz2cMkNUrJkakjgX/PzP/IzD8ANwInN6YsSWo6e5ikhojMrO+JEdOAKZn5qer9WcCE\nzDx/i/lmA7Ordw8EntjGoncHfltXUc1jTbWxptoMtJr2y8yRTVp23WrpYXX0r03acR83g9u5/RgI\n2wi9386a+lfTB+3MzCuAK2qdPyI6M7OjiSX1mjXVxppqY039R2/71yYD5fV0O7cfA2EboXnbWfIx\n37PAvl3u71OdJkn9gT1MUkOUhKmHgHdFxJiIeBswA7itMWVJUtPZwyQ1RN0f82Xmhog4H7gbGAR8\nNzOXNKCmXh9S7wPWVBtrqo01tYEm9jAYOK+n27n9GAjbCE3azrpPQJckSZIjoEuSJBUxTEmSJBVo\nmzDVLpd1iIjvRsSaiHisy7TdIuKeiHiy+u+ufVjPvhFxX0QsjYglEXFBG9Q0NCL+LSIertb0ler0\nMRGxoLoP51VP6u1TETEoIn4ZEbe3Q00RsSIiHo2IxRHRWZ3Wsn1XXf/wiLg5Ih6PiGURcVSra9pe\ntEsfa7R27EPN1G59pBkGSh+IiP9W/Z19LCJuqP7/1fD92RZhKtrrsg5zgSlbTLsIuDcz3wXcW73f\nVzYAX8jMscBE4Lzqa9PKmn4PHJuZhwHjgSkRMRG4BPhmZv4J8CJwdh/WtMkFwLIu99uhpg9l5vgu\nY5u0ct8BfAu4KzP/FDiMyuvV6pr6vTbrY43Wjn2omdqxjzTadt8HImJv4HNAR2YeQuWLJjNoxv7M\nzJb/AEcBd3e5/9fAX7ewntHAY13uPwHsWb29J/BEC2v7AfBn7VIT8A5gETCByqiyg7vbp31Uyz5U\nGsCxwO1AtEFNK4Ddt5jWsn0HDAOWU/3ySTvUtL38tFsfa/K2tlUfavC2tV0facI2Dog+AOwNPAPs\nRmX0gtuB45uxP9viyBR/3OBNVlWntYs9MnN19favgT1aUUREjAbeAyxodU3Vw+CLgTXAPcBTwEuZ\nuaE6Syv24T8Afwm8Ub0/og1qSuBHEbEwKpcmgdbuuzHA88Cc6scY/zcidmpxTduLdu9jDdFOfahJ\n2rGPNNqA6AOZ+SxwKbASWA2sAxbShP3ZLmGq38hKlO3z8SQiYmfgFuDzmfm7VteUmRszczyVd3FH\nAn/al+vfUkScCKzJzIWtrKMb78/M91L56Oe8iDim64Mt2HeDgfcCl2fme4BX2eJQfqt+x9X+2q0P\nNVob95FGGxB9oHrO18lUwuNewE5sfRpPQ7RLmGr3yzr8JiL2BKj+u6YvVx4RQ6g0sOsy83vtUNMm\nmfkScB+VQ6XDI2LTQLB9vQ+PBk6KiBXAjVQO0X+rxTVtemdEZq4BbqUSPFu571YBqzJzQfX+zVSa\nalv8PvVz7d7HirRzH2qgtuwjTTBQ+sCHgeWZ+Xxmrge+R2UfN3x/tkuYavfLOtwGnFm9fSaV8wX6\nREQEcCWwLDP/vk1qGhkRw6u3d6Ry7sQyKqFqWitqysy/zsx9MnM0ld+fn2Tmaa2sKSJ2iohdNt0G\nJgOP0cJ9l5m/Bp6JiAOrk44Dlraypu1Iu/exurVjH2qGduwjzTCA+sBKYGJEvKP6O7xpOxu/P1t9\ngliXE8U+CvyKyrk3/6OFddxA5bPV9VTS+9lUPjO/F3gS+DGwWx/W834qh1ofARZXfz7a4prGAb+s\n1vQY8DfV6fsD/wb8O3AT8PYW7cNJwO2trqm67oerP0s2/V63ct9V1z8e6Kzuv+8Du7a6pu3lp136\nWBO2q+36UB9sc1v0kSZu34DoA8BXgMer/1ddA7y9GfvTy8lIkiQVaJeP+SRJkvolw5QkSVIBw5Qk\nSVIBw5QkSVIBw5QkSVIBw5QaIiI+FhEZES0dCV2S6mEPUwnDlBplJvAv1X8lqb+xh6luhikVq16v\n6/1UBjidUZ22Q0T8U0Q8HhH3RMSdETGt+tjhEfGz6sV/7950+QJJagV7mEoZptQIJwN3ZeavgLUR\ncTjwX4HRwFhgFpVr9226vtdlwLTMPBz4LvC/WlG0JFXZw1Rk8LZnkbZpJpWLgULl4qAzqfxu3ZSZ\nbwC/joj7qo8fCBwC3FO5VBKDqFy+R5JaxR6mIoYpFYmI3ahcWf3QiEgqjSWBW3t6CrAkM4/qoxIl\nqUf2MDWCH/Op1DTgmszcLzNHZ+a+wHLgBeDj1fMO9qBy0VCAJ4CREbH5kHlEHNyKwiUJe5gawDCl\nUjPZ+h3cLcB/AVYBS4FrgUXAusz8A5XmdUlEPEzl6vPv67tyJelN7GEqFpnZ6hq0nYqInTPzlYgY\nAfwbcHRm/rrVdUlSLexhqpXnTKmZbo+I4cDbgK/ahCT1M/Yw1cQjU5IkSQU8Z0qSJKmAYUqSJKmA\nYUqSJKmAYUqSJKmAYUqSJKnA/wdhpYOuQMTQigAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x288 with 2 Axes>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "GFJX-UZD1jNe",
"colab_type": "code",
"outputId": "f2a875ed-b7ff-485c-b884-21b8a1977c4a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
}
},
"source": [
"sns.barplot(x='Pclass', y='Survived', data=train_df)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f98e6d519b0>"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAErVJREFUeJzt3XGQnHd93/H3x+eoBHCSgtWRx5Kw\nAqLUIZ5QLupM3QFCcCuSGSlTIBVxk3iGojITAW0GhGkbFURpJyIlk1ClQW08IUxAGGibS6tGpdgB\n4mKjExgbyRG9yICkckG2MdiERpb97R/36NflfLpb2ffc3lnv18yO9vntb3c/Ozujzz3Ps8/zpKqQ\nJAngklEHkCQtH5aCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1l446wIW6/PLL66qr\nrhp1DElaUQ4fPnxfVa1eaN6KK4WrrrqKycnJUceQpBUlyVeHmefmI0lSYylIkppeSyHJ5iTHkkwl\nuXGOx389yZ3d7ctJHuwzjyRpfr3tU0gyBuwFrgNOAoeSTFTV0XNzquqfDsx/I/CivvJIkhbW55rC\nJmCqqo5X1RlgP7B1nvmvBT7cYx5J0gL6LIUrgRMDyye7scdJ8hxgA3BLj3kkSQtYLjuatwEfq6pH\n53owyfYkk0kmT58+vcTRJOni0WcpnALWDSyv7cbmso15Nh1V1b6qGq+q8dWrFzz2QpL0BPV58Noh\nYGOSDcyUwTbg52ZPSvIC4K8Cn+0xy4qwc+dOpqenWbNmDXv27Bl1HEkXod5KoarOJtkBHATGgJuq\n6kiS3cBkVU10U7cB+6uq+sqyUkxPT3Pq1PlWpiSpf72e5qKqDgAHZo3tmrX8jj4zSJKGt1x2NEuS\nlgFLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRYCpKkxlKQJDWWgiSpsRQk\nSY2lIElqLAVJUtPrRXZG7cVv/b1RR7ggl933EGPA1+57aEVlP/yeXxh1BEmLxDUFSVJjKUiSGktB\nktRYCpKkptdSSLI5ybEkU0luPM+cn01yNMmRJB/qM48kaX69/fooyRiwF7gOOAkcSjJRVUcH5mwE\n3g5cW1XfTPLX+sojSVpYn2sKm4CpqjpeVWeA/cDWWXNeD+ytqm8CVNU3eswjSVpAn6VwJXBiYPlk\nNzbo+cDzk9yW5PYkm3vMI0lawKgPXrsU2Ai8DFgLfDrJj1bVg4OTkmwHtgOsX79+qTNK0kWjzzWF\nU8C6geW13digk8BEVT1SVfcCX2amJL5HVe2rqvGqGl+9enVvgSXpYtdnKRwCNibZkGQVsA2YmDXn\nvzCzlkCSy5nZnHS8x0ySpHn0VgpVdRbYARwE7gFurqojSXYn2dJNOwjcn+QocCvw1qq6v69MkqT5\n9bpPoaoOAAdmje0auF/AL3c3SdKIeUSzJKmxFCRJjaUgSWosBUlSYylIkppRH9GsAY+tesb3/CtJ\nS81SWEa+s/HvjjqCpIucm48kSY2lIElq3HwkLYKdO3cyPT3NmjVr2LNnz6jjSE+YpSAtgunpaU6d\nmn0SYGnlcfORJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJElNr6WQZHOS\nY0mmktw4x+M3JDmd5M7u9o/6zCNJml9v5z5KMgbsBa4DTgKHkkxU1dFZUz9SVTv6yiFJGl6fawqb\ngKmqOl5VZ4D9wNYe30+S9CT1WQpXAicGlk92Y7O9KsldST6WZF2PeSRJCxj1juY/BK6qqmuATwAf\nmGtSku1JJpNMnj59ekkDStLFpM9SOAUM/uW/thtrqur+qvrLbvE/Ai+e64Wqal9VjVfV+OrVq3sJ\nK0nqtxQOARuTbEiyCtgGTAxOSHLFwOIW4J4e80iSFtDbr4+q6mySHcBBYAy4qaqOJNkNTFbVBPCm\nJFuAs8ADwA195ZEkLazXy3FW1QHgwKyxXQP33w68vc8MkqThjXpHsyRpGbEUJEmNpSBJanrdpyA9\nGV/b/aOjjjC0sw88C7iUsw98dUXlXr/r7lFH0DLjmoIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktRY\nCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJKaec+SmuQhoM73eFX9wKInkiSNzLylUFWXASR5\nF/B14INAgOuBK3pPJ0laUsNuPtpSVb9VVQ9V1ber6t8DW/sMJklaesOWwneSXJ9kLMklSa4HvtNn\nMEnS0hu2FH4O+Fngz7vba7qxeSXZnORYkqkkN84z71VJKsn4kHkkST0Y6nKcVfUVLnBzUZIxYC9w\nHXASOJRkoqqOzpp3GfBm4I4LeX1J0uIbak0hyfOTfDLJl7rla5L8iwWetgmYqqrjVXUG2M/cxfIu\n4FeB/3sBuSVJPRh289F/AN4OPAJQVXcB2xZ4zpXAiYHlk91Yk+RvAuuq6r8NmUOS1KOhNh8BT6+q\nzyUZHDv7ZN44ySXAe4Ebhpi7HdgOsH79+ifztlIvLn/aY8DZ7l9p5Rq2FO5L8ly6A9mSvJqZ4xbm\ncwpYN7C8ths75zLghcAfd2WzBphIsqWqJgdfqKr2AfsAxsfHz3swnTQqb7nmwVFHkBbFsKXwS8z8\np/yCJKeAe5k5gG0+h4CNSTYwUwbbGPjFUlV9C7j83HKSPwbeMrsQJElLZ9hS+GpVvSLJM4BLquqh\nhZ5QVWeT7AAOAmPATVV1JMluYLKqJp54bElSH4YthXuT/BHwEeCWYV+8qg4AB2aN7TrP3JcN+7qS\npH4M++ujFwD/k5nNSPcm+XdJ/k5/sSRJozBUKVTVX1TVzVX194EXAT8AfKrXZJKkJTf09RSSvDTJ\nbwGHgacxc9oLSdJTyFD7FJJ8BfgCcDPw1qryZHiS9BQ07I7ma6rq270mkSSN3EJXXttZVXuAdyd5\n3EFjVfWm3pJJkpbcQmsK93T/ekCZJF0EFroc5x92d++uqs8vQR5J0ggN++ujf5vkniTvSvLCXhNJ\nkkZm2OMUfgL4CeA08P4kdw9xPQVJ0goz9HEKVTVdVb8JvAG4E5jzdBWSpJVr2Cuv/Y0k70hyN/A+\n4H8xcypsSdJTyLDHKdzEzOU0/15V/Z8e80iSRmjBUkgyBtxbVb+xBHkkSSO04OajqnoUWJdk1RLk\nkSSN0NDXUwBuSzIBtPMeVdV7e0klSRqJYUvhz7rbJcxcW1mS9BQ0VClU1Tv7DiJJGr1hT519KzDX\nCfFevuiJJEkjM+zmo7cM3H8a8Crg7OLHkSSN0rCbjw7PGrotyed6yCNJGqFhj2h+1sDt8iSbgR8c\n4nmbkxxLMpXkxjkef0N3HqU7k/xJkqufwGeQJC2SYTcfHeb/71M4C3wFeN18T+gOetsLXAecBA4l\nmaiqowPTPlRVv93N3wK8F9g8dHpJ0qKad00hyY8nWVNVG6rqh4F3An/a3Y7O91xgEzBVVcer6gwz\np8nYOjhh1iU+n8EcO7MlSUtnoc1H7wfOACR5CfBvgA8A3wL2LfDcK4ETA8snu7HvkeSXkvwZsAfw\n8p6SNEILlcJYVT3Q3f8HwL6q+nhV/QrwvMUIUFV7q+q5wNuAOa/RkGR7kskkk6dPn16Mt5UkzWHB\nUkhybr/DTwK3DDy20P6IU8C6geW13dj57Ad+Zq4HqmpfVY1X1fjq1asXeFtJ0hO1UCl8GPhUkj8A\nvgt8BiDJ85jZhDSfQ8DGJBu6k+ltAyYGJyTZOLD408D/voDskqRFNu9f+1X17iSfBK4A/kdVndsR\nfAnwxgWeezbJDuAgMAbcVFVHkuwGJqtqAtiR5BXAI8A3gV98ch9HkvRkLPiT1Kq6fY6xLw/z4lV1\nADgwa2zXwP03D/M6ktSnnTt3Mj09zZo1a9izZ8+o44zUsMcpSNJT1vT0NKdOzbfL8+Ix1BHNkqSL\ng6UgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVLjuY8kLbpr33ftqCNc\nkFUPruISLuHEgydWVPbb3njbor+mawqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSGktBktT0\nWgpJNic5lmQqyY1zPP7LSY4muSvJJ5M8p888kqT59VYKScaAvcArgauB1ya5eta0LwDjVXUN8DFg\nT195JEkL63NNYRMwVVXHq+oMsB/YOjihqm6tqr/oFm8H1vaYR5K0gD5L4UrgxMDyyW7sfF4H/Pce\n80iSFrAsToiX5B8C48BLz/P4dmA7wPr165cwmaSLQT29eIzHqKfXqKOMXJ+lcApYN7C8thv7Hkle\nAfxz4KVV9ZdzvVBV7QP2AYyPj/utSVpUj1z7yKgjLBt9bj46BGxMsiHJKmAbMDE4IcmLgPcDW6rq\nGz1mkSQNobdSqKqzwA7gIHAPcHNVHUmyO8mWbtp7gGcCH01yZ5KJ87ycJGkJ9LpPoaoOAAdmje0a\nuP+KPt9fknRhPKJZktRYCpKkxlKQJDWWgiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAk\nNZaCJKmxFCRJjaUgSWosBUlSYylIkhpLQZLUWAqSpMZSkCQ1loIkqbEUJElNr6WQZHOSY0mmktw4\nx+MvSfL5JGeTvLrPLJKkhfVWCknGgL3AK4GrgdcmuXrWtK8BNwAf6iuHJGl4l/b42puAqao6DpBk\nP7AVOHpuQlV9pXvssR5zSJKG1OfmoyuBEwPLJ7sxSdIytSJ2NCfZnmQyyeTp06dHHUeSnrL6LIVT\nwLqB5bXd2AWrqn1VNV5V46tXr16UcJKkx+uzFA4BG5NsSLIK2AZM9Ph+kqQnqbdSqKqzwA7gIHAP\ncHNVHUmyO8kWgCQ/nuQk8Brg/UmO9JVHkrSwPn99RFUdAA7MGts1cP8QM5uVJEnLwIrY0SxJWhqW\ngiSpsRQkSY2lIElqLAVJUmMpSJIaS0GS1FgKkqTGUpAkNZaCJKmxFCRJjaUgSWosBUlSYylIkhpL\nQZLUWAqSpMZSkCQ1loIkqbEUJEmNpSBJaiwFSVLTaykk2ZzkWJKpJDfO8fhfSfKR7vE7klzVZx5J\n0vx6K4UkY8Be4JXA1cBrk1w9a9rrgG9W1fOAXwd+ta88kqSF9bmmsAmYqqrjVXUG2A9snTVnK/CB\n7v7HgJ9Mkh4zSZLm0WcpXAmcGFg+2Y3NOaeqzgLfAp7dYyZJ0jwuHXWAYSTZDmzvFh9OcmyUeXp2\nOXDfqENciPzaL446wnKx4r47/qUr5gNW3PeXN13Q9/ecYSb1WQqngHUDy2u7sbnmnExyKfCDwP2z\nX6iq9gH7esq5rCSZrKrxUefQhfO7W9n8/mb0ufnoELAxyYYkq4BtwMSsORPAuT8zXw3cUlXVYyZJ\n0jx6W1OoqrNJdgAHgTHgpqo6kmQ3MFlVE8DvAB9MMgU8wExxSJJGJP5hvrwk2d5tLtMK43e3svn9\nzbAUJEmNp7mQJDWWwjKR5KYk30jypVFn0YVJsi7JrUmOJjmS5M2jzqThJXlaks8l+WL3/b1z1JlG\nyc1Hy0SSlwAPA79XVS8cdR4NL8kVwBVV9fkklwGHgZ+pqqMjjqYhdGdReEZVPZzk+4A/Ad5cVbeP\nONpIuKawTFTVp5n5BZZWmKr6elV9vrv/EHAPjz96X8tUzXi4W/y+7nbR/rVsKUiLqDvT74uAO0ab\nRBciyViSO4FvAJ+oqov2+7MUpEWS5JnAx4F/UlXfHnUeDa+qHq2qH2PmzAubkly0m3AtBWkRdNui\nPw78flX9p1Hn0RNTVQ8CtwKbR51lVCwF6UnqdlT+DnBPVb131Hl0YZKsTvJD3f3vB64D/nS0qUbH\nUlgmknwY+Czw15OcTPK6UWfS0K4Ffh54eZI7u9tPjTqUhnYFcGuSu5g5Z9snquq/jjjTyPiTVElS\n45qCJKmxFCRJjaUgSWosBUlSYylIkhpLQZolyaPdz0q/lOSjSZ4+z9x3JHnLUuaT+mQpSI/33ar6\nse5stWeAN4w6kLRULAVpfp8BngeQ5BeS3NWdd/+DsycmeX2SQ93jHz+3hpHkNd1axxeTfLob+5Hu\nHP53dq+5cUk/lXQeHrwmzZLk4ap6ZpJLmTmf0R8Bnwb+M/C3q+q+JM+qqgeSvAN4uKp+Lcmzq+r+\n7jX+FfDnVfW+JHcDm6vqVJIfqqoHk7wPuL2qfj/JKmCsqr47kg8sDXBNQXq87+9OozwJfI2Z8xq9\nHPhoVd0HUFVzXfvihUk+05XA9cCPdOO3Ab+b5PXAWDf2WeCfJXkb8BwLQcvFpaMOIC1D3+1Oo9zM\nnPNuQb/LzBXXvpjkBuBlAFX1hiR/C/hp4HCSF1fVh5Lc0Y0dSPKPq+qWRfwM0hPimoI0nFuA1yR5\nNkCSZ80x5zLg691ptK8/N5jkuVV1R1XtAk4D65L8MHC8qn4T+APgmt4/gTQE1xSkIVTVkSTvBj6V\n5FHgC8ANs6b9CjNXXDvd/XtZN/6ebkdygE8CXwTeBvx8kkeAaeBf9/4hpCG4o1mS1Lj5SJLUWAqS\npMZSkCQ1loIkqbEUJEmNpSBJaiwFSVJjKUiSmv8H55UenBCzHqQAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "nKYfbR-N1vsd",
"colab_type": "code",
"outputId": "fc9457a6-25f2-437b-cd6e-b3f20a10bdda",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
}
},
"source": [
"#Combining SibSp and Parch to show if someone is alone or not\n",
"data = [train_df, test_df]\n",
"for dataset in data:\n",
" dataset['relatives'] = dataset['SibSp'] + dataset['Parch']\n",
" dataset.loc[dataset['relatives'] > 0, 'not_alone'] = 0\n",
" dataset.loc[dataset['relatives'] == 0, 'not_alone'] = 1\n",
" dataset['not_alone'] = dataset['not_alone'].astype(int)\n",
"train_df['not_alone'].value_counts()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1 537\n",
"0 354\n",
"Name: not_alone, dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "GiygRG_X2GXR",
"colab_type": "code",
"colab": {}
},
"source": [
"train_df = train_df.drop(['PassengerId','Cabin'], axis=1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "L_iMmf8W2wQp",
"colab_type": "code",
"outputId": "0c233087-5040-49f0-d7ca-eae26e787e88",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" mean = train_df[\"Age\"].mean()\n",
" std = test_df[\"Age\"].std()\n",
" is_null = dataset[\"Age\"].isnull().sum()\n",
" # compute random numbers between the mean, std and is_null\n",
" rand_age = np.random.randint(mean - std, mean + std, size = is_null)\n",
" # fill NaN values in Age column with random values generated\n",
" age_slice = dataset[\"Age\"].copy()\n",
" age_slice[np.isnan(age_slice)] = rand_age\n",
" dataset[\"Age\"] = age_slice\n",
" dataset[\"Age\"] = train_df[\"Age\"].astype(int)\n",
"train_df[\"Age\"].isnull().sum()\n"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "9oiazbi23F3i",
"colab_type": "code",
"outputId": "e7ef4238-7ad0-4077-dba9-817377088d47",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 102
}
},
"source": [
"train_df['Embarked'].describe()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"count 889\n",
"unique 3\n",
"top S\n",
"freq 644\n",
"Name: Embarked, dtype: object"
]
},
"metadata": {
"tags": []
},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "91r6YkKR3NJU",
"colab_type": "code",
"colab": {}
},
"source": [
"common_value = 'S'\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Embarked'] = dataset['Embarked'].fillna(common_value)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2feOofoi3P3K",
"colab_type": "code",
"outputId": "f3a91856-21c0-405c-8215-9ae05afa7057",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 306
}
},
"source": [
"train_df.info()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
"Survived 891 non-null int64\n",
"Pclass 891 non-null int64\n",
"Name 891 non-null object\n",
"Sex 891 non-null object\n",
"Age 891 non-null int64\n",
"SibSp 891 non-null int64\n",
"Parch 891 non-null int64\n",
"Ticket 891 non-null object\n",
"Fare 891 non-null float64\n",
"Embarked 891 non-null object\n",
"relatives 891 non-null int64\n",
"not_alone 891 non-null int64\n",
"dtypes: float64(1), int64(7), object(4)\n",
"memory usage: 83.6+ KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "aWEXlXZc3UQf",
"colab_type": "code",
"colab": {}
},
"source": [
"train_df = train_df.drop(['Ticket','Name'], axis=1)\n",
"test_df = test_df.drop(['Ticket','Name'], axis=1)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "QZhOp69G38gC",
"colab_type": "code",
"colab": {}
},
"source": [
"genders = {\"male\": 0, \"female\": 1}\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Sex'] = dataset['Sex'].map(genders)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "USyX_tSg4C9N",
"colab_type": "code",
"colab": {}
},
"source": [
"ports = {\"S\": 0, \"C\": 1, \"Q\": 2}\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Embarked'] = dataset['Embarked'].map(ports)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ApXleNFj4Df1",
"colab_type": "code",
"outputId": "4a146c54-e5e6-450e-f7e2-d87283fd13f4",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 153
}
},
"source": [
"data = [train_df, test_df]\n",
"for dataset in data:\n",
" dataset['Age'] = dataset['Age'].astype(int)\n",
" dataset.loc[ dataset['Age'] <= 11, 'Age'] = 0\n",
" dataset.loc[(dataset['Age'] > 11) & (dataset['Age'] <= 18), 'Age'] = 1\n",
" dataset.loc[(dataset['Age'] > 18) & (dataset['Age'] <= 22), 'Age'] = 2\n",
" dataset.loc[(dataset['Age'] > 22) & (dataset['Age'] <= 27), 'Age'] = 3\n",
" dataset.loc[(dataset['Age'] > 27) & (dataset['Age'] <= 33), 'Age'] = 4\n",
" dataset.loc[(dataset['Age'] > 33) & (dataset['Age'] <= 40), 'Age'] = 5\n",
" dataset.loc[(dataset['Age'] > 40) & (dataset['Age'] <= 66), 'Age'] = 6\n",
" dataset.loc[ dataset['Age'] > 66, 'Age'] = 6\n",
"\n",
"# let's see how it's distributed \n",
"train_df['Age'].value_counts()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"4 170\n",
"6 154\n",
"5 146\n",
"3 134\n",
"2 122\n",
"1 97\n",
"0 68\n",
"Name: Age, dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "H8sr8fSz4rbt",
"colab_type": "code",
"outputId": "9b42d801-183e-447b-d601-e47057aae644",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
}
},
"source": [
"train_df.head(10)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>51</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>21</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>30</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age ... Fare Embarked relatives not_alone\n",
"0 0 3 0 2 ... 7 0 1 0\n",
"1 1 1 1 5 ... 71 1 1 0\n",
"2 1 3 1 3 ... 7 0 0 1\n",
"3 1 1 1 5 ... 53 0 1 0\n",
"4 0 3 0 5 ... 8 0 0 1\n",
"5 0 3 0 4 ... 8 2 0 1\n",
"6 0 1 0 6 ... 51 0 0 1\n",
"7 0 3 0 0 ... 21 0 4 0\n",
"8 1 3 1 3 ... 11 0 2 0\n",
"9 1 2 1 1 ... 30 1 1 0\n",
"\n",
"[10 rows x 10 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "jSkVyUP34t77",
"colab_type": "code",
"colab": {}
},
"source": [
"X_train = train_df.drop([\"Survived\",\"Fare\"], axis=1)\n",
"Y_train = train_df[\"Survived\"]\n",
"X_test = test_df.drop([\"PassengerId\",\"Cabin\",\"Fare\"], axis=1).copy()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WzFHuTHj5u2g",
"colab_type": "code",
"outputId": "83b6c27d-db12-4f08-e5a8-5e4cd3eec328",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
}
},
"source": [
"X_train.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Sex Age SibSp Parch Embarked relatives not_alone\n",
"0 3 0 2 1 0 0 1 0\n",
"1 1 1 5 1 0 1 1 0\n",
"2 3 1 3 0 0 0 0 1\n",
"3 1 1 5 1 0 0 1 0\n",
"4 3 0 5 0 0 0 0 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "9pOckEeq50PC",
"colab_type": "code",
"outputId": "ce603a97-d8dd-4f34-9ab1-4c5f98649ecc",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
}
},
"source": [
"X_test.head()"
],
"execution_count": 0,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Sex Age SibSp Parch Embarked relatives not_alone\n",
"0 3 0 2 0 0 2 0 1\n",
"1 3 1 5 1 0 0 1 0\n",
"2 2 0 3 0 0 2 0 1\n",
"3 3 0 5 0 0 0 0 1\n",
"4 3 1 5 1 1 0 2 0"
]
},
"metadata": {
"tags": []
},
"execution_count": 28
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xfCIAPAh5Opm",
"colab_type": "code",
"outputId": "35504ebc-9a7c-4091-cbd2-63063b6a1fa2",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"logreg = LogisticRegression()\n",
"logreg.fit(X_train, Y_train)\n",
"\n",
"Y_pred = logreg.predict(X_test)\n",
"\n",
"acc_log = round(logreg.score(X_train, Y_train) * 100, 2)\n",
"print(\"Accuracy via logistic regression: \",acc_log)"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Accuracy via logistic regression: 80.58\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "dEt0at6c6Vbc",
"colab_type": "code",
"outputId": "2e2f2dff-50dd-4723-9fca-f8b4f5c07f99",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 85
}
},
"source": [
"from sklearn.model_selection import cross_val_score\n",
"\n",
"scores = cross_val_score(logreg, X_train, Y_train, cv=10, scoring = \"accuracy\")\n",
"print(\"Scores:\", scores)\n",
"print(\"Mean:\", scores.mean())\n",
"print(\"Standard Deviation:\", scores.std())"
],
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
"text": [
"Scores: [0.78888889 0.76666667 0.7752809 0.84269663 0.80898876 0.79775281\n",
" 0.79775281 0.7752809 0.82022472 0.80681818]\n",
"Mean: 0.7980351265463625\n",
"Standard Deviation: 0.021880254046102804\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dnKHfSxW7ZpK",
"colab_type": "text"
},
"source": [
"This shows us that the model accuracy using Logistic regression is 80.58% with an error of 2.18%."
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment