Skip to content

Instantly share code, notes, and snippets.

@arghyadeep99
Created August 15, 2020 10:04
Show Gist options
  • Save arghyadeep99/505aa7e0427d1cc2eb2ce0730b4ce875 to your computer and use it in GitHub Desktop.
Save arghyadeep99/505aa7e0427d1cc2eb2ce0730b4ce875 to your computer and use it in GitHub Desktop.
Logistic Regression.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Logistic Regression.ipynb",
"provenance": [],
"collapsed_sections": [],
"toc_visible": true,
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/arghyadeep99/505aa7e0427d1cc2eb2ce0730b4ce875/logistic-regression.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gpfV-EvZPiP0",
"colab_type": "text"
},
"source": [
"# Logistic Regression\n",
"\n",
"####While we used linear regression to deal with continuous data, that helped in predicting a future value on the basis of past data, logistic regression is different. Logsitic regression is used when the dependent variable is dichotomous (binary). For example: positive-negative, 0-1, pass-fail, benign-malignant, etc. It is assumed that the data for such dichotomous nature would be independent and that there's no correlation of data between two classes.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "avQGnJNbTvNN",
"colab_type": "text"
},
"source": [
"Many will think, why not linear regression? That's because linear data plotted on graph may look something like this:\n",
"\n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=192bLivyYTXRLQaJsG8h-8jphieelQ-HX'/>\n",
" </center>\n",
" <center><figcaption><b>Linear Regression to fit dichotomous data</b></figcaption></center>\n",
"</figure>\n",
"\n",
"But what if we are faced with data having outlier points? \n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=1Bk_tgAS1gMjEKqh6CVo6iZ3vNYzLf6Hx'/>\n",
" </center>\n",
" <center><figcaption><b>Linear Regression to fit dichotomous data with outliers</b></figcaption></center>\n",
"</figure>\n",
"\n",
"It is clearly visible that the line shifts just because of one outlier. This increases confusion of the model. \n",
"\n",
"Hence, Logistic regression is used."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lPX6x1cRX81E",
"colab_type": "text"
},
"source": [
"In logistic regression, we use logistic functions that are used to plot probabilistic models. \n",
"\n",
"Sigmoid function is a logistic function that's used in logistic regression. This is how it looks: \n",
"\n",
"\n",
"<figure>\n",
"<center>\n",
"<img src='https://drive.google.com/uc?id=1ro0WaW1kszK3InLQNVoKAoFD9K370REO'/>\n",
" </center>\n",
" <center><figcaption><b>Sigmoid function</b></figcaption></center>\n",
"</figure>\n",
"\n",
"### Let's plot this graph using Python."
]
},
{
"cell_type": "code",
"metadata": {
"id": "r0ZJewoXXrKQ",
"colab_type": "code",
"colab": {}
},
"source": [
"import math\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import warnings\n",
"\n",
"warnings.simplefilter('ignore')\n",
"\n",
"def sigmoid(x):\n",
" a = []\n",
" for item in x:\n",
" a.append(1/(1+math.exp(-item)))\n",
" return a"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "v1PijRvKaf2E",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
},
"outputId": "c0f807ba-b54e-42e5-8b2b-af19a22982b3"
},
"source": [
"x = np.arange(-10., 10., 0.2)\n",
"sig = sigmoid(x)\n",
"plt.plot(x,sig)\n",
"plt.grid(True)\n",
"plt.show()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZhU9Z3v8fe3d5ZmkaVFQBAFA7hh456oKCqaGclETMjcGDMxQ5IJ85hJZu6Y671OojNzJ5OZuTdzNWMS42TMRjTjgoqCMW1M3AHZmkWbRaChuwFZuoFequp7/6gDU3aq6eqmqs+p6s/reerps/yq6sOp098+/Oqc8zN3R0RE8l9R2AFERCQ7VNBFRAqECrqISIFQQRcRKRAq6CIiBaIkrDceOXKkT5w4sVfPPXz4MIMGDcpuoCyJarao5oLoZlOunotqtqjmgp5nW7FixV53H5V2pbuH8qiurvbeqqmp6fVzcy2q2aKayz262ZSr56KaLaq53HueDVjuXdRVdbmIiBQIFXQRkQKhgi4iUiBU0EVECoQKuohIgei2oJvZw2bWZGbrulhvZvavZlZnZmvM7MLsxxQRke5kcoT+I2DOCdbfCEwOHguAfzv5WCIi0lPdXljk7i+b2cQTNJkLPBKcH/m6mQ0zszHuvjtLGUWkwCUSTlssQXssQVssTlssQUc8QSzhtKdMd8QTxBNOLOEkOv90J55wEg4JTy4/Nr1pewfbX9tGIuE4kPBj1+CAc+wnH5g/xoOZY22OTR9fj6e0TV2eotNtyq+dWsX544dlYct9kLl3fz/0oKA/4+7npFn3DPAP7v67YP5F4K/dfXmatgtIHsVTVVVVvWjRol6FbmlpYfDgwb16bq5FNVtUc0F0sylXZtydwx1wsN1pOniEWHEFLe3O4Q7nSAwOdzhHY05rDFrjztEYtMedtji0xZ2OOMT6wbAMljJ927Qyrjm9FOj55zlr1qwV7j4z3bo+vfTf3b8PfB9g5syZfvXVV/fqdV566SV6+9xci2q2qOaC6GZTriR3p6m5ja17D/PevsNs23eEXQeOBo9W9jS30R5PBK0NaDv+3LLiIoYMKGVIRQmDyksYUV7CoPJiBpSVMLC0mAFlxVSUFlNeUkR5aRHlJcnpspIiykuKKC0uoqTIkj+LjZKi5M/iIqPYgp+pDzOKzCgqIvkzmH7t1de44orLg2VgGFaUTGtmwc9geVB5j/9MXRa0PzZ9jFnqdOqa7mXz88xGQa8HxqfMjwuWiUie6YgneKexmTU7D7K2/iDvNDSzqbGZ5tbY8TYlRcaYYRWcNnQAF59xClVDKhhVWc7IwWXUb97IrMsvYvjAMoYNLKWitDjEf81/GVpujBxcHnaMnMtGQV8MLDSzRcAlwEH1n4vkh1g8wds7DvD65n28vnUfK97bT2tH8mh7SEUJHxozhLkXnMaUqkomjRzMhBEDGTO0gpLi9OdTvHTgXaaOGdKX/wRJ0W1BN7OfA1cDI81sJ/A3QCmAuz8ILAFuAuqAI8Cf5CqsiJy8w20xXtzYxIsbGqnZ2MSh1hhmMPXUIcy/6HRmnD6M88cNY8KIgT3uPpBwZXKWy6e6We/Al7OWSESyLpFwXt28j8dX7uS5dQ0c7YhzyqAyrp9+Ktd+aDSXnTmCYQPLwo4pJym0+6GLSO4dbovxyxU7+fdXtrJt3xEqK0r42IzT+NgFY5k58RSKi3QEXkhU0EUKUHNrBz/47VZ+9MpWDrXGuGD8ML5z3RRumH5qZL6olOxTQRcpIK0dcX7y+ns8UFPH/iMd3DC9igVXnkn1hOFhR5M+oIIuUiBe3byXu59Yx9a9h/nI5JH81Q1nc9647F+NKNGlgi6S5w4e6eDvlqzn0eU7mTBiII987mKunJJ+yEkpbCroInlsxXv7WfizlTQ1t/HFq87kzmsnM6BMfeT9lQq6SB5ydx5+ZRv/e8kGxgyr4Ik/u1zdK6KCLpJv2uPOwp+/zbNrdnP9tCq+fev5DB1QGnYsiQAVdJE8cqi1g39e3sqm/Ue468YP8YUrJ+lqTjlOBV0kTzQeauX2h9+k7kCC78y/gLkXjA07kkSMCrpIHmg81MqtD77GvpY2/qK6QsVc0tIg0SIRt/9wO7f98A32tbTxk89fwjkjdRaLpKeCLhJhLW0xPvvvb7Jt3xF+cPtMZpyuKz6layroIhHVEU+w4JHlrNt1iO/+8YVcfubIsCNJxKmgi0TU3z27gVc37+Nbt5zH7GlVYceRPKCCLhJBjy3fwY9e3cYdHz6DedXjwo4jeUIFXSRiVu84wN1PruPyM0fw9Rs/FHYcySMq6CIRcvBIB1/6yQpGDS7n/31qRpdjd4qko/PQRSLkG0/X0tjcxuNfupwR/WCUesku/fkXiYgla3fzxNv1/Pk1Z3H+eN1oS3pOBV0kApqaW7n7ibWcN24oX551VthxJE+poIuEzN35H4+v5Uh7nH/5xPmUqt9cekl7jkjInlvXwK82NPFXN5zNWaMrw44jeUwFXSRER9pj/O0z65k6ZgifvXxi2HEkz6mgi4TouzWb2XWwlfvmTtcpinLStAeJhGTr3sN8/+UtfHzGWGZOPCXsOFIAVNBFQuDufPPpWspKirhLV4NKlqigi4Tgd3V7eWnTHu68djKjh1SEHUcKhAq6SB9zd769dBNjhw3gM5dPCDuOFBAVdJE+trS2kTU7D3Ln7MmUl2j0IckeFXSRPhRPOP/ywiYmjRrEx2doXFDJrowKupnNMbNNZlZnZnelWX+6mdWY2dtmtsbMbsp+VJH8t3h1Pe80tvDV66boNEXJum73KDMrBh4AbgSmAZ8ys2mdmv1P4FF3nwHMB76b7aAi+a4jnuD/vPAuU8cM4aZzxoQdRwpQJocIFwN17r7F3duBRcDcTm0cGBJMDwV2ZS+iSGF4atUutr9/hK9dN4WiIgs7jhQgc/cTNzCbB8xx988H87cBl7j7wpQ2Y4BlwHBgEDDb3Vekea0FwAKAqqqq6kWLFvUqdEtLC4MHD+7Vc3Mtqtmimguimy2buRLu/K9XjmLAfVcMwKz3BT2q2wuimy2quaDn2WbNmrXC3WemXenuJ3wA84CHUuZvA+7v1OarwNeC6cuA9UDRiV63urrae6umpqbXz821qGaLai736GbLZq5frW/wCX/9jD++csdJv1ZUt5d7dLNFNZd7z7MBy72LuppJl0s9MD5lflywLNUdwKPBH4jXgApgZAavLdIvfO83Wxg7bAB/cN5pYUeRApZJQX8LmGxmZ5hZGckvPRd3arMduBbAzKaSLOh7shlUJF+teG8/b257nzs+fIbudS451e3e5e4xYCGwFNhA8myWWjO718xuDpp9DfhTM1sN/Bz4bPBfA5F+73u/2czQAaV88qLx3TcWOQkZDRLt7kuAJZ2W3ZMyvR64IrvRRPLf5j0tvLChkYWzzmJQucZkl9zS//9EcujHr71HaVERn7lsYthRpB9QQRfJkcNtMf5zxU5uOvdURlWWhx1H+gEVdJEceWrVLprbYtx2me6oKH1DBV0kB9ydR17bxtQxQ7jw9OFhx5F+QgVdJAdWbt/PxoZmbrt0wkldFSrSEyroIjnw49feo7K8hLkX6EIi6Tsq6CJZtreljSVrG7ilepxOVZQ+pYIukmX/uWIn7fEEn7709LCjSD+jgi6SRe7OYyt2Uj1hOGeNrgw7jvQzKugiWbRqxwHqmlq4tXpc2FGkH1JBF8mix1bspKK0iI+epxGJpO+poItkSWtHnKdX7+LGc8ZQWVEadhzph1TQRbJkaW0Dza0xdbdIaFTQRbLkseU7GTd8AJdOGhF2FOmnVNBFsqD+wFFe2byXWy4cpwGgJTQq6CJZ8OTb9bjDPHW3SIhU0EVOkrvz5Nv1XDRxOONPGRh2HOnHVNBFTtLGhmbebWrh5gvGhh1F+jkVdJGT9NSqXZQUGR89V+eeS7hU0EVOQiLhPL16Fx+ZPJJTBpWFHUf6ORV0kZOwYvt+6g8cZa66WyQCVNBFTsLiVbuoKC3iumlVYUcRUUEX6a2OeIJn1+5m9tQq3fdcIkEFXaSXfle3l/cPt6u7RSJDBV2kl55evYshFSVcOWVk2FFEABV0kV5pjyV4YX0j108/lfKS4rDjiAAq6CK98krdXppbY9x07qlhRxE5TgVdpBeeXbubyooSPnzWqLCjiByngi7SQ+2xBMtqG7huWhVlJfoVkujQ3ijSQ69s3suh1pgu9ZfIyaigm9kcM9tkZnVmdlcXbT5hZuvNrNbMfpbdmCLR8dza3VSWl/DhyTq7RaKl26shzKwYeAC4DtgJvGVmi919fUqbycDXgSvcfb+Zjc5VYJEwdcQTLFvfyOxpVTq7RSInkyP0i4E6d9/i7u3AImBupzZ/Cjzg7vsB3L0puzFFouHVzfs4cKSDm9TdIhFk7n7iBmbzgDnu/vlg/jbgEndfmNLmSeAd4AqgGPiGuz+f5rUWAAsAqqqqqhctWtSr0C0tLQwePLhXz821qGaLai6IbrZ0uR5e18abu2P86zUDKSsOZ6i5qG4viG62qOaCnmebNWvWCnefmXalu5/wAcwDHkqZvw24v1ObZ4AngFLgDGAHMOxEr1tdXe29VVNT0+vn5lpUs0U1l3t0s3XOFYsn/MJ7l/nCn60MJ1AgqtvLPbrZoprLvefZgOXeRV3NpMulHhifMj8uWJZqJ7DY3TvcfSvJo/XJGf25EckTy7e9z77D7cyZrouJJJoyKehvAZPN7AwzKwPmA4s7tXkSuBrAzEYCU4AtWcwpErrnaxsoKyni6rN1MZFEU7cF3d1jwEJgKbABeNTda83sXjO7OWi2FNhnZuuBGuCv3H1frkKL9DV3Z1ltI1dOHqlb5UpkZbRnuvsSYEmnZfekTDvw1eAhUnDW1R+i/sBR7pytnkSJLl0pKpKBpbUNFBcZs6dqZCKJLhV0kQw8X9vAxRNP0UDQEmkq6CLdqGtqoa6phTnn6OwWiTYVdJFuLK1tAOD66epukWhTQRfpxrL1jZw/bihjhg4IO4rICamgi5xAw8FWVu84wPW6mEjygAq6yAm8sKERgBvU3SJ5QAVd5ASW1TYwaeQgzhwVzRs7iaRSQRfpwuEO57XN+7huehVm4dxZUaQnVNBFurB2T5xYwrl+mvrPJT+ooIt0YWVTjJGDy5kxfljYUUQyooIukkZbLM6aPXGum1ZFUZG6WyQ/qKCLpPHa5n20xnUxkeQXFXSRNJatb6SiGC4/c0TYUUQypoIu0kki4bywvpFzRhZTXlIcdhyRjKmgi3SyaucB9jS3UV2lgSwkv6igi3SyrLaRkiLjvFE6Opf8ooIu0smy9Q1cduYIBpXq7BbJLyroIinqmlrYsucw10/T2S2Sf1TQRVK8sD55M67ZKuiSh1TQRVIsW9/Aebr3ueQpFXSRQNOhVt7efkDdLZK3VNBFAsfufa7BLCRfqaCLBJbVNjJxxEAmj9a9zyU/qaCLAIdaO3h1816un36q7n0ueUsFXQSo2dhER9y5Qd0tksdU0EWA59c1MLpS9z6X/KaCLv1ea0eclzbt4frpuve55DcVdOn3Xn5nD0c74upukbyngi793tLaRoZUlHDpJN37XPKbCrr0ax3xBL/a0MjsqVWUFuvXQfJbRnuwmc0xs01mVmdmd52g3S1m5mY2M3sRRXLnza3vc/Bohy4mkoLQbUE3s2LgAeBGYBrwKTOblqZdJXAn8Ea2Q4rkyvPrGqgoLeKqKaPCjiJy0jI5Qr8YqHP3Le7eDiwC5qZpdx/wLaA1i/lEciaRcJbWNnDVlFEMKNNgFpL/zN1P3MBsHjDH3T8fzN8GXOLuC1PaXAjc7e63mNlLwF+6+/I0r7UAWABQVVVVvWjRol6FbmlpYfDgaF6eHdVsUc0F4WV7Z3+cv3+jlS+cV85lp/3+cHNR3WZRzQXRzRbVXNDzbLNmzVrh7um7td39hA9gHvBQyvxtwP0p80XAS8DEYP4lYGZ3r1tdXe29VVNT0+vn5lpUs0U1l3t42f7mqXU++e4lfuhoe9r1Ud1mUc3lHt1sUc3l3vNswHLvoq5m0uVSD4xPmR8XLDumEjgHeMnMtgGXAov1xahEWSLhPL+ugSsnj6KyojTsOCJZkUlBfwuYbGZnmFkZMB9YfGylux9095HuPtHdJwKvAzd7mi4Xkah4e8d+Gg618tHzdHaLFI5uC7q7x4CFwFJgA/Cou9ea2b1mdnOuA4rkwrNrGigrLuLaqRrMQgrH738TlIa7LwGWdFp2Txdtrz75WCK5k0g4z63bzZVTRjJE3S1SQHRpnPQ7q3YeYPfBVm48Z0zYUUSySgVd+p0la3ZTWmzM1tihUmBU0KVfcXeeW9fARyaPYugAdbdIYVFBl35l5fb91B84ykfPVXeLFB4VdOlXnlq1i/KSIm44R6crSuFRQZd+oyOe4Nk1u5k9rYrB5Rmd4CWSV1TQpd94pW4v+w63M/f808KOIpITKujSbyxevYshFSVcdbZulSuFSQVd+oXWjjhL1zVw07ljKC/RrXKlMKmgS7/w4oYmDrfHuVndLVLAVNClX3hqVT2jK8u5RANBSwFTQZeCd+BIOy9t2sMfnHcaxUUWdhyRnFFBl4K3ePUu2uMJbqkeG3YUkZxSQZeC99jynUwbM4Tppw0NO4pITqmgS0Hb2HCItfUHuXXmuLCjiOScCroUtMeW76S02Jh7gbpbpPCpoEvB6ognePLtemZPreKUQWVhxxHJORV0KVi/3tjEvsPtzKtWd4v0DyroUrAeW76TUZXlXDVFl/pL/6CCLgWpqbmVmk1NfHzGWEqKtZtL/6A9XQrSL97cQTzhfPKi8WFHEekzKuhScGLxBD97czsfmTySSaMGhx1HpM+ooEvBeXFjE7sPtvLpSyeEHUWkT6mgS8H5yevvcdrQCq790Oiwo4j0KRV0KShb9rTw23f38seXnK4vQ6Xf0R4vBeUnr2+ntNj4hL4MlX5IBV0KxpH2GI+t2MGcc8YwurIi7DgifU4FXQrGL1fspLk1xmcu05eh0j+poEtBiMUT/OC3W7jw9GHMnDA87DgioVBBl4KwZF0DO94/yheuOhMzjUok/VNGBd3M5pjZJjOrM7O70qz/qpmtN7M1Zvaimen/vNJn3J3v/WYzk0YN4rqpVWHHEQlNtwXdzIqBB4AbgWnAp8xsWqdmbwMz3f084JfAP2Y7qEhXXqnbR+2uQ3zhykkUacxQ6ccyOUK/GKhz9y3u3g4sAuamNnD3Gnc/Esy+Duh+pdJnHvzNZkZXlvOxGRrEQvo3c/cTNzCbB8xx988H87cBl7j7wi7a3w80uPvfplm3AFgAUFVVVb1o0aJehW5paWHw4GjeoyOq2aKaC04u29aDcb75WiufmFLKTZOyO4hFVLdZVHNBdLNFNRf0PNusWbNWuPvMtCvd/YQPYB7wUMr8bcD9XbT9NMkj9PLuXre6utp7q6amptfPzbWoZotqLveTy/aZH77h539zqR882p69QIGobrOo5nKPbrao5nLveTZguXdRVzPpcqkHUi+7Gxcs+wAzmw3cDdzs7m2Z/rUR6a23tr3Pb97ZwxevOpMhFaVhxxEJXSYF/S1gspmdYWZlwHxgcWoDM5sBfI9kMW/KfkyRD3J3vv38JkZVlnP7ZRPDjiMSCd0WdHePAQuBpcAG4FF3rzWze83s5qDZt4HBwGNmtsrMFnfxciJZ8fK7e3lz2/v8+TVnMaCsOOw4IpFQkkkjd18CLOm07J6U6dlZziXSJXfnn5ZuYuywAcy/6PSw44hEhq4Ulbzz7NrdrK0/yFdmT6asRLuwyDH6bZC8cqQ9xt8/u4GpY4bwRzrvXOQDVNAlrzxQU8eug63cO3e6BrAQ6US/EZI3tu49zA9e3srHZ4zloomnhB1HJHJU0CUvuDvffLqWspIi7rrxQ2HHEYkkFXTJC0trG3hp0x6+Mnsyo4doNCKRdFTQJfL2trRx9xPrmDZmCLdfPjHsOCKRldF56CJhcXfufmItza0xfvanF1CqL0JFuqTfDom0x1fWs7S2kb+8YQpnn1oZdhyRSFNBl8iqP3CUbyyu5eKJp3DHhyeFHUck8lTQJZLaYnG+/NOVJNz5p1vPp1gjEYl0S33oEjnuzj1P1rJqxwEe/PSFnD5iYNiRRPKCjtAlcn76xnZ+sXwHX551JnPOGRN2HJG8oYIukfLWtvf55tO1XH32KL563dlhxxHJKyroEhnrdx3ijh+9xfjhA/nO/BnqNxfpIRV0iYStew/zmYffYFB5CY/ccTFDB2hIOZGeUkGX0O07muDTD72BO/z4jksYN1xfgor0hs5ykVBt3tPC37/RSrsX8/MFl3LW6MFhRxLJWyroEpq3t+/ncz96i1jC+dmCSzln7NCwI4nkNRV0CcWvNzby5Z++zajKcr48vYRzx6mYi5ws9aFLn4onnH9etok7/mM5Z44exC+/dBlVg7QbimSDjtClzzQ1t3Lnz1fx2pZ9fHLmeL45dzoVpcWsDzuYSIFQQZecc3eeeLue+55Zz9GOOP906/nMqx4XdiyRgqOCLjm1fd8R7n5yLb99dy8Xnj6Mb91yHpOrdBtckVxQQZec2NfSxgM1m/nJ6+9RVlLEfXOn898umUCRrv4UyRkVdMmqPc1t/Pi1bfzwd1s52hHn1urx/MV1Uzh1qMYBFck1FXTJivW7DvHvr2zlqVW7aI8nuPGcU/na9WfrQiGRPqSCLr22p7mNxat38fjKndTuOkRFaRGfuGgcf3LFGZw5SoVcpK+poEvG3J26phZe3NjEr9Y3snL7fhIO544dyt/84TQ+dsFYhg8qCzumSL+lgi5d6ogn2NTQzKodB3hj6/u8vmUfe5rbAJh+2hAWXjOZPzxvjM5aEYkIFXQhkXAam1vZsucw7zQ2805jCxsbDrF+1yHaYgkARleWc9mkEVw6aQRXnT2KscMGhJxaRDrLqKCb2RzgO0Ax8JC7/0On9eXAI0A1sA/4pLtvy25U6Y22WJwDRzp471CclzY1sae5jabmNuoPHGX3gaPUHzjKe/uOHC/cAEMHlHL2qZV8+tIJnDduKOePG8aEEQMx0ymHIlHWbUE3s2LgAeA6YCfwlpktdvfUK7bvAPa7+1lmNh/4FvDJXATOZ4mEE3cnnkg+Ysd+xhN0JJx43OlIJOiIJ+iIOe3xOG2xBO3BozWWoLUjTltHnKMdcY60xznaHudwe4zDbXGaW2O0tHVw8GiMQ0c7OHi0g5a22H8FePWt45PDB5Zy2rABTBgxiKumjGLCiEFMHDGIKVWDGVVZruItkocyOUK/GKhz9y0AZrYImAsfuAXHXOAbwfQvgfvNzNzds5gVgEff2sH//e0RBq78DZD8oi4d72Lm2KS7f6DNsZdxHPeU+ZR27sn1iePrj00n2yQSTkcsRtGvnyfhEHfHgwKeyPqWSCovKWJQeQmDy0sYVF5CZXkJY4cNYOqYSoYOKGXEoDKGDypj19Z3ueayCxk1uIJRleUMKCvOTSARCY11V3PNbB4wx90/H8zfBlzi7gtT2qwL2uwM5jcHbfZ2eq0FwAKAqqqq6kWLFvU48NtNMV7e3kpJyX/9LcrkWDK1TerBp6VpZNjx5Wa//1w71tSC21UGy4oMYh0dlJWWYmYUpSwvsuRzi1IexWYUH5suguJgWUkRlATzpUVGaTBfWmyUFUFpMZQXG+XFUJThkXRLSwuDB0fzVMKoZlOunotqtqjmgp5nmzVr1gp3n5l2pQdHkV09gHkk+82Pzd8G3N+pzTpgXMr8ZmDkiV63urrae6umpqbXz821qGaLai736GZTrp6Larao5nLveTZguXdRVzO5EXU9MD5lflywLG0bMysBhpL8clRERPpIJgX9LWCymZ1hZmXAfGBxpzaLgduD6XnAr4O/JCIi0ke6/VLU3WNmthBYSvK0xYfdvdbM7iV56L8Y+CHwYzOrA94nWfRFRKQPZXQeursvAZZ0WnZPynQrcGt2o4mISE9oMEcRkQKhgi4iUiBU0EVECoQKuohIgej2StGcvbHZHuC9Xj59JLC321bhiGq2qOaC6GZTrp6Larao5oKeZ5vg7qPSrQitoJ8MM1vuXV36GrKoZotqLohuNuXquahmi2ouyG42dbmIiBQIFXQRkQKRrwX9+2EHOIGoZotqLohuNuXquahmi2ouyGK2vOxDFxGR35evR+giItKJCrqISIGIbEE3s1vNrNbMEmY2s9O6r5tZnZltMrMbunj+GWb2RtDuF8Gtf3OR8xdmtip4bDOzVV2022Zma4N2y3ORpdP7fcPM6lOy3dRFuznBdqwzs7tynSt4z2+b2UYzW2NmT5jZsC7a9ck2624bmFl58DnXBfvUxFxlSXnP8WZWY2brg9+DO9O0udrMDqZ8xveke60c5TvhZ2NJ/xpsszVmdmEfZDo7ZVusMrNDZvaVTm36bJuZ2cNm1hSM6HZs2Slm9oKZvRv8HN7Fc28P2rxrZrena5NWVyNfhP0ApgJnAy8BM1OWTwNWA+XAGSRHRypO8/xHgfnB9IPAl/og8z8D93SxbhvdjOKU5SzfAP6ymzbFwfabBJQF23VaH2S7HigJpr8FfCusbZbJNgD+DHgwmJ4P/KIPttEY4MJguhJ4J02uq4Fn+mqf6slnA9wEPEdyFMZLgTf6OF8x0EDyIpxQthlwJXAhsC5l2T8CdwXTd6Xb94FTgC3Bz+HB9PBM3jOyR+juvsHdN6VZNRdY5O5t7r4VqCM5kPVxlhyy/hqSA1YD/AfwsVzmDd7zE8DPc/k+WXZ8AHB3bweODQCeU+6+zN1jwezrJEfBCksm22AuyX0IkvvUtcHnnTPuvtvdVwbTzcAGYGwu3zPL5gKPeNLrwDAzG9OH738tsNnde3s1+klz95dJjg+RKnVf6qou3QC84O7vu/t+4AVgTibvGdmCfgJjgR0p8zv5/R19BHAgpWika5NtHwEa3f3dLtY7sMzMVgSDZfeFhcF/dx/u4r92mWzLXPscySO5dPpim2WyDY63CfapgyT3sT4RdPHMAN5Is/oyM1ttZs+Z2fS+ykT3n03Y+9Z8uj64CmubAVS5++5gugGoStOm19suo1VW6HcAAALHSURBVAEucsXMfgWcmmbV3e7+VF/n6UqGOT/FiY/OP+zu9WY2GnjBzDYGf8Fzkgv4N+A+kr9495HsDvrcybxftrId22ZmdjcQA37axctkfZvlGzMbDPwn8BV3P9Rp9UqSXQotwXckTwKT+yhaZD+b4Puym4Gvp1kd5jb7AHd3M8vqeeOhFnR3n92Lp2UyaPU+kv/FKwmOqNK1yVh3OS05MPbHgeoTvEZ98LPJzJ4g+V/9k/oFyHT7mdkPgGfSrMpkW/ZKBtvss8AfANd60HGY5jWyvs3S6Mkg6DutDwdBN7NSksX8p+7+eOf1qQXe3ZeY2XfNbKS75/wmVBl8NjnbtzJwI7DS3Rs7rwhzmwUazWyMu+8OuqCa0rSpJ9nXf8w4kt8ldisfu1wWA/ODMw/OIPnX9c3UBkGBqCE5YDUkB7DO5RH/bGCju+9Mt9LMBplZ5bFpkl8KrkvXNls69Vf+URfvl8kA4LnINgf478DN7n6kizZ9tc0iOQh60Ef/Q2CDu/9LF21OPdaXb2YXk/x97os/NJl8NouBzwRnu1wKHEzpasi1Lv+3HNY2S5G6L3VVl5YC15vZ8KCr9PpgWff64tveXn5D/Eck+47agEZgacq6u0membAJuDFl+RLgtGB6EslCXwc8BpTnMOuPgC92WnYasCQly+rgUUuy2yHX2+/HwFpgTbATjemcK5i/ieQZFJv7IlfwnnUk+whXBY8HO2fry22WbhsA95L8gwNQEexDdcE+NakPttGHSXaXrUnZTjcBXzy2rwELg22zmuSXy5f30eeX9rPplM2AB4JtupaUM9VynG0QyQI9NGVZKNuM5B+V3UBHUMvuIPndy4vAu8CvgFOCtjOBh1Ke+7lgf6sD/iTT99Sl/yIiBSIfu1xERCQNFXQRkQKhgi4iUiBU0EVECoQKuohIgVBBFxEpECroIiIF4v8DMUNczyVSIaIAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ItbgATl7tE0G",
"colab_type": "text"
},
"source": [
"### As an example, we will start working on the famous Titanic Dataset hosted on Kaggle. This example will also help us understand some data pre processing and how to draw inference and make next steps.\n",
"\n",
"Download From : \n",
"https://www.kaggle.com/c/titanic/data or from Drive Link : https://drive.google.com/drive/folders/19mWZBWwEy1aRwHweSG4zlM81rOuGJ3QE?usp=sharing\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "oykQKWNotUX1",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np \n",
"import pandas as pd \n",
"import seaborn as sns\n",
"%matplotlib inline\n",
"from matplotlib import pyplot as plt\n",
"from matplotlib import style\n",
"\n",
"# Algorithms\n",
"from sklearn import linear_model\n",
"from sklearn.linear_model import LogisticRegression\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "7BHDjA76uPFa",
"colab_type": "code",
"colab": {}
},
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VKbFxzy8tX_q",
"colab_type": "code",
"colab": {}
},
"source": [
"test_df = pd.read_csv(\"/content/drive/My Drive/ML-DL101 Datasets/test_titanic.csv\")\n",
"train_df = pd.read_csv(\"/content/drive/My Drive/ML-DL101 Datasets/train_titanic.csv\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "KSPLWYdjr4N-",
"colab_type": "code",
"colab": {}
},
"source": [
"test_df = pd.read_csv(\"test.csv\")\n",
"train_df = pd.read_csv(\"train.csv\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "e7Pb5nNeu-8Q",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 340
},
"outputId": "a731bbbd-ef38-4996-d434-5fabcf2a94bf"
},
"source": [
"train_df.info()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 PassengerId 891 non-null int64 \n",
" 1 Survived 891 non-null int64 \n",
" 2 Pclass 891 non-null int64 \n",
" 3 Name 891 non-null object \n",
" 4 Sex 891 non-null object \n",
" 5 Age 714 non-null float64\n",
" 6 SibSp 891 non-null int64 \n",
" 7 Parch 891 non-null int64 \n",
" 8 Ticket 891 non-null object \n",
" 9 Fare 891 non-null float64\n",
" 10 Cabin 204 non-null object \n",
" 11 Embarked 889 non-null object \n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.7+ KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "h0UBHi6ivEUe",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 297
},
"outputId": "b771855a-bb1b-44fa-ed49-a0ec5b0fbf3d"
},
"source": [
"train_df.describe()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>714.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" <td>891.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>446.000000</td>\n",
" <td>0.383838</td>\n",
" <td>2.308642</td>\n",
" <td>29.699118</td>\n",
" <td>0.523008</td>\n",
" <td>0.381594</td>\n",
" <td>32.204208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>257.353842</td>\n",
" <td>0.486592</td>\n",
" <td>0.836071</td>\n",
" <td>14.526497</td>\n",
" <td>1.102743</td>\n",
" <td>0.806057</td>\n",
" <td>49.693429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.420000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>223.500000</td>\n",
" <td>0.000000</td>\n",
" <td>2.000000</td>\n",
" <td>20.125000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>7.910400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>446.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3.000000</td>\n",
" <td>28.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>14.454200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>668.500000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>38.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>31.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>891.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>80.000000</td>\n",
" <td>8.000000</td>\n",
" <td>6.000000</td>\n",
" <td>512.329200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass ... SibSp Parch Fare\n",
"count 891.000000 891.000000 891.000000 ... 891.000000 891.000000 891.000000\n",
"mean 446.000000 0.383838 2.308642 ... 0.523008 0.381594 32.204208\n",
"std 257.353842 0.486592 0.836071 ... 1.102743 0.806057 49.693429\n",
"min 1.000000 0.000000 1.000000 ... 0.000000 0.000000 0.000000\n",
"25% 223.500000 0.000000 2.000000 ... 0.000000 0.000000 7.910400\n",
"50% 446.000000 0.000000 3.000000 ... 0.000000 0.000000 14.454200\n",
"75% 668.500000 1.000000 3.000000 ... 1.000000 0.000000 31.000000\n",
"max 891.000000 1.000000 3.000000 ... 8.000000 6.000000 512.329200\n",
"\n",
"[8 rows x 7 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "EW2l3yuIvJ2q",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "bc261a4a-c6dd-4df7-d75e-771f8bcf194d"
},
"source": [
"train_df.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PassengerId</th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Ticket</th>\n",
" <th>Fare</th>\n",
" <th>Cabin</th>\n",
" <th>Embarked</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Braund, Mr. Owen Harris</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>A/5 21171</td>\n",
" <td>7.2500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>PC 17599</td>\n",
" <td>71.2833</td>\n",
" <td>C85</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Heikkinen, Miss. Laina</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>STON/O2. 3101282</td>\n",
" <td>7.9250</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>113803</td>\n",
" <td>53.1000</td>\n",
" <td>C123</td>\n",
" <td>S</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Allen, Mr. William Henry</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>373450</td>\n",
" <td>8.0500</td>\n",
" <td>NaN</td>\n",
" <td>S</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PassengerId Survived Pclass ... Fare Cabin Embarked\n",
"0 1 0 3 ... 7.2500 NaN S\n",
"1 2 1 1 ... 71.2833 C85 C\n",
"2 3 1 3 ... 7.9250 NaN S\n",
"3 4 1 1 ... 53.1000 C123 S\n",
"4 5 0 3 ... 8.0500 NaN S\n",
"\n",
"[5 rows x 12 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7h7PKBWuwTAR",
"colab_type": "text"
},
"source": [
"The head of the data gives us an indication of various parameters that need to be converted into numeric form for prediction. "
]
},
{
"cell_type": "code",
"metadata": {
"id": "6LsdW4ojwigB",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "34c00273-4d28-4bc9-a8d4-7c32fed40a55"
},
"source": [
"total = train_df.isnull().sum().sort_values(ascending=False)\n",
"percent_1 = train_df.isnull().sum()/train_df.isnull().count()*100\n",
"percent_2 = (round(percent_1, 1)).sort_values(ascending=False)\n",
"missing_data = pd.concat([total, percent_2], axis=1, keys=['Total', '% missing'])\n",
"missing_data.head(5)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Total</th>\n",
" <th>% missing</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Cabin</th>\n",
" <td>687</td>\n",
" <td>77.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Age</th>\n",
" <td>177</td>\n",
" <td>19.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Embarked</th>\n",
" <td>2</td>\n",
" <td>0.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Fare</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Ticket</th>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Total % missing\n",
"Cabin 687 77.1\n",
"Age 177 19.9\n",
"Embarked 2 0.2\n",
"Fare 0 0.0\n",
"Ticket 0 0.0"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LHm7kQY8w7Zs",
"colab_type": "text"
},
"source": [
"From above data, we can make following inferences:\n",
"\n",
"1. Around 77% of cabin data is missing. This % is huge. So, we can't afford to drop data row-wise. It's better to delete this column entirely since majority values are missing anyways.\n",
"\n",
"2. Embarked value can be easily filled.\n",
"\n",
"3. With some common sense, we can eliminate variables like PassengerId, Name and Ticket as we don't expect them to have much corelation with survival chance."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Vr0zZm8mPZFu",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"outputId": "b7385e8d-d5cf-4a4b-d45c-4a2eebe8509c"
},
"source": [
"survived = 'survived'\n",
"not_survived = 'not survived'\n",
"fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(10, 4))\n",
"women = train_df[train_df['Sex']=='female']\n",
"men = train_df[train_df['Sex']=='male']\n",
"ax = sns.distplot(women[women['Survived']==1].Age.dropna(), bins=18, label = survived, ax = axes[0], kde =False)\n",
"ax = sns.distplot(women[women['Survived']==0].Age.dropna(), bins=40, label = not_survived, ax = axes[0], kde =False)\n",
"ax.legend()\n",
"ax.set_title('Female')\n",
"ax = sns.distplot(men[men['Survived']==1].Age.dropna(), bins=18, label = survived, ax = axes[1], kde = False)\n",
"ax = sns.distplot(men[men['Survived']==0].Age.dropna(), bins=40, label = not_survived, ax = axes[1], kde = False)\n",
"ax.legend()\n",
"_ = ax.set_title('Male')"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlAAAAEWCAYAAACpC6mpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAeaklEQVR4nO3dfZBcdZ3v8feXJBokmECYYnkKE1hlE0IYZSDBuIjghiDcYK7BJEAACzfKQxn3WtxlXd2LV7dKtnBdZe9F8WICSCABRCjwCREFnyKTGCAPIIuJIRIJBozAgibhe//oTnZIZjJ9erqne2ber6qp6T59+pzv6TPznc+cPv07kZlIkiSpcns1ugBJkqT+xgAlSZJUkAFKkiSpIAOUJElSQQYoSZKkggxQkiRJBRmg1FQiojUiMiKGNroWSdqVPUo7GKC0m4hYFxGvRMRLnb4ObnRdktRb5f7254g4YJfpvywHo9bGVKb+xgCl7vy3zBzR6euZRhckSTWyFpiz405EHAO8qXHlqD8yQKkiETEyIq6PiI0R8duI+GxEDCk/dmFE/CQivhARf4iIX0fEO8rTn46ITRFxQadlnVH+b++P5cevrGa9klSlm4DzO92/ALhxxx17lCphgFKlFgLbgL8E3gZMBT7U6fFJwKPAaGARcCtwfHn+84B/j4gR5XlfptS8RgFnABdHxPuqXK8kFfVz4M0RMa4cdmYDX+/0uD1KPQqvhaddRcQ64ABKTQHgZ8ApwKjMfKU8zxxgXma+OyIuBP4xM99SfuwYSmHqLzLz2fK0zcCpmbmii/X9G5CZ+Xfl8w/WAsMohbH13a23DpsuaYAr97cPAZOBfYAfAR8HTge2AmMzc90uz7FHaTd+ikDdeV9mfh8gIk4ATgM2RsSOx/cCnu40/7Odbr8CsCM8dZo2ory8ScDngAnAG4A3Ard1UcPhlJrUntYrSdW4CXgQGEunt+/AHqXKGKBUiaeBPwEHZOa2nmauwCLg34HTM/PV8n93B3QxX63XK0kAZOZvImIt8F7gol0etkepR54DpR5l5kbge8DnI+LNEbFXRBwZEe+qcpH7As+XG9MJwDl9tF5J6uwi4JTMfHmX6fYo9cgApUqdT+lQ9mrgBeB24KAql3UJ8L8j4kXgn4AlfbReSdopM5/KzI4uHrJHqUeeRC5JklSQR6AkSZIKMkBJkiQVZICSJEkqyAAlSZJUUJ+OA3XAAQdka2trX65SUoMtW7bs95nZ0ug6esv+JQ0+e+pffRqgWltb6ejo6hOjkgaqiPhNo2uoBfuXNPjsqX/5Fp4kSVJBBihJkqSCDFCSJEkFeTFh9Xtbt25lw4YNvPrqq40uZVAbPnw4hx56KMOGDWt0KVK/Yg9rvGr6lwFK/d6GDRvYd999aW1tJSIaXc6glJls3ryZDRs2MHbs2EaXI/Ur9rDGqrZ/+Rae+r1XX32V0aNH23gaKCIYPXq0/0FLVbCHNVa1/csApQHBxtN47gOpev7+NFY1r78BSpIkqSDPgdKAs2jp+pou75xJY2q6vErdfffdrF69miuuuKLXyxoxYgQvvfRSDaqSVG8DoYcNhv5lgBpAqvmla1Q4UMm2bdsYOrTrX8Pp06czffr0Pq5I6qRjwe7T2j/Y93WoKQ32/uVbeFINvPzyy5xxxhkce+yxTJgwgcWLF9Pa2srvf/97ADo6Ojj55JMBuPLKK5k7dy5Tpkxh7ty5TJ48mVWrVu1c1sknn0xHRwcLFy7ksssuY8uWLRx++OG89tprO9d12GGHsXXrVp566immTZvGcccdx1//9V/z+OOPA7B27VpOPPFEjjnmGD75yU/27YshqV+xf1XHACXVwHe+8x0OPvhgHnnkEVauXMm0adP2OP/q1av5/ve/zy233MKsWbNYsmQJABs3bmTjxo20t7fvnHfkyJG0tbXxox/9CIB77rmH0047jWHDhjFv3jyuueYali1bxtVXX80ll1wCwPz587n44ot57LHHOOigg+q01ZIGAvtXdQxQUg0cc8wx3Hffffz93/89Dz30ECNHjtzj/NOnT2fvvfcG4AMf+AC33347AEuWLGHmzJm7zT9r1iwWL14MwK233sqsWbN46aWX+OlPf8rZZ59NW1sbH/7wh9m4cSMAP/nJT5gzZw4Ac+fOrdl2Shp47F/V8RwoqQbe+ta3snz5cr71rW/xyU9+klNPPZWhQ4fuPGy96/gi++yzz87bhxxyCKNHj+bRRx9l8eLFfPnLX95t+dOnT+cTn/gEzz//PMuWLeOUU07h5ZdfZtSoUaxYsaLLmvxYtKRK2L+q4xEoqQaeeeYZ3vSmN3Heeedx+eWXs3z5clpbW1m2bBkAd9xxxx6fP2vWLP7lX/6FLVu2MHHixN0eHzFiBMcffzzz58/nzDPPZMiQIbz5zW9m7Nix3HbbbUBpNN1HHnkEgClTpnDrrbcCcPPNN9dyUyUNMPav6ngESgNOIz5Z+Nhjj3H55Zez1157MWzYMK699lpeeeUVLrroIj71qU/tPAGzOzNnzmT+/Pl86lOf6naeWbNmcfbZZ/PDH/5w57Sbb76Ziy++mM9+9rNs3bqV2bNnc+yxx/LFL36Rc845h6uuuoqzzjqrRlspqS/0dQ+zf1UnMrPPVtbe3p4dHR19tr7BZrAOY7BmzRrGjRvX6DJE1/siIpZlZns3T+k3BmX/6moYg+44vEHV7GHNoWj/8i08SZKkggxQkiRJBRmgJEmSCjJASZIkFWSAkiRJKsgAJUmSVJDjQGngKfLR60rU4ePZCxcuZOrUqRx88ME1X3Z33vGOd/DTn/6018u58MILOfPMM7u8ZIOkGrCHdanZephHoKQGWLhwIc8880xNl5mZOy+90JVaNB5JAnsYGKCkXlu3bh3jxo3jb//2bzn66KOZOnUqr7zyCgArVqxg8uTJTJw4kRkzZvDCCy9w++2309HRwbnnnktbW9vOeXf40pe+xPjx45k4cSKzZ88G4Morr+Tqq6/eOc+ECRNYt24d69at46ijjuL8889nwoQJfOYzn+Hyyy/fOd/ChQu57LLLgNLlFABmz57Nvffeu3OeCy+8kNtvv53t27dz+eWXc/zxxzNx4kS+8pWvAKWmdtlll3HUUUfxnve8h02bNtXhVZTUKPaw6higpBp48sknufTSS1m1ahWjRo3aee2o888/n6uuuopHH32UY445hk9/+tPMnDmT9vZ2br75ZlasWLHzquY7fO5zn+OXv/wljz76aJcX5uxq3ZdccgmrVq3ikksu4c4779z52OLFi3c2sB1mzZrFkiVLAPjzn//M/fffzxlnnMH111/PyJEjefjhh3n44Yf56le/ytq1a7nzzjt54oknWL16NTfeeGPT/RcoqffsYcUZoKQaGDt2LG1tbQAcd9xxrFu3ji1btvCHP/yBd73rXQBccMEFPPjggz0ua+LEiZx77rl8/etfZ+jQnk9TPPzww5k8eTIALS0tHHHEEfz85z9n8+bNPP7440yZMuV1859++uk88MAD/OlPf+Lb3/42J510EnvvvTff+973uPHGG2lra2PSpEls3ryZJ598kgcffJA5c+YwZMgQDj74YE455ZSiL4+kJmcPK67HABURh0XEAxGxOiJWRcT88vQrI+K3EbGi/PXemlQk9UNvfOMbd94eMmQI27Ztq3pZ9957L5deeinLly/n+OOPZ9u2bQwdOvR15wa8+uqrO2/vs88+r3v+7NmzWbJkCXfccQczZswgIl73+PDhwzn55JP57ne/y+LFi5k1axZQOsx9zTXXsGLFClasWMHatWuZOnVq1dvRDOxfUmXsYcVVcgRqG/DxzBwPTAYujYjx5ce+kJlt5a9v1a1KqR8aOXIk++23Hw899BAAN910087/5Pbdd19efPHF3Z7z2muv8fTTT/Pud7+bq666ii1btvDSSy/R2trK8uXLAVi+fDlr167tdr0zZszgrrvu4pZbbtnt0PcOs2bNYsGCBTz00ENMmzYNgNNOO41rr72WrVu3AvCrX/2Kl19+mZNOOonFixezfft2Nm7cyAMPPFD9i9L37F9Slexhe9bjsbXM3AhsLN9+MSLWAIfUZO1SPTTRVeFvuOEGPvKRj/Cf//mfHHHEESxYUPp48oUXXshHPvIR9t57b372s5/tPIdg+/btnHfeeWzZsoXM5KMf/SijRo3i/e9/PzfeeCNHH300kyZN4q1vfWu369xvv/0YN24cq1ev5oQTTuhynqlTpzJ37lzOOuss3vCGNwDwoQ99iHXr1vH2t7+dzKSlpYVvfvObzJgxgx/84AeMHz+eMWPGcOKJJ9b4Vaof+5f6JXtYv+hhkZmVzxzRCjwITAD+B3Ah8Eegg9J/eS908Zx5wDyAMWPGHPeb3/ymtzWrG4uWri/8nHMmjalDJX1rzZo1jBs3rtFliK73RUQsy8z2BpXUuY5W7F/FFBmPqB5/9LtafxOFi1qxhzWHov2r4pPII2IEcAfwscz8I3AtcCTQRuk/vM939bzMvC4z2zOzvaWlpdLVSVLN2L8k1VpFASoihlFqPjdn5jcAMvPZzNyema8BXwW6Ps4mSQ1k/5JUD5V8Ci+A64E1mfmvnaYf1Gm2GcDK2pcnVabIW9Gqj2bcB/Yv9RfN+PszmFTz+ldyLbwpwFzgsYhYUZ72CWBORLQBCawDPlx47VINDB8+nM2bNzN69OjdPu6qvpGZbN68meHDhze6lF3Zv9T07GGNVW3/quRTeD8GutqjfuxXTeHQQw9lw4YNPPfcc40uZVAbPnw4hx56aKPLeB37l/oDe1jjVdO/KjkCJTW1YcOGMXbs2EaXIUlVsYf1TwYoSVJ9NHoYBKmOvBaeJElSQQYoSZKkggxQkiRJBRmgJEmSCjJASZIkFWSAkiRJKsgAJUmSVJABSpIkqSADlCRJUkEGKEmSpIIMUJIkSQUZoCRJkgoyQEmSJBVkgJIkSSpoaKMLGCwWLV1f+DnnTBpTh0okqQl1LGh0BVIhHoGSJEkqyAAlSZJUkAFKkiSpIAOUJElSQQYoSZKkggxQkiRJBTmMgfqEwzhIkgYSj0BJkiQVZICSJEkqyAAlSZJUkAFKkiSpoB4DVEQcFhEPRMTqiFgVEfPL0/ePiPsi4sny9/3qX64kVc7+JaleKjkCtQ34eGaOByYDl0bEeOAK4P7MfAtwf/m+JDUT+5ekuugxQGXmxsxcXr79IrAGOAQ4C7ihPNsNwPvqVaQkVcP+JaleCp0DFRGtwNuApcCBmbmx/NDvgAO7ec68iOiIiI7nnnuuF6VKUvXsX5JqqeIAFREjgDuAj2XmHzs/lpkJZFfPy8zrMrM9M9tbWlp6VawkVcP+JanWKgpQETGMUvO5OTO/UZ78bEQcVH78IGBTfUqUpOrZvyTVQyWfwgvgemBNZv5rp4fuBi4o374AuKv25UlS9exfkuqlkmvhTQHmAo9FxIrytE8AnwOWRMRFwG+AD9SnREmqmv1LUl30GKAy88dAdPPwqbUtR5Jqx/4lqV4ciVySJKmgSt7Ck15n0dL1jS5BkqSG8giUJElSQQYoSZKkggxQkiRJBRmgJEmSCjJASZIkFWSAkiRJKsgAJUmSVJABSpIkqSADlCRJUkEGKEmSpIIMUJIkSQUZoCRJkgoyQEmSJBVkgJIkSSpoaKMLkCSpLjoW7D6t/YN9X4cGJI9ASZIkFWSAkiRJKsgAJUmSVJABSpIkqSADlCRJUkEGKEmSpIIcxkCS1PVH/ptRb4cmcGgD1YhHoCRJkgoyQEmSJBVkgJIkSSrIACVJklRQjwEqIr4WEZsiYmWnaVdGxG8jYkX56731LVOSqmMPk1QPlRyBWghM62L6FzKzrfz1rdqWJUk1sxB7mKQa6zFAZeaDwPN9UIsk1Zw9TFI99GYcqMsi4nygA/h4Zr7Q1UwRMQ+YBzBmzJherE71sGjp+kaXIDVKjz2sN/2rmt+tcyYN7h65dG3xnDtp7P51qETqWbUnkV8LHAm0ARuBz3c3Y2Zel5ntmdne0tJS5eokqaYq6mH2L0ndqSpAZeazmbk9M18DvgqcUNuyJKl+7GGSequqABURB3W6OwNY2d28ktRs7GGSeqvHc6Ai4hbgZOCAiNgA/C/g5IhoAxJYB3y4jjVKUtXsYZLqoccAlZlzuph8fR1qkaSas4dJqgdHIpckSSqoN8MYSE3Fj41LDdSxoNEVSH3KI1CSJEkFGaAkSZIKMkBJkiQVZICSJEkqyAAlSZJUkAFKkiSpIAOUJElSQQYoSZKkggxQkiRJBRmgJEmSCjJASZIkFWSAkiRJKsgAJUmSVJABSpIkqaChjS6gO4uWri/8nHMmjalDJZKkZrV07fM8tb3rvxdHrn9+t2mTxu6/+4wdC7peePsHe1OaBjiPQEmSJBVkgJIkSSrIACVJklSQAUqSJKkgA5QkSVJBBihJkqSCmnYYA1U3lMNAMti3X5LUvDwCJUmSVJABSpIkqSADlCRJUkEGKEmSpIJ6DFAR8bWI2BQRKztN2z8i7ouIJ8vf96tvmZJUHXuYpHqo5AjUQmDaLtOuAO7PzLcA95fvS1IzWog9TFKN9RigMvNBYNdLWp8F3FC+fQPwvhrXJUk1YQ+TVA/VjgN1YGZuLN/+HXBgdzNGxDxgHsCYMWOqXJ0k1VRFPcz+1T8cuf62+iy4Y8Hu09o/WJ91qd/p9UnkmZlA7uHx6zKzPTPbW1paers6SaqpPfUw+5ek7lQboJ6NiIMAyt831a4kSao7e5ikXqk2QN0NXFC+fQFwV23KkaQ+YQ+T1CuVDGNwC/Az4KiI2BARFwGfA/4mIp4E3lO+L0lNxx4mqR56PIk8M+d089CpNa5FkmrOHiapHhyJXJIkqaBqhzGQJKlbS9fuOvTWINPVEAjdcWiEfskjUJIkSQUZoCRJkgoyQEmSJBVkgJIkSSrIACVJklSQAUqSJKkgA5QkSVJBjgMlSRo0qhmfatLY/fc8Q5ExnyrV3TIdM6ppeARKkiSpIAOUJElSQQYoSZKkggxQkiRJBRmgJEmSCjJASZIkFeQwBlIfWLR0feHnnDNpTB0qkYqr5qP/A1aFQxYUec2e2l7qD1X/zndVk8Md1J1HoCRJkgoyQEmSJBVkgJIkSSrIACVJklSQAUqSJKkgA5QkSVJBA2oYAz8qrqKq+ZnpK0Vr82dZ6p+OXH9b6caQ/RtbiArxCJQkSVJBBihJkqSCDFCSJEkFGaAkSZIK6tVJ5BGxDngR2A5sy8z2WhQlSX3BHiapWrX4FN67M/P3NViOJDWCPUxSYb6FJ0mSVFBvj0Al8L2ISOArmXndrjNExDxgHsCYMb0bp2bnWBmdPDXm7F4ts0sdC3af1v7BnTfrNXZQpdvX29ehz15HqfntsYfVon/5+6am18PfPHWtt0eg3pmZbwdOBy6NiJN2nSEzr8vM9sxsb2lp6eXqJKmm9tjD7F+SutOrAJWZvy1/3wTcCZxQi6IkqS/YwyRVq+oAFRH7RMS+O24DU4GVtSpMkurJHiapN3pzDtSBwJ0RsWM5izLzOzWpSpLqzx4mqWpVB6jM/DVwbA1rkaQ+Yw+T1BsOYyBJklRQLQbSlNRPVTMkxzmTejccibpXdH+4L/rG0rXPN7qE2uhquIJ6LXcQDIPgEShJkqSCDFCSJEkFGaAkSZIKMkBJkiQVZICSJEkqyAAlSZJUkAFKkiSpIMeBUreOXH/bbtOeGnN2RfN1pavn9lalNTb7OmqhmjGd1Bg1+90asv9/3a5w3J1FS9dX/DurvlXJeFNPbX/973m3Y4HVa8ynWuuuzn4wjpRHoCRJkgoyQEmSJBVkgJIkSSrIACVJklSQAUqSJKkgA5QkSVJBDmPQILX+GHF3y6v04/aV1tNfP/7cX4YikIp43cfe135+t8f9GdeuKhkqAXjdz9OksfvvYcYC+svQChXyCJQkSVJBBihJkqSCDFCSJEkFGaAkSZIKMkBJkiQVZICSJEkqaEAOY1DkSucVX8C+00c6j+xhmc2k2eqpVpHt6IshGXrz3Hp8tNxhGqTBadHS9Ry5vsKhCfqTroY8aP/gbpMWVfxHvOScSWOqrWg3HoGSJEkqyAAlSZJUkAFKkiSpIAOUJElSQb0KUBExLSKeiIj/iIgralWUJPUFe5ikalUdoCJiCPB/gNOB8cCciBhfq8IkqZ7sYZJ6ozdHoE4A/iMzf52ZfwZuBc6qTVmSVHf2MElVi8ys7okRM4Fpmfmh8v25wKTMvGyX+eYB88p3jwKe6GHRBwC/r6qo+rGmylhTZQZbTYdnZkudll21SnpYFf1rh2bcx/Xgdg4cg2Ebofh2dtu/6j6QZmZeB1xX6fwR0ZGZ7XUsqTBrqow1Vcaa+o+i/WuHwfJ6up0Dx2DYRqjtdvbmLbzfAod1un9oeZok9Qf2MElV602Aehh4S0SMjYg3ALOBu2tTliTVnT1MUtWqfgsvM7dFxGXAd4EhwNcyc1UNaip8uLwPWFNlrKky1tQE6tjDYPC8nm7nwDEYthFquJ1Vn0QuSZI0WDkSuSRJUkEGKEmSpIKaJkA1yyUVIuJrEbEpIlZ2mrZ/RNwXEU+Wv+/Xh/UcFhEPRMTqiFgVEfOboKbhEfGLiHikXNOny9PHRsTS8j5cXD4xt09FxJCI+GVE3NMMNUXEuoh4LCJWRERHeVrD9l15/aMi4vaIeDwi1kTEiY2uaaBolj5Wa83Yh+qp2fpIPQyWPhARf1f+mV0ZEbeU/37VZH82RYCK5rqkwkJg2i7TrgDuz8y3APeX7/eVbcDHM3M8MBm4tPzaNLKmPwGnZOaxQBswLSImA1cBX8jMvwReAC7qw5p2mA+s6XS/GWp6d2a2dRp7pJH7DuCLwHcy86+AYym9Xo2uqd9rsj5Wa83Yh+qpGftIrQ34PhARhwAfBdozcwKlD4vMplb7MzMb/gWcCHy30/1/AP6hgfW0Ais73X8COKh8+yDgiQbWdhfwN81SE/AmYDkwidLorkO72qd9VMuhlH7pTwHuAaIJaloHHLDLtIbtO2AksJbyB0iaoaaB8tVsfazO29pUfajG29Z0faQO2zgo+gBwCPA0sD+lUQfuAU6r1f5siiNQ/NdG7rChPK1ZHJiZG8u3fwcc2IgiIqIVeBuwtNE1lQ9xrwA2AfcBTwF/yMxt5VkasQ//DfifwGvl+6OboKYEvhcRy6J0WRBo7L4bCzwHLCi/RfH/ImKfBtc0UDR7H6uJZupDddKMfaTWBkUfyMzfAlcD64GNwBZgGTXan80SoPqNLEXWPh/7ISJGAHcAH8vMPza6pszcnpltlP5bOwH4q75c/64i4kxgU2Yua2QdXXhnZr6d0ts6l0bESZ0fbMC+Gwq8Hbg2M98GvMwuh+kb9TOu5tdsfajWmriP1Nqg6APlc7jOohQYDwb2YfdTdKrWLAGq2S+p8GxEHARQ/r6pL1ceEcMoNa2bM/MbzVDTDpn5B+ABSodBR0XEjsFZ+3ofTgGmR8Q64FZKh9+/2OCadvwHRGZuAu6kFDYbue82ABsyc2n5/u2UGmlT/Dz1c83ex3qlmftQDTVlH6mDwdIH3gOszcznMnMr8A1K+7gm+7NZAlSzX1LhbuCC8u0LKL3/3yciIoDrgTWZ+a9NUlNLRIwq396b0rkQaygFqZmNqCkz/yEzD83MVko/Pz/IzHMbWVNE7BMR++64DUwFVtLAfZeZvwOejoijypNOBVY3sqYBpNn7WNWasQ/VQzP2kXoYRH1gPTA5It5U/hnesZ212Z+NPsmr08le7wV+Relcmn9sYB23UHqvdCullH4RpffA7weeBL4P7N+H9byT0mHUR4EV5a/3NrimicAvyzWtBP6pPP0I4BfAfwC3AW9s0D48Gbin0TWV1/1I+WvVjp/rRu678vrbgI7y/vsmsF+jaxooX83Sx+qwXU3Xh/pgm5uij9Rx+wZFHwA+DTxe/lt1E/DGWu1PL+UiSZJUULO8hSdJktRvGKAkSZIKMkBJkiQVZICSJEkqyAAlSZJUkAFKNRER74uIjIiGjkguSdWwh6koA5RqZQ7w4/J3Sepv7GEqxAClXitfH+udlAYdnV2etldE/N+IeDwi7ouIb0XEzPJjx0XEj8oX2P3ujksHSFIj2MNUDQOUauEs4DuZ+Stgc0QcB/x3oBUYD8yldK28HdfTugaYmZnHAV8D/rkRRUtSmT1MhQ3teRapR3MoXXATShfgnEPpZ+u2zHwN+F1EPFB+/ChgAnBf6dJEDKF06RxJahR7mAozQKlXImJ/SlcsPyYiklIzSeDO7p4CrMrME/uoREnqlj1M1fItPPXWTOCmzDw8M1sz8zBgLfA88P7yeQQHUrowJ8ATQEtE7DwcHhFHN6JwScIepioZoNRbc9j9P7U7gL8ANgCrga8Dy4EtmflnSg3rqoh4hNJV3d/Rd+VK0uvYw1SVyMxG16ABKiJGZOZLETEa+AUwJTN/1+i6JKkS9jDtiedAqZ7uiYhRwBuAz9h4JPUz9jB1yyNQkiRJBXkOlCRJUkEGKEmSpIIMUJIkSQUZoCRJkgoyQEmSJBX0/wG9AFxclRQbfwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x288 with 2 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "GFJX-UZD1jNe",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 296
},
"outputId": "971bda42-7cc3-424c-f4a4-02a4e08b83aa"
},
"source": [
"sns.barplot(x='Pclass', y='Survived', data=train_df)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f16f09c1898>"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAASwUlEQVR4nO3dcZBdZ33e8e9jOSrBOKFgdeSxZKyAKHWoJ5SNmKk7hBDcimRGyhRI5bpJPENRmUFAmwFh2kYFUdqJSMkkVGlQGk8IEzAG2mbTqlEpdoC42GgFxkZSTBUZkFQ2rG0MNqGRZf/6xx7Ry+pq98res1er9/uZubP3vOe9Z39Xd0bPnvfc876pKiRJ7bpo3AVIksbLIJCkxhkEktQ4g0CSGmcQSFLjLh53Aefqsssuq6uuumrcZUjSsnLgwIEHqmrVsH3LLgiuuuoqpqamxl2GJC0rSb56tn0ODUlS4wwCSWqcQSBJjTMIJKlxvQZBko1J7ktyJMlNQ/b/WpK7u8eXkzzcZz2SpDP19q2hJCuA3cB1wHFgf5LJqjp0uk9V/bOB/m8EXtRXPZKk4fo8I9gAHKmqo1V1ErgF2DxP/+uBD/dYjyRpiD6D4Arg2MD28a7tDEmeA6wDbjvL/q1JppJMzczMLHqhktSy8+WGsi3Ax6rq8WE7q2oPsAdgYmLigl1AYfv27UxPT7N69Wp27do17nIkNaLPIDgBrB3YXtO1DbMFeEOPtSwL09PTnDhxtn8iSepHn0ND+4H1SdYlWcnsf/aTczsleQHwV4HP9liLJOkseguCqjoFbAP2AYeBW6vqYJKdSTYNdN0C3FKumSlJY9HrNYKq2gvsndO2Y872O/qsQZI0P+8slqTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXHny5rFvXjxW39v3CWck0sfeIQVwNceeGRZ1X7gPb8w7hIkPQWeEUhS4wwCSWqcQSBJjTMIJKlxBoEkNa7XIEiyMcl9SY4kueksfX4uyaEkB5N8qM96JEln6u3ro0lWALuB64DjwP4kk1V1aKDPeuDtwLVV9c0kf62veiRJw/V5RrABOFJVR6vqJHALsHlOn9cBu6vqmwBV9Y0e65EkDdFnEFwBHBvYPt61DXo+8PwkdyS5M8nGYQdKsjXJVJKpmZmZnsqVpDaN+2LxxcB64GXA9cBvJ3nm3E5VtaeqJqpqYtWqVUtcoiRd2PoMghPA2oHtNV3boOPAZFU9VlX3A19mNhgkSUukzyDYD6xPsi7JSmALMDmnz39h9myAJJcxO1R0tMeaJElz9BYEVXUK2AbsAw4Dt1bVwSQ7k2zquu0DHkxyCLgdeGtVPdhXTZKkM/U6+2hV7QX2zmnbMfC8gF/qHpKkMRj3xWJJ0pgZBJLUOINAkhpnEEhS4y7opSqXmydWXvJ9PyVpKRgE55HvrP+74y5BUoMcGpKkxhkEktQ4g0CSGmcQSFLjvFgsLYLt27czPT3N6tWr2bVr17jLkc6JQSAtgunpaU6cmDvLurQ8ODQkSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmN6zUIkmxMcl+SI0luGrL/xiQzSe7uHv+4z3okSWfqbYqJJCuA3cB1wHFgf5LJqjo0p+tHqmpbX3VIkubX5xnBBuBIVR2tqpPALcDmHn+fJOlJ6DMIrgCODWwf79rmelWSe5J8LMnaYQdKsjXJVJKpmZmZPmqVpGaN+2LxHwJXVdU1wCeADwzrVFV7qmqiqiZWrVq1pAVK0oWuzyA4AQz+hb+ma/ueqnqwqv6y2/yPwIt7rEeSNESfQbAfWJ9kXZKVwBZgcrBDkssHNjcBh3usR5I0RG/fGqqqU0m2AfuAFcDNVXUwyU5gqqomgTcl2QScAh4CbuyrHknScL2uUFZVe4G9c9p2DDx/O/D2PmuQJM1v3BeLJUljZhBIUuNcvF7nra/t/JvjLmFkpx56FnAxpx766rKq+8od9467BJ0HPCOQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklq3LyzjyZ5BKiz7a+qH1r0iiRJS2reIKiqSwGSvAv4OvBBIMANwOXzvFSStEyMOjS0qap+s6oeqapvV9V/ADb3WZgkaWmMGgTfSXJDkhVJLkpyA/CdPguTJC2NUYPgHwI/B/x593hN1zavJBuT3JfkSJKb5un3qiSVZGLEeiRJi2SkpSqr6iuc41BQkhXAbuA64DiwP8lkVR2a0+9S4M3AXedyfEnS4hjpjCDJ85N8MsmXuu1rkvzLBV62AThSVUer6iRwC8PD5F3ArwD/9xzqliQtklGHhn4beDvwGEBV3QNsWeA1VwDHBraPd23fk+RvAWur6r+NWIckaZGNNDQEPL2qPpdksO3UU/nFSS4C3gvcOELfrcBWgCuvvPKp/FqpF5c97QngVPdTWl5GDYIHkjyX7uayJK9m9r6C+ZwA1g5sr+naTrsUeCHwx13ArAYmk2yqqqnBA1XVHmAPwMTExFlvcJPG5S3XPDzuEqQnbdQgeAOz/xG/IMkJ4H5mbyqbz35gfZJ1zAbAFga+aVRV3wIuO72d5I+Bt8wNAUlSv0YNgq9W1SuSXAJcVFWPLPSCqjqVZBuwD1gB3FxVB5PsBKaqavLJly1JWiyjBsH9Sf4I+Ahw26gHr6q9wN45bTvO0vdlox5XkrR4Rv3W0AuA/8nsENH9Sf59kr/TX1mSpKUyUhBU1V9U1a1V9feBFwE/BHyq18okSUti5PUIkvxEkt8EDgBPY3bKCUnSMjfSNYIkXwG+ANwKvLWqnHBOki4Qo14svqaqvt1rJZKksVhohbLtVbULeHeSM27kqqo39VaZJGlJLHRGcLj76U1eknSBWmipyj/snt5bVZ9fgnokSUts1G8N/bskh5O8K8kLe61IkrSkRr2P4CeBnwRmgPcnuXeE9QgkScvAyPcRVNV0Vf0G8HrgbmDoVBGSpOVl1BXK/kaSdyS5F3gf8L+YnVZakrTMjXofwc3MLjX596rq//RYjyRpiS0YBN0i9PdX1a8vQT2SpCW24NBQVT0OrE2ycgnqkSQtsZHXIwDuSDIJfG+eoap6by9VSZKWzKhB8Gfd4yJm1xqWJF0gRgqCqnpn34VIksZj1GmobweGTTr38kWvSJK0pEYdGnrLwPOnAa8CTi1+OZKkpTbq0NCBOU13JPlcD/VIkpbYqHcWP2vgcVmSjcAPj/C6jUnuS3IkyU1D9r++m7fo7iR/kuTqJ/EeJElPwahDQwf4/9cITgFfAV473wu6G9F2A9cBx4H9SSar6tBAtw9V1W91/TcB7wU2jly9JOkpm/eMIMmPJ1ldVeuq6keAdwJ/2j0OzfdaYANwpKqOVtVJZqeo2DzYYc7yl5cw5IK0JKlfCw0NvR84CZDkpcC/BT4AfAvYs8BrrwCODWwf79q+T5I3JPkzYBcwdOnLJFuTTCWZmpmZWeDXSpLOxUJBsKKqHuqe/wNgT1V9vKp+GXjeYhRQVbur6rnA24ChaxxU1Z6qmqiqiVWrVi3Gr5UkdRYMgiSnryP8FHDbwL6Fri+cANYObK/p2s7mFuBnFzimJGmRLRQEHwY+leQPgO8CnwFI8jxmh4fmsx9Yn2RdN2HdFmBysEOS9QObPwP873OoXZK0CBZavP7dST4JXA78j6o6fTH3IuCNC7z2VJJtwD5gBXBzVR1MshOYqqpJYFuSVwCPAd8EfvGpvR1J0rla8OujVXXnkLYvj3LwqtoL7J3TtmPg+ZtHOY4k9Wn79u1MT0+zevVqdu3aNe5yltyo9xFI0gVrenqaEyfmu4R5YRt58XpJ0oXJIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOOcakrTorn3fteMu4ZysfHglF3ERxx4+tqxqv+ONdyzKcTwjkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDWu1yBIsjHJfUmOJLlpyP5fSnIoyT1JPpnkOX3WI0k6U29BkGQFsBt4JXA1cH2Sq+d0+wIwUVXXAB8DdvVVjyRpuD7PCDYAR6rqaFWdBG4BNg92qKrbq+ovus07gTU91iNJQ9XTiycueYJ6eo27lLHoc66hK4BjA9vHgZfM0/+1wH8ftiPJVmArwJVXXrlY9UkSAI9d+9i4Sxir8+JicZJ/BEwA7xm2v6r2VNVEVU2sWrVqaYuTpAtcn2cEJ4C1A9trurbvk+QVwL8AfqKq/rLHeiRJQ/R5RrAfWJ9kXZKVwBZgcrBDkhcB7wc2VdU3eqxFknQWvQVBVZ0CtgH7gMPArVV1MMnOJJu6bu8BngF8NMndSSbPcjhJUk96XZimqvYCe+e07Rh4/oo+f78kaWHnxcViSdL4GASS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWpcr0GQZGOS+5IcSXLTkP0vTfL5JKeSvLrPWiRJw/UWBElWALuBVwJXA9cnuXpOt68BNwIf6qsOSdL8Lu7x2BuAI1V1FCDJLcBm4NDpDlX1lW7fEz3WIUmaR59DQ1cAxwa2j3dt5yzJ1iRTSaZmZmYWpThJ0qxlcbG4qvZU1URVTaxatWrc5UjSBaXPIDgBrB3YXtO1SZLOI30GwX5gfZJ1SVYCW4DJHn+fJOlJ6C0IquoUsA3YBxwGbq2qg0l2JtkEkOTHkxwHXgO8P8nBvuqRJA3X57eGqKq9wN45bTsGnu9ndshIkjQmy+JisSSpPwaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIa12sQJNmY5L4kR5LcNGT/X0nykW7/XUmu6rMeSdKZeguCJCuA3cArgauB65NcPafba4FvVtXzgF8DfqWveiRJw/V5RrABOFJVR6vqJHALsHlOn83AB7rnHwN+Kkl6rEmSNMfFPR77CuDYwPZx4CVn61NVp5J8C3g28MBgpyRbga3d5qNJ7uul4vPDZcx5/+e7/OovjruE88Wy++z4V/7dNWDZfX550zl9fs85244+g2DRVNUeYM+461gKSaaqamLcdejc+dktby1/fn0ODZ0A1g5sr+nahvZJcjHww8CDPdYkSZqjzyDYD6xPsi7JSmALMDmnzyRwelzh1cBtVVU91iRJmqO3oaFuzH8bsA9YAdxcVQeT7ASmqmoS+B3gg0mOAA8xGxata2II7ALlZ7e8Nfv5xT/AJalt3lksSY0zCCSpcQbBeSLJzUm+keRL465F5ybJ2iS3JzmU5GCSN4+7Jo0uydOSfC7JF7vP753jrmmpeY3gPJHkpcCjwO9V1QvHXY9Gl+Ry4PKq+nySS4EDwM9W1aExl6YRdLMZXFJVjyb5AeBPgDdX1Z1jLm3JeEZwnqiqTzP7zSktM1X19ar6fPf8EeAws3fNaxmoWY92mz/QPZr6C9kgkBZRN4Pui4C7xluJzkWSFUnuBr4BfKKqmvr8DAJpkSR5BvBx4J9W1bfHXY9GV1WPV9WPMTsDwoYkTQ3PGgTSIujGlj8O/H5V/adx16Mnp6oeBm4HNo67lqVkEEhPUXex8XeAw1X13nHXo3OTZFWSZ3bPfxC4DvjT8Va1tAyC80SSDwOfBf56kuNJXjvumjSya4GfB16e5O7u8dPjLkojuxy4Pck9zM6R9omq+q9jrmlJ+fVRSWqcZwSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCKQ5kjzefQX0S0k+muTp8/R9R5K3LGV90mIzCKQzfbeqfqybBfYk8PpxFyT1ySCQ5vcZ4HkASX4hyT3dvPUfnNsxyeuS7O/2f/z0mUSS13RnF19M8umu7Ue7OfDv7o65fknflTTAG8qkOZI8WlXPSHIxs/MH/RHwaeA/A3+7qh5I8qyqeijJO4BHq+pXkzy7qh7sjvGvgT+vqvcluRfYWFUnkjyzqh5O8j7gzqr6/SQrgRVV9d2xvGE1zzMC6Uw/2E1JPAV8jdl5hF4OfLSqHgCoqmFrR7wwyWe6//hvAH60a78D+N0krwNWdG2fBf55krcBzzEENE4Xj7sA6Tz03W5K4u+ZnVduQb/L7MpkX0xyI/AygKp6fZKXAD8DHEjy4qr6UJK7ura9Sf5JVd22iO9BGplnBNJobgNek+TZAEmeNaTPpcDXuympbzjdmOS5VXVXVe0AZoC1SX4EOFpVvwH8AXBN7+9AOgvPCKQRVNXBJO8GPpXkceALwI1zuv0ysyuTzXQ/L+3a39NdDA7wSeCLwNuAn0/yGDAN/Jve34R0Fl4slqTGOTQkSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLj/h/Iug2c5RJ8IgAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "nKYfbR-N1vsd",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 68
},
"outputId": "d12854c4-0c03-4202-b914-e1cee493808b"
},
"source": [
"#Combining SibSp and Parch to show if someone is alone or not\n",
"data = [train_df, test_df]\n",
"for dataset in data:\n",
" dataset['relatives'] = dataset['SibSp'] + dataset['Parch']\n",
" dataset.loc[dataset['relatives'] > 0, 'not_alone'] = 0\n",
" dataset.loc[dataset['relatives'] == 0, 'not_alone'] = 1\n",
" dataset['not_alone'] = dataset['not_alone'].astype(int)\n",
"train_df['not_alone'].value_counts()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1 537\n",
"0 354\n",
"Name: not_alone, dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "GiygRG_X2GXR",
"colab_type": "code",
"colab": {}
},
"source": [
"train_df = train_df.drop(['PassengerId','Cabin'], axis=1)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "L_iMmf8W2wQp",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "09b49d16-2b6c-435d-ec80-4752299fbcf6"
},
"source": [
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" mean = train_df[\"Age\"].mean()\n",
" std = test_df[\"Age\"].std()\n",
" is_null = dataset[\"Age\"].isnull().sum()\n",
" # compute random numbers between the mean, std and is_null\n",
" rand_age = np.random.randint(mean - std, mean + std, size = is_null)\n",
" # fill NaN values in Age column with random values generated\n",
" age_slice = dataset[\"Age\"].copy()\n",
" age_slice[np.isnan(age_slice)] = rand_age\n",
" dataset[\"Age\"] = age_slice\n",
" dataset[\"Age\"] = train_df[\"Age\"].astype(int)\n",
"train_df[\"Age\"].isnull().sum()\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "9oiazbi23F3i",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 102
},
"outputId": "28f2957b-4b1d-495f-d737-f15801b24fe7"
},
"source": [
"train_df['Embarked'].describe()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"count 889\n",
"unique 3\n",
"top S\n",
"freq 644\n",
"Name: Embarked, dtype: object"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "91r6YkKR3NJU",
"colab_type": "code",
"colab": {}
},
"source": [
"common_value = 'S'\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Embarked'] = dataset['Embarked'].fillna(common_value)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2feOofoi3P3K",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 340
},
"outputId": "0985b3d9-3963-4ed7-e94d-6d17ca115845"
},
"source": [
"train_df.info()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Survived 891 non-null int64 \n",
" 1 Pclass 891 non-null int64 \n",
" 2 Name 891 non-null object \n",
" 3 Sex 891 non-null object \n",
" 4 Age 891 non-null int64 \n",
" 5 SibSp 891 non-null int64 \n",
" 6 Parch 891 non-null int64 \n",
" 7 Ticket 891 non-null object \n",
" 8 Fare 891 non-null float64\n",
" 9 Embarked 891 non-null object \n",
" 10 relatives 891 non-null int64 \n",
" 11 not_alone 891 non-null int64 \n",
"dtypes: float64(1), int64(7), object(4)\n",
"memory usage: 83.7+ KB\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "aWEXlXZc3UQf",
"colab_type": "code",
"colab": {}
},
"source": [
"train_df = train_df.drop(['Ticket','Name'], axis=1)\n",
"test_df = test_df.drop(['Ticket','Name'], axis=1)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "QZhOp69G38gC",
"colab_type": "code",
"colab": {}
},
"source": [
"genders = {\"male\": 0, \"female\": 1}\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Sex'] = dataset['Sex'].map(genders)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "USyX_tSg4C9N",
"colab_type": "code",
"colab": {}
},
"source": [
"ports = {\"S\": 0, \"C\": 1, \"Q\": 2}\n",
"data = [train_df, test_df]\n",
"\n",
"for dataset in data:\n",
" dataset['Embarked'] = dataset['Embarked'].map(ports)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "ApXleNFj4Df1",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 153
},
"outputId": "549cc6f8-d6ff-4adf-dc5c-99b730d07a38"
},
"source": [
"data = [train_df, test_df]\n",
"for dataset in data:\n",
" dataset['Age'] = dataset['Age'].astype(int)\n",
" dataset.loc[ dataset['Age'] <= 11, 'Age'] = 0\n",
" dataset.loc[(dataset['Age'] > 11) & (dataset['Age'] <= 18), 'Age'] = 1\n",
" dataset.loc[(dataset['Age'] > 18) & (dataset['Age'] <= 22), 'Age'] = 2\n",
" dataset.loc[(dataset['Age'] > 22) & (dataset['Age'] <= 27), 'Age'] = 3\n",
" dataset.loc[(dataset['Age'] > 27) & (dataset['Age'] <= 33), 'Age'] = 4\n",
" dataset.loc[(dataset['Age'] > 33) & (dataset['Age'] <= 40), 'Age'] = 5\n",
" dataset.loc[(dataset['Age'] > 40) & (dataset['Age'] <= 66), 'Age'] = 6\n",
" dataset.loc[ dataset['Age'] > 66, 'Age'] = 6\n",
"\n",
"# let's see how it's distributed \n",
"train_df['Age'].value_counts()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"6 161\n",
"4 161\n",
"5 158\n",
"3 139\n",
"2 117\n",
"1 87\n",
"0 68\n",
"Name: Age, dtype: int64"
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "H8sr8fSz4rbt",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 359
},
"outputId": "344de772-830b-41be-d3c2-1f81658bd537"
},
"source": [
"train_df.head(10)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Fare</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.4583</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>51.8625</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>21.0750</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>11.1333</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>30.0708</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass Sex Age ... Fare Embarked relatives not_alone\n",
"0 0 3 0 2 ... 7.2500 0 1 0\n",
"1 1 1 1 5 ... 71.2833 1 1 0\n",
"2 1 3 1 3 ... 7.9250 0 0 1\n",
"3 1 1 1 5 ... 53.1000 0 1 0\n",
"4 0 3 0 5 ... 8.0500 0 0 1\n",
"5 0 3 0 3 ... 8.4583 2 0 1\n",
"6 0 1 0 6 ... 51.8625 0 0 1\n",
"7 0 3 0 0 ... 21.0750 0 4 0\n",
"8 1 3 1 3 ... 11.1333 0 2 0\n",
"9 1 2 1 1 ... 30.0708 1 1 0\n",
"\n",
"[10 rows x 10 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 21
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "jSkVyUP34t77",
"colab_type": "code",
"colab": {}
},
"source": [
"X_train = train_df.drop([\"Survived\",\"Fare\"], axis=1)\n",
"Y_train = train_df[\"Survived\"]\n",
"X_test = test_df.drop([\"PassengerId\",\"Cabin\",\"Fare\"], axis=1).copy()"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "WzFHuTHj5u2g",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "50652674-359d-409c-8b58-d5ed34934996"
},
"source": [
"X_train.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Sex Age SibSp Parch Embarked relatives not_alone\n",
"0 3 0 2 1 0 0 1 0\n",
"1 1 1 5 1 0 1 1 0\n",
"2 3 1 3 0 0 0 0 1\n",
"3 1 1 5 1 0 0 1 0\n",
"4 3 0 5 0 0 0 0 1"
]
},
"metadata": {
"tags": []
},
"execution_count": 23
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "9pOckEeq50PC",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "7a60830f-0307-402b-f381-1e02685165b2"
},
"source": [
"X_test.head()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Pclass</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>SibSp</th>\n",
" <th>Parch</th>\n",
" <th>Embarked</th>\n",
" <th>relatives</th>\n",
" <th>not_alone</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Pclass Sex Age SibSp Parch Embarked relatives not_alone\n",
"0 3 0 2 0 0 2 0 1\n",
"1 3 1 5 1 0 0 1 0\n",
"2 2 0 3 0 0 2 0 1\n",
"3 3 0 5 0 0 0 0 1\n",
"4 3 1 5 1 1 0 2 0"
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xfCIAPAh5Opm",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "2b109c00-914a-4fd3-e784-ab0793fad077"
},
"source": [
"logreg = LogisticRegression()\n",
"logreg.fit(X_train, Y_train)\n",
"\n",
"Y_pred = logreg.predict(X_test)\n",
"\n",
"acc_log = round(logreg.score(X_train, Y_train) * 100, 2)\n",
"print(\"Accuracy via logistic regression: \",acc_log)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Accuracy via logistic regression: 80.13\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "dEt0at6c6Vbc",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 85
},
"outputId": "02b4db5c-cec2-40d3-ef27-e135f3d940ac"
},
"source": [
"from sklearn.model_selection import cross_val_score\n",
"\n",
"scores = cross_val_score(logreg, X_train, Y_train, cv=10, scoring = \"accuracy\")\n",
"print(\"Scores:\", scores)\n",
"print(\"Mean:\", scores.mean())\n",
"print(\"Standard Deviation:\", scores.std())"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Scores: [0.74444444 0.78651685 0.76404494 0.82022472 0.80898876 0.78651685\n",
" 0.80898876 0.80898876 0.82022472 0.84269663]\n",
"Mean: 0.7991635455680399\n",
"Standard Deviation: 0.027602996254681635\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Zv_GhUWjAqvr",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment