Skip to content

Instantly share code, notes, and snippets.

@albertovilla
Created January 12, 2019 19:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save albertovilla/d4f2593ad3e6f5372aed002e3c7e242a to your computer and use it in GitHub Desktop.
Save albertovilla/d4f2593ad3e6f5372aed002e3c7e242a to your computer and use it in GitHub Desktop.
intro-neural-networks/student-admissions/StudentAdmissions.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Predicting Student Admissions with Neural Networks\nIn this notebook, we predict student admissions to graduate school at UCLA based on three pieces of data:\n- GRE Scores (Test)\n- GPA Scores (Grades)\n- Class rank (1-4)\n\nThe dataset originally came from here: http://www.ats.ucla.edu/\n\n## Loading the data\nTo load the data and format it nicely, we will use two very useful packages called Pandas and Numpy. You can read on the documentation here:\n- https://pandas.pydata.org/pandas-docs/stable/\n- https://docs.scipy.org/"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Importing pandas and numpy\nimport pandas as pd\nimport numpy as np\n\n# Reading the csv file into a pandas DataFrame\ndata = pd.read_csv('student_data.csv')\n\n# Printing out the first 10 rows of our data\ndata[:10]",
"execution_count": 1,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 1,
"data": {
"text/plain": " admit gre gpa rank\n0 0 380 3.61 3\n1 1 660 3.67 3\n2 1 800 4.00 1\n3 1 640 3.19 4\n4 0 520 2.93 4\n5 1 760 3.00 2\n6 1 560 2.98 1\n7 0 400 3.08 2\n8 1 540 3.39 3\n9 0 700 3.92 2",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>admit</th>\n <th>gre</th>\n <th>gpa</th>\n <th>rank</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>380</td>\n <td>3.61</td>\n <td>3</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>660</td>\n <td>3.67</td>\n <td>3</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>800</td>\n <td>4.00</td>\n <td>1</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>640</td>\n <td>3.19</td>\n <td>4</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>520</td>\n <td>2.93</td>\n <td>4</td>\n </tr>\n <tr>\n <th>5</th>\n <td>1</td>\n <td>760</td>\n <td>3.00</td>\n <td>2</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1</td>\n <td>560</td>\n <td>2.98</td>\n <td>1</td>\n </tr>\n <tr>\n <th>7</th>\n <td>0</td>\n <td>400</td>\n <td>3.08</td>\n <td>2</td>\n </tr>\n <tr>\n <th>8</th>\n <td>1</td>\n <td>540</td>\n <td>3.39</td>\n <td>3</td>\n </tr>\n <tr>\n <th>9</th>\n <td>0</td>\n <td>700</td>\n <td>3.92</td>\n <td>2</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Plotting the data\n\nFirst let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Importing matplotlib\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\n# Function to help us plot\ndef plot_points(data):\n X = np.array(data[[\"gre\",\"gpa\"]])\n y = np.array(data[\"admit\"])\n admitted = X[np.argwhere(y==1)]\n rejected = X[np.argwhere(y==0)]\n plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')\n plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')\n plt.xlabel('Test (GRE)')\n plt.ylabel('Grades (GPA)')\n \n# Plotting the points\nplot_points(data)\nplt.show()",
"execution_count": 2,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 4 plots, each one for each rank."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Separating the ranks\ndata_rank1 = data[data[\"rank\"]==1]\ndata_rank2 = data[data[\"rank\"]==2]\ndata_rank3 = data[data[\"rank\"]==3]\ndata_rank4 = data[data[\"rank\"]==4]\n\n# Plotting the graphs\nplot_points(data_rank1)\nplt.title(\"Rank 1\")\nplt.show()\nplot_points(data_rank2)\nplt.title(\"Rank 2\")\nplt.show()\nplot_points(data_rank3)\nplt.title(\"Rank 3\")\nplt.show()\nplot_points(data_rank4)\nplt.title(\"Rank 4\")\nplt.show()",
"execution_count": 3,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
},
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEWCAYAAAB1xKBvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3X+cXHV97/HXx2VDZBYhkiiYEAMFfyLshDSDcK9VQrPsbYLmUUS21dJaL9VLW38XoiErG9tgb6u26q2mWq1Wo4IGs7HrLsoPizQbN+zyGxQVJQEk/BJ2A2x2/dw/zndhGPbHmd05M+fMvJ+Pxzx2zjnfmfmc3Z35zPn+NHdHRERkJs+rdQAiIpINShgiIhKLEoaIiMSihCEiIrEoYYiISCxKGCIiEosShkiVmdlHzOw/ah2HSLmUMEQmYWZ3m9kTZjZsZveb2ZfMrKWG8cwzszvMbE+tYhBRwhCZ2lp3bwFagTywvoaxfBB4oIavL6KEITITd78f6CVKHACY2R+Y2aCZPWZm95jZR4qOLTMzN7PzzOxXZvagmX14suc2s2Yz22pm3zKzeVOUOQZ4K7C5oicmUiYlDJEZmNkSoB24q2j3CPAnwOHAHwDvMrM3lTz0fwAvB1YBG83slSXP+3zgCuAp4Bx3H50ihE8BHwKemOOpiMyJEobI1K4ws8eBe4iqgzonDrj7Ne5+s7v/1t1vArYCv1fy+Evc/Ql3vxG4ETip6NgLgO8BPwP+zN3HJwvAzNYBB7n7toqdlcgsKWGITO1N7n4o8HrgFcDCiQNmVjCzq81sn5n9Bnhn8fHg/qL7+4HiRvNTgBOBS32KGUDNLAf8PfBXcz0RkUpQwhCZgbtfC3wJ+Iei3V8DtgNHu/thwGcBK+Np+4jaJH5gZi+eoszxwDLgv8zsfuDbwFGh19ayMl5LpCKUMETi+STw+2Y20fB9KPCwuz9pZiuBPyr3Cd3974kSzw/MrPTqBOAW4GiixvZW4B3Ar8P9e8o/BZG5UcIQicHd9wFfBi4Ou/4P0BXaODYC35zl824iavj+vpm9sOTYmLvfP3EDHgZ+G7YnbfMQSZJpASUREYlDVxgiIhKLEoaIiMSihCEiIrEoYYiISCwH1TqASlq4cKEvW7as1mGIiGTG7t27H3T3RXHK1lXCWLZsGQMDA7UOQ0QkM8zsl3HLqkpKRERiUcIQEZFYlDBERCQWJQwREYlFCUNERGJRwhARkVgSTxhm1hTWPt4xybGDzewbZnaXmfUXz/FvZuvD/jvNrC3pOEVEZHrVGIfxbuB2oiUpS/058Ii7H2dm5wIfA95iZq8CzgVeDbyEaOrnl2lKZxGRZ4yPj9PT08Pg4CD5fJ729naampoSe71EE4aZLQH+APhb4H2TFHkj8JFw/3Lg02ZmYf/X3f0p4BdmdhewEvjvJOMVEcmK8fFx1rW1sbe/n9UjI3TmcmwpFNjW25tY0ki6SuqTwN8Av53i+GLCymHuPgb8BjiieH+wJ+wTERGgp6eHvf397BweZrM7O4eH2dPfT09PT2KvmVjCMLM1wAPuvnu6YpPs82n2T/Y655vZgJkN7Nu3bxaRiohkz+DgIKtHRmgO281A28gIQ0NDib1mklcYpwFnmdndwNeB083sP0rK7CFasxgzOwg4jGgZyqf3B0uAeyd7EXff4u4r3H3FokWx5s8SEcm8fD5PXy7HgbB9AOjN5WhtbZ3uYXOSWMJw9/XuvsTdlxE1YF/l7m8tKbYdOC/cPzuU8bD/3NCL6hjgeGBXUrFK/RsfH2fHjh1s2rSJHTt2MD6u/hOSbe3t7SwuFCi0tLDejEJLC0sKBdrb2xN7zarPVmtmXcCAu28HvgB8JTRqP0yUWHD3W83sm8BtwBhwgXpIyWzVonEwi6rd40bmpqmpiW29vfT09DA0NERXa2vifzOLvtDXhxUrVrimN5dSO3bsoLOjg53DwzQTXboXWlro2rqVNWvW1Dq8VChNqn25HIuVVBuCme129xVxymqkt9S9WjQOZk0tetxI9ihhyIyyXv9fi8bBrFFSlTiUMGRaE1UVnR0d7O/spLOjg3VtbZlKGrVoHMwaJVWJQwlDplUPVRUTjYNdW7eS6+qia+tW1c2XUFKVOOpqTW+pvOmqKrLUYNzU1MSaNWsyFXM11aLHjWSPEoZMK5/P05nL0VXUw6g3l6MrY1UV6jI6MyVVmYkShkyrvb2dLYUChf5+2kZG6M3lMldVoXEYkhVp/2KjhCHTqoeqiuJ2mGaga3iYQmiH0bdpSYssfLFRo7fMaKKqYsOGDaxZsyY1/7xxqcuoZEEWOpgoYUjdU5dRyYIsfLFRwpC6py6jkgVZ+GKjuaSkIUw0Jg4NDdGawXYYqX8TbRh7SjqYJN2GUc5cUkoYIiIpUYsvNkoYIiISi2arFRGRilPCEBGRWDRwT6QC0j5CV6QSlDBE5qgaI3SVkCQNlDBE5ijpqUeyMGWENIbE2jDMbL6Z7TKzG83sVjO7ZJIynzCzoXD7iZk9WnRsvOjY9qTiFJmrpEfoZmHKCGkMSTZ6PwWc7u4nAa3AmWZ2SnEBd3+vu7e6eyvwKeDbRYefmDjm7mclGKfInCQ9QjcLU0ZIY0gsYXhkOGw2h9t0gz46gK1JxSNSjnLWMU966pEsTBnRKLK+vv1cJTpwz8yagN3AccBn3P3CKcq9FNgJLHH38bBvDBgCxoBL3f2KmV5PA/ekEsbHx2lbt47+vXsZWb2aXF8fhcWL6d22bco2gyRH6NZqygh5ttK2pL5cjsV18HcoZ+Ae7p74DTgcuBo4YYrjFwKfKtn3kvDzWOBu4HemeOz5wAAwsHTpUheZq+7ubm9ZvtwZHY3+yUZHvSWf9+7u7prFNDY25t3d3b5p0ybv7u72sbGxmsXSqLq7u315S4uPgjv4KHi+paWm/xeVAAx4zM/yqgzcc/dHgWuAM6coci4l1VHufm/4+fPw2PwUz73F3Ve4+4pFixZVKmRJsaSrBQYHBxlZvRqaQ6tBczMjbW01bTPI+pok9UBtScn2klpkZoeH+88HzgDumKTcy4EFwH8X7VtgZgeH+wuB04DbkopVsmOiuqijs5PO/fvp6Oykbd26iiaNfD5Prq8PDoRWgwMHyPX2qs1gjrJe/6+2pGR7SR0FXG1mNwE/Bq509x1m1mVmxb2eOoCvh0ujCa8EBszsRqKqrEvdXQlD6OnpoX/vXoZ37sQ3b2Z450769+ypaBfT9vZ2CosX01IoYOvX01IoUFiyROtnzEE1En3StK5KggP33P0mJqlGcveNJdsfmaTM9cBrkopNsmu66qJKrc/d1NRE77ZtzzRid3VpZPUcFSd6mpsZ7uqiv1DI1Lrq9bC+/Vxp8kHJlGpVF6nNoLLS2C40G43+f6GEIZmi6qJsUrtQfdACSpI5Wm41e54e27JnDyNtbeR6eyksWTLt2BapDq24JzJHmh228pTo00kJQ2QOZjPSWySrtESryBxUo+uuyGTSPlZFCUOkRL306JHKS/IDPQtjVZQwREqoR49MZmLywc6ODvZ3dtLZ0cG6traKfaBn4cpWCUOkhLruymSSXsgqC1e2ShgiJSZGem/t6qIrl2NrV5cavCXxyQezcGWrNb0zTt0/kzExojcr01ZI8vL5PJ25HF1h7faJyQe7KvSB3t7eTmHLFvoLhWeNVUnTla261WZYvS7oIpJG1VjIqhZjVTQOo0Hs2LGDzo4OdhZ94ym0tNC1dau+GYskoB4HH5aTMFQllWHT1akqYYhUXqNXVarRO8O0oEt6pH3AlUglKGFkmBZ0SYcsDLgSqQRVSWWYFnSJL8neZPWwOJBIHEoYGdfodapxlPYm68zl2FLB3i3VWAUwjdLWpbsa8aTtnKvO3evmdvLJJ7tIqe7ubl/e0uKj4A4+Cp5vafHu7u6KPX/L8uXO6Gj0jzg66i35fMWeP43GxsZ87apVvrylxS8y8+UtLb521SofGxurWTyr1q71luXL3S66yFuWL/dVa9dWNJ5qvEYtAAMe8zM2sTYMM5tvZrvM7EYzu9XMLpmkzJ+a2T4zGwq3dxQdO8/Mfhpu5yUVp9S/pEfoNuJUIklPkzGbeJKehykLcz0lLclG76eA0939JKAVONPMTpmk3DfcvTXcPg9gZi8EOoECsBLoNLMFCcYqdSzp3mSNOJVI0kl4NvEkPQ9TFuZ6SlpiCSNc7QyHzeZwiztKsA240t0fdvdHgCuBMxMIUxpANXqTTbQlbdiwgTVr1syYLLLeDTdtXbqrMQ9TFuZ6SlqiI73NrAnYDRwHfMbdLyw5/qfAZmAf8BPgve5+j5l9AJjv7h8N5S4GnnD3f5ju9RptpLfEl6YRuvUwpUs1pskoN56k1wyv13XJUzc1iJkdDmwD/srdbynafwQw7O5Pmdk7gXPc/XQz+yBwcEnC2O/u/zjJc58PnA+wdOnSk3/5y18mfj4ic1EvU7qkKQlXK560nXMlpC5hAJhZJzAy1VVCuBp52N0PM7MO4PXu/hfh2OeAa9x963SvoSsMqZVyultu2rSJ/Z2dbC567603I9fVxYYNG+b8/CLlSMVcUma2CDjg7o+a2fOBM4CPlZQ5yt3vC5tnAbeH+73A3xU1dK8G1icVq8hclDvOo9xpspMeR1ItaUt6aYsnE+L2vy33BpwIDAI3AbcAG8P+LuCscH8zcCtwI3A18Iqix78duCvc/izOa2ochtRCueM8JsYw5MMYhvwMYxiSHkdSDWkct5GmeGqJNIzDcPeb3D3v7ie6+wnu3hX2b3T37eH+end/tbuf5O5vcPc7ih7/b+5+XLh9Mak4Reaq3C6mE1O6dG3dSq6ri66tW6e9WkhbF9bZSOO4jTTFkxWafFBkjmbTxbScbrhp68I6G2lLemmLJyuUMETmKOlxHvUwK3Hakl7a4skKrbgnM1Lj4MyS7m6Z9e6caRy3kaZ4aimV3WqrQQmj8uphkFk9qIeknbakl7Z4akUJQyomjYPM6uHDsxxK2pKkio/DMLMXAacBLwGeIOomO+Duv511lJIJaVs3PK1jEpJeoGmiR08z0DU8TCH06MnSyHDJvmkbvc3sDWbWC3wXaAeOAl4FbABuNrNLzOwFyYcptZK2xsE0doecSGKdHR3s7+yks6ODdW1tFZtQUD160iPrk0bO1Uy9pP4X8L/d/Xfd/Xx33+DuH3D3s4CTiAbm/X7iUUrNpK2HTho/PJNOYmlL2o0q6S8GWTBtwnD3D7r7r6Y4fIS7X+Hu30ogLkmJcgeZJS2NH57VWKApTUm7UaXx6rbayhqHYWaHmdnbzez7wA0JxSQpU+5aD0lK44dnNRZoSlPSblRpvLqtthkbvcPEgWcBfwQsBw4F3gT8MNnQRJ5r4sNzojtkVwq6Q7a3t7OlUKBQ0qc/iQWa1MhdO/l8no2HHEJhZISbgdcA3zvkEDY1UNXgtN1qzeyrwOuAPuDrwFXAXe5+THXCK08jdqtttC6maVVun3793bJndHSU4488kkMeeYSzgO3A/gUL+On99zNv3rxahzdrlexWewLwCNG043e4+7iZ1c/AjYxLaxfTRlTOFcD4+DhvWr2aX11/PWc++SQfnj+fz516Klf09envlmJ9fX0sPHCAnUTVUR8FCqOj9PX1NcyV30yN3icB5wAvAL5vZv8FHGpmR1YjOJmeGuGyaceOHfzs2msZePJJPgYMPPkkP7v2Wnbs2FGx10hj9880xlSOSdsw9u9vqDaMGRu93f2OMCX5y4H3Al8GdpnZ9YlHJ9NSI1w2XXbZZawdH3/W323N+DiXX355RZ4/jd0/0xhTudLYQ6/aYvWSMrOFAO4+4O7vB16KVsCrOf0DZ9d34Vl/t/+coXw5387TeOWZxpjKlcYeetU200jvtWa2j2hU9x4zOxUgLNR0bVUilCnpHzib3vzmN/NgUxMFom9dBeDBpibOPvvsScuX++08jVeeaYypXOrezPRLtBItr/qKcL8AXBt3Kb9a3BpxidaxsTHv7u72TZs2eXd3d0MuMZk1Y2Njvub00/3Y+fP9f4IfO3++rzn99Iot0ZrGJV3TGJNEKGOJ1pl6SY15WDbV3fvN7NAkk5eUT/3zs6epqYkr+vpid8MtdwLIaowLKVcaY5LyzTQOYw/w8aJd7yvedvePP+dBzzx2PtHgvoOJuu9e7u6dJWXeB7wDGAP2AW9391+GY+PAzaHorzyav2pajTgOQ+rfbKaYT+NaD2mMSSq4HoaZdU55EHD3S6Z5rAE5dx82s2bgOuDd7r6zqMwbgH53329m7wJe7+5vCceG3b0lzklMUMKQeqTV4SRJFRu4N11CmEmoGxsOm83h5iVlri7a3Am8dbavJ+mhUcwzK+d3lMbpUKQxzXSF8Wrgd9x9e9j+BHBYOPxpd592AkIzawJ2A8cBn3H3C6cp+2ngfnf/aNgeA4aIqqsudfcrZjoZXWHEk+QHelpXh0tTEkvr70gaUzlXGDP1kuoGTi3avg34Q+BtwBVxW9aBw4GrgROmOP5WoiuMg4v2vST8PBa4myhxTfbY84EBYGDp0qVz6i3QCMbGxnzV2rXesny520UXecvy5b5q7dqK9a5KY2+YsbExX7tqlS9vafGLzHx5S4uvXbWqZj3K0vg7ksZFGb2kZhq4d5S7F4/ofszdv+XuXwEWxspIUVJ6FLgGOLP0mJmdAXwYOMvdnyp6zL3h58/DY/NTPPcWd1/h7isWLVoUN6SG1dPTQ//evQzv3Ilv3szwzp3079lTsQFUaexvn7ZBY2n8HUH2p+6Q5M2UMJ7VjdbdTynafNF0DzSzRWZ2eLj/fOAM4I6SMnngc0TJ4oGi/QvM7OBwfyHReuK3zRCrxDA4OMjI6tXQHD6umpsZaWur2IdVGkefp+0DOo2/o3qYukOSN1PCuNfMCqU7zewU4N4ZHnsUcLWZ3QT8GLjS3XeYWZeZTXSR/b9AC3CZmQ2Z2faw/5XAgJndSFSVdam7K2FUQD6fJ9fXBwfCx9WBA+R6eyv2YZXG0edp+4BO4+9oNldhSV+R6IonfWZq9F4JfAP4Es+ssHcycB7wFnfflXSA5VCj98zGx8dpW7eO/j17GGlrI9fbS2HJEnq3batow3ea+tunsVtq2n5HmzZtYn9nJ5uLPg/Wm5Hr6mLDhg3PKf/0/9HevYysXk2ur4/C4sUV+z9Sx4DqqVijd0gmLwK6gG+FWxfw4riNJNW8NeLUILPRiNOJNOI5l2M204+0LF/ujI5Gb77RUW/J5yvWcK+OAdVDBacGwaO2hY2zzV6SPo04nUgjnnM5yp26Y7q2sEr8jsudDkWqY6bZarvDjLXNkxw7NrRHvD258ESkGsqdiTXptrC0tTtJZKY2jCOJ5o/6Q+Bhovme5gPLgJ8RDd77TvJhxqM2jGxK06A6iSfptrA0tjulUSXeOxWbS6rkSZcR9Xx6AviJu+8vK6oqUMLIHjVuZlfSDfdp6xiQNpV67ySSMLJACSMdyvnWM5uZWEWkcu+dchJGrCVaReKqh9XhRLKgFu8dJQypqHIHgKlxU2R2avHeKTthhGk7TkwiGMm+cr/1pHHUM2iUsaRfLd47M47DADCza4CzQvkhYJ+ZXevu70ssMsmkfD5PZy5HV1G9am8uR9cU33rSuNZDaWNiZy7HFjXES8rU4r0Tq9HbzAbdPW9m7wCOdvdOM7vJ3VN1paFG79qrh+6QjdoQr+7NjaliK+4VlzOzo4BziKYiF5lUGq8YytWIo4zTeFWlBJY+cRNGF9AL/Mjdf2xmxwI/TS4sybKsT8NRbrVaPSjurNAMdA0PUwidFWrxd0xjApOYjd7ufpm7n+ju7wrbP3f3P0w2NJHaSGtDfJLS1r05bYteSSRWwjCzl5nZD8zslrB9opk9d85jkTpQ7rxK9SBt3ZvTlsAkErdb7b8C64n+j3D3m4BzkwpKpNYmqtU2bNjAmjVr6jpZQPquqtKWwCQStw3jEHffZWbF+8YSiEdEaiBtnRXKnW5dqiNuwnjQzH4HcAAzOxu4L7GoRDKmHnr0pKmzQtoSmETijsM4FtgCnAo8AvwCeKu7351odGXSOAypBc24K1lW8ckHQ6+oM4BFwCvc/X/MlCzMbL6Z7TKzG83sVjO7ZJIyB5vZN8zsLjPrD1OoTxxbH/bfaWZtceIUmUqSU32oR480immrpMxs0qk/Jtoy3P3j0zz8KeB0dx8OK/ZdZ2Y97r6zqMyfA4+4+3Fmdi7wMeAtZvYqokb1VwMvAb5vZi9zd03oI2VLuk9/Iw70q4Z6qOarNzNdYRwabiuAdwGLw+2dwKume2BYX3w4bDaHW2n91xuBfw/3LwdWWZSN3gh83d2fcvdfAHcBK2OdkUiJpK8A1KOn8iZW9Ovo7KRz/346OjtpW7dOk0DW2LQJw90vcfdLgIXAcnd/v7u/HzgZWDLTk5tZk5kNAQ8AV7p7f0mRxcA94bXGgN8ARxTvD/aEfSJlS7pPf9q6pNaDnp4e+vfuZXjnTnzzZoZ37qR/zx5V89VY3HEYS4HRou1RonW9p+Xu4+7eSpRcVprZCSVFbLKHTbP/OczsfDMbMLOBffv2zRSSNKBJrwAOOaRiVwCNONAvaYODg4ysXg3NIc03NzPS1qaBezUWN2F8BdhlZh8xs06gH/hy3Bdx90eBa4AzSw7tAY4GMLODgMOAh4v3B0uAe6d47i3uvsLdVyxatChuSNJAVq9ezYPNzbwGuBB4DfDQvHmsXr26Yq/RaAP9kpbP58n19cGBkOYPHCDX26tqvhqL20vqb4G3E3WpfRT4M3f/u+keY2aLzOzwcP/5wBnAHSXFtgPnhftnA1d51M93O3Bu6EV1DHA8sCveKYk8W19fH0eMjnIpUYPcpcALR0fp6+urcWQylfb2dgqLF9NSKGDr19NSKFBYskTVfDUWd+Ae7r7bzO4B5gOY2VJ3/9U0DzkK+HczayJKTN909x1m1gUMuPt24AvAV8zsLqIri3PDa91qZt8EbiMaUX6BekjJbA0ODtK2fz9vAt4U9vXv369eTCnW1NRE77ZtTw/ca+3qUi+pFIg7cO8s4B+Jurg+QNSmcYe7vzrZ8MqjgXsymUZdEEkkjooP3AM2AacAP3H3Y4iql340y/hEqqq9vZ2XrFzJK+bP53XAK+bPZ/HKldNWb2hNb5HnilsldcDdHzKz55nZ89z9ajP7WKKRVYkGBzWOFuC1wPdmKKfFe0QmFzdhPGpmLcAPga+a2QPUwWy19fDBoIQ3s56eHu7dtYuBJ5+kGfjok09S2LVrytXk0rb6nEhaxK2SeiOwH3gv0Re0nwFrkwqqWrI+B9BEwuvs6GB/ZyedHR2sa2tT9UmJcgfuafGebFI1YvJmTBihl9N33P237j7m7v/u7v/s7g9VIb5EZf2DIesJr1rKnbpDU31kj748VceMCSN0Z91vZodVIZ6qyvoHQ9YTXrWUO3WHpvrIHn15qo64bRhPAjeb2ZXAyMROd//rRKKqkqyv6pXP5+nM5egq6i7am8vRlZGEVy3lLsajxXuyRzMGV0fchPHdcKsrWf9gyHrCq4U4444gXavPgTo3zERfnqoj1sC9rGjEgXsTHyRDQ0O0ZizhVUvWV8SbmOq7f+9eRlavJtfXR2HxYnq3bctE/NUw8TfeU/LlKSt/41oqZ+Ae7j7ljah31AVF2/3Az8Pt7OkeW4vbySef7CKluru7fXlLi4+CO/goeL6lxbu7u2sdWizd3d3esny5Mzoa/aOPjnpLPp+Z+KtlbGzMu7u7fdOmTd7d3e1jY2O1DikTiKZqivUZO1Oj998QTQQ44WDgd4HXEy2oJJJ6We8coKm+49GMwcmbKWHMc/fihYyuc/eHPJp0MJdgXCIVk/XecJrqW9JipoSxoHjD3f+yaFOLT0gmZL2b7Gym+h4dHWXjxo2sWrWKjRs3Mjo6OmVZkbimbfQ2s68C17j7v5bs/wvg9e7ekXB8ZWnERm+JJ+udA8qJf3R0lOOPPJJDHnmEs4jqlPcvWMBP77+fefPmVTVuSb9yGr1nShgvAq4AngJuCLtPJmrLeJO7/3qOsVaUEoYIbNy4kcs2beImeLqL6WuAcy6+mK6urtoGJ6lTTsKYdhyGuz8AnGpmpwMTa198192vmmOMIlXVSOMYfvSjH3EWPKuR/43A9ddfX7ugpC7EGrgXEoSShGRSPcxKXI7TTjuNy666io/yzBXGd4BzTj21toFJ5sWdrVYksxptnqENGzawf8ECXgNcSFQd9cSCBWzYsKHGkUnWKWFI3cv6OIxyzZs3j5/efz/nXHwxu1et4pyLL1aDt1RE3LmkRDKrEecZmjdvnhq4peISu8Iws6PN7Gozu93MbjWzd09S5oNmNhRut5jZuJm9MBy728xuDsfU9UlmLevjMETSIrHJB83sKOAod7/BzA4FdhN1xb1tivJrgfe6++lh+25ghbs/GPc11a1WppL1cRgiSalYt9q5cPf7gPvC/cfN7HZgMTBpwgA6gK1JxSONLW3TlYtkUVUavc1sGZAnmu12suOHAGcC3yra7UCfme02s/Onee7zzWzAzAb27dtXuaBFRORZEk8YZtZClAje4+6PTVFsLfAjd3+4aN9p7r4caAcuMLPXTfZAd9/i7ivcfcWiRZreSkQkKYkmDDNrJkoWX3X3b09T9FxKqqPc/d7w8wFgG7AyqThFRGRmSfaSMuALwO3u/vFpyh0G/B7RYNSJfbnQUI6Z5YDVwC1JxSoiIjNLchzGacDbgJvNbGKE1IeApQDu/tmwbx3Q5+4jRY99MbAtyjkcBHzN3b+XYKwiIjKDJHtJXQdYjHJfAr5Usu/nwEmJBCYiIrOiqUFERCQWJQwREYlFCUNERGJRwhARkViUMEREJBYlDBERiUUJQ0REYlHCEBGRWJQwREQkFiUMERGJRQlDRERiUcIQEZFYlDBERCQWJQwREYlFCUNERGJRwhARkViUMEREJBYlDBERiUUJQ0REYkksYZjZ0WZ2tZndbma3mtm7JynzejP7jZkNhdvGomNnmtmdZnaXmV2UVJwiIhLPQQk+9xjwfne/wcwOBXab2ZXufltJuf9y9zXFO8ysCfgM8PvAHuDHZrZ9kseKiEjAh4A7AAALbUlEQVSVJJYw3P0+4L5w/3Ezux1YDMT50F8J3OXuPwcws68Db4z5WKmx8fFxenp6GBwcJJ/P097eTlNTU63DEpE5SvIK42lmtgzIA/2THH6tmd0I3At8wN1vJUos9xSV2QMUpnju84HzAZYuXVq5oGVWxsfHWdfWxt7+flaPjNCZy7GlUGBbb6+ShkjGJd7obWYtwLeA97j7YyWHbwBe6u4nAZ8Crph42CRP5ZM9v7tvcfcV7r5i0aJFlQpbZqmnp4e9/f3sHB5mszs7h4fZ099PT09PrUMTkTlKNGGYWTNRsviqu3+79Li7P+buw+H+fwLNZraQ6Iri6KKiS4iuQCTlBgcHWT0yQnPYbgbaRkYYGhqqZVgiUgFJ9pIy4AvA7e7+8SnKHBnKYWYrQzwPAT8GjjezY8xsHnAusD2pWKVy8vk8fbkcB8L2AaA3l6O1tbWWYYlIBSTZhnEa8DbgZjOb+Hr5IWApgLt/FjgbeJeZjQFPAOe6uwNjZvaXQC/QBPxbaNuQlGtvb2dLoUChv5+2kRF6czmWFAq0t7fXOjQRmSOLPp/rw4oVK3xgYKDWYTS8iV5SQ0NDtLa2qpeUSIqZ2W53XxGrrBKGiEjjKidhaGoQERGJRQlDRERiUcIQEZFYlDBERCQWJQwREYlFCUNERGJRwhARkViUMEREJBYlDBERiUUJQ0REYlHCEBGRWJQwREQkFiUMERGJRQlDRERiUcIQEZFYlDBERCQWJQwREYklyTW9pU5MLLk6ODhIPp/XkqsiDSqxhGFmRwNfBo4Efgtscfd/Kinzx8CFYXMYeJe73xiO3Q08DowDY3GXEJTKGh8fZ11bG3v7+1k9MkJnLseWQoFtvb1KGiINJskqqTHg/e7+SuAU4AIze1VJmV8Av+fuJwKbgC0lx9/g7q1KFrXT09PD3v5+dg4Ps9mdncPD7Onvp6enp9ahiUiVJZYw3P0+d78h3H8cuB1YXFLmend/JGzuBJYkFY/MzuDgIKtHRmgO281A28gIQ0NDtQxLRGqgKo3eZrYMyAP90xT7c6D4a6sDfWa228zOn+a5zzezATMb2LdvXyXClSL5fJ6+XI4DYfsA0JvL0draWsuwRKQGEk8YZtYCfAt4j7s/NkWZNxAljAuLdp/m7suBdqLqrNdN9lh33+LuK9x9xaJFiyocvbS3t7O4UKDQ0sJ6MwotLSwpFGhvb691aCJSZYn2kjKzZqJk8VV3//YUZU4EPg+0u/tDE/vd/d7w8wEz2wasBH6YZLzyXE1NTWzr7aWnp4ehoSG6WlvVS0qkQSXZS8qALwC3u/vHpyizFPg28DZ3/0nR/hzwPHd/PNxfDXQlFatMr6mpiTVr1rBmzZpahyIiNZTkFcZpwNuAm81sooX0Q8BSAHf/LLAROAL4f1F+ebr77IuBbWHfQcDX3P17CcYqIiIzSCxhuPt1gM1Q5h3AOybZ/3PgpIRCExGRWdDUICIiEosShoiIxKKEISIisShhiIhILObutY6hYsxsH/DLWT58IfBgBcPJAp1z/Wu08wWdc7le6u6xRj3XVcKYCzMbaLRJDnXO9a/Rzhd0zklSlZSIiMSihCEiIrEoYTyjdC2ORqBzrn+Ndr6gc06M2jBERCQWXWGIiEgsShgiIhJLQyQMM5tvZrvM7EYzu9XMLgn7jzGzfjP7qZl9w8zmhf0Hh+27wvFltYx/LsysycwGzWxH2K7rczazu83sZjMbMrOBsO+FZnZlOOcrzWxB2G9m9s/hnG8ys+W1jX52zOxwM7vczO4ws9vN7LX1fM5m9vLw9524PWZm76nzc35v+Oy6xcy2hs+0qr+XGyJhAE8Bp7v7SUArcKaZnQJ8DPiEux8PPEK06h/h5yPufhzwiVAuq95NtJ76hEY45ze4e2tRv/SLgB+Ec/5B2IZoNcfjw+184F+qHmll/BPwPXd/BdEsz7dTx+fs7neGv28rcDKwH9hGnZ6zmS0G/hpY4e4nAE3AudTivezuDXUDDgFuAApEIyMPCvtfC/SG+73Aa8P9g0I5q3XsszjXJURvnNOBHUTTzdf7Od8NLCzZdydwVLh/FHBnuP85oGOyclm5AS8AflH6t6rncy45z9XAj+r5nIHFwD3AC8N7cwfQVov3cqNcYUxUzQwBDwBXAj8DHnX3sVBkD9EfBp75AxGO/4Zooaes+STwN8Bvw/YR1P85O9BnZrvN7Pyw78Xufh9A+PmisP/pcw6Kfx9ZcSywD/hiqHr8fFilsp7Pudi5wNZwvy7P2d33Av8A/Aq4j+i9uZsavJcbJmG4+7hHl7BLiNYHf+VkxcLPyRZ+ylT/YzNbAzzg7ruLd09StG7OOTjN3ZcTVUNcYGavm6ZsPZzzQcBy4F/cPQ+M8ExVzGTq4ZwBCHX2ZwGXzVR0kn2ZOefQFvNG4BjgJUCO6P+7VOLv5YZJGBPc/VHgGuAU4HAzm1h1cAlwb7i/BzgaIBw/DHi4upHO2WnAWWZ2N/B1omqpT1Lf54y73xt+PkBUr70S+LWZHQUQfj4Qij99zkHx7yMr9gB73L0/bF9OlEDq+ZwntAM3uPuvw3a9nvMZwC/cfZ+7HwC+DZxKDd7LDZEwzGyRmR0e7j+f6A9wO3A1cHYodh7wnXB/e9gmHL/KQ4VgVrj7endf4u7LiC7br3L3P6aOz9nMcmZ26MR9ovrtW3j2uZWe85+EXjSnAL+ZqNLICne/H7jHzF4edq0CbqOOz7lIB89UR0H9nvOvgFPM7BAzM575G1f/vVzrBp0qNRqdCAwCNxF9gGwM+48FdgF3EV3WHhz2zw/bd4Xjx9b6HOZ4/q8HdtT7OYdzuzHcbgU+HPYfQdT4/9Pw84VhvwGfIWrPupmoF0rNz2MW590KDIT/7yuABQ1wzocADwGHFe2r23MGLgHuCJ9fXwEOrsV7WVODiIhILA1RJSUiInOnhCEiIrEoYYiISCxKGCIiEosShoiIxKKEIQ3LzI4omvH0fjPbW7Q9r4znebuZHTnN8U+b2anhfrOZXRpmEh0K03lcFI4dZGbjYf8tZvYdM3tBOHacmT1RMkvrH4djPzCzw+b22xCZmRKGNCx3f8ifmfX0s0Qzf7aG22gZT/V2YNKEYWaLgLy7Xx92bQYWAa8Or/s6oj71Ex4Pr38CMAy8q+jYnUXxtbr7V8P+rwHvLCNekVk5aOYiIo3HzM4DLgDmAdcDf0n0BeuLRAPljGgd5V+H7W+Y2RPAypJk82agJzznoUQjcJe5+1MA7v440aCsyfw38LIY4X6HaKBalqeklwzQFYZICTM7AVgHnBquAg4iml7lZKKp018TrgC+7O7fAIaAt0xxZXIa0cyiEK3HcLe7j8SIoYlo/q/tRbtLFw46FcDdHwQOnZj+RiQpusIQea4zgN8FBqKpe3g+0XTRvUQf2v8E/CfQF+O5jiKafvw5zOwdRFcuC8Pr7SP64B8ClgH9RPMFTbgzJLDJ7Auv9WiMmERmRVcYIs9lwL8VtRW83N03uftDRPOSXUe0AtrnYjzXE0Rz+0A0x9ExYWJE3P3zIQEME62iBqENgyhhHAr8RcyY54fXEkmMEobIc30fOMfMFsLTvamWhgZsc/fLgE6iacQBHif6cJ/M7cBx8HR7xZeBfzazg8NzHwQ0lz7Io2n43w18MFRPTcnMnkd0lXLPdOVE5koJQ6SEu99M1BD9fTO7iajq6cVEawz8MFQZ/SvwofCQLwKfn6I77neJZguecBHRLKu3mdkgcC3weaLG89I4fkw0Q+k5YVdpG8YFYf9K4Dp3H5/LeYvMRLPViiQorF9wHdDu7o8l9BqfAb7p7tcm8fwiE3SFIZIgj76RfQBYmuDLDCpZSDXoCkNERGLRFYaIiMSihCEiIrEoYYiISCxKGCIiEosShoiIxPL/AVcBjfMh4u+/AAAAAElFTkSuQmCC\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of our inputs. In order to do this, we should one-hot encode it.\n\n## One-hot encoding the rank\nUse the `get_dummies` function in pandas in order to one-hot encode the data.\n\nHint: To drop a column, it's suggested that you use `one_hot_data`[.drop( )](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html)."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Make dummy variables for rank and concat existing columns\none_hot_data = pd.get_dummies(data, columns=['rank'])\n\n# TODO: Drop the previous rank column\n# no need to drop the rank column\n\n# Print the first 10 rows of our data\none_hot_data[:10]",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": " admit gre gpa rank_1 rank_2 rank_3 rank_4\n0 0 380 3.61 0 0 1 0\n1 1 660 3.67 0 0 1 0\n2 1 800 4.00 1 0 0 0\n3 1 640 3.19 0 0 0 1\n4 0 520 2.93 0 0 0 1\n5 1 760 3.00 0 1 0 0\n6 1 560 2.98 1 0 0 0\n7 0 400 3.08 0 1 0 0\n8 1 540 3.39 0 0 1 0\n9 0 700 3.92 0 1 0 0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>admit</th>\n <th>gre</th>\n <th>gpa</th>\n <th>rank_1</th>\n <th>rank_2</th>\n <th>rank_3</th>\n <th>rank_4</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>380</td>\n <td>3.61</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>660</td>\n <td>3.67</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>800</td>\n <td>4.00</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>640</td>\n <td>3.19</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>520</td>\n <td>2.93</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>5</th>\n <td>1</td>\n <td>760</td>\n <td>3.00</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1</td>\n <td>560</td>\n <td>2.98</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>0</td>\n <td>400</td>\n <td>3.08</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>1</td>\n <td>540</td>\n <td>3.39</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>0</td>\n <td>700</td>\n <td>3.92</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Scaling the data\nThe next step is to scale the data. We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Making a copy of our data\nprocessed_data = one_hot_data[:]\n\n# Scale the columns\nprocessed_data['gre'] = processed_data['gre'] / 800\nprocessed_data['gpa'] = processed_data['gpa'] / 4.0\n\n# Printing the first 10 rows of our procesed data\nprocessed_data[:10]",
"execution_count": 10,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 10,
"data": {
"text/plain": " admit gre gpa rank_1 rank_2 rank_3 rank_4\n0 0 0.475 0.9025 0 0 1 0\n1 1 0.825 0.9175 0 0 1 0\n2 1 1.000 1.0000 1 0 0 0\n3 1 0.800 0.7975 0 0 0 1\n4 0 0.650 0.7325 0 0 0 1\n5 1 0.950 0.7500 0 1 0 0\n6 1 0.700 0.7450 1 0 0 0\n7 0 0.500 0.7700 0 1 0 0\n8 1 0.675 0.8475 0 0 1 0\n9 0 0.875 0.9800 0 1 0 0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>admit</th>\n <th>gre</th>\n <th>gpa</th>\n <th>rank_1</th>\n <th>rank_2</th>\n <th>rank_3</th>\n <th>rank_4</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0.475</td>\n <td>0.9025</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>0.825</td>\n <td>0.9175</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>1.000</td>\n <td>1.0000</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>0.800</td>\n <td>0.7975</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>0.650</td>\n <td>0.7325</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>5</th>\n <td>1</td>\n <td>0.950</td>\n <td>0.7500</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>1</td>\n <td>0.700</td>\n <td>0.7450</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>0</td>\n <td>0.500</td>\n <td>0.7700</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>1</td>\n <td>0.675</td>\n <td>0.8475</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>0</td>\n <td>0.875</td>\n <td>0.9800</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Splitting the data into Training and Testing"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "sample = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)\ntrain_data, test_data = processed_data.iloc[sample], processed_data.drop(sample)\n\nprint(\"Number of training samples is\", len(train_data))\nprint(\"Number of testing samples is\", len(test_data))\nprint(train_data[:10])\nprint(test_data[:10])",
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": "Number of training samples is 360\nNumber of testing samples is 40\n admit gre gpa rank_1 rank_2 rank_3 rank_4\n304 0 0.275 0.7075 0 0 1 0\n5 1 0.950 0.7500 0 1 0 0\n245 0 1.000 0.9775 0 0 1 0\n330 0 0.925 1.0000 0 0 1 0\n189 0 0.625 0.8375 0 1 0 0\n7 0 0.500 0.7700 0 1 0 0\n215 1 0.825 0.7275 0 0 1 0\n320 0 0.575 0.7850 0 0 1 0\n132 0 0.725 0.8500 0 1 0 0\n335 1 0.775 0.9275 1 0 0 0\n admit gre gpa rank_1 rank_2 rank_3 rank_4\n0 0 0.475 0.9025 0 0 1 0\n12 1 0.950 1.0000 1 0 0 0\n19 1 0.675 0.9525 1 0 0 0\n29 0 0.650 0.8225 1 0 0 0\n32 0 0.750 0.8500 0 0 1 0\n41 1 0.725 0.8300 0 1 0 0\n45 1 0.575 0.8625 0 0 1 0\n52 0 0.925 0.8425 0 0 0 1\n59 0 0.750 0.7050 0 0 0 1\n60 1 0.775 0.7950 0 1 0 0\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Splitting the data into features and targets (labels)\nNow, as a final step before the training, we'll split the data into features (X) and targets (y)."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "features = train_data.drop('admit', axis=1)\ntargets = train_data['admit']\nfeatures_test = test_data.drop('admit', axis=1)\ntargets_test = test_data['admit']\n\nprint(features[:10])\nprint(targets[:10])",
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"text": " gre gpa rank_1 rank_2 rank_3 rank_4\n304 0.275 0.7075 0 0 1 0\n5 0.950 0.7500 0 1 0 0\n245 1.000 0.9775 0 0 1 0\n330 0.925 1.0000 0 0 1 0\n189 0.625 0.8375 0 1 0 0\n7 0.500 0.7700 0 1 0 0\n215 0.825 0.7275 0 0 1 0\n320 0.575 0.7850 0 0 1 0\n132 0.725 0.8500 0 1 0 0\n335 0.775 0.9275 1 0 0 0\n304 0\n5 1\n245 0\n330 0\n189 0\n7 0\n215 1\n320 0\n132 0\n335 1\nName: admit, dtype: int64\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Training the 2-layer Neural Network\nThe following function trains the 2-layer neural network. First, we'll write some helper functions."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Activation (sigmoid) function\ndef sigmoid(x):\n return 1 / (1 + np.exp(-x))\ndef sigmoid_prime(x):\n return sigmoid(x) * (1-sigmoid(x))\ndef error_formula(y, output):\n return - y*np.log(output) - (1 - y) * np.log(1-output)",
"execution_count": 13,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Backpropagate the error\nNow it's your turn to shine. Write the error term. Remember that this is given by the equation $$ (y-\\hat{y}) \\sigma'(x) $$"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Write the error term formula\ndef error_term_formula(x, y, output):\n return (y-output) * sigmoid_prime(x)",
"execution_count": 15,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Neural Network hyperparameters\nepochs = 1000\nlearnrate = 0.5\n\n# Training function\ndef train_nn(features, targets, epochs, learnrate):\n \n # Use to same seed to make debugging easier\n np.random.seed(42)\n\n n_records, n_features = features.shape\n last_loss = None\n\n # Initialize weights\n weights = np.random.normal(scale=1 / n_features**.5, size=n_features)\n\n for e in range(epochs):\n del_w = np.zeros(weights.shape)\n for x, y in zip(features.values, targets):\n # Loop through all records, x is the input, y is the target\n\n # Activation of the output unit\n # Notice we multiply the inputs and the weights here \n # rather than storing h as a separate variable \n output = sigmoid(np.dot(x, weights))\n\n # The error, the target minus the network output\n error = error_formula(y, output)\n\n # The error term\n error_term = error_term_formula(x, y, output)\n\n # The gradient descent step, the error times the gradient times the inputs\n del_w += error_term * x\n\n # Update the weights here. The learning rate times the \n # change in weights, divided by the number of records to average\n weights += learnrate * del_w / n_records\n\n # Printing out the mean square error on the training set\n if e % (epochs / 10) == 0:\n out = sigmoid(np.dot(features, weights))\n loss = np.mean((out - targets) ** 2)\n print(\"Epoch:\", e)\n if last_loss and last_loss < loss:\n print(\"Train loss: \", loss, \" WARNING - Loss Increasing\")\n else:\n print(\"Train loss: \", loss)\n last_loss = loss\n print(\"=========\")\n print(\"Finished training!\")\n return weights\n \nweights = train_nn(features, targets, epochs, learnrate)",
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": "Epoch: 0\nTrain loss: 0.2757128714751906\n=========\nEpoch: 100\nTrain loss: 0.20673003134146126\n=========\nEpoch: 200\nTrain loss: 0.2037436009279713\n=========\nEpoch: 300\nTrain loss: 0.20219327535892873\n=========\nEpoch: 400\nTrain loss: 0.20134601874349142\n=========\nEpoch: 500\nTrain loss: 0.2008415972190445\n=========\nEpoch: 600\nTrain loss: 0.20050851797287458\n=========\nEpoch: 700\nTrain loss: 0.20026413768351276\n=========\nEpoch: 800\nTrain loss: 0.2000676042868315\n=========\nEpoch: 900\nTrain loss: 0.19989814978961387\n=========\nFinished training!\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Calculating the Accuracy on the Test Data"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Calculate accuracy on test data\ntest_out = sigmoid(np.dot(features_test, weights))\npredictions = test_out > 0.5\naccuracy = np.mean(predictions == targets_test)\nprint(\"Prediction accuracy: {:.3f}\".format(accuracy))",
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": "Prediction accuracy: 0.575\n",
"name": "stdout"
}
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "intro-neural-networks/student-admissions/StudentAdmissions.ipynb",
"public": false
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment