Skip to content

Instantly share code, notes, and snippets.

@andychase
Last active April 8, 2016 15:21
Show Gist options
  • Save andychase/adf315f646afa4385e8eeea22c2adf0e to your computer and use it in GitHub Desktop.
Save andychase/adf315f646afa4385e8eeea22c2adf0e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Assignment 1\n",
"\n",
"Andy Chase \n",
"Brandon Edwards \n",
"Daniel Kirkpatrick \n",
"\n",
"April 9th 2015"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we import [pandas](http://pandas.pydata.org) and [numpy](https://docs.scipy.org/doc/). These are very popular numerical computation libraries for the Python programming language. Matplotlib is also used for a simple plot in question 5."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I converted the data to csv. Here pandas will read the files and import them as [DataFrames][1]. Pandas DataFrames were used because they include headers which makes the results easy to read.\n",
"\n",
"\n",
"[1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"d = pandas.read_csv('housing_train.csv')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"test_data = pandas.read_csv('housing_test.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The answers are \"popped\" off (removed and saved). Since there's only one column, Pandas will make these [series][1].\n",
"\n",
"[1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"test_answers = test_data.pop('MEDV')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y = d.pop('MEDV')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 1\n",
"\n",
"Next we need to add the \"dummy\" column with all ones. This is done in pandas like so:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"d[\"dummy\"] = 1"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"test_data[\"dummy\"] = 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas doesn't support inverting \"Dataframes\" (which are kind of like matrices). Therefore to accomplish this we convert into a Numpy matrix and use the numpy inversion function. I figured this all out by just knowing what I wanted to accomplish and googling out to perform this task, reading StackOverflow, etc.\n",
"\n",
"I made a quick function here to make it clear later what's happening."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def data_frame_invert(df):\n",
" return numpy.linalg.inv(df.as_matrix())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 2\n",
"\n",
"This performs the maths: $w = (X^T X)^{−1} X^T Y$"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output_w = data_frame_invert(d.T.dot(d)).dot(d.T.dot(y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I convert the output (which ends up being a numpy array), back into pandas so that I can keep the columns names."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output_w = pandas.Series(output_w, index=d.columns)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"CRIM -0.101137\n",
"ZN 0.045894\n",
"INDUS -0.002730\n",
"CHAS 3.072013\n",
"NOX -17.225407\n",
"RM 3.711252\n",
"AGE 0.007159\n",
"DIS -1.599002\n",
"RAD 0.373623\n",
"TAX -0.015756\n",
"PTRATIO -1.024177\n",
"B 0.009693\n",
"LSTAT -0.585969\n",
"dummy 39.584321\n",
"dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_w"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I write a quick function to calculate the SSE. The code here is kind of opaque if you've never seen python, so let's break it down line-by-line.\n",
"\n",
" def get_sse(d, y, output_w):\n",
" \n",
"This first line just says \"here's a function with variables, d, y, and output_w).\n",
"\n",
" for answer, (_, training_row) in zip(y, d.iterrows()):\n",
" \n",
"`d.iterrows()` takes that DataFrame we have that has the training data, and goes through it one row at a time. If we didn't write `iterrows` Pandas would try to go through it column-wise, which isn't what we want.\n",
"\n",
"Each row is going to be in the form `(index, row_array)`. I don't need to worry about the index, so I used the variable `_` to indicate that I'm planning on ignoring this variable. `_` isn't special, it's a variable like `d`, or `y` , but convention states that you use `_` when you are ignoring something.\n",
"\n",
"`zip(y, d.iterrows())` takes each row and combines it with the rows in the answer output. Think of like a zipper, the left side of the zipper is the answer rows and the right side is the training data rows.\n",
"\n",
" predicted_value = training_row.dot(output_w)\n",
" yield (answer - predicted_value)**2\n",
"\n",
"`x**2` in Python means $ x^2 $. The `yield` keyword means this function returns a new kind of list with each answer as the row in the list (it's actually a generator, but imagine it as a list that's generated as it's used)."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def get_sse(d, y, output_w):\n",
" for answer, (_, testing_row) in zip(y, d.iterrows()):\n",
" predicted_value = testing_row.dot(output_w)\n",
" yield (answer - predicted_value)**2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 3\n",
"\n",
"Next we sum up all the values in that generated list to get the sum of squared errors. The `sum` function is a built-in function in Python that can sum up a list or generator.\n",
"\n",
"We are using the `test_data` and `test_answers` to calculate the error here, not the training data."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1675.2309659483587"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(get_sse(test_data, test_answers, output_w))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 4\n",
"\n",
"Repeat the experiment, but pop off the dummy variable and don't use it this time."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"_ = d.pop(\"dummy\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"_ = test_data.pop(\"dummy\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output_w = data_frame_invert(d.T.dot(d)).dot(d.T.dot(y))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"output_w = pandas.Series(output_w, index=d.columns)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1797.625624999007"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(get_sse(test_data, test_answers, output_w))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Question 5"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"d[\"dummy\"] = 1"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"test_data[\"dummy\"] = 1"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"identity = numpy.identity(len(d.columns))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_modifer_sse(modifer_integer):\n",
" modifier = modifer_integer*identity\n",
" output_w = data_frame_invert(d.T.dot(d) + modifier).dot(d.T.dot(y))\n",
" output_w = pandas.Series(output_w, index=d.columns)\n",
" return sum(get_sse(test_data, test_answers, output_w))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1661.8917627327075"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_modifer_sse(.5)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x_axis = numpy.arange(0.0, 2.0, 0.01)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"y_axis = [(i, get_modifer_sse(i)) for i in x_axis]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.20999999999999999, 1649.5930012875315)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"min(y_axis, key=lambda _: _[1])"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEACAYAAAC6d6FnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYVNWV9/HvUiASxWjUCBGRoE0ElYiAEFRoTVR0EsU7\nqGiIowJGZ5KM0eRNBB0TERkjOkq8NYiXRlSCEgE1QgEjQquooKABRQUUBAKCXKTpXu8f+2DKtulL\ndVeduvw+z1MPXfucOrWqnmKvc/beZ29zd0REpPDsFncAIiISDyUAEZECpQQgIlKglABERAqUEoCI\nSIFSAhARKVA1JgAzKzGz1Wa2MKlsvJm9Hj2WmdnrUfnJZvaqmS2I/j0x6TVdzGyhmS0xs1Hp+zgi\nIlJXVtN9AGZ2AvA5MM7dj6pm+0hgg7vfbGZHA6vcfZWZHQE85+6to/3KgF+4e5mZTQHudPdp6fhA\nIiJSNzVeAbj7bGB9ddvMzIDzgdJo3zfcfVW0eRHQ3MyamlkroIW7l0XbxgF9GyN4ERFJXUP6AE4A\nVrv7e9VsOwd4zd3LgYOAFUnbVkZlIiISoyYNeG1/4LGqhVHzz3Dg5AYcW0RE0iylBGBmTYCzgGOq\nlLcGJgID3H1ZVLwSaJ20W+uorLrjamIiEZEUuLvV9zWpNgH9GFjs7h/vLDCzfYBngevc/eWkoD4B\nNppZ96jfYAAwaVcHdnc9GukxdOjQ2GPIl4e+S32fcTx27HASCeeqq5yWLZ3OnZ1bbnGWLv3qfqmq\n8QrAzEqB3sB+ZrYcuMHdxwAXEHX+JvkFcCgw1MyGRmUnu/taYAgwFmgOTHGNABIRqZY7zJ0Ljz0G\nTz4JBx4I558Ps2ZBUVHjvleNCcDd+++ifGA1ZTcDN+9i/9eArw0jFRGRYNEiePRRKC2Fb3wDLrwQ\nZs6E9u3T954N6QSWLFdcXBx3CHlD32Xj0vcZfPQRjB8fzvbXroX+/eGpp+Doo8Hq3aJffzXeCJZp\nZubZFI+ISGNbuzY07Tz2WDjrP+eccLZ/wgmwW4q9smaGp9AJrAQgIpJmW7fC00+HJp5Zs+C000Kl\nf+qpobmnoZQARESyiDvMmQMPPRTO+Lt2hQEDoG9faNGicd8r1QSgPgARkUb00Ucwblx47LYbXHop\nLFgArVvX/tpMUwIQEWmgzZth4kQYOxbeeCMM23z4YTj22Mx05qZKCUBEJAWVlTB7dmji+etfoWdP\nGDQIfvpT2GOPuKOrG/UBiIjUw6pV4Uz/gQdCRT9wIFx0EbRsGV9M6gMQEUmTigp44QW47z6YMSMM\n3Xz00exv4qmNEoCIyC6sWAElJfDgg3DAAXD55eHsf++9446scSgBiIgk2bEDnn0W7r8/DOPs1w8m\nTYLOneOOrPEpAYiIEIZv3ndfOONv2zac7T/+OOy5Z9yRpY8SgIgUrMpKePFFuPvuMKLnoovg+efh\nyCPjjiwzlABEpOBs2BDa8kePDiN5rroqdOrm89l+dZQARKRgvPlmONt/4okwH09JSRi/n8sjeRpC\nCUBE8tr27WGK5bvvhg8/hCuvhMWL4x23ny2UAEQkL61dGzp1774bDj8cfv3rcJduE9V6X9JXISJ5\n5e23YdSo0Mxz9tkwbRocpfUIq6UEICI5r7IyVPR33AELF8KQIfDuu/Cd78QdWXarcf0ZMysxs9Vm\ntjCpbLyZvR49lpnZ61H5fmY2w8w2mdldVY7TxcwWmtkSMxuVno8iIoVm8+YwkqdjR/h//w8uvhg+\n+AD+8AdV/nVR2wJkY4A+yQXu3s/dO7t7Z+Cp6AGwFfg98F/VHGc0cJm7FwFFZtanmn1EROpk9epQ\n4R9ySBi3f++9MH8+XHJJ46ywVShqTADuPhtYX902MzPgfKA02neLu78EfFFlv1ZAC3cvi4rGAX0b\nGLeIFKAlS8IonsMPh/XrYe7cMBVz796FO5SzIVJcghiAE4DV7v5elfKq8zkfBKxIer4yKhMRqZOy\nMjj33DBm/8ADQ/v+PffAYYfFHVlua0gncH/gscYKREQkmTtMnQojRsCyZWEY59ixsNdecUeWP1JK\nAGbWBDgLOKYOu68EklfDbB2VVWvYsGFf/l1cXExxcXEqIYpIjiovh/HjQ8W/227wm9+EJRabNo07\nsuyRSCRIJBINPk6tK4KZWVtgsrsflVTWB7jO3U+sZv+fAV3c/eqksnnANUAZ8Cxwp7tPq+a1WhFM\npEB98UU4wx8+PMzGef31cMopatuvi7SsCGZmpUBvYD8zWw7c4O5jgAuIOn+r7P8B0AJoZmZ9gZPd\n/R1gCDAWaA5Mqa7yF5HCtGVLWF5xxAjo1AkeeQSOOy7uqAqD1gQWkVhs2gR/+Qvcfjv06BGGdXbt\nGndUuUlrAotITtiwAe66KzxOOimM49dUDfFoyDBQEZE6W7cOfv/7MHTzvffCAizjx6vyj5MSgIik\n1YYNMHQofP/74Q7esrLQ2fv978cdmSgBiEhabNoEf/wjFBWF9XbLysJC6+3axR2Z7KQEICKNassW\nuO220NSzaBG89BKMGaOKPxupE1hEGsW2bWFStuHD4fjjYfp0OOKIuKOSmigBiEiDbN8ODz4YmnuO\nOSZM33D00XFHJXWhBCAiKamshAkTwvj9ww6DiRPh2GPjjkrqQwlAROrt73+H664Lc/Xcf38Yzy+5\nRwlAROps/vwwR8+yZfCnP4UpmjVXT+7SKCARqdX778OFF8K//RucdVYY3XPeear8c50SgIjs0qef\nwjXXQLdu0KFDWJFr8GBNzZwvlABE5Gu2bg1NPB07hrP8xYvDQutajCW/qA9ARL7kDo8/Htr5u3aF\nefPg0EPjjkrSRQlARIAwVcMvfxnO/h96KCy0LvlNTUAiBW7FChgwIHTu/vu/wyuvqPIvFEoAIgVq\n82YYNgx+8AM45BB4910YOBB23z3uyCRT1AQkUmAqK+HRR+F3vwtz9syfHxKAFB4lAJEC8tpr8Itf\nQEVF6Ozt2TPuiCRONTYBmVmJma02s4VJZePN7PXosczMXk/a9lszW2Jm75jZKUnlXcxsYbRtVHo+\niojsyrp1MGhQuJHriitg7lxV/lJ7H8AYoE9ygbv3c/fO7t4ZeCp6YGYdgQuAjtFr7jH78j7B0cBl\n7l4EFJnZV44pIulRURGmaO7YMdy8tXhxaOffTb1/Qi1NQO4+28zaVrctqtzPB06Mis4ESt29HPjA\nzJYC3c3sQ6CFu5dF+40D+gLTGh6+iOzKvHlw1VXQvHlYeP0HP4g7Isk2DTkPOAFY7e7vRc+/C6xI\n2r4COKia8pVRuYikwZo1cNllYVjnf/4nzJqlyl+q15BO4P7AY40VyE7Dhg378u/i4mKKi4sb+y1E\n8lJlJZSUhNE9F18M77wDe+8dd1SSDolEgkQi0eDjmLvXvENoAprs7kcllTUhnNUf4+4fR2XXA7j7\n8Oj5NGAo8CEww907ROX9gd7uPqia9/La4hGRr3v7bbjyStixI7T564y/sJgZ7l7vuVlTbQL6MbB4\nZ+UfeQboZ2bNzOx7QBFQ5u6rgI1m1j3qNxgATErxfUUkydat4Yy/uBguuigswK7KX+qqtmGgpcAc\noL2ZLTezgdGmC4DS5H3dfREwAVgETAWGJJ3ODwEeAJYAS91dHcAiDfTcc3DkkWGu/gULwjTNuotX\n6qPWJqBMUhOQSO1WrQqTts2bB3ffDaedFndEErdMNwGJSIa5w4MPQqdOYeqGt95S5S8No6kgRHLA\n+++HO3g3bAgLsnfqFHdEkg90BSCSxSoq4I474Nhj4ZRTwhQOqvylsegKQCRLLVoUbuhq2hTmzIH2\n7eOOSPKNrgBEskx5Odx8M/TqFRZqSSRU+Ut66ApAJIvMnw8//zm0ahX+btMm7ogkn+kKQCQLlJfD\njTdCnz5hiOeUKar8Jf10BSASs0WL4JJLYP/9w1l/69ZxRySFQlcAIjGpqICRI0Nb/xVXwNSpqvwl\ns3QFIBKD996Dn/0MzKCsDNq1izsiKUS6AhDJIHcYPRq6d4ezzw4jfFT5S1x0BSCSIR9/HJZjXL8e\nZs+GDh3ijkgKna4ARDJg0iQ45hj44Q/DTV2q/CUb6ApAJI02b4Zf/QpeeAEmToSePeOOSORfdAUg\nkiavvRbO+rdtgzfeUOUv2UcJQKSRVVTArbeGqZpvugkeekhr80p2UhOQSCNavjzc1FVZCa++qrt5\nJbvpCkCkkUycCF27wqmnwvTpqvwl++kKQKSBtm2Da6+FZ5+FyZPD3P0iuaC2ReFLzGy1mS2sUn61\nmS02s7fM7NaorJmZjTGzBWb2hpn1Ttq/i5ktNLMlZjYqPR9FJPOWLAmdu6tWhXl8VPlLLqmtCWgM\n0Ce5wMxOBM4AOrn7kcDIaNPlQKW7dwJOBv4n6WWjgcvcvQgoMrOvHFMkFz32WKj8L78cJkyAffaJ\nOyKR+qmxCcjdZ5tZ2yrFg4Fb3L082mdNVN4BmLGzzMw2mFk3YAXQwt3Lov3GAX2BaY3yCUQybMsW\nuOaacDfvCy/A0UfHHZFIalLpBC4CepnZXDNLmFnXqPxN4Awz293Mvgd0AVoDBxGSwE4rozKRnPP2\n26GZZ9u2MMpHlb/kslQ6gZsA+7p7j+gMfwLQDighXAW8CnwIzAEqAK/PwYcNG/bl38XFxRQXF6cQ\nokjjGzs2dPaOGPGvmTxF4pBIJEgkEg0+jrnXXD9HTUCT3f2o6PlUYLi7z4yeLwW6u/u6Kq97CbgM\n+AyY7u4dovL+QG93H1TNe3lt8Yhk2rZt/2ryefJJOOKIuCMS+Sozw93rfUqSShPQJOCk6E3bA83c\nfZ2ZNTezPaPyk4Fyd3/H3T8BNppZdzMzYEB0DJGst2wZHHccbNgQ5u1X5S/5pLZhoKWEppz2Zrbc\nzAYSmnraRUNDS4FLot0PBF4zs0XAtYSKfqchwAPAEmCpu6sDWLLelCnQowcMGACPPw4tWsQdkUjj\nqrUJKJPUBCTZoKIiLNBeUgLjx8Pxx8cdkUjNUm0C0p3AIknWroWLLoLt28Mon5Yt445IJH00F5BI\npKwMunQJQztfeEGVv+Q/XQGIEJp7rr8e7r0Xzjor7mhEMkMJQApaeTn88pfhjH/WLDj88LgjEskc\nJQApWGvWwHnnwV57heafb30r7ohEMkt9AFKQ5s+Hbt3CCJ+nn1blL4VJVwBScEpLw529o0fDuefG\nHY1IfJQApGBUVMBvfxumc3jxRejUKe6IROKlBCAFYf166NcvJIFXXoH99os7IpH4qQ9A8t6SJWFK\nh44dYdo0Vf4iOykBSF6bMSN09P7Xf8Gf/wxNdM0r8iUlAMlbDzwQmn1KS8OyjSLyVTofkrxTUQG/\n+Q1Mnhzm8G/fPu6IRLKTEoDklU2b4MILYfNmmDsXvv3tuCMSyV5qApK88eGHYfGWVq3guedU+YvU\nRglA8sLcufDDH8LAgWFCt6ZN445IJPupCUhy3lNPwaBBMGYM/OQncUcjkjuUACSn3XEHjBwZmnyO\nOSbuaERyixKA5KSKCvjVr+Dvf4eXXoJDDok7IpHcU9ui8CVmtjpaAD65/GozW2xmb5nZrVHZHmZW\namYLzGyRmV2ftH8XM1toZkvMbFR6PooUii1bwiRuCxeq8hdpiNo6gccAfZILzOxE4Aygk7sfCYyM\nNvUDcPdOQBfgSjNrE20bDVzm7kVAkZl95ZgidfXpp3DSSWEO/2nTYJ994o5IJHfVmADcfTawvkrx\nYOAWdy+P9lkTlX8C7GlmuwN7AtuBjWbWCmjh7mXRfuOAvo0UvxSQf/wDevaEk0+GceOgWbO4IxLJ\nbakMAy0CepnZXDNLmFlXAHd/DthISAQfALe5+wbgIGBF0utXRmUidfbSS9CrV5jO+b//G8zijkgk\n96XSCdwE2Nfde5hZN2AC0M7MLgaaA62AbwOzzezF+h582LBhX/5dXFxMcXFxCiFKPvnrX+GKK+CR\nR+DUU+OORiR+iUSCRCLR4OOYu9e8g1lbYLK7HxU9nwoMd/eZ0fOlQA/gJmCOuz8SlT8ITAX+D5jh\n7h2i8v5Ab3cfVM17eW3xSGG591648Ub42980zFNkV8wMd6/3dXEqTUCTgJOiN20PNHX3tcA7SeV7\nEpLCO+6+itAX0N3MDBgQHUNkl9zhpptgxAiYNUuVv0g61NgEZGalQG9gPzNbDtwAlAAl0dDQ7cCl\n0e73Ag9G5bsBJe7+VrRtCDCW0EQ0xd2nNfYHkfxRURHW7J0zJ7T9t2wZd0Qi+anWJqBMUhOQbNsG\nAwbAunUwaRLsvXfcEYlkv0w2AYmkxWefwWmnhb+nTlXlL5JuSgCSFVatguJiOOIIGD8evvGNuCMS\nyX9KABK7pUvDPP7nnAN33QW77x53RCKFQZPBSawWLoQ+fWDo0DDWX0QyRwlAYjNvHpxxBtx5J1xw\nQdzRiBQeJQCJxfTp0K8fjB0Lp58edzQihUl9AJJxzzwTKv8nnlDlLxInJQDJqEcfDW39U6ZA795x\nRyNS2NQEJBlzzz1wyy2h+adjx7ijERElAMmI4cPh/vth5kxo1y7uaEQElAAkzdzDHP6TJ8Ps2fDd\n78YdkYjspAQgaVNZCVddBa++Gs78998/7ohEJFnWdQKvr7oApeSkigoYOBAWLYIXX1TlL5KNsi4B\nLFgQdwTSUDt2wKWXwsqVmtRNJJtlXQJ48824I5CG2LEDLrkEPv00tPt/85txRyQiu5J1fQBKALlr\nxw64+GLYsAGefhqaN487IhGpiRKANIrycrjoIti0KSzksscecUckIrXJuhXBvvlN57PPoEnWpSbZ\nlfJy6N8ftm6Fp55S5S+SaXmzIlibNmHkiOSG7dvDTJ5ffAETJ6ryF8klNSYAMysxs9XRQu/J5Veb\n2WIze8vMhkdlF5nZ60mPCjPrFG3rYmYLzWyJmY2q6T27dg3jxiX7bd8O558fhnw++aRW8RLJNbVd\nAYwB+iQXmNmJwBlAJ3c/EvgfAHd/1N07u3tnYACwzN13DuocDVzm7kVAkZl95ZjJunWDV15J7cNI\n5nzxBZx7LpiFWT1V+YvknhoTgLvPBqremjUYuMXdy6N91lTz0guBUgAzawW0cPeyaNs4oO+u3lNX\nANlv27awfGPTpjBhAjRrFndEIpKKVPoAioBeZjbXzBJm1rWafc4nSgDAQcCKpG0ro7JqHX00vP12\nOMOU7LNtG5x9dhjiOX58SAIikptSGWvTBNjX3XuYWTdgAvDl/I5m1h3Y4u4pdeWOGDGMFi3g6qvh\nwguLKS4uTuUwkgZbt8JZZ8G3vgWPPKLKXyQuiUSCRCLR4OPUOgzUzNoCk939qOj5VGC4u8+Mni8F\nurv7uuj5n4HV7r6zc7gVMN3dO0TP+wO93X1QNe/l7s5ll4W+gEFf20PisnUrnHkm7LcfPPywhumK\nZJNMDgOdBJwUvWl7oFlS5b8bcB4wfufO7v4JsNHMupuZETqIJ9X0Bl27qiM4m2zZAj/9KRxwgCp/\nkXxS2zDQUmAO0N7MlpvZQKAEaBcNDS0FLkl6SS/gI3f/oMqhhgAPAEuApe4+rab37d4d5s6t1+eQ\nNNm8GX7yE2jVCsaNU+Uvkk+y7k5gd2fHDth3X/joo/CvxGNn5d+mDZSUwO67xx2RiFQnb+4EhnCW\neeyxugqI0+efw+mnQ9u2qvxF8lVWJgCAnj3hpZfijqIwbdoEp50GRUXw4IOq/EXyVVYngDlz4o6i\n8GzcGCr/Dh3gvvtgt6z9hYhIQ2VlHwCEpSHbtAn/quMxMzZuhD59oFMnuOceVf4iuSKv+gAgdP4e\ncoiWiMyUzz6DU08Nd2Kr8hcpDFn93/y442DWrLijyH8bNsApp4T7L+6+W5W/SKHI6v/qJ54IM2bE\nHUV+W78eTj4ZevSAO+8Ms3uKSGHI2j4AgFWrQmfk2rUaiZIO//xnqPx79YLbb1flL5Kr8q4PAKBl\ny3AH6htvxB1J/lm3Dn78YyguVuUvUqiyOgGAmoHSYe1a+NGPQgIYOVKVv0ihyvoEcNJJMH163FHk\njzVrQuXfpw/ceqsqf5FCltV9ABDOVg89NPyr+ecb5tNPQ+V/xhlw882q/EXyRV72AQDsvz8cdhi8\n/HLckeS21avD1VTfvqr8RSTI+gQAYWqCqVPjjiJ3rVoV+lLOOQduukmVv4gESgB57pNPQuV/wQVw\n442q/EXkX7K+DwBgxw74znfgrbfgu9+NIbAc9fHHofK/+GL4wx/ijkZE0iVv+wAgTAZ38skwrcZ1\nxCTZypVhjP+ll6ryF5Hq5UQCgNAMNGVK3FHkhhUrQuX/85/D734XdzQikq1yogkIwvj1ww4LHZrN\nm2c4sByyfHlo9rnySrj22rijEZFMSEsTkJmVmNnqaAH45PKrzWyxmb1lZrcmlXcys5ej8gVm1iwq\n72JmC81siZmNqm+QAAccAMccA88/n8qrC8MHH4Qz/8GDVfmLSO1qawIaA/RJLjCzE4EzgE7ufiQw\nMipvAjwMXBGV9wZ2RC8bDVzm7kVAkZl95Zh1dfbZMHFiKq/Mf++9Fyr///gP+PWv445GRHJBjQnA\n3WcD66sUDwZucffyaJ81UfkpwAJ3XxiVr3f3SjNrBbRw97Jov3FA31SC7dsXJk+G8vJUXp2/3n03\nVP7XXw/XXBN3NCKSK1LpBC4CepnZXDNLmFnXpHI3s2lm9pqZ7WyEOAhYkfT6lVFZvR18cFioXJPD\n/cuiReEO3xtvhEGD4o5GRHJJKqvtNgH2dfceZtYNmAC0A5oCxwNdga3Ai2b2GvBZfQ4+bNiwL/8u\nLi6muLj4K9svuABKS8MKVoVuwYIwqduIEWGsv4gUhkQiQSKRaPBxah0FZGZtgcnuflT0fCow3N1n\nRs+XAj2AHwGnufvPovLfA9uAR4AZ7t4hKu8P9Hb3r52v1jQKaKdPPoGOHcM4929+s+4fNN/Mnw+n\nnw6jRoWkKCKFK5M3gk0CToretD3QzN3XAs8DR5lZ86hDuDfwtruvAjaaWXczM2BAdIyUtGoFxx4L\nzzyT6hFyX1lZuC/inntU+YtI6mobBloKzAHam9lyMxsIlADtoqGhpcAlEDp9gduBV4DXgdfcfecM\nPkOAB4AlwFJ3b9A9vRdfDI880pAj5K45c+AnP4EHHgijokREUpUzN4Il+/zz0CG8aFG4IigU06dD\nv34wblxo+xcRgTyfC6iqvfaC886DkpK4I8mcSZNC5f/EE6r8RaRx5OQVAIRO0LPOgvffh913T3Ng\nMRs7Fn77W/jb36BLl7ijEZFsU1BXABCmhWjZMv/XCbjjDhg6NNz7oMpfRBpTziYACDc+3X133FGk\nhzvccAOMHg2zZ8Phh8cdkYjkm5xtAgLYtg3atYPnnoOjjkpjYBlWWRmmdJgzJ6yB8J3vxB2RiGSz\ngmsCAthjj1BRjhgRdySN54svwjDXBQtCs48qfxFJl5y+AgDYsAEOPTR0Ch9ySJoCy5D160PH9re/\nDY8+qnUPRKRuCvIKAGCffeDyy+GPf4w7kob54AM47jjo3DkM9VTlLyLplvNXAAD//Ce0bx/azNu3\nT0Ngafbqq3DmmXDddZrOWUTqL9UrgLxIAAB/+hO8+SY8/ngjB5VmkyeHtXvvuy80/4iI1FfBJ4DN\nm8PZ/xNPQM+ejRxYGrjDLbeEYaxPPQU9esQdkYjkqlQTQCrrAWSlPfeEkSPDerivvQZNsviTff45\n/OxnsGJFmNnzoJSWxxERaZic7wRO1q9fWDz+rrvijmTX3nsPfvhD+Na3YOZMVf4iEp+8aQLa6R//\nCE1A//d/2Xf37PPPw4ABYWqHwYPB6n3BJiLydQXfB5DsL3+B+++Hl1+GZs0aIbAGKi8P0zo8/DA8\n9hj06hV3RCKST5QAkrhD377Qpk38zUHvvw/9+8P++4dZPQ84IN54RCT/FOyNYNUxg4cegr//He69\nN54Y3MOqXT16hATwt7+p8heR7JLFY2UaZp99whj7448P00afeWbm3nvZsnB38mefwYsv5tdEdSKS\nP/LyCmCnww6DZ5+FK67IzCLyX3wBt90G3brBqaeGPghV/iKSrWpbFL7EzFZHC8Anl19tZovN7C0z\nuzUqa2tmW83s9ehxT9L+XcxsoZktMbNR6fko1evSJTS/XHkl/O//puc93OHJJ6FDhzB3/8svw7XX\nZve9CCIiNXYCm9kJwOfAOHc/Kio7EfgdcLq7l5vZAe6+xszaApN37lflOGXAL9y9zMymAHe6+7Rq\n9muUTuDqLFsGP/1pWEls1CjYd9+GH7OyMiSX4cPDnci33w4/+lHDjysiUh9p6QR299nA+irFg4Fb\n3L082mdNLYG1Alq4e1lUNA7oW99AG+p734N582DvvUOzzNixUFGR2rE2bQodvEccATfeGCZwmz9f\nlb+I5JZU+gCKgF5mNtfMEmbWNWnb96Lmn4SZHR+VHQSsSNpnZVSWcXvuGZqBJkyAkpJwo9htt8FH\nH9X+2vXrwxj+s86C1q3h6afDPD6vvhruQM73helFJP+k0krdBNjX3XuYWTdgAtAO+Bg42N3Xm9kx\nwCQzO6K+Bx82bNiXfxcXF1NcXJxCiDXr2TNMw/Dyy+FMvmvXMDXDD34QrhT23js072zcGObpX7Qo\nzNvTqxecd15IHo3RhCQikopEIkEikWjwcWq9Eaxq276ZTQWGu/vM6PlSoLu7r6vyuhnAr4FPgOnu\n3iEq7w/0dvdB1bxX2voAalJZCW+/HR4ffhiaeMxCIjjkkHCl0LGjOnVFJDtlcjbQScBJwEwzaw80\nc/d1ZrY/sN7dK8ysHaGp6H1332BmG82sO1AGDADuTOF902a33UK/gIZsikghqTEBmFkp0BvYz8yW\nAzcAJUBJNDR0O3BJtHsv4CYzKwcqgSvdfUO0bQgwFmgOTKluBJCIiGRWXs4FJCJSSDQXkIiI1IsS\ngIhIgVICEBEpUEoAIiIFSglARKRAKQGIiBQoJQARkQKlBCAiUqCUAERECpQSgIhIgVICEBEpUEoA\nIiIFSglARKRAKQGIiBQoJQARkQKlBCAiUqCUAERECpQSgIhIgaoxAZhZiZmtjtb/TS6/2swWm9lb\nZnZrlW1sYFxFAAAD4UlEQVRtzOxzM/t1UlkXM1toZkvMbFTjfgQREUlFbVcAY4A+yQVmdiJwBtDJ\n3Y8ERlZ5ze3As1XKRgOXuXsRUGRmfZC0SyQScYeQN/RdNi59n9mhxgTg7rOB9VWKBwO3uHt5tM+a\nnRvMrC/wPrAoqawV0MLdy6KicUDfhocutdF/ssaj77Jx6fvMDqn0ARQBvcxsrpklzKwrgJntBfwG\nGFZl/4OAFUnPV0ZlIiISoyYpvmZfd+9hZt2ACUA7QsX/Z3ffYmbWiDGKiEg6uHuND6AtsDDp+VSg\nd9LzpcD+wCxgWfRYD6wDhgAtgcVJ+/cH/rKL93I99NBDDz3q/6itLq/ukcoVwCTgJGCmmbUHmrn7\nWqDXzh3MbCiwyd3viZ5vNLPuQBkwALizugO7u64cREQypMYEYGalQG9gPzNbDtwAlAAl0dDQ7cAl\ndXifIcBYoDkwxd2nNSRoERFpOIuaXkREpMDEciewmfUxs3eiG8Ou28U+d0bb3zSzzpmOMVfU9l2a\nWbGZfWZmr0eP38cRZy7Y1Y2PVfbR77KOavs+9dusOzM72MxmmNnb0Q241+xiv/r9PlPpOGjIA9id\n0HHcFmgKvAF0qLLP6YSmIoDuwNxMx5kLjzp+l8XAM3HHmgsP4ASgM0mDHqps1++ycb9P/Tbr/l22\nBI6O/t4LeLcx6s04rgCOBZa6+wcebiYbD5xZZZ8zgIcA3H0esI+ZHZjZMHNCXb5LAHWu14FXf+Nj\nMv0u66EO3yfot1kn7r7K3d+I/v4cWAx8t8pu9f59xpEADgKWJz1fwddvDKtun9ZpjisX1eW7dKBn\ndEk4xcw6Ziy6/KPfZePSbzMFZtaWcGU1r8qmev8+UxkG2lB17XWuemag3uqvq8t3Mh842MMNeqcR\nhvG2T29YeU2/y8aj32Y9RTMuPAn8R3Ql8LVdqjyv8fcZxxXASuDgpOcH89WpIqrbp3VUJl9V63fp\n7pvcfUv091SgqZl9O3Mh5hX9LhuRfpv1Y2ZNgaeAR9x9UjW71Pv3GUcCeJUwI2hbM2sGXAA8U2Wf\nZ4juLzCzHsAGd1+d2TBzQq3fpZkduHNqDjM7ljD095+ZDzUv6HfZiPTbrLvoe3oQWOTud+xit3r/\nPjPeBOTuO8zsF8BzhFEsD7r7YjO7Mtp+r7tPMbPTzWwpsBkYmOk4c0FdvkvgXGCwme0AtgD9Ygs4\nyyXd+Lh/dOPjUMLoKv0uU1Db94l+m/VxHHAxsMDMXo/Kfge0gdR/n7oRTESkQGlJSBGRAqUEICJS\noJQAREQKlBKAiEiBUgIQESlQSgAiIgVKCUBEpEApAYiIFKj/D3SXgS8jUbMGAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10ab0ff28>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot = plt.plot(\n",
" x_axis, \n",
" [i[1] for i in y_axis],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"X-Axis: The indentiy matrix coefficent \n",
"Y-Axis: SSE output of corresponding weight vector\n",
"\n",
"As $\\lambda$ approaches 1, the error rate goes up, possibly indicating that the model is to generalized; while too close to 0 and the model appears to be too complex (overfitting). The best $\\lambda$ value appears to be approximately .21."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 6"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"modifier = 10*identity\n",
"output_w = data_frame_invert(d.T.dot(d) + modifier).dot(d.T.dot(y))\n",
"output_w = pandas.Series(output_w, index=d.columns)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"CRIM -0.098055\n",
"ZN 0.051645\n",
"INDUS -0.021914\n",
"CHAS 2.497114\n",
"NOX 0.009656\n",
"RM 5.449644\n",
"AGE 0.000539\n",
"DIS -1.018682\n",
"RAD 0.238522\n",
"TAX -0.013011\n",
"PTRATIO -0.403960\n",
"B 0.016906\n",
"LSTAT -0.514954\n",
"dummy 2.732832\n",
"dtype: float64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output_w"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The weight values get weaker (closer to 0) in general proportial to what they were before."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In problem 6) we observed that the optimal w (for the modified objective) decreases in length as lambda increases. In fact one can show using matrix norms that the expression for this optimal w (given in prob 5) has length that goes as (is asymptotic to) $\\lambda^{-1}$. Therefore this optimal w goes to zero as lambda goes to infinity.\n",
"\n",
"We can also see suggestions of this from the modified objective function itself without explicitly solving for the optimal w as a function of lambda. Generally speaking this modified objective function penalizes for large w lengths with this penalty increasing in severity as lambda increases (holding w, x, and y constant). More precisely the partial derivative with respect to lambda is given as $|w|^2$. This implies that objective function values at w's other than the optimal w have no chance of becoming minimal at larger lambda values except for when they do not exceed the current optimal w in length. Indeed the w must get shorter as we see from the explicit form in problem 5)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment