Last active
August 29, 2016 08:59
-
-
Save hughrawlinson/d1201bed7fb4c608dd28321683cd8060 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// This may or may not help you understand what's going | |
// on in week one of the Stanford Machine Learning course | |
// particulary if you're coming from Javascript programming. | |
// The lecture deals with linear regression. You may remember | |
// a concept called "line of best fit" from middle school. | |
// A refresher: you have a set of points with x and y co-ordinates | |
// and you have to draw a straight line through them. It doesn't | |
// have to intersect all (or any) of the points, but it does need | |
// to represent the general layout of your dataset. The idea is | |
// that this will let you make guesses for the y position of a | |
// new point when you only know the x or vice versa. It often | |
// won't be 100% accurate, but it'll give you a good idea, and a | |
// good understanding of the range of possible error it could have | |
// made. | |
// A linear regression algorithm is just a way to make a computer | |
// figure out what the line of best fit for a given dataset is. | |
// The linear regression algorithm I describe below comes from | |
// the Stanford Machine Learning course on Coursera. I found it | |
// to be a bit too mathsy, so I'm writing it out in code, which | |
// may help some people understand. | |
// Linear regression algorithms return a function that takes an | |
// x and returns a guess for the corresponding y value, based on | |
// the *specific* dataset you trained against. This function is | |
// called a "model". There are many other kinds of models for | |
// various other kinds of machine learning algorithms, but because | |
// we're focusing on linear regression, this model will be the | |
// function of a line. You may remember this as `y=mx+c` where | |
// `m` is the slope of the line, and c is the horizontal offset of | |
// the line, which you can think about as 'the value of the equation | |
// when x is 0'. In the course, this is written as hθ1=θ1+θ2*x. | |
(()=>{ | |
// You'll need to make lots of linear functions, so | |
// lets make that easy. | |
function prepareLinearFunction (theta1, theta2) { | |
return x => theta1 + theta2 * x; | |
} | |
// We're assuming in this example that your dataset contains | |
// two columns, and therefore we have two thetas. You could | |
// generalise the above function to take more columns into | |
// account, but for now, lets focus on just two columns. | |
// Lets define a dataset to learn from. It is possible to | |
// accurately model this dataset with a line, so we should | |
// end up without any errors. | |
const dataset = [ | |
[0,1], | |
[1,3], | |
[2,5], | |
[3,7] | |
]; | |
// You'll notice that the 0th element of each inner array is | |
// the index of that element. I could just use the index, | |
// but I wanted to make it more obvious. | |
// Next you'll need a "cost function", which is just a | |
// fancy name for a function that measures how well a | |
// model is performing at it's task. To calculate this, | |
// the function needs both of the current thetas, and | |
// the dataset it's training against | |
function costFunction(theta1, theta2, dataset) { | |
// Here we generate our model using the function we | |
// wrote earlier, giving it our current theta values. | |
const model = prepareLinearFunction(theta1,theta2); | |
// We apply a procedure to our model for each row | |
// in the dataset. | |
return dataset.map((x)=>{ | |
// First we get the difference between what the | |
// data row says about the mapping and what our | |
// model says. | |
const difference = model(x[0])-x[1]; | |
// We then return the square of that difference. | |
return Math.pow(difference,2); | |
}) // Now we have an array of the square of the difference | |
// between each of the examples and what we predicted for | |
// their x value. The following line sums that array, which | |
// we divide by the length of the dataset, and return. | |
.reduce((x,y)=>{return x+y},0)/dataset.length*2; | |
} | |
// Ok, now we have a cost function. Next up, we implement | |
// "gradient descent", which is just a prodedure for modifying | |
// our theta values in a way that will decrease the overall "cost" | |
// (otherwise known as "error") of our model. | |
function gradientDescent(learningRate, thetaOne, thetaTwo, dataset) { | |
// First, we calculate the error of the model that our current | |
// values of theta creates | |
const error = costFunction(thetaOne, thetaTwo, dataset); | |
// Then we use the error, the learning rate to update each theta | |
// thetas must be updated simultaneously - don't get into a | |
// situation where you update one and you use that new theta | |
// to update your next theta. It would probably work, but it's | |
// not proper gradient descent. | |
// The gradient descent formula works roughly as follows: | |
// θj = θj - learningRate * derivitave * error | |
// The thetas need to be changed simultaneously - that is to say | |
// that you shouldn't update one theta before updating another, | |
// because updating one theta will affect the calculation of the | |
// next. | |
// In the lecture, we don't learn how to calculate the derivative | |
// of the error function. We are given the merged derivative and | |
// cost functions. You can see that the following functions that | |
// update the next thetas look a little bit like the error function | |
// with some subtle differences - that's because they are what | |
// you get when you merge the derivative and error functions. | |
const model = prepareLinearFunction(thetaOne,thetaTwo); | |
// Calculate the new theta1 | |
const a = dataset.map((x) => { | |
return model(x[0])-x[1]; | |
}).reduce((x,y) => {return x + y}, 0)/dataset.length; | |
// Calculate the new theta2 | |
const b = dataset.map((x) => { | |
return (model(x[0])-x[1])*x[0]; | |
}).reduce((x,y) => {return x + y}, 0)/dataset.length; | |
const newThetas = [ | |
thetaOne - learningRate * a, | |
thetaTwo - learningRate * b | |
]; | |
// Return the error as well as the new thetas so that we can | |
// keep all our logging in the same main function. | |
return { newThetas, error }; | |
} | |
// Lets run through the whole training process | |
(function linearRegress(dataset){ | |
// Initialize thetas to zero. It's not important that | |
// they're zero, they could start out as any number, | |
// but zero works just fine. | |
const initialThetaOne = 0; | |
const initialThetaTwo = 0; | |
const learningRate = 0.2; | |
// Create a named function so that you can recurse | |
// through the values of theta that gradient descent gives | |
// you until you converge. | |
(function reg(learningRate, thetaOne, thetaTwo, dataset) { | |
console.log(`Current thetas: ${thetaOne}, ${thetaTwo}`); | |
// Run gradient descent | |
const {error,newThetas} = gradientDescent(learningRate, thetaOne, thetaTwo, dataset); | |
console.log(`Those thetas gave us an error of ${error}.`); | |
if (error < 0.01) { | |
// Our error is acceptably small! That means our thetas have | |
// "converged", or "settled" on values that make sense | |
// for the dataset. Note that because our dataset fits | |
// perfectly on a line, we can check that our thetas | |
// match exactly. However, we accept a little bit of error, | |
// because when the numbers are this small we're going to have | |
// some floating point rounding error - and we won't be able | |
// to settle on the correct thetas. | |
console.log(`Convergeance! Line of best fit is: y = ${thetaOne} + ${thetaTwo}x.`); | |
process.exit(); | |
} | |
// Recursively call this function with updated thetas until | |
// convergeance happens and the condition above is met | |
console.log("Running gradient descent again to reduce the error.") | |
reg(learningRate, newThetas[0], newThetas[1], dataset); | |
})(learningRate, initialThetaOne, initialThetaTwo, dataset); | |
})(dataset); | |
})(); | |
// Example output: | |
// Current thetas: 0, 0 | |
// Those thetas gave us an error of 42. | |
// Running gradient descent again to reduce the error. | |
// Current thetas: 0.8, 1.7000000000000002 | |
// Those thetas gave us an error of 1.0699999999999994. | |
// Running gradient descent again to reduce the error. | |
// Current thetas: 0.93, 1.9700000000000002 | |
// Those thetas gave us an error of 0.02869999999999997. | |
// Running gradient descent again to reduce the error. | |
// Current thetas: 0.9530000000000001, 2.012 | |
// Those thetas gave us an error of 0.002041999999999996. | |
// Convergeance! Line of best fit is: y = 0.9530000000000001 + 2.012x. | |
// You can pretty clearly see how close that line is to y = 1 + 2x, which | |
// is the equation I used to generate the dataset. | |
// Hopefully you now understand linear regression a little better than | |
// you did previously. If you enjoyed this post, please let me know, and | |
// I'll consider continuing to write up more for the rest of the course. | |
// The course includes things like multi-variate linear and non-linear | |
// regression, including polynomial regression and neural networks, so | |
// there should be fun stuff for all. I'm @hughrawlinson on twitter, | |
// let me know if you'd like me to write those posts. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment