hughrawlinson/linearRegression.js

## linearRegression.js
// This may or may not help you understand what's going
// on in week one of the Stanford Machine Learning course
// particulary if you're coming from Javascript programming.

// The lecture deals with linear regression. You may remember
// a concept called "line of best fit" from middle school.
// A refresher: you have a set of points with x and y co-ordinates
// and you have to draw a straight line through them. It doesn't
// have to intersect all (or any) of the points, but it does need
// to represent the general layout of your dataset. The idea is
// that this will let you make guesses for the y position of a
// new point when you only know the x or vice versa. It often
// won't be 100% accurate, but it'll give you a good idea, and a
// good understanding of the range of possible error it could have
// made.

// A linear regression algorithm is just a way to make a computer
// figure out what the line of best fit for a given dataset is.
// The linear regression algorithm I describe below comes from
// the Stanford Machine Learning course on Coursera. I found it
// to be a bit too mathsy, so I'm writing it out in code, which
// may help some people understand.

// Linear regression algorithms return a function that takes an
// x and returns a guess for the corresponding y value, based on
// the *specific* dataset you trained against. This function is
// called a "model". There are many other kinds of models for
// various other kinds of machine learning algorithms, but because
// we're focusing on linear regression, this model will be the
// function of a line. You may remember this as `y=mx+c` where
// `m` is the slope of the line, and c is the horizontal offset of
// the line, which you can think about as 'the value of the equation
// when x is 0'. In the course, this is written as hθ1=θ1+θ2*x.

(()=>{

  // You'll need to make lots of linear functions, so
  // lets make that easy.
  function prepareLinearFunction (theta1, theta2) {
    return x => theta1 + theta2 * x;
  }

  // We're assuming in this example that your dataset contains
  // two columns, and therefore we have two thetas. You could
  // generalise the above function to take more columns into
  // account, but for now, lets focus on just two columns.

  // Lets define a dataset to learn from. It is possible to
  // accurately model this dataset with a line, so we should
  // end up without any errors.
  const dataset = [
    [0,1],
    [1,3],
    [2,5],
    [3,7]
  ];
  // You'll notice that the 0th element of each inner array is
  // the index of that element. I could just use the index,
  // but I wanted to make it more obvious.

  // Next you'll need a "cost function", which is just a
  // fancy name for a function that measures how well a
  // model is performing at it's task. To calculate this,
  // the function needs both of the current thetas, and
  // the dataset it's training against
  function costFunction(theta1, theta2, dataset) {
    // Here we generate our model using the function we
    // wrote earlier, giving it our current theta values.
    const model = prepareLinearFunction(theta1,theta2);

    // We apply a procedure to our model for each row
    // in the dataset.
    return dataset.map((x)=>{
      // First we get the difference between what the
      // data row says about the mapping and what our
      // model says.
      const difference = model(x[0])-x[1];

      // We then return the square of that difference.
      return Math.pow(difference,2);
    }) // Now we have an array of the square of the difference
    // between each of the examples and what we predicted for
    // their x value. The following line sums that array, which
    // we divide by the length of the dataset, and return.
    .reduce((x,y)=>{return x+y},0)/dataset.length*2;
  }

  // Ok, now we have a cost function. Next up, we implement
  // "gradient descent", which is just a prodedure for modifying
  // our theta values in a way that will decrease the overall "cost"
  // (otherwise known as "error") of our model.

  function gradientDescent(learningRate, thetaOne, thetaTwo, dataset) {
    // First, we calculate the error of the model that our current
    // values of theta creates
    const error = costFunction(thetaOne, thetaTwo, dataset);

    // Then we use the error, the learning rate to update each theta
    // thetas must be updated simultaneously - don't get into a
    // situation where you update one and you use that new theta
    // to update your next theta. It would probably work, but it's
    // not proper gradient descent.

    // The gradient descent formula works roughly as follows:
    // θj = θj - learningRate * derivitave * error
    // The thetas need to be changed simultaneously - that is to say
    // that you shouldn't update one theta before updating another,
    // because updating one theta will affect the calculation of the
    // next.

    // In the lecture, we don't learn how to calculate the derivative
    // of the error function. We are given the merged derivative and
    // cost functions. You can see that the following functions that
    // update the next thetas look a little bit like the error function
    // with some subtle differences - that's because they are what
    // you get when you merge the derivative and error functions.

    const model = prepareLinearFunction(thetaOne,thetaTwo);

    // Calculate the new theta1
    const a = dataset.map((x) => {
      return model(x[0])-x[1];
    }).reduce((x,y) => {return x + y}, 0)/dataset.length;

    // Calculate the new theta2
    const b = dataset.map((x) => {
      return (model(x[0])-x[1])*x[0];
    }).reduce((x,y) => {return x + y}, 0)/dataset.length;

    const newThetas = [
      thetaOne - learningRate * a,
      thetaTwo - learningRate * b
    ];

    // Return the error as well as the new thetas so that we can
    // keep all our logging in the same main function.
    return { newThetas, error };
  }

  // Lets run through the whole training process
  (function linearRegress(dataset){
    // Initialize thetas to zero. It's not important that
    // they're zero, they could start out as any number,
    // but zero works just fine.
    const initialThetaOne = 0;
    const initialThetaTwo = 0;
    const learningRate = 0.2;

    // Create a named function so that you can recurse
    // through the values of theta that gradient descent gives
    // you until you converge.
    (function reg(learningRate, thetaOne, thetaTwo, dataset) {
      console.log(`Current thetas: ${thetaOne}, ${thetaTwo}`);

      // Run gradient descent
      const {error,newThetas} = gradientDescent(learningRate, thetaOne, thetaTwo, dataset);

      console.log(`Those thetas gave us an error of ${error}.`);

      if (error < 0.01) {
        // Our error is acceptably small! That means our thetas have
        // "converged", or "settled" on values that make sense
        // for the dataset. Note that because our dataset fits
        // perfectly on a line, we can check that our thetas
        // match exactly. However, we accept a little bit of error,
        // because when the numbers are this small we're going to have
        // some floating point rounding error - and we won't be able
        // to settle on the correct thetas.
        console.log(`Convergeance! Line of best fit is: y = ${thetaOne} + ${thetaTwo}x.`);
        process.exit();
      }

      // Recursively call this function with updated thetas until
      // convergeance happens and the condition above is met
      console.log("Running gradient descent again to reduce the error.")
      reg(learningRate, newThetas[0], newThetas[1], dataset);
    })(learningRate, initialThetaOne, initialThetaTwo, dataset);
  })(dataset);
})();

// Example output:
// Current thetas: 0, 0
// Those thetas gave us an error of 42.
// Running gradient descent again to reduce the error.
// Current thetas: 0.8, 1.7000000000000002
// Those thetas gave us an error of 1.0699999999999994.
// Running gradient descent again to reduce the error.
// Current thetas: 0.93, 1.9700000000000002
// Those thetas gave us an error of 0.02869999999999997.
// Running gradient descent again to reduce the error.
// Current thetas: 0.9530000000000001, 2.012
// Those thetas gave us an error of 0.002041999999999996.
// Convergeance! Line of best fit is: y = 0.9530000000000001 + 2.012x.

// You can pretty clearly see how close that line is to y = 1 + 2x, which
// is the equation I used to generate the dataset.

// Hopefully you now understand linear regression a little better than
// you did previously. If you enjoyed this post, please let me know, and
// I'll consider continuing to write up more for the rest of the course.
// The course includes things like multi-variate linear and non-linear
// regression, including polynomial regression and neural networks, so
// there should be fun stuff for all. I'm @hughrawlinson on twitter,
// let me know if you'd like me to write those posts.
	// This may or may not help you understand what's going
	// on in week one of the Stanford Machine Learning course
	// particulary if you're coming from Javascript programming.

	// The lecture deals with linear regression. You may remember
	// a concept called "line of best fit" from middle school.
	// A refresher: you have a set of points with x and y co-ordinates
	// and you have to draw a straight line through them. It doesn't
	// have to intersect all (or any) of the points, but it does need
	// to represent the general layout of your dataset. The idea is
	// that this will let you make guesses for the y position of a
	// new point when you only know the x or vice versa. It often
	// won't be 100% accurate, but it'll give you a good idea, and a
	// good understanding of the range of possible error it could have
	// made.

	// A linear regression algorithm is just a way to make a computer
	// figure out what the line of best fit for a given dataset is.
	// The linear regression algorithm I describe below comes from
	// the Stanford Machine Learning course on Coursera. I found it
	// to be a bit too mathsy, so I'm writing it out in code, which
	// may help some people understand.

	// Linear regression algorithms return a function that takes an
	// x and returns a guess for the corresponding y value, based on
	// the specific dataset you trained against. This function is
	// called a "model". There are many other kinds of models for
	// various other kinds of machine learning algorithms, but because
	// we're focusing on linear regression, this model will be the
	// function of a line. You may remember this as `y=mx+c` where
	// `m` is the slope of the line, and c is the horizontal offset of
	// the line, which you can think about as 'the value of the equation
	// when x is 0'. In the course, this is written as hθ1=θ1+θ2*x.

	(()=>{

	// You'll need to make lots of linear functions, so
	// lets make that easy.
	function prepareLinearFunction (theta1, theta2) {
	return x => theta1 + theta2 * x;
	}

	// We're assuming in this example that your dataset contains
	// two columns, and therefore we have two thetas. You could
	// generalise the above function to take more columns into
	// account, but for now, lets focus on just two columns.

	// Lets define a dataset to learn from. It is possible to
	// accurately model this dataset with a line, so we should
	// end up without any errors.
	const dataset = [
	[0,1],
	[1,3],
	[2,5],
	[3,7]
	];
	// You'll notice that the 0th element of each inner array is
	// the index of that element. I could just use the index,
	// but I wanted to make it more obvious.

	// Next you'll need a "cost function", which is just a
	// fancy name for a function that measures how well a
	// model is performing at it's task. To calculate this,
	// the function needs both of the current thetas, and
	// the dataset it's training against
	function costFunction(theta1, theta2, dataset) {
	// Here we generate our model using the function we
	// wrote earlier, giving it our current theta values.
	const model = prepareLinearFunction(theta1,theta2);

	// We apply a procedure to our model for each row
	// in the dataset.
	return dataset.map((x)=>{
	// First we get the difference between what the
	// data row says about the mapping and what our
	// model says.
	const difference = model(x[0])-x[1];

	// We then return the square of that difference.
	return Math.pow(difference,2);
	}) // Now we have an array of the square of the difference
	// between each of the examples and what we predicted for
	// their x value. The following line sums that array, which
	// we divide by the length of the dataset, and return.
	.reduce((x,y)=>{return x+y},0)/dataset.length*2;
	}

	// Ok, now we have a cost function. Next up, we implement
	// "gradient descent", which is just a prodedure for modifying
	// our theta values in a way that will decrease the overall "cost"
	// (otherwise known as "error") of our model.

	function gradientDescent(learningRate, thetaOne, thetaTwo, dataset) {
	// First, we calculate the error of the model that our current
	// values of theta creates
	const error = costFunction(thetaOne, thetaTwo, dataset);

	// Then we use the error, the learning rate to update each theta
	// thetas must be updated simultaneously - don't get into a
	// situation where you update one and you use that new theta
	// to update your next theta. It would probably work, but it's
	// not proper gradient descent.

	// The gradient descent formula works roughly as follows:
	// θj = θj - learningRate * derivitave * error
	// The thetas need to be changed simultaneously - that is to say
	// that you shouldn't update one theta before updating another,
	// because updating one theta will affect the calculation of the
	// next.

	// In the lecture, we don't learn how to calculate the derivative
	// of the error function. We are given the merged derivative and
	// cost functions. You can see that the following functions that
	// update the next thetas look a little bit like the error function
	// with some subtle differences - that's because they are what
	// you get when you merge the derivative and error functions.

	const model = prepareLinearFunction(thetaOne,thetaTwo);

	// Calculate the new theta1
	const a = dataset.map((x) => {
	return model(x[0])-x[1];
	}).reduce((x,y) => {return x + y}, 0)/dataset.length;

	// Calculate the new theta2
	const b = dataset.map((x) => {
	return (model(x[0])-x[1])*x[0];
	}).reduce((x,y) => {return x + y}, 0)/dataset.length;

	const newThetas = [
	thetaOne - learningRate * a,
	thetaTwo - learningRate * b
	];

	// Return the error as well as the new thetas so that we can
	// keep all our logging in the same main function.
	return { newThetas, error };
	}

	// Lets run through the whole training process
	(function linearRegress(dataset){
	// Initialize thetas to zero. It's not important that
	// they're zero, they could start out as any number,
	// but zero works just fine.
	const initialThetaOne = 0;
	const initialThetaTwo = 0;
	const learningRate = 0.2;

	// Create a named function so that you can recurse
	// through the values of theta that gradient descent gives
	// you until you converge.
	(function reg(learningRate, thetaOne, thetaTwo, dataset) {
	console.log(`Current thetas: ${thetaOne}, ${thetaTwo}`);

	// Run gradient descent
	const {error,newThetas} = gradientDescent(learningRate, thetaOne, thetaTwo, dataset);

	console.log(`Those thetas gave us an error of ${error}.`);

	if (error < 0.01) {
	// Our error is acceptably small! That means our thetas have
	// "converged", or "settled" on values that make sense
	// for the dataset. Note that because our dataset fits
	// perfectly on a line, we can check that our thetas
	// match exactly. However, we accept a little bit of error,
	// because when the numbers are this small we're going to have
	// some floating point rounding error - and we won't be able
	// to settle on the correct thetas.
	console.log(`Convergeance! Line of best fit is: y = ${thetaOne} + ${thetaTwo}x.`);
	process.exit();
	}

	// Recursively call this function with updated thetas until
	// convergeance happens and the condition above is met
	console.log("Running gradient descent again to reduce the error.")
	reg(learningRate, newThetas[0], newThetas[1], dataset);
	})(learningRate, initialThetaOne, initialThetaTwo, dataset);
	})(dataset);
	})();

	// Example output:
	// Current thetas: 0, 0
	// Those thetas gave us an error of 42.
	// Running gradient descent again to reduce the error.
	// Current thetas: 0.8, 1.7000000000000002
	// Those thetas gave us an error of 1.0699999999999994.
	// Running gradient descent again to reduce the error.
	// Current thetas: 0.93, 1.9700000000000002
	// Those thetas gave us an error of 0.02869999999999997.
	// Running gradient descent again to reduce the error.
	// Current thetas: 0.9530000000000001, 2.012
	// Those thetas gave us an error of 0.002041999999999996.
	// Convergeance! Line of best fit is: y = 0.9530000000000001 + 2.012x.

	// You can pretty clearly see how close that line is to y = 1 + 2x, which
	// is the equation I used to generate the dataset.

	// Hopefully you now understand linear regression a little better than
	// you did previously. If you enjoyed this post, please let me know, and
	// I'll consider continuing to write up more for the rest of the course.
	// The course includes things like multi-variate linear and non-linear
	// regression, including polynomial regression and neural networks, so
	// there should be fun stuff for all. I'm @hughrawlinson on twitter,
	// let me know if you'd like me to write those posts.