Skip to content

Instantly share code, notes, and snippets.

@oskarth
Created August 25, 2012 19:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save oskarth/3469833 to your computer and use it in GitHub Desktop.
Save oskarth/3469833 to your computer and use it in GitHub Desktop.
Intro to gradient descent, and why feature scaling leads to better convergence

Machine learning gives computers the ability to learn without being explicitly programmed.

Linear Regression

Linear regression is a simple model to find best fit for some data.

Example: You have a data set with commute time, sleep time, salary and happiness (1-10). For the three features - commute time, sleep time and salary - you want to predict happiness level.

Cost function and finding a good fit

How well does this line predict the data? Cost function (essentially sum of deltas between a line at a point and the real point) which we want to minimize. Once function converges we know we have a good fit. How do we find it? Gradient descent.

Topographic map as a contour plot

The lines in this picture represent certain heights. Imagine we are walking up one of these hills by each step taking the steepest path (the places where lines are closest to each other). Thats essentially gradient ascent (descent if opposite).

Proportionality and feature scaling

Going back to our example: commute time on scale of 10m-2h, salary in 5-6 figure range - obviously not proportional. Feature scaling is a way of normalizing our values.

Using x' = (x - min) / (max - min), with x=15, min=0, max=120 we get x' = (15 - 0) / (120 - 0) = 0.125

Convergence speed

Feature scaling helps with convergence speed. Without proportionality we get uneven descent.

Proportional

Not proportional

3D intuition

Copy link

ghost commented Jun 5, 2018

PERFECT!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment