oskarth/gist:3469833

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Machine learning gives computers the ability to learn without being explicitly programmed.
Linear Regression

Linear regression is a simple model to find best fit for some data.

Example: You have a data set with commute time, sleep time, salary and happiness (1-10). For the three features - commute time, sleep time and salary - you want to predict happiness level.
Cost function and finding a good fit

How well does this line predict the data? Cost function (essentially sum of deltas between a line at a point and the real point) which we want to minimize. Once function converges we know we have a good fit. How do we find it? Gradient descent.
Topographic map as a contour plot

The lines in this picture represent certain heights. Imagine we are walking up one of these hills by each step taking the steepest path (the places where lines are closest to each other). Thats essentially gradient ascent (descent if opposite).

Proportionality and feature scaling

Going back to our example: commute time on scale of 10m-2h, salary in 5-6 figure range - obviously not proportional. Feature scaling is a way of normalizing our values.
Using  x' = (x - min) / (max - min), with x=15, min=0, max=120 we get x' = (15 - 0) / (120 - 0) = 0.125
Convergence speed

Feature scaling helps with convergence speed. Without proportionality we get uneven descent.
Proportional


Not proportional


3D intuition