jaceklaskowski/machine-learning.md

## machine-learning.md

      
    Raw
  

              machine-learning.md
            
          
    How much of machine learning is statistics and vice versa?
Learning using https://www.coursera.org/learn/machine-learning/home/welcome

machine learning = teaching a computer to learn concepts using data — without being explicitly programmed.
Supervised learning = "right answers" given
Regression problem

continuous valued output
deduce the function for a given data set and predict other values
"in regression problems, we are taking input variables and trying to map the output onto a continuous expected result function."
Univariate linear regression is used when you want to predict a single output value from a single input value.
From https://en.wikipedia.org/wiki/Regression: Regression analysis is a statistical technique for estimating the relationships among variables.


Classification problem

discrete valued output
From https://en.wikipedia.org/wiki/Statistical_classification: the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.


Feature = attribute

used to predict others


Unsupervised learning = all data is without labels or have the same label
Unstructured data
Can you find some structure in the dataset?
An Unsupervised Learning algorithm can group data into partitions called clusters
We're not telling the algorithms about structure of the data set or right answers/examples = we know nothing in advance
Clustering algorithm
Cocktail party algorithm
The Hypothesis Function
measure the accuracy of our hypothesis function by using a cost function.
training examples

Gradient Descent (multiple variables)


an iterative algorithm of linear regression
minimize the cost function $J(\theta)$
partial derivative - what's that? Why is this important for the algorithm?
$\alpha$ - learning rate
feature scaling = converge quicker

Normal Equation


Works well for smaller set of features, e.g. n < 10^5