Skip to content

Instantly share code, notes, and snippets.

@jaceklaskowski
Last active June 21, 2020 07:10
Show Gist options
  • Save jaceklaskowski/36934dabb1e8df611856 to your computer and use it in GitHub Desktop.
Save jaceklaskowski/36934dabb1e8df611856 to your computer and use it in GitHub Desktop.
Machine Learning for the very Impatient

How much of machine learning is statistics and vice versa?

Learning using https://www.coursera.org/learn/machine-learning/home/welcome

  • machine learning = teaching a computer to learn concepts using data — without being explicitly programmed.
  • Supervised learning = "right answers" given
  • Regression problem
    • continuous valued output
    • deduce the function for a given data set and predict other values
    • "in regression problems, we are taking input variables and trying to map the output onto a continuous expected result function."
    • Univariate linear regression is used when you want to predict a single output value from a single input value.
    • From https://en.wikipedia.org/wiki/Regression: Regression analysis is a statistical technique for estimating the relationships among variables.
  • Classification problem
    • discrete valued output
    • From https://en.wikipedia.org/wiki/Statistical_classification: the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
  • Feature = attribute
    • used to predict others
  • Unsupervised learning = all data is without labels or have the same label
  • Unstructured data
  • Can you find some structure in the dataset?
  • An Unsupervised Learning algorithm can group data into partitions called clusters
  • We're not telling the algorithms about structure of the data set or right answers/examples = we know nothing in advance
  • Clustering algorithm
  • Cocktail party algorithm
  • The Hypothesis Function
  • measure the accuracy of our hypothesis function by using a cost function.
  • training examples

Gradient Descent (multiple variables)

  • an iterative algorithm of linear regression
  • minimize the cost function $J(\theta)$
  • partial derivative - what's that? Why is this important for the algorithm?
  • $\alpha$ - learning rate
  • feature scaling = converge quicker

Normal Equation

  • Works well for smaller set of features, e.g. n < 10^5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment