Skip to content

Instantly share code, notes, and snippets.

@arturmkrtchyan
Last active October 13, 2016 16:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arturmkrtchyan/f35e0f401ca04bf8728fa096f7efbfa3 to your computer and use it in GitHub Desktop.
Save arturmkrtchyan/f35e0f401ca04bf8728fa096f7efbfa3 to your computer and use it in GitHub Desktop.
Machine Learning Introduction

Machine Learning Algorithm families

Supervised Learning (e.g. classification, anomaly detection, regression)

  • Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time.
  • A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data.
  • Example problems are classification and regression.
  • Example algorithms include Logistic Regression and the Back Propagation Neural Network.

Unsupervised Learning (e.g. clustering and dimensionality reduction)

  • Input data is not labelled and does not have a known result.
  • A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity.
  • Example problems are clustering, dimensionality reduction and association rule learning.
  • Example algorithms include: the Apriori algorithm and k-Means.

Algorithms Grouped By Similarity

Algorithms are often grouped by similarity in terms of their function (how they work). For example, tree-based methods, and neural network inspired methods.

Regression Algorithms

Regression is concerned with modelling the relationship between variables that is iteratively refined using a measure of error in the predictions made by the model.

The most popular regression algorithms are:

  • Ordinary Least Squares Regression (OLSR)
  • Linear Regression
  • Logistic Regression
  • Stepwise Regression
  • Multivariate Adaptive Regression Splines (MARS)
  • Locally Estimated Scatterplot Smoothing (LOESS)

Instance-based Algorithms

Instance-based learning model a decision problem with instances or examples of training data that are deemed important or required to the model.

The most popular instance-based algorithms are:

  • k-Nearest Neighbour (kNN)
  • Learning Vector Quantization (LVQ)
  • Self-Organizing Map (SOM)
  • Locally Weighted Learning (LWL)

Decision Tree Algorithms

Decision tree methods construct a model of decisions made based on actual values of attributes in the data.

The most popular decision tree algorithms are:

  • Classification and Regression Tree (CART)
  • Iterative Dichotomiser 3 (ID3)
  • C4.5 and C5.0 (different versions of a powerful approach)
  • Chi-squared Automatic Interaction Detection (CHAID)
  • Decision Stump
  • M5
  • Conditional Decision Trees

Bayesian Algorithms

Bayesian methods are those that explicitly apply Bayes’ Theorem for problems such as classification and regression.

The most popular Bayesian algorithms are:

  • Naive Bayes
  • Gaussian Naive Bayes
  • Multinomial Naive Bayes
  • Averaged One-Dependence Estimators (AODE)
  • Bayesian Belief Network (BBN)
  • Bayesian Network (BN)

Clustering Algorithms

Clustering Algorithms Clustering, like regression, describes the class of problem and the class of methods.

The most popular clustering algorithms are:

  • k-Means
  • k-Medians
  • Expectation Maximisation (EM)
  • Hierarchical Clustering

Deep Learning Algorithms

Deep Learning methods are a modern update to Artificial Neural Networks that exploit abundant cheap computation.

They are concerned with building much larger and more complex neural networks, and as commented above, many methods are concerned with semi-supervised learning problems where large datasets contain very little labelled data.

The most popular deep learning algorithms are:

  • Deep Boltzmann Machine (DBM)
  • Deep Belief Networks (DBN)
  • Convolutional Neural Network (CNN)
  • Stacked Auto-Encoders

10 most influential algorithms

  • C4.5 (decision tree)
  • k-means (clustering)
  • Support vector machines (next to C4.5, a classifier to try out first)
  • Apriori (association rule learning --> recommendation engine)
  • EM (i.e. expectation-maximization for clustering)
  • PageRank (network analysis; think of the PageRank in Google's search engine)
  • AdaBoost (boosting, and thus an ensemble learning algorithm; taking in and combining multiple learning algorithm)
  • kNN (aka k-Nearest Neighbors, thus classification)
  • Naive Bayes (family of classification algorithms assuming that all features is independent of each other)
  • CART (aka classification and regression trees, thus a classifier)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment