edcote/oreilly_hands_on_ml.md

## oreilly_hands_on_ml.md

      
    Raw
  

              oreilly_hands_on_ml.md
            
          
    Link to book: https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291
Chapter 1

ML is the field of study that gives computers the ability to learn without being explicitly programmed.
A spam filter based on ML techniques automatically learns which words and phrases are god predictors of spam by blocking unusually frequent pattern of words.
A second example where ML shines is for problems that are either too complex or have no known algorithm; speech recognition, for example.
Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparaent.
Types of ML systems:

Whether they are trained with human supervision (supervised, unsupervised, semisupervise, reinforcemnet learning)
Whether or not they can learn incrementally on the fly (online vs. batch learning)
Whether they work by comparing new data points to known data points or detect patterns in the training data and build a predictive model (instance- vs. model based)

With supervised learning, the training data you feed includes the desired solutions. A typical supervised learning task is classification. Another task is to predict a numerical value (price of a item given set of features). This is called regression.
In ML, an attribute is a data type (i.e., Mileage) while a feature generally means attribute + its value.
Examples of supervised ML algorithms:

k-Nearest Neighbors
Linear Regression
Logistic Regression
Support Vector Machines
Decision Trees and Random Forests
Neural Networks (*)

With unsupervised learning, the training data is unlabeled. The system tries to learn without a teacher. Here are some common algorithms:

Clustering (k-Means, Hierarchical Cluster Analysis, etc.)
Visualization and dimensionality reduction (Principal component analysis, etc.

A clustering algorithm can be used to detect groups of similar log entries. At no point do tell what algorithm an entry or group of entries belong to. Hierarchical clustering algorithms will subdivide each group into smaller groups.
Dimensionality reductions simplifies the data without losing too much information.
Semisupervised learning can deal with partially labeled data. Google Photos is a good example. First it recognizes the different people then it asks you to identify the different people.
In reinforcement learning, an agent observes the environment, selects and performs actions and gets rewards in return. It must then learn by itself what is the best strategy (policy) to get the most reward over time.
In batch learning, the system is incapable of learning incrementally. It must be trained using all the available data.
In online learning, the system is trained by incrementally feeding it data in instances of mini-batches.
Instance-based learning is to learn by heart. The system remembers the "bad" e-mails and uses a measure of similarity to the known-to-be bad e-mails to identify whether e-mail is spam or not.
Model-based learning uses a model to make a prediction.
Challenges of ML:

Insufficient quantity of training data
Nonrepresentative training data
Poor quality data
Irrelevant features
Overfitting (means model performs well on the training data, but goes not generalize well)
Underfitting the training data, opposite of the above

Overfitting happens when the model is too complex relative to the amount of noisiness of training data. Possible solutions are:

Simply the model (i.e., linear model rather than high-degress polynomial)
Gather more training data
Reduce noise in training data (fix data errors, remove outliers)

After training a model, it is important to validate how well it works. Divide into training and test set. Error rate on new cases is called generalization error. If training error is low (i.e, model makes few mistakes on the training set) but the generalization error is high, it means you are overfitting.
Chapter 2

This chapter serves as an end-to-end exercise to predict the median housing price in any district. There is a ML project checklist in Appendix B (check that out).
You need to select a performance measure. A typical measure for regression problems (not classification) is root mean square error (RMSE), to give higher weight for large errors.
If there are many outlier features, mean absolute error may be preferred.
Compute standard correlation coefficient (Pearson's r)  between every pair of attributes. Only if data set is not too large. Value is [-1:1] 1 means strong correlation.
Another way to evaluate a model is using cross-validation. With K-fold cross validation the training set is split into N distinct subsets (folds), then trains and evaluations the model N times. The result is an array containing N evaluation scores.
Cross validation allows you to get not only an estimate of your model's performance, but a measure of how precise this estimate is (i.e., its standard deviation).
Chapter 3

This chapter is on classification and uses the MNIST data set.
A binary classifier example is does image match 5, yes or no? Start with Stochastic Gradient Descent (SGD) classifier. This classifier relies on the randomness of training data.
Measuring accuracy is more difficult than regression. Cross validation can be used. A better way is to use a confusion matrix. The general idea is to count the number of times instances of class A are classified as class B. This method gives a lot of information. A more concise metric is to look at the accuracy of the positive preductions: precision of the classifier, true positives / (true positives + false positives)
Multiclass classification can distinquish between one or more classes. Random Forest of Bayes classifiers are capable of handling multiple classes whereas SVM or linear classifiers are binary only.
One way to create a system that can classify 10 digits is to train 10 binary classifiers and choose the best result. [..]