jonepl/Analyze+Model-P1.md

## Analyze+Model-P1.md

      
    Raw
  

              Analyze+Model-P1.md
            
          
    Analyze + Mode Part 1


Analyze & Model - the process of understanding, diagnosing, and refining a machine learning model with the help of interactive visualization
Regression Models

Linear Regression Model

Linear Regression - a supervised machine learning algorithm where the predicted output is continuous and has a constant slope

Logistic Regression Model

Logistic Regression - a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.
Examples of Categorical Data

Binary (Pass/Fail)
Multi (Cats, Dogs, Sheep)
Ordinal (Low, Medium, High)


Sigmoidal (function) curve
Testing model performance


Train-test split
Cross Validation (good for Hyperparameterized tuning)

K Fold Cross Validation - ???


Cross Validation

Split the data into 3 parts (Training Set => Train, Crossvalidation set = Validate Training data Set, Test Set)

Model tuning

Pick a model that you are getting the best performance and attempt to tune it with regularization and Hyperparameters

Underfitting vs overfitting
Regularization
Hyperparameter tuning
Cross validation

Feature Engineering

Feature normalization

Model Tuning - the process of maximizing a model's performance without overfitting or creating too high of a variance
Underfitting - is unable to capture the relationship between the input and output variables accurately
Overfitting - when a statistical model fits exactly against its training data. You can avoid this with regularization.
- Regularization -  a technique used for tuning the function by adding an additional penalty term in the error function
Regularization -
- Hyperparameter - is a parameter whose value is used to control the learning process. (Example C, penalty, )
- Penalty
- C
Hyperparameter Optimization -
Use Crossvalidation to test you regularization hyperparameter tuning
Feature Normalization -
Feature Standardization -
https://ml-cheatsheet.readthedocs.io/en/latest/logistic_regression.html


## Analyze+Model-P2.md

      
    Raw
  

              Analyze+Model-P2.md
            
          
    Analyze + Mode Part 2

Predictive Modeling with Machine Learning

Terminology


Training data - data you use to train a machine learning alogrithm
- Input features data (Predictors)
- Output features data (Response, Outcome)
Supervised data - training data that contains Input and Output features
Supervised Learning Problems - is a machine learning task of learning a function that maps an input to an output data based on example input-output pairs (ie Supervised data).
Examples of Supervised Learning models

Regression model - Output is a continuous variable | requires the prediction of a quantity
Classification model - requires that examples be classified into one of two or more classes. Example Titanic dataset Survived (1) or Not Survived (0)

Unsupervised data - training data that only contains Input features data
Unsupervisored Learning Problems - the goal is to identify meaningful patterns in the training data which doesn't contain output data
Examples of Unsupervised Learning models

K-means clustering
Reinforcement learning

Classification Task - requires that examples be classified into one of two or more classes. Example Titanic dataset Survived (1) or Not Survived (0)
Types of Classification Tasks

Binary Classification Problem -  Judge by accuracy, other precision, recall
Multi-label Classification
Classification Predicitive

Determining Performance

Confusion Matrix - is a table that is often used to describe the performance of a classification model. Use the following table to determine metric performanance.
                     Confusion Matrix

+-----------------+---------------------+---------------------+
|                 | Predicted Negative  | Predicted Positive  |
+-----------------+---------------------+---------------------+
| Actual Negative | True Negative (TN)  | False Positive (FP) |
+-----------------+---------------------+---------------------+
| Actual Postive  | False Negative (FN) | True Positive (TP)  |
+-----------------+---------------------+---------------------+

Examples of Performance metrics


Precision (aka Positive Predictive Value) - What proportion of positive identifications was actually correct?
Precision = TP/ (TP + FP)


Recall - What proportion of actual positives was identified correctly?
Recall = TP/ (TP + FN)


Accuracy - the percentage of correct predictions for the test data
Accuracy = (TP + TN) / (TP + TN + FP + FN)


Negative Predictive Value (NPV) - ?
NPV = TN/(TN + FN)


Sensitivity - True Positive Rate. The proportion of actual negatives, which got predicted correctly
Sensitivity = TP/(TP+TN)


Specifictiy - True Negative Rate. The proportion of actual negatives, which got predicted correctly
Specifity = TN/(TN + FP)


Evaluating Your Classifier

If you are given supervised test data you can perform an 80% 20% split in order to train your alogrithm.

Split supervised data into 80/20 split
Use the 80% supervised data to train a model
Use the 20% supervised data to test your tranined model
Compare your Predicted Output of you 20% supervised data input to the actual values and determine your score (Precision, Recall)


This idea of a feature is a detail of the data
Statistical modeling is a subfield of mathematics that seeks out relationships between variables in order to predict an outcome.
Classifier - a type of machine learning algorithm used to assign a class label to a data input

Logistic Regression
Support Vector machine
Neural Networks
Random forest


## Analyze-Model.png

      
    Raw
  

              Analyze-Model.png
            
          
## Classifier-Evaluation.png

      
    Raw
  

              Classifier-Evaluation.png
            
          
## Cross-Validation.png

      
    Raw
  

              Cross-Validation.png
            
          
## Data-Extraction.md

      
    Raw
  

              Data-Extraction.md
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Data-Organization.md

      
    Raw
  

              Data-Organization.md
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Data-Science-Cycle-Overview.png

      
    Raw
  

              Data-Science-Cycle-Overview.png
            
          
## Data-Science-Overview.md

      
    Raw
  

              Data-Science-Overview.md
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Linear-Regression.png

      
    Raw
  

              Linear-Regression.png
            
          
## Logistic-Regression.png

      
    Raw
  

              Logistic-Regression.png
            
          
## ML-Templates.md

      
    Raw
  

              ML-Templates.md
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
        
    
## Organize-Step.png

      
    Raw
  

              Organize-Step.png
            
          
## Sigmoid.png

      
    Raw
  

              Sigmoid.png