wael34218/LossFunctions.md

## LossFunctions.md

      
    Raw
  

              LossFunctions.md
            
          
    Machine Learning Loss Functions

Loss functions is a method of evaluating how well specific algorithm models the given data.
If predictions deviates too much from actual results, loss function will compute a large positive number.
Gradually, with the help of some optimization function, the model will make better predictions and reduce overall loss.
The cost function is the average of the losses.
You first calculate the loss, one for each data point, based on your prediction and your ground truth label.
Then, you average these losses which corresponds to your cost.
Regression Losses

Root Mean Squared Error (L2):

However, due to squaring, predictions which are far away from actual values are penalized heavily in comparison to less deviated predictions.
Plus MSE has nice mathematical properties which makes it easier to calculate gradients.
L2 loss is sensitive to outliers, but gives a more stable and closed form solution (by setting its derivative to 0.)

    
Mean Absolute Error (L1):

Like MSE, this as well measures the magnitude of error without considering their direction.
Unlike MSE, MAE needs more complicated tools such as linear programming to compute the gradients.
Plus MAE is more robust to outliers since it does not make use of square.

    
Huber Loss:

Huber loss is less sensitive to outliers in data than the squared error loss. It’s also differentiable at 0.
It’s basically absolute error, which becomes quadratic when error is small.

    
Log-Cosh Loss:

log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x.
This means that 'logcosh' works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction.
It has all the advantages of Huber loss, and it’s twice differentiable everywhere, unlike Huber loss.
But Log-cosh loss isn’t perfect.
It still suffers from the problem of gradient and hessian for very large off-target predictions being constant, therefore resulting in the absence of splits for XGBoost.

    
Mean Bias Error:

It could determine if the model has positive bias or negative bias.

    
Classification Losses

Cross Entropy Loss:

Cross-entropy loss increases as the predicted probability diverges from the actual label.
An important aspect of this is that cross entropy loss penalizes heavily the predictions that are confident but wrong.

    
Hinge Loss / Maximum Margin Classification:

In simple terms, the score of correct category should be greater than sum of scores of all incorrect categories by some safety margin (usually one).
And hence hinge loss is used for maximum-margin classification, most notably for support vector machines.
Although not differentiable, it’s a convex function which makes it easy to work with usual convex optimizers used in machine learning domain.

    
Kullback Leibler Divergence Loss:

Measure of how one probability distribution is different from a second, reference probability distribution.

    
Cosine Similarity Loss:

Embedding loss function

    
Triplet Loss

A more efficient loss function for Siamese NN

    
So as long as the negative value is further than the positive value + alpha there will be no gain for the algorithm to condense the positive and the anchor.

    
Contrastive Loss Function

Also can be used for Siamese NN

    
where m > 0 is a margin. The margin defines a radius around GW(X)
Minimax GAN Loss

Minimax refers to an optimization strategy in two-player turn-based games for minimizing the loss or cost for the worst case of the other player.
discriminator: maximize log D(x) + log(1 – D(G(z)))
generator: minimize log(1 – D(G(z)))
In other words, D and G play the following two-player minimax game with value function V (G, D):

    
In practice, this loss function for the generator saturates.
This means that if it cannot learn as quickly as the discriminator, the discriminator wins, the game ends, and the model cannot be trained effectively.

  
## ZContrastive1.png

      
    Raw
  

              ZContrastive1.png
            
          
## ZContrastive2.png

      
    Raw
  

              ZContrastive2.png
            
          
## ZCosineSimilarity.png

      
    Raw
  

              ZCosineSimilarity.png
            
          
## ZCrossEntropy.png

      
    Raw
  

              ZCrossEntropy.png
            
          
## ZHuber.png

      
    Raw
  

              ZHuber.png
            
          
## ZKL.png

      
    Raw
  

              ZKL.png
            
          
## ZLogCosh.png

      
    Raw
  

              ZLogCosh.png
            
          
## ZMAE.png

      
    Raw
  

              ZMAE.png
            
          
## ZMBE.png

      
    Raw
  

              ZMBE.png
            
          
## ZMiniMax.png

      
    Raw
  

              ZMiniMax.png
            
          
## ZRMSE.png

      
    Raw
  

              ZRMSE.png
            
          
## ZSVM.png

      
    Raw
  

              ZSVM.png
            
          
## ZTripletLoss.png

      
    Raw
  

              ZTripletLoss.png
            
          
## ZTripletLossIllustration.png

      
    Raw
  

              ZTripletLossIllustration.png