farhanhubble/Classification Losses.md

## Classification Losses.md

      
    Raw
  

              Classification Losses.md
            
          
    Neural networks that perform classification(predict the class of an input) produce a vector of C raw numbers for every input,
where C is the total number of classes, for example 10 for MNIST.  This vector is called a logit.
If we feed an MNIST image for digit 3 to a classification model, it may produce the a logit vector like this (top row added to
show the class ID's:


0
1
2
3
4
5
6
7
8
9


-121.2
-212.3
81.1
171.1
-55.0
132.5
-13.2
63.5
99.2
-10.9


We can take the element wise softmax() to get probabilities for each class id:


0
1
2
3
4
5
6
7
8
9


1.13e-127
3.10e-167
8.19e-40
1.0
6.39e-99
1.7e-17
9.11e-81
1.86e-47
5.94e-32
9.08e-80


The actual class id or target value 3 can be encoded into a one hot vector as:


0
1
2
3
4
5
6
7
8
9


0
0
0
1
0
0
0
0
0
0


For computing the loss with logits and targets, the loss function is simple cross entropy loss
that computes the following: np.dot(target, -1*log(softmax(logits)).


For computing the loss with probabilities and targets, the probabilities need to be first converted to
log-probabilities and then passed to the negative log likelihood function that simmply performs:
np.dot(*target, -1*log-probabilities)


When performing binary classification, there are two classes(C=2) that are mutually exclusive. The network is desinged to
output a single logit. The target class id is also either 0 or 1. The softmax() operation then reduces to a single
sigmoid() evaluation.


For computing loss with logit and target, the suitable loss function is binary cross entropy with logits that
still performs target * -1 *log(sigmoid(logit)) + (1-target) * -1 * log(1-sigmoid(logit))


For computing loss with probabilities, if the network has a sigmoid layer built into it, the loss function is
binary cross entropy loss