-
The Binary Cross entropy is the recommended loss function for binary classification. This loss function should typically be used when the Neural Network is designed to predict the probability of the outcome. In such cases, the output layer has a single unit with a suitable sigmoid as the activation function.
-
The Cross entropy is the recommended loss function for multi-classification. This loss function should typically be used with the Neural Network and is designed to predict the probability of the outcomes of each of the classes. In such cases, the output layer has softmax units (one for each class).
-
The squared loss function should be used for regression problems. The output layer in this case will have a single unit.
-
Sigmoid units can be used in the output layer in conjunction with binary cross entropy for binary classification problems. The output of this unit can model a Bernoulli distribution over the output yconditioned over x.
-
The Softmax layer is typically used as an output layer for multi-classification tasks in conjunction with the Cross Entropy loss function. The Softmax layer normalizes outputs of the previous layer so that they sum up to one. Typically, the units of the previous layer model an un-normalized score of how likely the input is to belong to a particular class. The softmax layer normalized this so that the output represents the probability for every class.
-
Rectified Linear Unit is typically used in conjunction with a linear transformation. The ReLU unit is more commonly used as a hidden unit in recent times. Results show that ReLU units lead to large and consistent gradients, which helps gradient-based learning.
-
The Hyperbolic Tangent unit is typically used in conjunction with a linear transformation. The hyperbolic tangent unit is also commonly used as a hidden unit.