Skip to content

Instantly share code, notes, and snippets.

@ahmed-BH
Created January 13, 2020 10:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ahmed-BH/802d37a3bc7dad85bdbe601e969bbbec to your computer and use it in GitHub Desktop.
Save ahmed-BH/802d37a3bc7dad85bdbe601e969bbbec to your computer and use it in GitHub Desktop.
Neural network notes taken from "Deep Learning with Python" by Nikhil Ketkar

Summary of Loss Functions

  • The Binary Cross entropy is the recommended loss function for binary classification. This loss function should typically be used when the Neural Network is designed to predict the probability of the outcome. In such cases, the output layer has a single unit with a suitable sigmoid as the activation function.

  • The Cross entropy is the recommended loss function for multi-classification. This loss function should typically be used with the Neural Network and is designed to predict the probability of the outcomes of each of the classes. In such cases, the output layer has softmax units (one for each class).

  • The squared loss function should be used for regression problems. The output layer in this case will have a single unit.

Types of activation functions

  • Sigmoid units can be used in the output layer in conjunction with binary cross entropy for binary classification problems. The output of this unit can model a Bernoulli distribution over the output yconditioned over x.

  • The Softmax layer is typically used as an output layer for multi-classification tasks in conjunction with the Cross Entropy loss function. The Softmax layer normalizes outputs of the previous layer so that they sum up to one. Typically, the units of the previous layer model an un-normalized score of how likely the input is to belong to a particular class. The softmax layer normalized this so that the output represents the probability for every class.

  • Rectified Linear Unit is typically used in conjunction with a linear transformation. The ReLU unit is more commonly used as a hidden unit in recent times. Results show that ReLU units lead to large and consistent gradients, which helps gradient-based learning.

  • The Hyperbolic Tangent unit is typically used in conjunction with a linear transformation. The hyperbolic tangent unit is also commonly used as a hidden unit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment