Skip to content

Instantly share code, notes, and snippets.

@zeroows
Created February 6, 2020 08:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zeroows/eb40753b39a2189bafbfd149baf36c8a to your computer and use it in GitHub Desktop.
Save zeroows/eb40753b39a2189bafbfd149baf36c8a to your computer and use it in GitHub Desktop.
Some Pytorch losses examples
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.4.0'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import torch\n",
"import torch.nn as nn\n",
"import numpy as np\n",
"\n",
"torch.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L1Loss - Mean Absolute Error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"It measures the numerical distance between the estimated and actual value. It is the simplest form of error metric. The absolute value of the error is taken because if we don’t then negatives will cancel out the positives. This isn’t useful to us, rather it makes it more unreliable.\n",
"The lower the value of MAE, better is the model. We can not expect its value to be zero, because it might not be practically useful. This leads to wastage of resources. For example, if our model’s loss is within 5% then it is alright in practice, and making it more precise may not really be useful.\n",
"### When to use it?\n",
"+ Regression problems\n",
"+ Simplistic model\n",
"+ As neural networks are usually used for complex problems, this function is rarely used."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"x = torch.randn(2, 3)\n",
"y = torch.randn(2, 3)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[-0.4672, -0.2739, -1.0714],\n",
" [-0.3790, 1.1721, 0.6467]])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 0.5286, 1.0452, 0.3380],\n",
" [-0.7827, -0.1144, 1.2989]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(1.0111)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.L1Loss()\n",
"\n",
"loss(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MSELoss - Mean Square Error Loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"The squaring of the difference of prediction and actual value means that we’re amplifying large losses. If the classifier is off by 200, the error is 40000 and if the classifier is off by 0.1, the error is 0.01. This penalizes the model when it makes large mistakes and incentivizes small errors.\n",
"### When to use it?\n",
"+ Regression problems.\n",
"+ The numerical value features are not large.\n",
"+ Problem is not very high dimensional."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 1.8940, -1.1273, 0.6175],\n",
" [-0.1853, 1.0409, -0.0589]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.randn(2, 3)\n",
"y = torch.randn(2, 3)\n",
"\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[-0.2626, -0.7741, -0.1262],\n",
" [ 0.3246, 0.2262, 0.9972]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(1.2279)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.MSELoss()\n",
"\n",
"loss(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CrossEntropyLoss - Cross-Entropy Loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"Cross-entropy as a loss function is used to learn the probability distribution of the data. While other loss functions like squared loss penalize wrong predictions, cross entropy gives a greater penalty when incorrect predictions are predicted with high confidence. What differentiates it with negative log loss is that cross entropy also penalizes wrong but confident predictions and correct but less confident predictions, while negative log loss does not penalize according to the confidence of predictions.\n",
"### When to use it?\n",
"+ Classification tasks\n",
"+ For making confident model i.e. model will not only predict accurately, but it will also do so with higher probability.\n",
"+ For higher precision/recall values."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[ 0.3764, -1.1333, 0.2292, -1.1961],\n",
" [ 1.2088, -1.3007, 0.2191, 0.0599]])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.randn(2, 4)\n",
"y = torch.LongTensor(2).random_(4)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([1, 0])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(1.4550)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.CrossEntropyLoss()\n",
"\n",
"loss(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NLLLoss - Negative Log-Likelihood Loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"It maximizes the overall probability of the data. It penalizes the model when it predicts the correct class with smaller probabilities and incentivizes when the prediction is made with higher probability. The logrithm does the penalizing part here. Smaller the probabilities, higher will be its logrithm. The negative sign is used here because the probabilities lie in the range [0, 1] and the logrithms of values in this range is negative. So it makes the loss value to be positive.\n",
"### When to use it?\n",
"+ Classification.\n",
"+ Smaller quicker training.\n",
"+ Simple tasks."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[-0.0231, 0.6845, -0.2956, -1.4282],\n",
" [-0.1547, 0.3717, -0.9786, 1.3589],\n",
" [-0.1835, 1.5115, 0.7340, 1.0231]])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.randn(3, 4)\n",
"y = torch.LongTensor(3).random_(4)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([3, 0, 3])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(0.1866)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.NLLLoss()\n",
"\n",
"loss(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MarginRankingLoss - Margin Ranking Loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"The prediction y of the classifier is based on the ranking of the inputs x1 and x2. Assuming margin to have the default value of 0, if y and (x1-x2) are of the same sign, then the loss will be zero. This means that x1/x2 was ranked higher(for y=1/-1), as expected by the data. If y and (x1-x2) are of the opposite sign, then the loss will be the non-zero value given by y * (x1-x2). This means that either x2 was ranked higher when x1 should have been ranked higher or vice versa. Although its usage in Pytorch in unclear as much open source implementations and examples are not available as compared to other loss functions.\n",
"### When to use it?\n",
"+ GANs.\n",
"+ Ranking tasks."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(tensor([ 0.0551, 0.2549, -0.5013]),\n",
" tensor([ 0.5706, 1.3610, -0.8076]),\n",
" tensor([-1., 1., -1.]))"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x1 = torch.randn(3)\n",
"x2 = torch.randn(3)\n",
"y = torch.FloatTensor(np.random.choice([1, -1], 3))\n",
"\n",
"x1, x2, y"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(0.4708)"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.MarginRankingLoss()\n",
"\n",
"loss(x1, x2, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# HingeEmbeddingLoss - Hinge Embedding Loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What does it mean?\n",
"The prediction y of the classifier is based on the value of the input x. Assuming margin to have the default value of 1, if y=-1, then the loss will be maximum of 0 and (1 — x). If x > 0 loss will be x itself (higher value), if 0<x<1 loss will be 1 — x (smaller value) and if x < 0 loss will be 0 (minimum value). For y =1, the loss is as high as the value of x.\n",
"### When to use it?\n",
"+ Learning nonlinear embeddings\n",
"+ Semi-supervised learning\n",
"+ Where similarity or dissimilar of two inputs is to be measured."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[-1.4491, -0.2287, 0.4343],\n",
" [ 0.3124, -0.1245, -0.6946]])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = torch.randn(2, 3)\n",
"y = torch.FloatTensor(np.random.choice([-1, 1], (2, 3)))\n",
"\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([[-1., 1., 1.],\n",
" [ 1., -1., 1.]])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor(0.5662)"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loss = nn.HingeEmbeddingLoss()\n",
"\n",
"loss(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"here are more losses with examples - thanks to yang\n",
"\n",
"https://gist.github.com/zeroows/e06a7b7cad53fcb7b103dd0fd6c44d45"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment