Skip to content

Instantly share code, notes, and snippets.

@sergeyprokudin
Last active January 18, 2024 00:58
Show Gist options
  • Star 16 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sergeyprokudin/4a50bf9b75e0559c1fcd2cae860b879e to your computer and use it in GitHub Desktop.
Save sergeyprokudin/4a50bf9b75e0559c1fcd2cae860b879e to your computer and use it in GitHub Desktop.
Multivariate Gaussian Negative LogLikelihood Loss Keras
import keras.backend as K
import numpy as np
def gaussian_nll(ytrue, ypreds):
"""Keras implmementation of multivariate Gaussian negative loglikelihood loss function.
This implementation implies diagonal covariance matrix.
Parameters
----------
ytrue: tf.tensor of shape [n_samples, n_dims]
ground truth values
ypreds: tf.tensor of shape [n_samples, n_dims*2]
predicted mu and logsigma values (e.g. by your neural network)
Returns
-------
neg_log_likelihood: float
negative loglikelihood averaged over samples
This loss can then be used as a target loss for any keras model, e.g.:
model.compile(loss=gaussian_nll, optimizer='Adam')
"""
n_dims = int(int(ypreds.shape[1])/2)
mu = ypreds[:, 0:n_dims]
logsigma = ypreds[:, n_dims:]
mse = -0.5*K.sum(K.square((ytrue-mu)/K.exp(logsigma)),axis=1)
sigma_trace = -K.sum(logsigma, axis=1)
log2pi = -0.5*n_dims*np.log(2*np.pi)
log_likelihood = mse+sigma_trace+log2pi
return K.mean(-log_likelihood)
@sitmo
Copy link

sitmo commented May 28, 2019

Thanks for sharing your code!

There is a little error, in mse you need to divide by "K.exp(2*logsigma)" instead of "K.exp(logsigma)"

Oops. It's fine as it is, I had misread the brackets! Perfect, sorry for the noise!

@gledsonmelotti
Copy link

Hi. I'm sorry for the inconvenience. Could you provide a complete example of a neural network using such a cost function? Because I could not understand how to obtain the average parameter and variance of the network. Thanks in advance for your attention, Gledson.

@sergeyprokudin
Copy link
Author

Hi. I'm sorry for the inconvenience. Could you provide a complete example of a neural network using such a cost function? Because I could not understand how to obtain the average parameter and variance of the network. Thanks in advance for your attention, Gledson.

Hi Gledson!

Take a look at this small network: https://gist.github.com/sergeyprokudin/bb66fff8c672f8caab6bbb1056c7bd20

you can compile this model with

model.compile(loss=gaussian_nll, optimizer='Adam')

and use predict_prob function to get mean and variance estimates.

Hope this helps,

Sergey

@gledsonmelotti
Copy link

gledsonmelotti commented May 15, 2020

Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass?
Can I do with sofmax and softplus?
mean = model.add(Dense(n_outputs, activation='softmax'))
sigma = model.add(Dense(n_outputs, activation='sofplus'))
model = Model(x_input, otput([mean,sigma]))

@sergeyprokudin
Copy link
Author

sergeyprokudin commented May 15, 2020

Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass?
Can I do with sofmax and softplus?
mean = model.add(Dense(n_outputs, activation='softmax'))
sigma = model.add(Dense(n_outputs, activation='sofplus'))
model = Model(x_input, otput([mean,sigma]))

I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper for discussion of the topic).

@gledsonmelotti
Copy link

gledsonmelotti commented May 16, 2020

Hello @sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible?

@gledsonmelotti
Copy link

gledsonmelotti commented May 16, 2020 via email

@sergeyprokudin
Copy link
Author

Hello @sergeyprokudin https://github.com/sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible? Thank you very much. Em sex., 15 de mai. de 2020 às 19:44, Sergey Prokudin < notifications@github.com> escreveu:

@.**** commented on this gist. ------------------------------ Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass? Can I do with sofmax and softplus? mean = model.add(Dense(n_outputs, activation='softmax')) sigma = model.add(Dense(n_outputs, activation='sofplus')) model = Model(x_input, otput([mean,sigma])) I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper https://arxiv.org/abs/1706.04599 for discussion of the topic). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/4a50bf9b75e0559c1fcd2cae860b879e#gistcomment-3305800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT3EDHN2LNBOW7K6MHAMFLRRWEPLANCNFSM4HQDOABQ .

Gaussian distribution is defined over continuous domain, while in classification you regularly want to model the parameters of some categorical distribution. What would be the implied interpretation of mean and variance in your case?

@gledsonmelotti
Copy link

Hello @sergeyprokudin https://github.com/sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible? Thank you very much. Em sex., 15 de mai. de 2020 às 19:44, Sergey Prokudin < notifications@github.com> escreveu:

@.**** commented on this gist. ------------------------------ Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass? Can I do with sofmax and softplus? mean = model.add(Dense(n_outputs, activation='softmax')) sigma = model.add(Dense(n_outputs, activation='sofplus')) model = Model(x_input, otput([mean,sigma])) I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper https://arxiv.org/abs/1706.04599 for discussion of the topic). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/4a50bf9b75e0559c1fcd2cae860b879e#gistcomment-3305800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT3EDHN2LNBOW7K6MHAMFLRRWEPLANCNFSM4HQDOABQ .

Gaussian distribution is defined over continuous domain, while in classification you regularly want to model the parameters of some categorical distribution. What would be the implied interpretation of mean and variance in your case?

Yes. I now understand your explanation. In this case, could I consider the average to be the softmax value?

Best hegards.

@sergeyprokudin
Copy link
Author

Hello @sergeyprokudin https://github.com/sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible? Thank you very much. Em sex., 15 de mai. de 2020 às 19:44, Sergey Prokudin < notifications@github.com> escreveu:

@.**** commented on this gist. ------------------------------ Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass? Can I do with sofmax and softplus? mean = model.add(Dense(n_outputs, activation='softmax')) sigma = model.add(Dense(n_outputs, activation='sofplus')) model = Model(x_input, otput([mean,sigma])) I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper https://arxiv.org/abs/1706.04599 for discussion of the topic). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/4a50bf9b75e0559c1fcd2cae860b879e#gistcomment-3305800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT3EDHN2LNBOW7K6MHAMFLRRWEPLANCNFSM4HQDOABQ .
Gaussian distribution is defined over continuous domain, while in classification you regularly want to model the parameters of some categorical distribution. What would be the implied interpretation of mean and variance in your case?

Yes. I now understand your explanation. In this case, could I consider the average to be the softmax value?

Best hegards.

The class with the maximum probability value is a mode of a corresponding categorical probability distribution, not its mean value which is undefined in this case. Hope this helps!

@gledsonmelotti
Copy link

Hello @sergeyprokudin https://github.com/sergeyprokudin, How are you? In fact I would like to use a Gaussian layer in my classification model, which could calculate mean and variance. Is this possible? Thank you very much. Em sex., 15 de mai. de 2020 às 19:44, Sergey Prokudin < notifications@github.com> escreveu:

@.**** commented on this gist. ------------------------------ Hello sergeyprokudin, Thank you very much. I have other doubte. Don't you use softmax to predict a multiclass? Can I do with sofmax and softplus? mean = model.add(Dense(n_outputs, activation='softmax')) sigma = model.add(Dense(n_outputs, activation='sofplus')) model = Model(x_input, otput([mean,sigma])) I'm afraid you are confusing regression and classification tasks. If you are interested in classification, you don't need Gaussian negative log-likelihood loss defined in this gist - you can use standard categorical crossentropy https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy loss and softmax activations to get valid class probabilities that will sum to 1. You don't need to model sigmas separately as (in theory) your softmax outputs already provide you with confidence estimates. In practice, however, you might want to calibrate them (check this paper https://arxiv.org/abs/1706.04599 for discussion of the topic). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://gist.github.com/4a50bf9b75e0559c1fcd2cae860b879e#gistcomment-3305800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIT3EDHN2LNBOW7K6MHAMFLRRWEPLANCNFSM4HQDOABQ .
Gaussian distribution is defined over continuous domain, while in classification you regularly want to model the parameters of some categorical distribution. What would be the implied interpretation of mean and variance in your case?

Yes. I now understand your explanation. In this case, could I consider the average to be the softmax value?
Best hegards.

The class with the maximum probability value is a mode of a corresponding categorical probability distribution, not its mean value which is undefined in this case. Hope this helps!

Okay, now I understand. My doubts were clarified. Thank you very much for the information.
Best Regards.

@aangius
Copy link

aangius commented Jun 4, 2021

Hi,
Why do you use sum in this piece of code
sigma_trace = -K.sum(logsigma, axis=1)
?

@lingleong981130
Copy link

Hi, may I know how to solve this error??
"ValueError: Dimensions must be equal, but are 128 and 64 for '{{node gaussian_nll/sub}} = Sub[T=DT_FLOAT](Cast, gaussian_nll/strided_slice)' with input shapes: [?,128,128,3], [?,64,128,3]."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment