Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Fisher vectors with sklearn
import numpy as np
import pdb
from sklearn.datasets import make_classification
from sklearn.mixture import GMM
def fisher_vector(xx, gmm):
"""Computes the Fisher vector on a set of descriptors.
Parameters
----------
xx: array_like, shape (N, D) or (D, )
The set of descriptors
gmm: instance of sklearn mixture.GMM object
Gauassian mixture model of the descriptors.
Returns
-------
fv: array_like, shape (K + 2 * D * K, )
Fisher vector (derivatives with respect to the mixing weights, means
and variances) of the given descriptors.
Reference
---------
J. Krapac, J. Verbeek, F. Jurie. Modeling Spatial Layout with Fisher
Vectors for Image Categorization. In ICCV, 2011.
http://hal.inria.fr/docs/00/61/94/03/PDF/final.r1.pdf
"""
xx = np.atleast_2d(xx)
N = xx.shape[0]
# Compute posterior probabilities.
Q = gmm.predict_proba(xx) # NxK
# Compute the sufficient statistics of descriptors.
Q_sum = np.sum(Q, 0)[:, np.newaxis] / N
Q_xx = np.dot(Q.T, xx) / N
Q_xx_2 = np.dot(Q.T, xx ** 2) / N
# Compute derivatives with respect to mixing weights, means and variances.
d_pi = Q_sum.squeeze() - gmm.weights_
d_mu = Q_xx - Q_sum * gmm.means_
d_sigma = (
- Q_xx_2
- Q_sum * gmm.means_ ** 2
+ Q_sum * gmm.covars_
+ 2 * Q_xx * gmm.means_)
# Merge derivatives into a vector.
return np.hstack((d_pi, d_mu.flatten(), d_sigma.flatten()))
def main():
# Short demo.
K = 64
N = 1000
xx, _ = make_classification(n_samples=N)
xx_tr, xx_te = xx[: -100], xx[-100: ]
gmm = GMM(n_components=K, covariance_type='diag')
gmm.fit(xx_tr)
fv = fisher_vector(xx_te, gmm)
pdb.set_trace()
if __name__ == '__main__':
main()
@iakash2604

This comment has been minimized.

Copy link

iakash2604 commented May 31, 2017

thanks for the code.
I just have one doubt. based on the derivation of a fisher vector for an image (as given on this page here a fisher vector will have dimension of only 2KD and not K + 2 * D * K.

can you please look into it. i may be wrong.
thanks again

@sitzikbs

This comment has been minimized.

Copy link

sitzikbs commented Jul 11, 2017

@iakash-1326 - It is not wrong (at least not regarding what you mentioned). In your link, they only use the derivatives with respects to the means and sigmas. In the original paper they also use the derivatives with respects to the weights. In another paper they mentioned that the derivatives with respects to the weights dont contribute much Hence the difference.
Here is a link to the original paper

What currently bothers me is the sign of the derivative according to sigma. I am not sure its correct (it should by *(-1)) but I am still checking

@danoneata

This comment has been minimized.

Copy link
Owner Author

danoneata commented May 7, 2018

Sorry for the late reply – unfortunately, Gist doesn't trigger a notification when a comment is posted.

@iakash2604 wrote:

based on the derivation of a fisher vector for an image [...] a fisher vector will have dimension of only 2KD and not K + 2DK

As @sitzikbs already explained, my implementation includes the derivatives with respect to the weights, hence the additional K values. My code is based on the paper of (Krapac et al., 2011), equations 15–17.

@iakash2604 wrote:

What currently bothers me is the sign of the derivative according to sigma. I am not sure its correct (it should by *(-1)) but I am still checking

I believe the current implementation uses the correct sign (if you are not convinced I can go into more details). But I want to point out that compared to the paper, the derivatives with respect to the variances miss a scaling by 0.5. From a practical point of view, this scaling doesn't matter, since the Fisher vectors are standardized (zero mean, unit variance across each dimension) in order to approximate the transformation with inverse of the Fisher information matrix, see section 3.5 from (Krapac et al., 2011). This standardization step removes the scaling of each dimension. Also, do not forget to L2 and power normalize the Fisher vectors before feeding them into the classifier; it has been shown it visibly improves performance, see table 1 in (Sanchez et al., 2013). Here is a code example of how the Fisher vectors are prepared for classification by normalizing and building the corresponding kernel matrices.

@vanetoj

This comment has been minimized.

Copy link

vanetoj commented Apr 27, 2020

what is the Q sum?

@danoneata

This comment has been minimized.

Copy link
Owner Author

danoneata commented Apr 28, 2020

Hi @vanetoj

what is the Q sum?

The matrix Q corresponds to the posterior distribution of the latent variables (the clusters from 1 to K) given the data (denoted by xx), that is:

Q[n, k] = p(k | xx[n])

To obtainQ_sum we average over the first dimension of Q, namely the N data points. This step leaves us with K values, which can be interpreted as a new estimate of the prior probability on the K clusters (pi) or, similarly, as an M-step update given the data xx; see (Bishop, 2006), eqs. (9.26) and (9.27).

Does this help?

@vanetoj

This comment has been minimized.

Copy link

vanetoj commented May 5, 2020

yes, thank you!

@khizerali

This comment has been minimized.

Copy link

khizerali commented Jul 3, 2020

hello , what are the equation numbers of dmu,d_pi ,d_sigma , in the paper mentioned? Becasue i could not understand how d_sigma equation is computed from paper.

@danoneata

This comment has been minimized.

Copy link
Owner Author

danoneata commented Jul 3, 2020

hello , what are the equation numbers of dmu,d_pi ,d_sigma , in the paper mentioned? Becasue i could not understand how d_sigma equation is computed from paper.

Hello @khizerali!

The code implements equations 15–17 from (Krapac et al., 2011). Specifically, d_sigma is based on equation 17:

q_nk . (Σ_k - x_nk ** 2) / 2

Since x_nk is defined as x_n - μ_k, we have

q_nk . (Σ_k - (x_n - μ_k) ** 2) / 2

and by further expanding the terms we get

(q_nk . Σ_k - q_nk . x_n ** 2 - q_nk . μ_k ** 2 + 2 * q_nk . x_n . μ_k) / 2

The code is a vectorized version of the above formula that also averages over the set of N descriptors (the dimension denoted by n). It might be worth keeping in mind the following correspondences between code and math notation:

  • Q_sum corresponds to averaging q_nk over n (that is, 1 / N . Σ_n q_nk)
  • Q_xx_2 corresponds to averaging q_nk . x_n ** 2 over n (that is, 1 / N . Σ_n q_nk . x_n ** 2)
  • gmm.means_ corresponds to the means μ_k
  • gmm.covars_ coresponds to diagonal of the covariance matrix Σ_k.

I hope this is somewhat clear 🙂

@khizerali

This comment has been minimized.

Copy link

khizerali commented Jul 4, 2020

@danoneata ,thanks for your explanation . Now i understand how this equation is formed. But why inv(gmm.covars) is not used in d_mu equation according to equation 16?

@danoneata

This comment has been minimized.

Copy link
Owner Author

danoneata commented Jul 4, 2020

But why inv(gmm.covars) is not used in d_mu equation according to equation 16?

Yes, that's a valid observation, @khizerali! The reason is similar to what I've previously explained when motivating the missing 0.5 factor in d_sigma – since I'm standardizing the Fisher vectors, any linear transformation on a given dimension will be canceled away (and in this case, the inverse covariance represents a constant scaling applied to the d_mu component of all Fisher vectors). Practically, avoiding the multiplication with gmm.covars_ ** -1 doesn't change the results, but saves some computation. In fact, now I notice that for the same reason, I could have also avoided subtracting the GMM weights when computing d_pi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.