|import numpy as np|
|from sklearn.datasets import make_classification|
|from sklearn.mixture import GaussianMixture as GMM|
|def fisher_vector(xx, gmm):|
|"""Computes the Fisher vector on a set of descriptors.|
|xx: array_like, shape (N, D) or (D, )|
|The set of descriptors|
|gmm: instance of sklearn mixture.GMM object|
|Gauassian mixture model of the descriptors.|
|fv: array_like, shape (K + 2 * D * K, )|
|Fisher vector (derivatives with respect to the mixing weights, means|
|and variances) of the given descriptors.|
|J. Krapac, J. Verbeek, F. Jurie. Modeling Spatial Layout with Fisher|
|Vectors for Image Categorization. In ICCV, 2011.|
|xx = np.atleast_2d(xx)|
|N = xx.shape|
|# Compute posterior probabilities.|
|Q = gmm.predict_proba(xx) # NxK|
|# Compute the sufficient statistics of descriptors.|
|Q_sum = np.sum(Q, 0)[:, np.newaxis] / N|
|Q_xx = np.dot(Q.T, xx) / N|
|Q_xx_2 = np.dot(Q.T, xx ** 2) / N|
|# Compute derivatives with respect to mixing weights, means and variances.|
|d_pi = Q_sum.squeeze() - gmm.weights_|
|d_mu = Q_xx - Q_sum * gmm.means_|
|d_sigma = (|
|- Q_sum * gmm.means_ ** 2|
|+ Q_sum * gmm.covariances_|
|+ 2 * Q_xx * gmm.means_)|
|# Merge derivatives into a vector.|
|return np.hstack((d_pi, d_mu.flatten(), d_sigma.flatten()))|
|# Short demo.|
|K = 64|
|N = 1000|
|xx, _ = make_classification(n_samples=N)|
|xx_tr, xx_te = xx[: -100], xx[-100: ]|
|gmm = GMM(n_components=K, covariance_type='diag')|
|fv = fisher_vector(xx_te, gmm)|
|if __name__ == '__main__':|
Nov 24, 2020
Thanks for the implementation.
I had a question about your implementation. I would be grateful if you could reply.
I just realized there is no Fisher information matrix in your implementation. However, In the paper "Fisher Kernels on Visual Vocabularies for Image Categorization" authors mentioned:
To normalize the dynamic range of the different dimensions of the gradient vectors, we need to compute the diagonal of the Fisher information matrix F.
So isn't this degrading performance of FV if you are not using Fisher information matrix F?
Nov 24, 2020
Hello @sobhanhemati You are right, my code doesn't include the normalization with the Fisher information matrix. In practice, we usually approximate this normalization by standardizing the Fisher vectors (scaling to zero mean and unit variance); the implementation will look something along these lines:
from sklearn.preprocessing import StandardScaler fvs = np.vstack([fisher_vector(get_descs(img), gmm) for img in imgs]) scaler = StandardScaler() fvs = scaler.fit(fvs).transform(fvs)
Standardizing the Fisher vectors corresponds to using a diagonal approximation of the sample covariance matrix of the Fisher vectors. For more information please check section 3.5 in (Krapac et al., 2011) and for an empirical evaluation of the performance see page 9 (approximate FIM vs. empirical FIM) in (Sanchez et al., 2013); the latter report the following accuracies for image classification (so the higher the values the better):
- 61.8% for analytical diagonal approximation;
- 60.6% for empirical diagonal approximation (the approach mentioned above);
- 59.8% for using the identity matrix as the Fisher information matrix (that is, performing no normalization).
Hope this helps!
P.S.: If the dimensionality of your data allows, you can also estimate the full sample covariance matrix (which is equivalent to whitening the Fisher vectors).
Nov 25, 2020
Thank you for clarification.
Do you have any implementation of the analytical diagonal approximation so that I can add that the current implementation?
It seems that analytical diagonal approximation works about 1 percent better :))
Thank you in advance
Nov 25, 2020
@sobhanhemati Equations (16–18) from (Sanchez et al., 2013) provide the Fisher vectors that include the analytical approximation; hence, you can modify the computation of
d_sigma in the gist above as follows:
# at line 43 s = np.sqrt(gmm.weights_)[:, np.newaxis] d_pi = (Q_sum.squeeze() - gmm.weights_) / s.squeeze() d_mu = (Q_xx - Q_sum * gmm.means_) * np.sqrt(gmm.covariances_) ** -1 / s d_sigma = - ( - Q_xx_2 - Q_sum * gmm.means_ ** 2 + Q_sum * gmm.covariances_ + 2 * Q_xx * gmm.means_) / (s * np.sqrt(2))
Note that I haven't tested this implementation, so you might want to double check it. And I would suggest to try both methods for estimating the diagonal Fisher information matrix and see which one works better for you — Sanchez et al. mention in their paper:
Note that we do not claim that this difference is significant nor that the closed-form approximation is superior to the empirical one in general.
Finally, do not forget to L2 and power normalise the Fisher vectors — these transformations yield much more substantial improvements (about 6-7% points each) than the choice of the approximation for the Fisher information matrix (see Table 1 from Sanchez et al.).
Nov 27, 2020
Thank you so much for the comprehensive answer.
I really appreciate that.
Jun 15, 2021
Hai I'm beginner so i don't know working of fisher vector encoding. Please help to understand this
Yes, that's a valid observation, @khizerali! The reason is similar to what I've previously explained when motivating the missing 0.5 factor in
d_sigma– since I'm standardizing the Fisher vectors, any linear transformation on a given dimension will be canceled away (and in this case, the inverse covariance represents a constant scaling applied to the
d_mucomponent of all Fisher vectors). Practically, avoiding the multiplication with
gmm.covars_ ** -1doesn't change the results, but saves some computation. In fact, now I notice that for the same reason, I could have also avoided subtracting the GMM weights when computing