Skip to content

Instantly share code, notes, and snippets.

@lgalke

lgalke/all_but_the_top.py

Last active Nov 18, 2019
Embed
What would you like to do?
Word Embedding Postprocessing: All but the top
"""
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Paper: https://arxiv.org/abs/1702.01417
Last Updated: Fri 15 Nov 2019 11:47:00 AM CET
**Prior version had serious issues, please excuse any inconveniences.**
"""
import numpy as np
from sklearn.decomposition import PCA
def all_but_the_top(v, D):
"""
Arguments:
:v: word vectors of shape (n_words, n_dimensions)
:D: number of principal components to subtract
"""
# 1. Subtract mean vector
v_tilde = v - np.mean(v, axis=0)
# 2. Compute the first `D` principal components
# on centered embedding vectors
u = PCA(n_components=D).fit(v_tilde).components_ # [D, emb_size]
# Subtract first `D` principal components
# [vocab_size, emb_size] @ [emb_size, D] @ [D, emb_size] -> [vocab_size, emb_size]
return v_tilde - (v @ u.T @ u)
@liaocs2008

This comment has been minimized.

Copy link

@liaocs2008 liaocs2008 commented Jul 19, 2018

I think line 24 is not as in the paper. it should be like "u[:i].dot(v[w])*u[:,i].T" rather than the element-wise multiplication.

@aerinkim

This comment has been minimized.

Copy link

@aerinkim aerinkim commented Aug 12, 2018

Hey this implementation is wrong. You shouldn't use fit_transform. You need U, singular vector itelf.

@gojomo

This comment has been minimized.

Copy link

@gojomo gojomo commented Nov 14, 2019

Using this all-but-the-top transformation didn't result in the expected improvement on a word-vector evaluation. (Specifically, questions-words.txt analogies-correct on 'GoogleNews' top-200k words.) In fact, it drove accuracy down from 75.37% to 59.45%.

Using instead the all-but-the-top implemenation at https://github.com/s1998/All-but-the-top/blob/e0c7d758b495ad55868d9a14ecd31df86b79e4d3/src/embeddings_processor.py#L4 slightly improved accuracy, as would be expected from the paper's claims, to 75.79%.

So: more evidence this implementation is off.

@lgalke

This comment has been minimized.

Copy link
Owner Author

@lgalke lgalke commented Nov 14, 2019

@gojomo thanks for pointing that out! I will try to fix it as soon as possible. Please excuse the confusion.

@lgalke

This comment has been minimized.

Copy link
Owner Author

@lgalke lgalke commented Nov 14, 2019

I adapted the code to match the version of this implementation. Changes are also ported back in vec4ir.

@gojomo

This comment has been minimized.

Copy link

@gojomo gojomo commented Nov 15, 2019

Thanks - trying this version (that ends return v - (v_tilde @ u.T @ u)) has the expected behavior in my evaluations!

@lgalke

This comment has been minimized.

Copy link
Owner Author

@lgalke lgalke commented Nov 15, 2019

@gojomo I just double checked with the paper itself and it should be v_tilde - (v @ u.T @ u) instead of v - (v_tilde @....

Applying PCA to centered / non-centered versions should not make a difference. The important thing is to subtract the mean from the embeddings to match the paper. v_tilde holds the centered version in this gist so now we return the embeddings - mean - first D principal components.

@s1998: I think this last point is also not considered in your implementation. In line 11 you do not subtract from the centered version. Am I missing something?

@s1998

This comment has been minimized.

Copy link

@s1998 s1998 commented Nov 18, 2019

@lgalke You are correct, it should be mean centered embeddings that I need to subtract from. I'll fix that, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment