|All-but-the-Top: Simple and Effective Postprocessing for Word Representations|
|Last Updated: Fri 15 Nov 2019 11:47:00 AM CET|
|**Prior version had serious issues, please excuse any inconveniences.**|
|import numpy as np|
|from sklearn.decomposition import PCA|
|def all_but_the_top(v, D):|
|:v: word vectors of shape (n_words, n_dimensions)|
|:D: number of principal components to subtract|
|# 1. Subtract mean vector|
|v_tilde = v - np.mean(v, axis=0)|
|# 2. Compute the first `D` principal components|
|# on centered embedding vectors|
|u = PCA(n_components=D).fit(v_tilde).components_ # [D, emb_size]|
|# Subtract first `D` principal components|
|# [vocab_size, emb_size] @ [emb_size, D] @ [D, emb_size] -> [vocab_size, emb_size]|
|return v_tilde - (v @ u.T @ u)|
Using this all-but-the-top transformation didn't result in the expected improvement on a word-vector evaluation. (Specifically,
Using instead the all-but-the-top implemenation at https://github.com/s1998/All-but-the-top/blob/e0c7d758b495ad55868d9a14ecd31df86b79e4d3/src/embeddings_processor.py#L4 slightly improved accuracy, as would be expected from the paper's claims, to 75.79%.
So: more evidence this implementation is off.
@gojomo I just double checked with the paper itself and it should be
Applying PCA to centered / non-centered versions should not make a difference. The important thing is to subtract the mean from the embeddings to match the paper.