Skip to content

Instantly share code, notes, and snippets.

@patrickdrouin
patrickdrouin / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."
def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
"""
Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
which they call the "local neighborhood measure." They find this measure better suited to understand
the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
rather than global changes in language ("linguistic drift") use that are better suited to a
Procrustes-alignment method (also described in the same paper.)
Arguments are:
@patrickdrouin
patrickdrouin / gensim_word2vec_make_semantic_network.py
Created March 19, 2019 20:40 — forked from quadrismegistus/gensim_word2vec_make_semantic_network.py
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""
@patrickdrouin
patrickdrouin / gensim_word2vec_procrustes_align.py
Created March 19, 2019 20:41 — forked from tangert/gensim_word2vec_procrustes_align.py
Code for aligning two or more word2vec models using Procrustes matrix alignment. Code originally ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
def align_gensim_models(models, words=None):
"""
Returns the aligned/intersected models from a list of gensim word2vec models.
Generalized from original two-way intersection as seen above.
Also updated to work with the most recent version of gensim
Requires reduce from functools
In order to run this, make sure you run 'model.init_sims()' for each model before you input them for alignment.