Skip to content

Instantly share code, notes, and snippets.

Avatar

patrickdrouin

View GitHub Profile
@patrickdrouin
patrickdrouin / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."
View gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py
def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
"""
Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
which they call the "local neighborhood measure." They find this measure better suited to understand
the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
rather than global changes in language ("linguistic drift") use that are better suited to a
Procrustes-alignment method (also described in the same paper.)
Arguments are:
@patrickdrouin
patrickdrouin / gensim_word2vec_make_semantic_network.py
Created Mar 19, 2019 — forked from quadrismegistus/gensim_word2vec_make_semantic_network.py
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
View gensim_word2vec_make_semantic_network.py
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""
@patrickdrouin
patrickdrouin / gensim_word2vec_procrustes_align.py
Created Mar 19, 2019 — forked from tangert/gensim_word2vec_procrustes_align.py
Code for aligning two or more word2vec models using Procrustes matrix alignment. Code originally ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
View gensim_word2vec_procrustes_align.py
def align_gensim_models(models, words=None):
"""
Returns the aligned/intersected models from a list of gensim word2vec models.
Generalized from original two-way intersection as seen above.
Also updated to work with the most recent version of gensim
Requires reduce from functools
In order to run this, make sure you run 'model.init_sims()' for each model before you input them for alignment.