Skip to content

Instantly share code, notes, and snippets.

View saburbutt's full-sized avatar
🤠
Life is good

Sabur B saburbutt

🤠
Life is good
View GitHub Profile
@zhicongchen
zhicongchen / gensim_word2vec_procrustes_align.py
Last active June 3, 2024 01:54 — forked from quadrismegistus/gensim_word2vec_procrustes_align.py
Code for aligning two gensim word2vec models using Procrustes matrix alignment (updated for compatibility with Gensim 4.0 API). The code is modified from https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf, which is originally ported from HistWords by William Hamilton: https://github.com/williamleif/histwords
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""
Original script: https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf
Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.