Skip to content

Instantly share code, notes, and snippets.

Avatar

Ryan Heuser quadrismegistus

View GitHub Profile
View lltk-readme.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@quadrismegistus
quadrismegistus / pmap.py
Last active May 4, 2021
Easy parallel processing in python with progress bar
View pmap.py
"""
Simple mofo'n parallelism with progress bar. Born of frustration with p_tqdm.
"""
def pmap_do(inp):
@quadrismegistus
quadrismegistus / brute_txt.py
Last active Nov 2, 2020
Brute-force convert any file into a plain text string, as best as possible
View brute_txt.py
# install
# pip install bs4 fulltext epub-conversion pymupdf requests xml_cleaner html2text kitchen -q
# imports
import os
from kitchen.text.converters import to_unicode
# constants
WORKING_EXTS={'txt','pdf','epub','html','xml','htm'}
CONTENT_TAGS={'xml':['p'],'html':['p'],'htm':['p'],'epub':['p']}
View opentext.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@quadrismegistus
quadrismegistus / pyvis_for_networkx.py
Last active Jun 4, 2021
Draw Networkx Graph with Pyvis
View pyvis_for_networkx.py
def draw_graph3(networkx_graph,notebook=True,output_filename='graph.html',show_buttons=True,only_physics_buttons=False):
"""
This function accepts a networkx graph object,
converts it to a pyvis network object preserving its node and edge attributes,
and both returns and saves a dynamic network visualization.
Valid node attributes include:
"size", "value", "title", "x", "y", "label", "color".
(For more info: https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_node)
@quadrismegistus
quadrismegistus / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py
Last active Feb 20, 2020
This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."
View gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py
def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
"""
Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
which they call the "local neighborhood measure." They find this measure better suited to understand
the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
rather than global changes in language ("linguistic drift") use that are better suited to a
Procrustes-alignment method (also described in the same paper.)
Arguments are:
@quadrismegistus
quadrismegistus / gensim_word2vec_make_semantic_network.py
Last active Jun 7, 2020
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
View gensim_word2vec_make_semantic_network.py
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active Jun 15, 2021
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
View gensim_word2vec_procrustes_align.py
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.