Skip to content

Instantly share code, notes, and snippets.

View quadrismegistus's full-sized avatar

Dr. Ryan Heuser quadrismegistus

View GitHub Profile
@quadrismegistus
quadrismegistus / logwatch.py
Created July 8, 2023 12:36
Quick "logwatch" (logger + stopwatch)
# !pip install humanfriendly loguru
from humanfriendly import format_timespan
from loguru import logger
import time
class Logwatch:
def __init__(self, name='Task', level='DEBUG'):
self.started = None
self.ended = None
@quadrismegistus
quadrismegistus / nb2md.py
Created July 22, 2021 23:31
Convert jupyter to markdown (github compatible)
#!/usr/bin/env python3
import sys,os,bs4
def nb2py(fn):
if not os.path.exists(fn): return
os.system(f'jupyter nbconvert --to markdown {fn}')
fn_md=os.path.splitext(fn)[0]+'.md'
if not os.path.exists(fn_md): return
with open(fn_md) as f: txt=f.read()
dom=bs4.BeautifulSoup(txt,'lxml')
@quadrismegistus
quadrismegistus / lltk-readme.ipynb
Last active June 17, 2022 08:13
lltk-readme.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@quadrismegistus
quadrismegistus / pmap.py
Last active May 4, 2021 13:21
Easy parallel processing in python with progress bar
"""
Simple mofo'n parallelism with progress bar. Born of frustration with p_tqdm.
"""
def pmap_do(inp):
@quadrismegistus
quadrismegistus / brute_txt.py
Last active November 2, 2020 02:08
Brute-force convert any file into a plain text string, as best as possible
# install
# pip install bs4 fulltext epub-conversion pymupdf requests xml_cleaner html2text kitchen -q
# imports
import os
from kitchen.text.converters import to_unicode
# constants
WORKING_EXTS={'txt','pdf','epub','html','xml','htm'}
CONTENT_TAGS={'xml':['p'],'html':['p'],'htm':['p'],'epub':['p']}
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@quadrismegistus
quadrismegistus / pyvis_for_networkx.py
Last active July 20, 2023 12:59
Draw Networkx Graph with Pyvis
def draw_graph3(networkx_graph,notebook=True,output_filename='graph.html',show_buttons=True,only_physics_buttons=False):
"""
This function accepts a networkx graph object,
converts it to a pyvis network object preserving its node and edge attributes,
and both returns and saves a dynamic network visualization.
Valid node attributes include:
"size", "value", "title", "x", "y", "label", "color".
(For more info: https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_node)
@quadrismegistus
quadrismegistus / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py
Last active January 24, 2023 14:52
This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."
def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
"""
Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
which they call the "local neighborhood measure." They find this measure better suited to understand
the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
rather than global changes in language ("linguistic drift") use that are better suited to a
Procrustes-alignment method (also described in the same paper.)
Arguments are:
@quadrismegistus
quadrismegistus / gensim_word2vec_make_semantic_network.py
Last active June 7, 2020 15:14
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""
@quadrismegistus
quadrismegistus / gensim_word2vec_procrustes_align.py
Last active November 16, 2023 01:57
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>. [NOTE: This code is DEPRECATED for latest versions of gensim. Please see instead this updated version of the code <https://gist.github.com/zhicongchen/9e23…
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <wleif@stanford.edu>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
Return other_embed.