Skip to content

Instantly share code, notes, and snippets.

View abehmiel's full-sized avatar

Abraham Hmiel abehmiel

View GitHub Profile
@abehmiel
abehmiel / btm.py
Created March 5, 2018 22:16 — forked from amintos/btm.py
Bi-term Topic Model implementation in pure Python
"""
Bi-Term Topic Model (BTM) for very short texts.
Literature Reference:
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng:
"A biterm topic model for short texts"
In Proceedings of WWW '13, Rio de Janeiro, Brazil, pp. 1445-1456.
ACM, DOI: https://doi.org/10.1145/2488388.2488514
This module requires pre-processing of textual data,
@abehmiel
abehmiel / install_packages.R
Created January 4, 2018 18:37
Install useful R packages data science
install.packages(
c(
"dplyr", # data manipulation
"tidyr", # data manipulation
"rmarkdown", # data presentation
"knitr", # data presentation
"RODBC", # database tools
"RMySQL", # database tools
"RPostgreSQL", # database tools
"RSQLite", # database tools
@abehmiel
abehmiel / clarify_pos.py
Created December 19, 2017 18:26
Part-of-speech clarifier from nltk
from nltk import pos_tag
from nltk.tag import str2tuple
"""
Usage:
dictionary_df['Pos'] = dictionary_df['Word'].apply(pos_maker)
dictionary_df['Help Definition'] = dictionary_df['Pos'].apply(clarify_pos)
"""
def clarify_pos(pos):
@abehmiel
abehmiel / gist:e5dd495ca6123fda20ee876d58a6cd8f
Created December 15, 2017 23:18 — forked from rohannog/gist:3861442
Decrypt pdf on command-line
qpdf --password=passwd --decrypt orig.pdf decrypted.pdf
#To input the password
read -s -p "Password: " password && qpdf --password=$password --decrypt orig.pdf decrypted.pdf
@abehmiel
abehmiel / regex.md
Created December 11, 2017 21:36 — forked from magicznyleszek/regex.md
RegEx Cheatsheet
@abehmiel
abehmiel / understanding-word-vectors.ipynb
Created November 19, 2017 03:07 — forked from aparrish/understanding-word-vectors.ipynb
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@abehmiel
abehmiel / spacy_intro.ipynb
Created November 16, 2017 23:03 — forked from aparrish/spacy_intro.ipynb
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@abehmiel
abehmiel / fix_exhibit_b.py
Created November 1, 2017 21:12
Convert tabular pdf data to a csv and also read it as a python dataframe
# It's really stupid when the gov't releases pdf's of tabular data. So I made a quick, hacky script to
# fix their mistakes for them. (I'm referring to https://t.co/oOyhHNVvjS )
# requirements:
# pandas
# tabula-py
import pandas as pd
from tabula import read_pdf
@abehmiel
abehmiel / figure_formatting.py
Created October 31, 2017 21:29 — forked from corbett/figure_formatting.py
Create beautiful square figures with big labels and the correct number of ticks
def create_figure(size=3.6,nxticks=6):
import matplotlib
from matplotlib.ticker import MaxNLocator
figure=matplotlib.pyplot.figure(figsize=(size,size))
ax = figure.add_subplot(1, 1, 1, position = [0.2, 0.15, 0.75, 0.75])
ax.xaxis.set_major_locator(MaxNLocator(nxticks))
return ax
def format_axes(ax,xf='%d',yf='%d',nxticks=6,nyticks=6,labelsize=10):
import pylab