Skip to content

Instantly share code, notes, and snippets.

Avatar

Abraham Hmiel abehmiel

View GitHub Profile
@abehmiel
abehmiel / btm.py
Created Mar 5, 2018 — forked from amintos/btm.py
Bi-term Topic Model implementation in pure Python
View btm.py
"""
Bi-Term Topic Model (BTM) for very short texts.
Literature Reference:
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng:
"A biterm topic model for short texts"
In Proceedings of WWW '13, Rio de Janeiro, Brazil, pp. 1445-1456.
ACM, DOI: https://doi.org/10.1145/2488388.2488514
This module requires pre-processing of textual data,
@abehmiel
abehmiel / install_packages.R
Created Jan 4, 2018
Install useful R packages data science
View install_packages.R
install.packages(
c(
"dplyr", # data manipulation
"tidyr", # data manipulation
"rmarkdown", # data presentation
"knitr", # data presentation
"RODBC", # database tools
"RMySQL", # database tools
"RPostgreSQL", # database tools
"RSQLite", # database tools
@abehmiel
abehmiel / clarify_pos.py
Created Dec 19, 2017
Part-of-speech clarifier from nltk
View clarify_pos.py
from nltk import pos_tag
from nltk.tag import str2tuple
"""
Usage:
dictionary_df['Pos'] = dictionary_df['Word'].apply(pos_maker)
dictionary_df['Help Definition'] = dictionary_df['Pos'].apply(clarify_pos)
"""
def clarify_pos(pos):
View gist:e5dd495ca6123fda20ee876d58a6cd8f
qpdf --password=passwd --decrypt orig.pdf decrypted.pdf
#To input the password
read -s -p "Password: " password && qpdf --password=$password --decrypt orig.pdf decrypted.pdf
@abehmiel
abehmiel / understanding-word-vectors.ipynb
Created Nov 19, 2017 — forked from aparrish/understanding-word-vectors.ipynb
Understanding word vectors: A tutorial for "Reading and Writing Electronic Text," a class I teach at ITP. (Python 2.7) Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
View understanding-word-vectors.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@abehmiel
abehmiel / spacy_intro.ipynb
Created Nov 16, 2017 — forked from aparrish/spacy_intro.ipynb
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
View spacy_intro.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@abehmiel
abehmiel / fix_exhibit_b.py
Created Nov 1, 2017
Convert tabular pdf data to a csv and also read it as a python dataframe
View fix_exhibit_b.py
# It's really stupid when the gov't releases pdf's of tabular data. So I made a quick, hacky script to
# fix their mistakes for them. (I'm referring to https://t.co/oOyhHNVvjS )
# requirements:
# pandas
# tabula-py
import pandas as pd
from tabula import read_pdf
@abehmiel
abehmiel / figure_formatting.py
Created Oct 31, 2017 — forked from corbett/figure_formatting.py
Create beautiful square figures with big labels and the correct number of ticks
View figure_formatting.py
def create_figure(size=3.6,nxticks=6):
import matplotlib
from matplotlib.ticker import MaxNLocator
figure=matplotlib.pyplot.figure(figsize=(size,size))
ax = figure.add_subplot(1, 1, 1, position = [0.2, 0.15, 0.75, 0.75])
ax.xaxis.set_major_locator(MaxNLocator(nxticks))
return ax
def format_axes(ax,xf='%d',yf='%d',nxticks=6,nyticks=6,labelsize=10):
import pylab
You can’t perform that action at this time.