Skip to content

Instantly share code, notes, and snippets.

View gavinmh's full-sized avatar

Gavin Hackeling gavinmh

View GitHub Profile
asdasdasdasdas
@gavinmh
gavinmh / viterbi.py
Created November 19, 2012 04:27
Viterbi Algorithm
# -*- coding: utf-8 -*-
"""
This is an example of a basic optical character recognition system.
Some components, such as the featurizer, are missing, and have been replaced
with data that I made up.
This system recognizes words produced from an alphabet of 2 letters: 'l' and 'o'.
Words that can be recognized include, 'lol', 'lolol', 'and loooooll'.
We'll assume that this system is used to digitize hand-written notes by Redditors,
or something.
@gavinmh
gavinmh / featurizer_sub.py
Created December 4, 2012 01:59
Lexical entailment featurizer for substitution edits
from __future__ import division
from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic
from nltk.metrics import edit_distance
from nltk.corpus.reader.wordnet import WordNetError
import numpy as np
import logging, os
import Alignment_sub
@gavinmh
gavinmh / Alignment_sub.py
Created December 4, 2012 02:05
A substitution alignment
# -*- coding: utf-8 -*-
"""
Created on Fri Nov 23 11:25:40 2012
@author: gavin
"""
import logging
from nltk.corpus import wordnet as wn
class Alignment_sub:
@gavinmh
gavinmh / classifier_substition.py
Created December 4, 2012 02:07
Substition classifier
try:
import cPickle as pickle
except:
import pickle
from sklearn.ensemble import RandomForestClassifier
import logging, os
class Lexent_classifier_sub:
@gavinmh
gavinmh / harness_substitution.py
Created December 4, 2012 02:08
substitution classifier harness
import logging
import numpy as np
import Alignment_sub
import lexent_featurizer_sub
try:
import cPickle as pickle
except:
import pickle
@gavinmh
gavinmh / 52-displaylink.conf
Created December 22, 2012 17:41
Dual head with DisplayLink for Linux
@gavinmh
gavinmh / naive_summarizer
Last active December 10, 2015 03:58
A naive, unsupervised text summarizer.
# -*- coding: utf-8 *-*
'''
The following is a naive, unsupervised text summarizer.
It extracts N of the text's most salient sentences.
Salience is defined as the average of the tf-idf weights of the words in a sentence.
'''
from nltk import sent_tokenize, word_tokenize
from collections import Counter
from math import log10
@gavinmh
gavinmh / ner.py
Last active December 5, 2018 18:59
Named Entity Extraction with NLTK in Python
# -*- coding: utf-8 -*-
'''
'''
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk
def extract_entities(text):
entities = []
for sentence in sent_tokenize(text):
@gavinmh
gavinmh / made-in-nyc-jobs.tsv
Last active September 13, 2016 17:42
.tsv of companies that are hiring in NYC and job URLS from http://wearemadeinny.com/find-a-job/
10Gen (The MongoDB Company) http://www.10gen.com/careers
1stdibs.com http://www.1stdibs.com/jobs/
20x200 http://www.20x200.com/jobs/
29th Street Publishing http://29.io
2tor Inc. http://2tor.com/careers/
303 Network, Inc.
33Across http://33across.com/careers.php#axzz1uqxl0v16
360i http://360i.com/careers
3degrees http://toprubyjobs.com/jobs/399-3degrees-cto-%252F-lead-rails-engineer
680 Partners LLC http://www.680partners.com