Skip to content

Instantly share code, notes, and snippets.

Jack Schultz jackschultz

Block or report user

Report or block jackschultz

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@jackschultz
jackschultz / article-summarizer.clj
Created Aug 30, 2013
Clojure implementation of a semi-naive article summarizer. Takes the url supplied and attempts to find the num-sentences most "valuable" sentences ranked by most words in common with other sentences. To run, throw into Leiningen and download the opennlp binaries.
View article-summarizer.clj
(ns classify.core
(:use [boilerpipe-clj.core]
[opennlp.nlp]
[opennlp.treebank]
[clojure.pprint :only [pprint]]
[opennlp.tools.filters]
[clojure.set]
[stemmer.snowball])
(:gen-class))
@jackschultz
jackschultz / article-summarizer.py
Created Aug 30, 2013
Article summarizer written in python.
View article-summarizer.py
import nltk
from nltk.stem.wordnet import WordNetLemmatizer
import string
class SentenceRank(object):
def __init__(self, body, title):
self.body = body
self.sentence_list = nltk.tokenize.sent_tokenize(self.body)[:]
self.title = title
@jackschultz
jackschultz / classify.clj
Created Sep 1, 2013
Simple classification algorithm that uses urls and the text in the articles.
View classify.clj
(ns gb-or-syria.core
(:use [boilerpipe-clj.core]
[opennlp.nlp]
[opennlp.treebank]
[clojure.pprint :only [pprint]]
[opennlp.tools.filters]
[clojure.set]
[clojure.string :only [split-lines]]
[stemmer.snowball])
(:gen-class))
@jackschultz
jackschultz / ner.py
Created Sep 19, 2013
Named Entity Recognition with python
View ner.py
import nltk
import requests
FREEBASE_API_KEY = ''
class FindNames(object):
def __init__(self, text, freebase_api_key):
self.text = text
self.key = freebase_api_key
@jackschultz
jackschultz / ner_take2.py
Created Sep 23, 2013
Second revision of my NER system. More pluggable and less error prone.
View ner_take2.py
import nltk
import string
import pprint
import requests
import operator
import re
import logging
from collections import defaultdict
FREEBASE_API_KEY = ''
@jackschultz
jackschultz / ner-take2.py
Created Sep 25, 2013
Even better NER. I should probably change from gists to a regular repo.
View ner-take2.py
import nltk
import string
import requests
import operator
import re
import logging
from collections import defaultdict
pattern = '[A-Z][^A-Z]*'
@jackschultz
jackschultz / points_from_results.py
Last active Apr 13, 2019
This takes a Draftkings result csv file, and extracts the scores for the individual players by using the lineups and lineup point totals.
View points_from_results.py
import csv
import numpy as np
points_label = "Points"
lineup_label = "Lineup"
players = set()
num_lineups = 0
with open('outcome.csv', 'rb') as csvfile:
rows = csv.reader(csvfile)
@jackschultz
jackschultz / gist:38c8462d8c3b6d74f422
Created Jun 7, 2015
Analysis of nobel prize winners and their ages
View gist:38c8462d8c3b6d74f422
from bs4 import BeautifulSoup
import unicodedata
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.stats import norm
class Prize:
def __init__(self, name, age, year, prize_type, description):
self.name = unicodedata.normalize('NFKD', name).encode('ascii','ignore') #umlaut issues
self.age = age
@jackschultz
jackschultz / recombinator.py
Created Dec 17, 2015
Accepts json string, reformats the data and matches variable names with arrays of their values according to certain rules. Can be used from command line by running "python recombinator.py STRING" where STRING is valid json, or it can be imported and run with other python code.
View recombinator.py
import sys
import json
from collections import defaultdict
def parse_list_based_json(data):
out = {}
for index, value in enumerate(data[0]):
out[value] =[row[index] for row in data[1:]]
return out
@jackschultz
jackschultz / genius_song_scrape.py
Created Nov 23, 2016
For artist name in artist_names list, this will use Genius' API and website to download the info and lyrics (as best can be done with html scraping) into named folders in current directory. Need Geinus API Bearer token as well.
View genius_song_scrape.py
import requests
from bs4 import BeautifulSoup
import os, json
base_url = "http://api.genius.com"
headers = {'Authorization': 'Bearer GENIUS_API_BEARER_STRING'}
artist_names = ["Fleet Foxes"]
def artist_id_from_song_api_path(song_api_path, artist_name):
You can’t perform that action at this time.