Skip to content

Instantly share code, notes, and snippets.

View jackschultz's full-sized avatar

Jack Schultz jackschultz

View GitHub Profile
@jackschultz
jackschultz / gist:38c8462d8c3b6d74f422
Created June 7, 2015 19:10
Analysis of nobel prize winners and their ages
from bs4 import BeautifulSoup
import unicodedata
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from scipy.stats import norm
class Prize:
def __init__(self, name, age, year, prize_type, description):
self.name = unicodedata.normalize('NFKD', name).encode('ascii','ignore') #umlaut issues
self.age = age
@jackschultz
jackschultz / recombinator.py
Created December 17, 2015 17:56
Accepts json string, reformats the data and matches variable names with arrays of their values according to certain rules. Can be used from command line by running "python recombinator.py STRING" where STRING is valid json, or it can be imported and run with other python code.
import sys
import json
from collections import defaultdict
def parse_list_based_json(data):
out = {}
for index, value in enumerate(data[0]):
out[value] =[row[index] for row in data[1:]]
return out
@jackschultz
jackschultz / article-summarizer.clj
Created August 30, 2013 22:45
Clojure implementation of a semi-naive article summarizer. Takes the url supplied and attempts to find the num-sentences most "valuable" sentences ranked by most words in common with other sentences. To run, throw into Leiningen and download the opennlp binaries.
(ns classify.core
(:use [boilerpipe-clj.core]
[opennlp.nlp]
[opennlp.treebank]
[clojure.pprint :only [pprint]]
[opennlp.tools.filters]
[clojure.set]
[stemmer.snowball])
(:gen-class))
@jackschultz
jackschultz / classify.clj
Created September 1, 2013 21:09
Simple classification algorithm that uses urls and the text in the articles.
(ns gb-or-syria.core
(:use [boilerpipe-clj.core]
[opennlp.nlp]
[opennlp.treebank]
[clojure.pprint :only [pprint]]
[opennlp.tools.filters]
[clojure.set]
[clojure.string :only [split-lines]]
[stemmer.snowball])
(:gen-class))
@jackschultz
jackschultz / ner_take2.py
Created September 23, 2013 16:03
Second revision of my NER system. More pluggable and less error prone.
import nltk
import string
import pprint
import requests
import operator
import re
import logging
from collections import defaultdict
FREEBASE_API_KEY = ''
@jackschultz
jackschultz / ner-take2.py
Created September 25, 2013 15:38
Even better NER. I should probably change from gists to a regular repo.
import nltk
import string
import requests
import operator
import re
import logging
from collections import defaultdict
pattern = '[A-Z][^A-Z]*'
@jackschultz
jackschultz / genius_song_scrape.py
Created November 23, 2016 17:16
For artist name in artist_names list, this will use Genius' API and website to download the info and lyrics (as best can be done with html scraping) into named folders in current directory. Need Geinus API Bearer token as well.
import requests
from bs4 import BeautifulSoup
import os, json
base_url = "http://api.genius.com"
headers = {'Authorization': 'Bearer GENIUS_API_BEARER_STRING'}
artist_names = ["Fleet Foxes"]
def artist_id_from_song_api_path(song_api_path, artist_name):
@jackschultz
jackschultz / ner.py
Created September 19, 2013 16:52
Named Entity Recognition with python
import nltk
import requests
FREEBASE_API_KEY = ''
class FindNames(object):
def __init__(self, text, freebase_api_key):
self.text = text
self.key = freebase_api_key
@jackschultz
jackschultz / article-summarizer.py
Created August 30, 2013 23:21
Article summarizer written in python.
import nltk
from nltk.stem.wordnet import WordNetLemmatizer
import string
class SentenceRank(object):
def __init__(self, body, title):
self.body = body
self.sentence_list = nltk.tokenize.sent_tokenize(self.body)[:]
self.title = title
@jackschultz
jackschultz / points_from_results.py
Last active April 13, 2019 19:52
This takes a Draftkings result csv file, and extracts the scores for the individual players by using the lineups and lineup point totals.
import csv
import numpy as np
points_label = "Points"
lineup_label = "Lineup"
players = set()
num_lineups = 0
with open('outcome.csv', 'rb') as csvfile:
rows = csv.reader(csvfile)