Skip to content

Instantly share code, notes, and snippets.


Josh Strupp joshstrupp

  • ISL
  • Washington, DC
View GitHub Profile
joshstrupp / Subreddit similarity
Created Jul 22, 2020
Generate similarity scores between Subreddit using LDA modeling
View Subreddit similarity
import gensim
from gensim.corpora import Dictionary
from gensim.models import ldamodel
from gensim.matutils import hellinger
from gensim.matutils import kullback_leibler
import pandas as pd
import praw
import nltk
from pprint import pprint
joshstrupp / Subreddit Sentiment & Keyword Analysis
Last active Jul 22, 2020
Generating keyword and sentiment insights for select Subreddit(s)
View Subreddit Sentiment & Keyword Analysis
import pandas as pd
import praw
import nltk
import random
from pprint import pprint
# Enter your own client_id, client_secret, username and password, or follow this quick start guide:
reddit = praw.Reddit(user_agent='Comment Extraction (by /u/USERNAME)',client_id='enter_here',client_secret="enter_here",username='enter_here', password='enter_here')
from textblob import TextBlob
joshstrupp /
Last active Oct 25, 2019
ISL Experiments ReadMe

About Experiments

What are Experiments?

Experiments defined: internally-produced proofs of concept, prototypes, and products.

Why do we do them?

  1. Learn new tricks: To highlight (and expand) ISL’s capabilities.
  2. Stay sharp: Exercise the collective idea muscle of ISL.
  3. Reinforce inventive spirit: Create morale around collaborative passion projects.
  4. Maintain talent appeal: Recruit best of the best talent.
  5. Mega-boost biz dev: Create additional case studies and relevant work for pitches, leads, and general marketing.
View Netflix Shuffler for Roku
from roku import Roku;
import time;
import random;
with open("/Users/josh/Desktop/1-1000.txt") as f:
wordlist = []
for line in f:
joshstrupp /
Last active Jun 18, 2020
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split
from sklearn import metrics
from math import exp
import numpy as np
import matplotlib.pyplot as plt
nfl2000 = pd.read_csv('nfl2000stats.csv', sep=',') #13-3
joshstrupp / gist:6c8b4fad0719d4877d56
Created May 18, 2015 Question - Log. Regression Model May 17
View gist:6c8b4fad0719d4877d56
nfl = pd.concat([nfl2000, nfl2001, nfl2002, nfl2003, nfl2004, nfl2005, nfl2006, nfl2007, nfl2008, nfl2009, nfl2010, nfl2011, nfl2012, nfl2013], axis=0)
nfl['WinLoss'] = np.where(nfl.ScoreOff > nfl.ScoreDef, 1, 0)
feature_cols = ['Date', 'FirstDownDef', 'FirstDownOff', 'FumblesDef', 'FumblesOff', 'Line', 'Opponent', 'PassAttDef', 'PassAttOff', 'PassCompDef', 'PassCompOff', 'PassIntDef', 'PassIntOff', 'PassYdsDef', 'PassYdsOff', 'PenYdsDef', 'PenYdsOff', 'PuntAvgOff', 'RushAttDef', 'RushAttOff', 'RushYdsDef', 'RushYdsOff', 'SackNumDef', 'SackNumOff', 'SackYdsDef', 'SackYdsOff', 'ScoreDef', 'ScoreOff', 'Site', 'TeamName', 'ThirdDownPctDef', 'ThirdDownPctOff', 'TimePossDef', 'TimePossOff', 'TotalLine', 'Totalline', 'Totalline ', 'WinLoss']
X = nfl[feature_cols]
y = nfl
View Pandas Homework
Part 1
Load the data (
into a DataFrame. Try looking at the "head" of the file in the command line
to see how the file is delimited and how to load it.
Note: You do not need to turn in any command line code you may use.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np