Skip to content

Instantly share code, notes, and snippets.

View gjreda's full-sized avatar

Greg Reda gjreda

View GitHub Profile
@gjreda
gjreda / espn-cbb.py
Created October 26, 2013 22:24
Grabs college basketball play-by-play data for a given date range. Example usage: python espn.cbb.py 2013-01-01 2013-01-07
from bs4 import BeautifulSoup
from urllib2 import urlopen
from datetime import datetime, timedelta
from time import sleep
import sys
import csv
# CONSTANTS
ESPN_URL = "http://scores.espn.go.com"
@gjreda
gjreda / gist:8611946
Created January 25, 2014 04:37
Weird numpy/pandas groupby behavior when using min() on a np.datetime64 field.
# OSX 10.7.5
# python 2.7.5
# pandas 0.13.0
# numpy 1.8.0
import pandas as pd
import numpy as np
from StringIO import StringIO
d = """row1,'2013-10-01'
@gjreda
gjreda / gist:7433f5f70299610d9b6b
Last active April 11, 2023 16:23
pandas' read_csv parse_dates vs explicit date conversion
# When you're sure of the format, it's much quicker to explicitly convert your dates than use `parse_dates`
# Makes sense; was just surprised by the time difference.
import pandas as pd
from datetime import datetime
to_datetime = lambda d: datetime.strptime(d, '%m/%d/%Y %H:%M')
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime'])
# CPU times: user 1min 29s, sys: 331 ms, total: 1min 29s
# Wall time: 1min 30s
@gjreda
gjreda / coding-music.md
Last active August 18, 2017 15:33
Music to code by
@gjreda
gjreda / pandas-groupby-cumsum.py
Last active October 19, 2018 03:45
add grouped cumulative sum column to pandas dataframe
"""
add grouped cumulative sum column to pandas dataframe
Add a new column to a pandas dataframe which holds the cumulative sum for a given grouped window
Desired output:
user_id,day,session_minutes,cumulative_minutes
516530,0,NaN,0
516530,1,0,0
516532,0,5,5
@gjreda
gjreda / pandas-groupby-cumulative-count-with-reset.py
Last active August 2, 2016 14:51
days since last login -- pandas groupby cumulative count with reset
# for creating a column like "days since last login"
df = pd.read_clipboard(index_col=['customer_id', 'days'])
(df
.groupby(level='customer_id')
.did_login
.cumsum()
.to_frame()
.groupby(level='customer_id')
.apply(lambda g: g.groupby('did_login').cumcount())
@gjreda
gjreda / concurrent_futures_example.py
Created September 25, 2016 21:14
example of using Python3's concurrent.futures module
from concurrent.futures import ProcessPoolExecutor
import concurrent.futures
from halas.parsers import boxscore
GAMES = [ ... ]
results = []
with ProcessPoolExecutor(max_workers=4) as executor:
future_results = {executor.submit(boxscore, game):
@gjreda
gjreda / useful-one-liners.sh
Last active April 11, 2017 03:45
Random bash one-liners that are useful but I always forget
# Installing/upgrading old requirements.txt from python2 to python3
sed s/\=/\ /g requirements.txt | awk '{print $1}' | xargs -n1 pip3 install --upgrade
[
{
"id": "805168201126518784",
"text": "@ryanisaac this is the weirdest quarter of football I’ve seen in a while"
},
{
"id": "804818096942968833",
"text": "@SportsTribution I don't follow."
},
{
id text
805168201126518784 @ryanisaac this is the weirdest quarter of football I’ve seen in a while
804818096942968833 @SportsTribution I don't follow.
804816669281546240 Looking for a weekend longread? @samhinkie's resignation letter is still one of the best things I've read in 2016 https://t.co/y7464DISgX
804759041318813696 Have used Postico to query our Redshift cluster for the last few months and it's been great. Similar to Sequel Pro. https://t.co/NN0DvdCpa6
804699067590840320 @jrmontag @tanehisicoates Agreed. Important book.
804690839221964801 The Year of the Looking Glass: Building Products https://t.co/0MVAbxeSze
804469380352446464 "So how do we build trust? The easy answer is by producing high quality work. The hard part is how you get there." https://t.co/M4MgJYU2Wm
804015210621239297 Holywow this looks awesome. Continuously impressed by the data products the @awscloud team keeps churning out: https://t.co/jmkLqFjyn7
803734870706896896 RT @jevnin: I'd recommend working with this guy. https://t.