Skip to content

Instantly share code, notes, and snippets.

View jseabold's full-sized avatar

Skipper Seabold jseabold

View GitHub Profile
@jseabold
jseabold / Difference_in_Differences_Differences.ipynb
Last active March 14, 2021 18:13
Working out some differences between R and Python for Causal Inference: The Mixtape
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
"""
This took me a while to figure out so posting for posterity.
plt.interactive(False) is important, if you want the grid to only show up
when `display`ed. Since `sns.FacetGrid` can take a few seconds depending
on the size of your data, this displays a spinner in the notebook cell
until the new graph is ready to render.
"""
import matplotlib.pyplot as plt
@jseabold
jseabold / docker_manifest.py
Created September 9, 2017 17:34
Some functions for dealing with docker registry manifests
import urllib
import docker
def get_manifest_auth_token(repo):
# https://docs.docker.com/registry/spec/auth/token/
query = urllib.parse.urlencode({
'service': 'registry.docker.io',
'scope': 'repository:{repo}:pull'.format(repo=repo)
@jseabold
jseabold / binary_rng.py
Created September 27, 2016 20:13
Create correlated binary variables. Based on Leisch, Weingessel, and Hornik (1998).
"""
Heavily inspired by the R package bindata.
"""
import numpy as np
from scipy import interpolate
from scipy import stats
def corr_to_joint(corr, marginals):
"""
@jseabold
jseabold / git_find_big.py
Created September 27, 2015 16:49
git filter-branch magic using Python
#! /usr/bin/env python
import glob
import os
import shutil
import re
from collections import namedtuple
import subprocess
from subprocess import PIPE
@jseabold
jseabold / spot_pricing.py
Created August 11, 2015 14:33
Plot EC2 spot pricing with boto3 and pandas
import pandas as pd
from boto3 import client
client = client(service_name='ec2')
prices = client.describe_spot_price_history(InstanceTypes=["m3.medium"],
AvailabilityZone="us-east-1a")
df = pd.DataFrame(prices['SpotPriceHistory'])
df.set_index("Timestamp", inplace=True)
df["SpotPrice"] = df.SpotPrice.astype(float)
@jseabold
jseabold / cat_transformers.py
Created January 28, 2015 20:46
sklearn transformers that can account for categorical variables
import numpy as np
from sklearn.base import TransformerMixin, BaseEstimator
class StandardTransformer(BaseEstimator, TransformerMixin):
def __init__(self, variables=[], ignore=[]):
self.variables = variables
self.ignore = ignore
self.transform_idx = np.asarray([True if i not in self.ignore
else False for i in self.variables])
@jseabold
jseabold / tufte.py
Last active February 19, 2019 12:44
Recreation of Tufte graphic in Python based on an Rstats blog post and gist http://asbcllc.com/blog/2015/January/gotham_2014_weather/ https://gist.github.com/abresler/46c36c1a88c849b94b07
import os
import calendar
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FixedLocator, FixedFormatter
import pandas as pd
import seaborn as sns
to_colors = lambda x : x/255.
@jseabold
jseabold / pymc_spatial_surival_debug.ipynb
Created June 25, 2014 16:47
Replicates Table 4 columns 3 and 5. Fails on Table 3 column 3 of Darmofal's "Bayesian Spatial Survival Models for Political Event Processes"
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jseabold
jseabold / cox_model.py
Last active June 11, 2018 22:22
Try to replicate BUGS code with PyMC for Table 3, Column 1 of "Bayesian Spatial Survival Models for Political Event Processes."
from pymc import Gamma, Poisson, Normal, MCMC, deterministic
import numpy as np
dta = dict(T=73, Nsubj=430, eps=0.0, t=[1, 21, 85, 128, 129, 148, 178, 204,
206, 210, 211, 212, 225, 238, 241,
248, 259, 273, 275, 281, 286, 289,
301, 302, 303, 304, 313, 317, 323,
344, 345, 349, 350, 351, 355, 356,
359, 364, 385, 386, 389, 390, 391,
392, 394, 395, 396, 397, 398, 399,