Skip to content

Instantly share code, notes, and snippets.

View thekensta's full-sized avatar

Chris Kenwright thekensta

View GitHub Profile
thekensta /
Last active August 26, 2015 17:07
IPython Rpy2 passing parameters
# Quick summary to access stuff in R from ipython
# Useful link but summary somehwat buried
import numpy as np
%load_ext rpy2.ipython
# %R [-i INPUT] [-o OUTPUT] [-n] [-w WIDTH] [-h HEIGHT] [-p POINTSIZE]
# [-b BG] [–noisolation] [-u {px,in,cm,mm}] [-r RES] [code [code ...]]
thekensta / grouping_sets_and_rollup.sql
Last active August 27, 2015 16:33
Grouping and Rollup SQL aggregation
-- Reference
-- TODO: add more detail, this is syntax reference for me
Select fname, food, sum(total)
From lateral(
('Bob', 'Pies', 3),
('Charlie', 'Pies', 1),
thekensta /
Last active August 29, 2015 14:22
Extract date components from Date column in pandas dataframe
# Extracting date components from a Date column in Pandas using IPython
# Converting to DatetimeIndex is 100x faster than using DataFrame.apply()
import pandas as pd
dates = pd.DataFrame({"Date": pd.date_range(start="1970-01-01", end="2037-12-31")})
# Date
# 0 1970-01-01
# 1 1970-01-02
thekensta / AutoArima.R
Last active August 29, 2015 14:25
Auto Arima from data.frame embedding forecast in actuals
## Wrap forecast auto.arima(..) and forecast(..) into a data.frame
## Embeds the forecast into the data.frame
## Allow passing an EndDate so that the forecast can start mid-actuals
## (helps with visualization and exaplantion)
## Usage:
## Forecast.df <- AutoArimaForecast(Monthly.df, # DataFrame with
## H = 6, # Predict 6 months forward
thekensta /
Created September 21, 2015 12:07
SVD Image Compression
# Ipython code using SVD to extract components of an image
%matplotlib inline
import matplotlib.pyplot as plt
import as cmap
import numpy as np
from scipy import ndimage
# Any image file here, this is colourso convert to greyscale
DOG_IMAGE_FILE = "dog2.jpg"
thekensta /
Last active September 23, 2015 14:50
Numpy Basic Operations Cheat Sheet
# Storing basic operations here, as I tend to forget them!
# Fill as required (or fill as forgotten?? :-)
# Repeat and Tile
# Repeat copies by element and flattens
# Tile copies sequences and preserves shape
a = np.array([1, 2, 3])
print(np.tile(a, 2))
thekensta /
Created October 27, 2015 16:04
Spark Shell Script
# submit with spark-submit
# Spark 1.5.1
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("showMeTheSchema").setMaster("local")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
thekensta / ab_mc.ipynb
Last active November 5, 2015 16:15
Interactive AB split test via Monte Carlo Integration
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
thekensta /
Created November 13, 2015 10:30
Call power.prop.test via Python
# Need to change the values extracted, serves as aide-memoire for R <-> Python
def parse_robj(obj):
"""Extract n, p1 and p2 from R List from power.prop.test. """
return (obj[obj.names.index("n")][0],
def call_power_prop_test(p1, n):
P__ = %R power.prop.test(p1=$p1, n=$n, power=0.8)
return parse_robj(P__)
thekensta /
Last active November 16, 2015 00:07
Summary of least squares in Python
# Quick reminder of least squares calculations in python
import numpy as np
def least_sq_numpy(x, y):
"""Calculate y = mx + c from x, y returning m, c using numpy."""
A = np.vstack([x, np.ones(x.size)]).T
fit = np.linalg.lstsq(A, y)
return fit[0]