Skip to content

Instantly share code, notes, and snippets.

View thoolihan's full-sized avatar

Tim Hoolihan thoolihan

View GitHub Profile
@thoolihan
thoolihan / get_start_time.py
Created July 13, 2018 15:30
python consistent start time
from datetime import datetime
def get_curr_time():
return datetime.now().strftime("%Y.%m.%d.%H.%M.%S")
def get_start_time():
return _start_time if _start_time else get_curr_time()
_start_time = get_start_time()
@thoolihan
thoolihan / nba.R
Last active May 23, 2018 15:08
Graphing Sports Odds
library(ggplot2)
library(dplyr)
library(R.utils)
teams <- data.frame(team = c('warriors', 'rockets',
'cavaliers', 'celtics'),
odds_nw = c(5,9,8,20),
odds_w = c(9,4,1,1))
# raw probabilities sum to more than 1 because of house take
@thoolihan
thoolihan / target_encode.py
Created November 30, 2017 15:08
TargetEncoder
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
# Adapted from https://www.kaggle.com/ogrellier/python-target-encoding-for-categorical-features
class TargetEncoder(BaseEstimator, TransformerMixin):
def __init__(self, columns, noise_level = 0):
self.columns = columns
self.maps = {}
@thoolihan
thoolihan / gist:06d2d93d2618fd6535ffedaa40f33bff
Created November 8, 2017 16:06
creating week from date
> df <- data.frame(date = c('2017-10-01', '2017-10-11'))
> df$date <- as.Date(df$date)
> df
date
1 2017-10-01
2 2017-10-11
> sapply(df$date, function(d) {if (d < as.Date('2017-10-07')) 1 else 0})
[1] 1 0
> df$week1 <- sapply(df$date, function(d) {if (d < as.Date('2017-10-07')) 1 else 0})
> df
@thoolihan
thoolihan / arrange_plot.R
Last active August 3, 2017 21:09
arrange and plot not working together
library(tidyverse)
mtcars %>%
mutate(car = rownames(.)) %>%
arrange(hp) %>%
ggplot(aes(x = car, y = hp)) +
geom_point() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
@thoolihan
thoolihan / mc.R
Created June 30, 2017 17:42
Simulate 2 state Markov Chain from Wikipedia
# simulating https://en.wikipedia.org/wiki/Markov_chain#/media/File:Markovkate_01.svg
library(ggplot2)
means = c()
ntimes <- 1000
for (t in 1:ntimes) {
n <- 1000
state <- c(1)
@thoolihan
thoolihan / distance.R
Last active March 30, 2017 12:04
color distance
library(dplyr)
n <- 1000000
data <- data.frame(id = 1:n,
red = sample(0:255, size = n, replace = TRUE),
green = sample(0:255, size = n, replace = TRUE),
blue = sample(255, size = n, replace = TRUE))
query <- list(red = 80, green = 90, blue = 255)
@thoolihan
thoolihan / monty_hall.R
Last active March 22, 2017 18:33
Simulation of the famous Monty Hall problem in R
library(dplyr)
library(ggplot2)
doors <- 1:3
sample_doors <- function() { return(sample(doors, size = 1000, replace = TRUE))}
games <- data.frame(prize = sample_doors(), pick = sample_doors())
games$strategy <- factor(ifelse(games$prize == games$pick, 'stay', 'switch'))
monte_show <- function(prize, pick) {
@thoolihan
thoolihan / clustering.R
Last active May 27, 2019 22:33
A simple clustering example in R with kmeans and ggplot2
library(ggplot2)
cars <- mtcars
cars$cyl <- factor(cars$cyl, labels =
c('Four cylinder', 'Six cylinder', 'Eight cylinder'))
features <- c('wt', 'qsec')
n_clusters <- 3
car_clusters <- kmeans(cars[, features], n_clusters, nstart = 30)
@thoolihan
thoolihan / resize.py
Created February 21, 2017 18:30
Resize Example
import numpy as np
a = np.arange(1000)
a = a.reshape(2, 500)
a = a.resize((2,600), refcheck = False)