Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / keybase.md
Created March 10, 2019 16:54
keybase.md

Keybase proof

I hereby claim:

  • I am ramhiser on github.
  • I am ramhiser (https://keybase.io/ramhiser) on keybase.
  • I have a public key ASBotXs-LlQCC_m4Y3nVJlvF-fOMjq9idZtoXkYd-jekzQo

To claim this, I am signing this object:

@ramhiser
ramhiser / changepoint-linear-regression.r
Last active May 21, 2019 02:57
Bayesian changepoint detection in linear regression with R and Stan
# Based on this blog post: http://nowave.it/pages/bayesian-changepoint-detection-with-r-and-stan.html
library(rstan)
rstan_options(auto_write = TRUE)
set.seed(42)
beta0 <- 3
beta1 <- 9
beta2 <- 15
set.seed(42)
@ramhiser
ramhiser / export_scikit_pipeline.py
Created May 31, 2017 21:15
JSON Export of a scikit-learn Pipeline object
import json
def fullname(o):
return o.__module__ + "." + o.__class__.__name__
def export_pipeline(scikit_pipeline):
"""JSON export of a scikit-learn pipeline.
Especially useful when paired with GridSearchCV, TPOT, etc.
@ramhiser
ramhiser / confidence_interval.py
Last active May 31, 2017 13:34
Confidence Interval for the mean of a Normal distribution using scipy and Pythonn
from scipy import stats
import numpy as np
def mean_confidence_interval(x, alpha=0.05):
"""Computes two-sided confidence interval for a Normal mean
Assumes population variance is unknown.
x is assumed to be a list or a 1-d Numpy array
"""
@ramhiser
ramhiser / brms-nonlinear.r
Last active February 26, 2023 18:14
Adding fixed effects and random effects to a nonlinear Stan model via brms
# The data set and model are described in the *brms* vignette
library(brms)
url <- paste0("https://raw.githubusercontent.com/mages/diesunddas/master/Data/ClarkTriangle.csv")
loss <- read.csv(url)
set.seed(42)
# Generated a random continuous feature
loss$ramey <- runif(nrow(loss))
@ramhiser
ramhiser / illustrate-overfit.r
Last active July 24, 2018 15:06
An illustration of underfitting and overfitting on an unknown curve compared with a random forest
library(tidyverse)
library(randomForest)
library(rpart)
set.seed(42)
num_points <- 20
x <- sort(runif(num_points, min=-5, max=6))
y <- x^2/5 + sin(3*x) # + rnorm(num_points, sd=0.1)
df <- data_frame(x=x, y=y)
@ramhiser
ramhiser / stan-dogs.r
Created March 3, 2017 17:32
Stan Implementation of a Log-linear Model for the Dogs Data Set
# The Dogs data set was analyzed by D.V. Lindley using a loglinear model for binary data
# For details about the Dogs data set and model, see: http://www.openbugs.net/Examples/Dogs.html
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
num_dogs <- 30
num_trials <- 25
Y <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@ramhiser
ramhiser / character2factor.r
Created February 10, 2017 19:13
Convert all character columns to factors using dplyr in R
library(dplyr)
iris_char <- iris %>%
mutate(Species=as.character(Species),
char_column=sample(letters[1:5], nrow(iris), replace=TRUE))
sum(sapply(iris_char, is.character)) # 2
iris_factor <- iris_char %>%
mutate_if(sapply(iris_char, is.character), as.factor)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species char_column
# "numeric" "numeric" "numeric" "numeric" "character" "character"
@ramhiser
ramhiser / stratify.r
Last active April 1, 2021 14:22
Stratified Sampling in R with dplyr
# Uses a subset of the Iris data set with different proportions of the Species factor
set.seed(42)
iris_subset <- iris[c(1:50, 51:80, 101:120), ]
stratified_sample <- iris_subset %>%
group_by(Species) %>%
mutate(num_rows=n()) %>%
sample_frac(0.4, weight=num_rows) %>%
ungroup
@ramhiser
ramhiser / symmetric-MAPE.r
Created January 12, 2017 21:26
Symmetric MAPE following visualization in Figure 3c of the MAAPE Paper
library(dplyr)
library(ggplot2)
A <- seq(0, 10, length=100)
F <- seq(0, 10, length=100)
symmetric_mape <- function(A, F) {
abs(F - A) / (abs(A) + abs(F))
}