Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / stan-dogs.r
Created March 3, 2017 17:32
Stan Implementation of a Log-linear Model for the Dogs Data Set
# The Dogs data set was analyzed by D.V. Lindley using a loglinear model for binary data
# For details about the Dogs data set and model, see: http://www.openbugs.net/Examples/Dogs.html
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
num_dogs <- 30
num_trials <- 25
Y <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@ramhiser
ramhiser / brms-nonlinear.r
Last active February 26, 2023 18:14
Adding fixed effects and random effects to a nonlinear Stan model via brms
# The data set and model are described in the *brms* vignette
library(brms)
url <- paste0("https://raw.githubusercontent.com/mages/diesunddas/master/Data/ClarkTriangle.csv")
loss <- read.csv(url)
set.seed(42)
# Generated a random continuous feature
loss$ramey <- runif(nrow(loss))
@ramhiser
ramhiser / random-forest.r
Created October 22, 2014 21:57
Plots Variable Importance from Random Forest in R
library(randomForest)
library(dplyr)
library(ggplot2)
set.seed(42)
rf_out <- randomForest(Species ~ ., data=iris)
# Extracts variable importance (Mean Decrease in Gini Index)
# Sorts by variable importance and relevels factors to match ordering
@ramhiser
ramhiser / latlong2fips.r
Created May 6, 2014 03:35
Latitude/Longitude to FIPS Codes via the FCC's API
# FCC's Census Block Conversions API
# http://www.fcc.gov/developers/census-block-conversions-api
latlong2fips <- function(latitude, longitude) {
url <- "http://data.fcc.gov/api/block/find?format=json&latitude=%f&longitude=%f"
url <- sprintf(url, latitude, longitude)
json <- RCurl::getURL(url)
json <- RJSONIO::fromJSON(json)
as.character(json$County['FIPS'])
}
@ramhiser
ramhiser / character2factor.r
Created February 10, 2017 19:13
Convert all character columns to factors using dplyr in R
library(dplyr)
iris_char <- iris %>%
mutate(Species=as.character(Species),
char_column=sample(letters[1:5], nrow(iris), replace=TRUE))
sum(sapply(iris_char, is.character)) # 2
iris_factor <- iris_char %>%
mutate_if(sapply(iris_char, is.character), as.factor)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species char_column
# "numeric" "numeric" "numeric" "numeric" "character" "character"
@ramhiser
ramhiser / try_backoff.r
Last active August 2, 2022 15:11
Try/catch in R with exponential backoff
#' Try/catch with exponential backoff
#'
#' Attempts the expression in \code{expr} up to the number of tries specified in
#' \code{max_attempts}. Each time a failure results, the functions sleeps for a
#' random amount of time before re-attempting the expression. The upper bound of
#' the backoff increases exponentially after each failure.
#'
#' For details on exponential backoff, see:
#' \url{http://en.wikipedia.org/wiki/Exponential_backoff}
#'
@ramhiser
ramhiser / export_scikit_pipeline.py
Created May 31, 2017 21:15
JSON Export of a scikit-learn Pipeline object
import json
def fullname(o):
return o.__module__ + "." + o.__class__.__name__
def export_pipeline(scikit_pipeline):
"""JSON export of a scikit-learn pipeline.
Especially useful when paired with GridSearchCV, TPOT, etc.
@ramhiser
ramhiser / date-range.py
Last active July 5, 2022 10:55
Python generator to construct range of dates
from datetime import datetime, timedelta
def date_range(start, end, step=7, date_format="%m-%d-%Y"):
"""
Creates generator with a range of dates.
The dates occur every 7th day (default).
:param start: the start date of the date range
:param end: the end date of the date range
:param step: the step size of the dates
@ramhiser
ramhiser / jaccard.py
Last active November 4, 2021 08:41
Jaccard cluster similarity in Python
import itertools
def jaccard(labels1, labels2):
"""
Computes the Jaccard similarity between two sets of clustering labels.
The value returned is between 0 and 1, inclusively. A value of 1 indicates
perfect agreement between two clustering algorithms, whereas a value of 0
indicates no agreement. For details on the Jaccard index, see:
http://en.wikipedia.org/wiki/Jaccard_index
@ramhiser
ramhiser / huber.py
Created January 21, 2015 17:39
Robust Estimation of Mean and Standard Deviation in Python via the Huber Estimator
import numpy as np
from statsmodels.robust.scale import huber
# Mean and standard deviation to generate normal random variates
mean, std_dev = 0, 2
sample_size = 25
np.random.seed(42)
x = np.random.normal(mean, std_dev, sample_size)
# Appends a couple of outliers