Skip to content

Instantly share code, notes, and snippets.

View mrecos's full-sized avatar

Matt Harris mrecos

View GitHub Profile
@mrecos
mrecos / statebins test.r
Last active August 29, 2015 14:15
Qucik setup to produce statebins choropleth-like map
@mrecos
mrecos / stratifiedCV.r
Last active August 14, 2020 17:50
Stratified K-folds Cross-Validation with Caret
require(caret)
#load some data
data(USArrests)
### Prepare Data (postive observations)
# add a column to be the strata. In this case it is states, it can be sites, or other locations
# the original data has 50 rows, so this adds a state label to 10 consecutive observations
USArrests$state <- c(rep(c("PA","MD","DE","NY","NJ"), each = 5))
# this replaces the existing rownames (states) with a simple numerical index
@mrecos
mrecos / parking_violation.R
Last active August 29, 2015 14:22
Data preparation for Philadelphia Parking Violations data. Part of Philly Transportation Hackathon http://phillyinnovates.com/2015/06/04/planes-trains-and-civic-hacking/
# I started to play with the data and used this to get it a bit cleaned up and prepared for analysis
# I am sure my code is not the most efficient or prettiest, but if there are errors, please let me know
# Perhaps you will find this usefull in saving a few minutes getting the data ready for the fun part; analysis!
# DATA : https://www.opendataphilly.org/dataset/parking-violation
library(data.table)
library(lubridate)
setwd(<<YOUR DIRECTORY>>) # Set WD if desired
data <- fread("Parking_Violations.csv")
dat <- data.frame(data) # convert to a data.frame from data table
@mrecos
mrecos / Temp_graphic.Rd
Last active August 29, 2015 14:25
A ggplot2 implementation of Tufte and Olson's city temperature charts
# Here is my attempt to use ggplot2 to recreate Randy Olson's (@randal_olson), recreation of the famous Tufte New York City Weather chart.
# The original Tufte chart is here:
# http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=00014g
# Randy Olson recreated this chart for a post on 538 (http://fivethirtyeight.com/) linked here:
# http://fivethirtyeight.com/features/what-12-months-of-record-setting-temperatures-looks-like-across-the-u-s/?ex_cid=538twitter
# and Randy posted his data and Python code here:
# https://github.com/mrecos/data/tree/master/us-weather-history
# This is a ggplot implementation of Randy's basic design, but without the top and right side axis and legend
# I think the legend in Randy's plot may be added in Illustrator of something
# The only real hard-code (aside from .csv location) here is the y-axis labels that show the data is from July '14 to June '15,
@mrecos
mrecos / density_plot.r
Created October 13, 2015 02:18
Attempt at model plot
# The original plot in question appears to be the results of a model
# based on scores and age, a distribution is predicted for score on
# each age class. The range of predicted scores is visualized as a
# normal'ish density over each age class. There is also a point which is the
# mean observation of the raw score at each age class. This does not need to
# be at the mean of the distribution depending on fit of the model.
# There is also a line that is the predicted response of model over age classes.
#
# I did not have the data or model from this example, so I improvised.
# I first created sythetic model data based on the example, and from the
@mrecos
mrecos / loader.r
Created October 20, 2015 14:55
R Package Loader
load.fun <- function(x) {
x <- as.character(x)
if(isTRUE(x %in% .packages(all.available=TRUE))) {
eval(parse(text=paste("require(", x, ")", sep="")))
print(paste(c(x, " : already installed; requiring"), collapse=''))
} else {
if(x == "twitteR"){
print(paste(c(x, " : not installed; installing"), collapse=''))
install_github("twitteR", username="geoffjentry")
print(paste(c(x, " : installed and requiring"), collapse=''))
@mrecos
mrecos / Multi point drive times and compare.r
Last active November 9, 2015 15:41
Function and code for comparing drive times from multiple origins to multiple destinations; with charting
# Driving distance from multuple starts to multilpe destinations
# modified from the code of gmapsdistance package by Rodrigo Azuero Melo
# https://cran.r-project.org/web/packages/gmapsdistance
gdist2 <- function (origin, destination, mode, key)
{
typeKEY = class(key)
if (typeKEY != "character") {
stop("Key should be string")
}
@mrecos
mrecos / DF string split and concatenate.r
Created February 8, 2016 21:53
Two R functions; 1) recode values in a vector based on a look up table including replacement for NA and no matching codes; 2) take a vector of concatenated codes, split based on a character, recode, and concatenate back together.
lut_match <- function(LUT, code_field, desc_col = 2, is_NA = "NA", no_code = "N/A"){
require(stringr)
require(splitstackshape)
# this loops through all unique codes in code_field. Potentially slow
code_field[which(code_field == "")] <- NA
unique_code <- unique(code_field)
coded <- code_field
for(i in seq_along(unique_code)){
if(is.na(unique_code[i])){
coded[is.na(coded)] <- is_NA # set this to whatever works
@mrecos
mrecos / Test_vs_Train_error.R
Created February 17, 2016 03:02
Code from my blog post on model overfitting. Code generates simulated data, has an inner loop that fits that data given a range of polynomials in lm(), and nests that in a loop that does it over a bunch of simulated data sets. There are two plots, 1) an animated gif of the fit line for each polynomial on the train and test set and 2) a overall c…
# function to generate samples of data
get_data <- function(n, exp_i, intercept, beta1, sigma, plot = TRUE){
library(lattice)
x <- sample(seq(1,100,0.1), n, replace = TRUE)
gen_model <- intercept + beta1 * x^(exp_i)
y <- rnorm(n, gen_model, sigma)
dat <- data.frame(x, y)
if(isTRUE(plot)){
print(xyplot(y~x, type=c("smooth", "p"),col.line = "darkorange", lwd = 2))
}
@mrecos
mrecos / Game of Zombies.R
Last active April 12, 2020 02:23
John Ramey's (@ramhiser) implementation for Conway's Game of Life, but updated for gganimate package and adding the undead cells
library('foreach')
library('ggplot2')
library('gganimate')
# library('animation') # require for original code
library('reshape2')
library('doParallel') # for multicore support
## Guts and working concept from post below.
## http://johnramey.net/blog/2011/06/05/conways-game-of-life-in-r-with-ggplot2-and-animation/#comments
## I updated ggplot, melt, varnames, gganimate, and faded cells