Skip to content

Instantly share code, notes, and snippets.

View mrecos's full-sized avatar

Matt Harris mrecos

View GitHub Profile
@mrecos
mrecos / sf point_in_poly.r
Last active December 4, 2019 15:35
A repro example to get data, and aggregate points into polygons over a list with purrr::map and then animate with ggplot
library(corrplot)
library(viridis)
library(stargazer)
library(tidyverse)
library(dplyr)
library(sf)
library(tigris)
library(ggplot2)
library(rgdal)
library(maptools)
@mrecos
mrecos / purrr_example_iris.r
Created December 3, 2019 23:52
Quick example of purrr::nest analysis
library(tidyverse)
g <- glimpse
g(iris)
dat <- iris %>%
nest(data = c(-Species))
dat$data[[1]]
@mrecos
mrecos / multiclass_confusion_matrix.R
Last active March 6, 2019 05:03
Reproducible example for the ggplot design and approach to making a mulitclass ggplot confusion matrix
### Example of ggplot code for multiclass confusion matrix with caret::confusionMatrix and ggplot
### `Example_plot1` is the result of applying `caret::confusionMatrix()` to the outcome ...
### of a model that included a reference class and a predicted class; both as factors
### calling `as.data.frame(Example_plot1$table)` casts the predicted class frequency table from the ...
### `caret::confusionMatrix()` object into a nice long format table of columns `Reference`, `Prediction`, and `Freq`.
### Do this for a bunch of models, and then use `cowplot::plot_grid()` to arrange them.
library(tidyverse)
library(cowplot)
library(caret)
@mrecos
mrecos / precip_deviation_by_year.R
Created December 17, 2018 00:58
Code for downloading and plotting deviation in average precipitation for a given weather station. Using R and ggplot
library('rnoaa')
library("tidyverse")
library("lubridate")
library("ggrepel")
token = 'GET YOUR API KEY at: http://www.ncdc.noaa.gov/cdo-web/token'
locs <- ncdc_locs(locationcategoryid='CITY', sortfield='name', sortorder='desc', token = token, limit = 800)
loc_data <- locs$data
dplyr::filter(loc_data, grepl(", PA",loc_data$name))
@mrecos
mrecos / Beach not Beach NN Loop.r
Last active January 11, 2018 18:09
R stats code for building NN and looping over hidden layer node density for animated gif output. NN code attributed to David Selby; http://selbydavid.com/2018/01/09/neural-network/
########################################################################
### Bespoke Neural Network R code attributed to: David Selby
### From blog post: http://selbydavid.com/2018/01/09/neural-network/
### Adapted here for making animated GIF of node density
### output gifs compiled at gifmaker.me for final output
### output tweeted here:
### https://twitter.com/Md_Harris/status/951257342418608128
########################################################################
two_spirals <- function(N = 200,
@mrecos
mrecos / Purrr Grid Search Parallel.R
Last active December 24, 2017 20:46
A bit of code for conducting parallelized random grid-search of randomForest hyperparameters using purrr::map() and futures (for multicore/multisession). This is a bit of a proof-of-concept as there are plenty of ways to iterate over a grid and do CV. Also, especially with randomForest, this is very memory inefficient. However, the approach may …
### ------- Load Packages ---------- ###
library("purrr")
library("future")
library("dplyr")
library("randomForest")
library("rsample")
library("ggplot2")
library("viridis")
### ------- Helper Functions for map() ---------- ###
# breaks CV splits into train (analysis) and test (assessmnet) sets
@mrecos
mrecos / ggraph_transformed_space.R
Created April 12, 2017 20:33
A bit of code for simulating site areas and environmental gradients to be plotted as two networks in both geographic and feature space. Supporting a graphic in this tweet: https://twitter.com/Md_Harris/status/851983574249209856
library("ggraph")
library("igraph")
library("ggplot2")
# create some simulated sites that contain various numbers of measurements on a regular grid
site_dat <- data.frame(size = rep(c("A","B","C","D"), times = c(3,4,8,5)),
x = c(25,26,26,
40,41,41,40,
15,16,17,17,16,15,15,16,
40,41,42,42,41),
@mrecos
mrecos / Logistic_Kernel_Ridge_Regression.stan
Last active January 31, 2017 02:55
DRAFT! Logistic Kernel Ridge Regression Stan model. Parameters: alpha_hat = fitted coefficients, yhat2 = estimated train response; Arguments: N = number of training samples (bags), P = dimensions of Gram matrix or kernel (usually same as N), K = Gram or kernel matrix, y = response of training data (0,1), lambda = regularization coefficient in KR…
data {
int<lower=0> N;
int<lower=0> P;
matrix[P, P] K;
vector[N] y;
//real lambda;
}
transformed data{
// this block results verified with KRR_logit() analytical solution
vector[N] q;
@mrecos
mrecos / Kernel_Ridge_Regression.R
Created January 24, 2017 23:59
Analytical solution to Kernel Ridge Regression. Process: 1) Simulate N data points, 2) define N x N kernel as desired (RBF here), 3) Perform KRR by regularizing kernel by lambda and solving for 'y', 4) estimate response as y = K %*% \alpha
### Simualte some one-dimensional data
# Constants
a = 50
b = 50
c = 80
N = 10 # low dimensions help to visualize matrix
#Limits
x_upper <- 100
x_lower <-.01
spacing = (x_upper-x_lower)/(N-1)
@mrecos
mrecos / Logistic_Kernel_Ridge_Regression_Prediction.R
Created January 24, 2017 23:00
Function for predicting Logistic Kernel Ridge Regression. Arguments: test_data = testing data set, train_data = data used to train model, alphas_pred = alpha parameters resulting from KRR_logit_optim() or KRR_logit(), sigma = sigma parameter of Gaussian RBF kernel, dist_method = distance method from `proxy` package, progress = T/F progress bar
KRR_logit_predict <- function(test_data, train_data, alphas_pred, sigma, dist_method = "Euclidean", progress = TRUE){
# example: KRR_logit_predict(test_dat, train_dat, theSol, sigma)
pred_yhat <- matrix(nrow = length(test_data), ncol = length(train_data))
if(isTRUE(progress)){
total_iter <- length(test_data) * length(train_data)
pb <- txtProgressBar(min = 0, max = total_iter, style = 3)
}
iter <- 0
for(j in 1:length(test_data)){
for(i in 1:length(train_data)){