Skip to content

Instantly share code, notes, and snippets.

@brshallo
brshallo / piecewise-spearman-corelation.md
Last active Jun 7, 2021
Mockup piecewise weighted Spearman Correlation based on derivatives of GAM model... per conversations with Shaina and Ricky
View piecewise-spearman-corelation.md
library(tidyverse)
library(mgcv)
library(gratia)

n_obs <- 10000

set.seed(12345)
new_dat <- tibble(x = rnorm(n_obs, sd = 2*pi),
                  y = sin(x) + rnorm(n_obs, sd = 0.5))
@brshallo
brshallo / continuous-engagements.md
Created Jun 3, 2021
Identify longest continuous engagement by customer
View continuous-engagements.md

library(tidyverse)
library(lubridate)

T <- tibble(
  customer = c("c1", "c2", "c1", "c2", "c2", "c2"),
  start_date = ymd(c(
    20210107, 20210109, 20210201, 20210225, 20210314, 20210401
View sumrows-concat-cols.md
library(tidyverse)

data <- tibble(
  a = c(0, 0, 1),
  b = c(0, 1, 0),
  c = c(1, 1, 1)
)

data_sums <- data %>% 
@brshallo
brshallo / bootstrap-mean-rsample.R
Created Apr 27, 2021
Initial approach to bootstrapping a mean value, but is much slower than just using base::sample() so used that...
View bootstrap-mean-rsample.R
library(tidymodels)
### MUCH faster computationally to use base R `sample()` for this step... so did not use this approach
resamples <- rsample::bootstraps(preds, 5000)
avg_diff_sample <- function(split){
analysis(split) %>%
summarise(diff = mean(diff_abs_resids)) %>%
pull(diff)
}
View ts-prep-ml-exampl.md
library(tidyverse)
library(lubridate)

date <- ymd(20200101) + months(1:7)
company <- c("a", "b")

sim_rw <- function(start = 0, n = 7, mean = 1){
  arima.sim(model = list(order = c(0, 1, 0)), n = n - 1, mean = mean) %>% 
    as.numeric() %>% 
@brshallo
brshallo / undo-yeo-johnson.md
Created Apr 13, 2021
Useful for undoing transformation applied to predictions. See tidymodels/recipes#264 (https://github.com/tidymodels/recipes/issues/264) for discussion, though was closed without solution.
View undo-yeo-johnson.md
library(tidymodels)

rec_prep <- recipe(cty ~ ., data = mpg) %>% 
  step_YeoJohnson(cty) %>% 
  prep(data = mpg)

yj_estimate <- rec_prep %>% 
  tidy(1) %>% 
  pluck("value", 1)
@brshallo
brshallo / predict-interval-boot-only.R
Last active Jul 26, 2021
Prep interval and then produce prediction interval on a new data set. See thread: https://community.rstudio.com/t/prediction-intervals-with-tidymodels-best-practices/82594/15 also see prior set-up: https://gist.github.com/brshallo/3db2cd25172899f91b196a90d5980690 . The approach at this gist is similar but uses the bootstrapped residuals to produ…
View predict-interval-boot-only.R
library(tidyverse)
library(tidymodels)
# Control function used as part of `prep_interval()`
ctrl_fit_recipe <- function(x){
output <- list(fit = workflows::pull_workflow_fit(x),
recipe = workflows::pull_workflow_prepped_recipe(x))
c(output, list(resids =
bind_cols(
@brshallo
brshallo / source_rmd.R
Last active Apr 1, 2021 — forked from noamross/source_rmd.R
Source an RMD file
View source_rmd.R
#' Source the R code from an knitr file, optionally skipping plots
#'
#' @param file the knitr file to source
#' @param skip_plots whether to make plots. If TRUE (default) sets a null graphics device
#'
#' @return This function is called for its side effects
#' @export
source_rmd = function(file, skip_plots = TRUE) {
temp = tempfile(fileext=".R")
knitr::purl(file, output=temp)
@brshallo
brshallo / source-rmd-chunks.r
Last active May 19, 2021
Function for sourcing individual or multiple chunks from an RMD document
View source-rmd-chunks.r
library(magrittr)
library(stringr)
library(readr)
library(purrr)
library(glue)
library(knitr)
source_rmd_chunks <- function(file, chunk_labels, skip_plots = TRUE, output_temp = FALSE){
temp <- tempfile(fileext=".R")
View slice_sample-vs-sample_n_of.md
library(tidyverse)

sample_n_of <- function(data, size, ...) {
  dots <- quos(...)
  
  group_ids <- data %>% 
    group_by(!!! dots) %>% 
    group_indices()