Instantly share code, notes, and snippets.

🏠
Working from home
• Sort options
Last active Mar 1, 2020
Showing that the number of deaths by coronavirus is somewhat robust to CFR restricted above 1% (South Korea lower bound at 0.4%).
View coronavirus.R
 set.seed(1) N <- 1e7 # sims quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) } death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) } quantiles_of_interest(death_rate()) # very robust / ignorant belief of range of case 2.4k to 60M cases. susceptible_cases <- function() { runif(N, 0.0024, 150) }
Created Mar 1, 2020
Coronavirus estimates
View coronavirus.R
 set.seed(0) N <- 1e7 # sims quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) } death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) } # Statwonk's belief of final US death rate quantiles_of_interest(death_rate()) # 0.01% 25% 50% 75% 99.99%
Last active Feb 22, 2020
The essence of bootstrapping: resampling with replacement to estimate a sampling distribution and its statistics of interest.
View bootstrapping.R
 library(tidyverse) library(gganimate) # sample of 100 data points tibble(x = rexp(1e2)) -> d # 20 bootstrap samples, you likely want several orders of magnitude more. tibble(boot_samples = seq_len(2e1)) %>% mutate(data = map(boot_samples, ~ d %>% sample_frac(1, replace = TRUE))) %>% unnest(cols = c(data)) %>%
Created Dec 30, 2019
Measuring retweets, for fun!
View retweets.R
 library(fitdistrplus) library(countreg) library(tidyverse) library(rtweet) get_timeline("statwonk", n = 1e4) %>% filter(!is_retweet) %>% pull(retweet_count) -> my_retweets # I know a priori it's negbin,
Created Sep 7, 2019
The relationship between intra-arrival times and count models.
View durations_to_counts.R
 library(tidyverse) # https://www.kellogg.northwestern.edu/faculty/weber/decs-430/Notes%20on%20the%20Poisson%20and%20exponential%20distributions.pdf # http://www.mscs.mu.edu/~jsta/issues/11(3)/JSTA11(3)p2.pdf # Case 1: memoryless intra-arrival times. They're not correlated. tibble::tibble( durations_between = rexp(3e4, rate = 20) ) %>% mutate(time = cumsum(durations_between)) %>% mutate(time_interval = cut(time, breaks = seq(0, 1e4, 1), labels = FALSE)) %>%
Last active Sep 4, 2019
A simulation of a minimal multilevel model w/ two levels: inning level, opposing pitcher level.
View simple_multilevel.R
 library(tidyverse) library(brms) library(tidybayes) library(ggthemes) # Simulation settings innings <- 9 games <- 162 number_of_opposing_pitchers <- 20 # purposly set low to illustrate variation opposing_pitchers <- rnorm(number_of_opposing_pitchers, sd = 0.5) # a set of opposing pitcher effects
Last active Sep 28, 2019
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
View rootograms.R
 # install.packages("countreg", repos="http://R-Forge.R-project.org") # https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/ # https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression library(countreg) rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x table(x) hist(x, col = "orange") rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
Last active Aug 11, 2019