Instantly share code, notes, and snippets.

🏠
Working from home

🏠
Working from home
• Sort options
Created Dec 30, 2019
Measuring retweets, for fun!
View retweets.R
 library(fitdistrplus) library(countreg) library(tidyverse) library(rtweet) get_timeline("statwonk", n = 1e4) %>% filter(!is_retweet) %>% pull(retweet_count) -> my_retweets # I know a priori it's negbin,
Created Sep 7, 2019
The relationship between intra-arrival times and count models.
View durations_to_counts.R
 library(tidyverse) # https://www.kellogg.northwestern.edu/faculty/weber/decs-430/Notes%20on%20the%20Poisson%20and%20exponential%20distributions.pdf # http://www.mscs.mu.edu/~jsta/issues/11(3)/JSTA11(3)p2.pdf # Case 1: memoryless intra-arrival times. They're not correlated. tibble::tibble( durations_between = rexp(3e4, rate = 20) ) %>% mutate(time = cumsum(durations_between)) %>% mutate(time_interval = cut(time, breaks = seq(0, 1e4, 1), labels = FALSE)) %>%
Last active Sep 4, 2019
A simulation of a minimal multilevel model w/ two levels: inning level, opposing pitcher level.
View simple_multilevel.R
 library(tidyverse) library(brms) library(tidybayes) library(ggthemes) # Simulation settings innings <- 9 games <- 162 number_of_opposing_pitchers <- 20 # purposly set low to illustrate variation opposing_pitchers <- rnorm(number_of_opposing_pitchers, sd = 0.5) # a set of opposing pitcher effects
Last active Sep 28, 2019
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
View rootograms.R
 # install.packages("countreg", repos="http://R-Forge.R-project.org") # https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/ # https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression library(countreg) rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x table(x) hist(x, col = "orange") rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
Last active Aug 11, 2019
 library(tidyverse) expand.grid( risk = seq(0.001, 0.02, 0.001), units_of_exposure = seq_len(24*7) ) %>% as_tibble() %>% mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y))) %>% ggplot(aes(x = risk, y = units_of_exposure)) + geom_raster(aes(fill = total_risk), alpha = 0.90) + scale_fill_gradient2(name = "Total chance", low = "white", mid = "white", high = "purple4", midpoint = 0.5,
Created Jul 12, 2019
Assuming outages are totally unrelated and occur at a rate of every other month, what's the chance of seeing more than four in a given week? About one in seven million.
View outages.R
 # https://news.ycombinator.com/item?id=20356610 # chance of the cluster size being >5 when we can cluster events within a one-week period # and the rate of events is one every 2 months (debatable, but feels right to me)? # My re-write: # probability of more than four outages in a given week given a rate of every other month # _assuming_: totally unrelated infra (none share AWS, etc.) ppois(4, lambda = 1/2/4.5, lower.tail = FALSE) # ~ 1/7M chance. # Given that we've observed five in one-week,
Created Jul 11, 2019
Time to 10% ending / converting / failure.
View quantile_analysis.R
 library(tidyverse) library(survival) rweibull_cens <- function(n, shape, scale) { # http://statwonk.com/weibull.html a_random_death_time <- rweibull(n, shape = shape, scale = scale) a_random_censor_time <- rweibull(n, shape = shape, scale = scale) observed_time <- pmin(a_random_censor_time, a_random_death_time) censor <- observed_time == a_random_death_time tibble(time = observed_time, censor = censor)
Created Jul 6, 2019
A minimal Bayesian multi-level model. 12 groups.
View minimal_mlm.R
 library(brms) library(tidybayes) N <- 1e3 number_of_groups <- 12 group_means <- rnorm(number_of_groups) tibble(obs = seq_len(N)) %>% mutate(group = sample(seq_len(number_of_groups), n(), replace = TRUE), contribution = group_means[group],
Created Jun 25, 2019
An example of checking the statistical coverage properties of an algorithm.
View prediction_intervals.R
 library(tidyverse) tibble(boot = seq_len(1e2)) %>% mutate(prediction_intervals = map(boot, function(i) { rpois(100, 10) -> x (sum(x[6:100]) -> total_oracle) # Given the first five, what can we say about the sum of 95 more? x[1:5] -> first_five
Created Jun 19, 2019
Part of the Normal family of distributions.
View normals.R
 library(tidyverse) plot.new() par(bg = "black") plot(density(rnorm(1e6)), col = "deeppink", lwd = 1, xlab = "", ylab = "", main = "Normal distribution", axes = FALSE) seq(1, 10, 0.05) %>% map(function(x){ rnorm(1e5, 0, x)}) %>% map(function(x) { density(x) %>% lines(col = colors() %>% Filter(function(.x) grepl("pink", .x), .) %>% sample(1), lwd = sample(seq(0.1, 0.1, 0.01), 1),
You can’t perform that action at this time.