Skip to content

Instantly share code, notes, and snippets.

🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
Block or report user

Report or block statwonk

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@statwonk
statwonk / rootograms.R
Last active Sep 28, 2019
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
View rootograms.R
# install.packages("countreg", repos="http://R-Forge.R-project.org")
# https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/
# https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression
library(countreg)
rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x
table(x)
hist(x, col = "orange")
rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
@statwonk
statwonk / durations_to_counts.R
Created Sep 7, 2019
The relationship between intra-arrival times and count models.
View durations_to_counts.R
library(tidyverse)
# https://www.kellogg.northwestern.edu/faculty/weber/decs-430/Notes%20on%20the%20Poisson%20and%20exponential%20distributions.pdf
# http://www.mscs.mu.edu/~jsta/issues/11(3)/JSTA11(3)p2.pdf
# Case 1: memoryless intra-arrival times. They're not correlated.
tibble::tibble(
durations_between = rexp(3e4, rate = 20)
) %>%
mutate(time = cumsum(durations_between)) %>%
mutate(time_interval = cut(time, breaks = seq(0, 1e4, 1), labels = FALSE)) %>%
@statwonk
statwonk / simple_multilevel.R
Last active Sep 4, 2019
A simulation of a minimal multilevel model w/ two levels: inning level, opposing pitcher level.
View simple_multilevel.R
library(tidyverse)
library(brms)
library(tidybayes)
library(ggthemes)
# Simulation settings
innings <- 9
games <- 162
number_of_opposing_pitchers <- 20 # purposly set low to illustrate variation
opposing_pitchers <- rnorm(number_of_opposing_pitchers, sd = 0.5) # a set of opposing pitcher effects
@statwonk
statwonk / beta-distribution.R
Created Jun 2, 2019
A visualization of the beta distribution.
View beta-distribution.R
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rbeta(1e6, 50, 50)), col = "pink3", lwd = 1, ylim = c(0, 8),
xlab = "", ylab = "", main = "Beta distribution", axes = FALSE,
col.main = 'white')
map2(c(seq(1, 49, 1), 50),
c(seq(1, 49, 1), 50),
function(x, y){ message(x,y);rbeta(1e6, x + 1, y + 1)}) %>%
map(function(x) { density(x) %>%
@statwonk
statwonk / risk_adds_up.R
Last active Aug 11, 2019
Risk adds up. This code piece answers, "how quickly?" https://twitter.com/statwonk/status/1160542394544267265
View risk_adds_up.R
library(tidyverse)
expand.grid(
risk = seq(0.001, 0.02, 0.001),
units_of_exposure = seq_len(24*7)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y))) %>%
ggplot(aes(x = risk, y = units_of_exposure)) +
geom_raster(aes(fill = total_risk), alpha = 0.90) +
scale_fill_gradient2(name = "Total chance",
low = "white", mid = "white", high = "purple4", midpoint = 0.5,
@statwonk
statwonk / outages.R
Created Jul 12, 2019
Assuming outages are totally unrelated and occur at a rate of every other month, what's the chance of seeing more than four in a given week? About one in seven million.
View outages.R
# https://news.ycombinator.com/item?id=20356610
# chance of the cluster size being >5 when we can cluster events within a one-week period
# and the rate of events is one every 2 months (debatable, but feels right to me)?
# My re-write:
# probability of more than four outages in a given week given a rate of every other month
# _assuming_: totally unrelated infra (none share AWS, etc.)
ppois(4, lambda = 1/2/4.5, lower.tail = FALSE) # ~ 1/7M chance.
# Given that we've observed five in one-week,
@statwonk
statwonk / quantile_analysis.R
Created Jul 11, 2019
Time to 10% ending / converting / failure.
View quantile_analysis.R
library(tidyverse)
library(survival)
rweibull_cens <- function(n, shape, scale) {
# http://statwonk.com/weibull.html
a_random_death_time <- rweibull(n, shape = shape, scale = scale)
a_random_censor_time <- rweibull(n, shape = shape, scale = scale)
observed_time <- pmin(a_random_censor_time, a_random_death_time)
censor <- observed_time == a_random_death_time
tibble(time = observed_time, censor = censor)
@statwonk
statwonk / minimal_mlm.R
Created Jul 6, 2019
A minimal Bayesian multi-level model. 12 groups.
View minimal_mlm.R
library(brms)
library(tidybayes)
N <- 1e3
number_of_groups <- 12
group_means <- rnorm(number_of_groups)
tibble(obs = seq_len(N)) %>%
mutate(group = sample(seq_len(number_of_groups), n(), replace = TRUE),
contribution = group_means[group],
@statwonk
statwonk / prediction_intervals.R
Created Jun 25, 2019
An example of checking the statistical coverage properties of an algorithm.
View prediction_intervals.R
library(tidyverse)
tibble(boot = seq_len(1e2)) %>%
mutate(prediction_intervals = map(boot, function(i) {
rpois(100, 10) -> x
(sum(x[6:100]) -> total_oracle)
# Given the first five, what can we say about the sum of 95 more?
x[1:5] -> first_five
@statwonk
statwonk / normals.R
Created Jun 19, 2019
Part of the Normal family of distributions.
View normals.R
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rnorm(1e6)), col = "deeppink", lwd = 1,
xlab = "", ylab = "", main = "Normal distribution", axes = FALSE)
seq(1, 10, 0.05) %>%
map(function(x){ rnorm(1e5, 0, x)}) %>%
map(function(x) { density(x) %>%
lines(col = colors() %>% Filter(function(.x) grepl("pink", .x), .) %>% sample(1),
lwd = sample(seq(0.1, 0.1, 0.01), 1),
You can’t perform that action at this time.