Skip to content

Instantly share code, notes, and snippets.

View statwonk's full-sized avatar
🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
View GitHub Profile
@statwonk
statwonk / inezs_weibulls.R
Created June 1, 2019 22:06
;O{) # o-----\. # /.;/?}
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rweibull(1e6, 2, 0.1)), col = "pink3", lwd = 1, ylim = c(0, 20),
xlab = "", ylab = "", main = "Weibull distribution", axes = FALSE)
seq(1, 5, 0.25) %>%
map(function(x){ rweibull(1e6, x, 0.1)}) %>%
map(function(x) { density(x) %>%
lines(col = sample(colors(), 1),
lwd = sample(seq(0.1, 3, 0.1), 1),
@statwonk
statwonk / beta-distribution.R
Created June 2, 2019 14:35
A visualization of the beta distribution.
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rbeta(1e6, 50, 50)), col = "pink3", lwd = 1, ylim = c(0, 8),
xlab = "", ylab = "", main = "Beta distribution", axes = FALSE,
col.main = 'white')
map2(c(seq(1, 49, 1), 50),
c(seq(1, 49, 1), 50),
function(x, y){ message(x,y);rbeta(1e6, x + 1, y + 1)}) %>%
map(function(x) { density(x) %>%
@statwonk
statwonk / normals.R
Created June 19, 2019 04:27
Part of the Normal family of distributions.
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rnorm(1e6)), col = "deeppink", lwd = 1,
xlab = "", ylab = "", main = "Normal distribution", axes = FALSE)
seq(1, 10, 0.05) %>%
map(function(x){ rnorm(1e5, 0, x)}) %>%
map(function(x) { density(x) %>%
lines(col = colors() %>% Filter(function(.x) grepl("pink", .x), .) %>% sample(1),
lwd = sample(seq(0.1, 0.1, 0.01), 1),
@statwonk
statwonk / prediction_intervals.R
Created June 25, 2019 22:49
An example of checking the statistical coverage properties of an algorithm.
library(tidyverse)
tibble(boot = seq_len(1e2)) %>%
mutate(prediction_intervals = map(boot, function(i) {
rpois(100, 10) -> x
(sum(x[6:100]) -> total_oracle)
# Given the first five, what can we say about the sum of 95 more?
x[1:5] -> first_five
@statwonk
statwonk / minimal_mlm.R
Created July 6, 2019 19:25
A minimal Bayesian multi-level model. 12 groups.
library(brms)
library(tidybayes)
N <- 1e3
number_of_groups <- 12
group_means <- rnorm(number_of_groups)
tibble(obs = seq_len(N)) %>%
mutate(group = sample(seq_len(number_of_groups), n(), replace = TRUE),
contribution = group_means[group],
@statwonk
statwonk / quantile_analysis.R
Created July 11, 2019 15:43
Time to 10% ending / converting / failure.
library(tidyverse)
library(survival)
rweibull_cens <- function(n, shape, scale) {
# http://statwonk.com/weibull.html
a_random_death_time <- rweibull(n, shape = shape, scale = scale)
a_random_censor_time <- rweibull(n, shape = shape, scale = scale)
observed_time <- pmin(a_random_censor_time, a_random_death_time)
censor <- observed_time == a_random_death_time
tibble(time = observed_time, censor = censor)
@statwonk
statwonk / outages.R
Created July 12, 2019 13:52
Assuming outages are totally unrelated and occur at a rate of every other month, what's the chance of seeing more than four in a given week? About one in seven million.
# https://news.ycombinator.com/item?id=20356610
# chance of the cluster size being >5 when we can cluster events within a one-week period
# and the rate of events is one every 2 months (debatable, but feels right to me)?
# My re-write:
# probability of more than four outages in a given week given a rate of every other month
# _assuming_: totally unrelated infra (none share AWS, etc.)
ppois(4, lambda = 1/2/4.5, lower.tail = FALSE) # ~ 1/7M chance.
# Given that we've observed five in one-week,
@statwonk
statwonk / risk_adds_up.R
Last active August 11, 2019 13:27
Risk adds up. This code piece answers, "how quickly?" https://twitter.com/statwonk/status/1160542394544267265
library(tidyverse)
expand.grid(
risk = seq(0.001, 0.02, 0.001),
units_of_exposure = seq_len(24*7)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y))) %>%
ggplot(aes(x = risk, y = units_of_exposure)) +
geom_raster(aes(fill = total_risk), alpha = 0.90) +
scale_fill_gradient2(name = "Total chance",
low = "white", mid = "white", high = "purple4", midpoint = 0.5,
@statwonk
statwonk / rootograms.R
Last active September 28, 2019 07:11
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
# install.packages("countreg", repos="http://R-Forge.R-project.org")
# https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/
# https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression
library(countreg)
rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x
table(x)
hist(x, col = "orange")
rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
@statwonk
statwonk / simple_multilevel.R
Last active September 4, 2019 13:16
A simulation of a minimal multilevel model w/ two levels: inning level, opposing pitcher level.
library(tidyverse)
library(brms)
library(tidybayes)
library(ggthemes)
# Simulation settings
innings <- 9
games <- 162
number_of_opposing_pitchers <- 20 # purposly set low to illustrate variation
opposing_pitchers <- rnorm(number_of_opposing_pitchers, sd = 0.5) # a set of opposing pitcher effects