Skip to content

Instantly share code, notes, and snippets.

🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
Block or report user

Report or block statwonk

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@statwonk
statwonk / risk_adds_up.R
Last active Aug 11, 2019
Risk adds up. This code piece answers, "how quickly?" https://twitter.com/statwonk/status/1160542394544267265
View risk_adds_up.R
library(tidyverse)
expand.grid(
risk = seq(0.001, 0.02, 0.001),
units_of_exposure = seq_len(24*7)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y))) %>%
ggplot(aes(x = risk, y = units_of_exposure)) +
geom_raster(aes(fill = total_risk), alpha = 0.90) +
scale_fill_gradient2(name = "Total chance",
low = "white", mid = "white", high = "purple4", midpoint = 0.5,
@statwonk
statwonk / outages.R
Created Jul 12, 2019
Assuming outages are totally unrelated and occur at a rate of every other month, what's the chance of seeing more than four in a given week? About one in seven million.
View outages.R
# https://news.ycombinator.com/item?id=20356610
# chance of the cluster size being >5 when we can cluster events within a one-week period
# and the rate of events is one every 2 months (debatable, but feels right to me)?
# My re-write:
# probability of more than four outages in a given week given a rate of every other month
# _assuming_: totally unrelated infra (none share AWS, etc.)
ppois(4, lambda = 1/2/4.5, lower.tail = FALSE) # ~ 1/7M chance.
# Given that we've observed five in one-week,
@statwonk
statwonk / quantile_analysis.R
Created Jul 11, 2019
Time to 10% ending / converting / failure.
View quantile_analysis.R
library(tidyverse)
library(survival)
rweibull_cens <- function(n, shape, scale) {
# http://statwonk.com/weibull.html
a_random_death_time <- rweibull(n, shape = shape, scale = scale)
a_random_censor_time <- rweibull(n, shape = shape, scale = scale)
observed_time <- pmin(a_random_censor_time, a_random_death_time)
censor <- observed_time == a_random_death_time
tibble(time = observed_time, censor = censor)
@statwonk
statwonk / minimal_mlm.R
Created Jul 6, 2019
A minimal Bayesian multi-level model. 12 groups.
View minimal_mlm.R
library(brms)
library(tidybayes)
N <- 1e3
number_of_groups <- 12
group_means <- rnorm(number_of_groups)
tibble(obs = seq_len(N)) %>%
mutate(group = sample(seq_len(number_of_groups), n(), replace = TRUE),
contribution = group_means[group],
@statwonk
statwonk / prediction_intervals.R
Created Jun 25, 2019
An example of checking the statistical coverage properties of an algorithm.
View prediction_intervals.R
library(tidyverse)
tibble(boot = seq_len(1e2)) %>%
mutate(prediction_intervals = map(boot, function(i) {
rpois(100, 10) -> x
(sum(x[6:100]) -> total_oracle)
# Given the first five, what can we say about the sum of 95 more?
x[1:5] -> first_five
@statwonk
statwonk / normals.R
Created Jun 19, 2019
Part of the Normal family of distributions.
View normals.R
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rnorm(1e6)), col = "deeppink", lwd = 1,
xlab = "", ylab = "", main = "Normal distribution", axes = FALSE)
seq(1, 10, 0.05) %>%
map(function(x){ rnorm(1e5, 0, x)}) %>%
map(function(x) { density(x) %>%
lines(col = colors() %>% Filter(function(.x) grepl("pink", .x), .) %>% sample(1),
lwd = sample(seq(0.1, 0.1, 0.01), 1),
@statwonk
statwonk / beta-distribution.R
Created Jun 2, 2019
A visualization of the beta distribution.
View beta-distribution.R
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rbeta(1e6, 50, 50)), col = "pink3", lwd = 1, ylim = c(0, 8),
xlab = "", ylab = "", main = "Beta distribution", axes = FALSE,
col.main = 'white')
map2(c(seq(1, 49, 1), 50),
c(seq(1, 49, 1), 50),
function(x, y){ message(x,y);rbeta(1e6, x + 1, y + 1)}) %>%
map(function(x) { density(x) %>%
@statwonk
statwonk / inezs_weibulls.R
Created Jun 1, 2019
;O{) # o-----\. # /.;/?}
View inezs_weibulls.R
library(tidyverse)
plot.new()
par(bg = "black")
plot(density(rweibull(1e6, 2, 0.1)), col = "pink3", lwd = 1, ylim = c(0, 20),
xlab = "", ylab = "", main = "Weibull distribution", axes = FALSE)
seq(1, 5, 0.25) %>%
map(function(x){ rweibull(1e6, x, 0.1)}) %>%
map(function(x) { density(x) %>%
lines(col = sample(colors(), 1),
lwd = sample(seq(0.1, 3, 0.1), 1),
@statwonk
statwonk / control-vs-interaction.R
Created May 17, 2019
A comparison of std.errors of the treatment effect from an additive DGP experiment where a pre-treatment variable is interacted maximally, only additive no interaction, or unadjusted for.
View control-vs-interaction.R
library(tidyverse)
library(lme4)
library(broom)
N <- 1e4
generate_data <- function() {
tibble(
intercept = 1,
pretreatment_var = case_when(rbinom(N, 1, p = 0.5) == 1 ~ 0.5, TRUE ~ -0.5),
View cdfs.R
library(tidyverse)
# Cumulative density functions
?hist
# built in datasets
str(mtcars)
head(mtcars)
You can’t perform that action at this time.