Skip to content

Instantly share code, notes, and snippets.

Avatar
🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
View GitHub Profile
@statwonk
statwonk / risk_adds_up2.R
Last active Sep 2, 2020
In the context of coronavirus, let's revisit this post about risk accumulation. https://twitter.com/statwonk/status/1160542394544267265?s=20
View risk_adds_up2.R
library(tidyverse)
library(ggthemes)
expand.grid(
risk = seq(0.1/5e3, 1/5e3, 1e-05), # average daily risk e.g. - 1,000 infected per day in Alabama / 5,000,000 AL population
units_of_exposure = seq_len(31) # days of exposure (up to 31 days)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y)),
total_odds = 1/total_risk,
risk_threshold = case_when(total_odds <= 5e2 ~ "Worse than 1 in 500",
total_odds <= 1e3 ~ "Worse than 1 in 1k chance",
@statwonk
statwonk / generate_multilevel_logistic_data.R
Created Aug 30, 2020
A tool to simulate multilevel logistic data.
View generate_multilevel_logistic_data.R
library(tidyverse)
library(brms)
library(tidybayes)
1e7 -> N # obs
1 -> J # groups of members
10 -> K # members
0.5 -> base_p # base rate, this is logistic regression
# sample member coefficients
@statwonk
statwonk / out_of_core.py
Created Aug 30, 2020
scikit-learn partial_fit out-of-core learning.
View out_of_core.py
import sklearn
from sklearn import naive_bayes
import pandas as pd
import numpy as np
d = pd.read_csv("data.csv")
y = d.iloc[:, 1]
X = d.iloc[:,list(range(2, d.shape[1]))]
@statwonk
statwonk / variational_logistic.R
Created Aug 15, 2020
A simulation to learn about variational inference and compare it to MCMC.
View variational_logistic.R
library(tidyverse)
library(brms)
library(tidybayes)
3e4 -> N
40 -> K
rnorm(K) -> group_coefs
tibble(K = factor(rep(paste0("group_", seq_len(K)), length.out = N))) %>%
mutate(coef = rep(group_coefs, N/40)) %>%
@statwonk
statwonk / massive_logistic.R
Last active Aug 11, 2020
A simulation showing how cases can be discarded in logistic regression while preserving an unbiased estimator. https://twitter.com/statwonk/status/1291712092479860737?s=20
View massive_logistic.R
library(tidyverse)
1e4 -> N
0.03 -> p
# author: twitter.com/statwonk
# showing how cases can be discarded in logistic regression while preserving an unbiased estimator
seq_len(1e3) %>%
map_dbl(function(x) {
rbinom(N, 1, p) -> y
tibble(
all_data = tibble(y = y) %>% glm(y ~ 1, "binomial", .) %>% coef() %>% plogis(),
@statwonk
statwonk / coronavirus.R
Last active Mar 1, 2020
Showing that the number of deaths by coronavirus is somewhat robust to CFR restricted above 1% (South Korea lower bound at 0.4%).
View coronavirus.R
set.seed(1)
N <- 1e7 # sims
quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) }
death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) }
quantiles_of_interest(death_rate())
# very robust / ignorant belief of range of case 2.4k to 60M cases.
susceptible_cases <- function() { runif(N, 0.0024, 150) }
@statwonk
statwonk / coronavirus.R
Created Mar 1, 2020
Coronavirus estimates
View coronavirus.R
set.seed(0)
N <- 1e7 # sims
quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) }
death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) }
# Statwonk's belief of final US death rate
quantiles_of_interest(death_rate())
# 0.01% 25% 50% 75% 99.99%
@statwonk
statwonk / bootstrapping.R
Last active Feb 22, 2020
The essence of bootstrapping: resampling with replacement to estimate a sampling distribution and its statistics of interest.
View bootstrapping.R
library(tidyverse)
library(gganimate)
# sample of 100 data points
tibble(x = rexp(1e2)) -> d
# 20 bootstrap samples, you likely want several orders of magnitude more.
tibble(boot_samples = seq_len(2e1)) %>%
mutate(data = map(boot_samples, ~ d %>% sample_frac(1, replace = TRUE))) %>%
unnest(cols = c(data)) %>%
@statwonk
statwonk / retweets.R
Created Dec 30, 2019
Measuring retweets, for fun!
View retweets.R
library(fitdistrplus)
library(countreg)
library(tidyverse)
library(rtweet)
get_timeline("statwonk", n = 1e4) %>%
filter(!is_retweet) %>%
pull(retweet_count) -> my_retweets
# I know a priori it's negbin,
@statwonk
statwonk / rootograms.R
Last active Sep 28, 2019
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
View rootograms.R
# install.packages("countreg", repos="http://R-Forge.R-project.org")
# https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/
# https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression
library(countreg)
rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x
table(x)
hist(x, col = "orange")
rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
You can’t perform that action at this time.