{{ message }}

Instantly share code, notes, and snippets.

🏠
Working from home

🏠
Working from home
Last active Sep 2, 2020
In the context of coronavirus, let's revisit this post about risk accumulation. https://twitter.com/statwonk/status/1160542394544267265?s=20
 library(tidyverse) library(ggthemes) expand.grid( risk = seq(0.1/5e3, 1/5e3, 1e-05), # average daily risk e.g. - 1,000 infected per day in Alabama / 5,000,000 AL population units_of_exposure = seq_len(31) # days of exposure (up to 31 days) ) %>% as_tibble() %>% mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y)), total_odds = 1/total_risk, risk_threshold = case_when(total_odds <= 5e2 ~ "Worse than 1 in 500", total_odds <= 1e3 ~ "Worse than 1 in 1k chance",
Created Aug 30, 2020
A tool to simulate multilevel logistic data.
View generate_multilevel_logistic_data.R
 library(tidyverse) library(brms) library(tidybayes) 1e7 -> N # obs 1 -> J # groups of members 10 -> K # members 0.5 -> base_p # base rate, this is logistic regression # sample member coefficients
Created Aug 30, 2020
scikit-learn partial_fit out-of-core learning.
View out_of_core.py
 import sklearn from sklearn import naive_bayes import pandas as pd import numpy as np d = pd.read_csv("data.csv") y = d.iloc[:, 1] X = d.iloc[:,list(range(2, d.shape[1]))]
Created Aug 15, 2020
A simulation to learn about variational inference and compare it to MCMC.
View variational_logistic.R
 library(tidyverse) library(brms) library(tidybayes) 3e4 -> N 40 -> K rnorm(K) -> group_coefs tibble(K = factor(rep(paste0("group_", seq_len(K)), length.out = N))) %>% mutate(coef = rep(group_coefs, N/40)) %>%
Last active Aug 11, 2020
A simulation showing how cases can be discarded in logistic regression while preserving an unbiased estimator. https://twitter.com/statwonk/status/1291712092479860737?s=20
View massive_logistic.R
 library(tidyverse) 1e4 -> N 0.03 -> p # author: twitter.com/statwonk # showing how cases can be discarded in logistic regression while preserving an unbiased estimator seq_len(1e3) %>% map_dbl(function(x) { rbinom(N, 1, p) -> y tibble( all_data = tibble(y = y) %>% glm(y ~ 1, "binomial", .) %>% coef() %>% plogis(),
Last active Mar 1, 2020
Showing that the number of deaths by coronavirus is somewhat robust to CFR restricted above 1% (South Korea lower bound at 0.4%).
View coronavirus.R
 set.seed(1) N <- 1e7 # sims quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) } death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) } quantiles_of_interest(death_rate()) # very robust / ignorant belief of range of case 2.4k to 60M cases. susceptible_cases <- function() { runif(N, 0.0024, 150) }
Created Mar 1, 2020
Coronavirus estimates
View coronavirus.R
 set.seed(0) N <- 1e7 # sims quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) } death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) } # Statwonk's belief of final US death rate quantiles_of_interest(death_rate()) # 0.01% 25% 50% 75% 99.99%
Last active Feb 22, 2020
The essence of bootstrapping: resampling with replacement to estimate a sampling distribution and its statistics of interest.
View bootstrapping.R
 library(tidyverse) library(gganimate) # sample of 100 data points tibble(x = rexp(1e2)) -> d # 20 bootstrap samples, you likely want several orders of magnitude more. tibble(boot_samples = seq_len(2e1)) %>% mutate(data = map(boot_samples, ~ d %>% sample_frac(1, replace = TRUE))) %>% unnest(cols = c(data)) %>%
Created Dec 30, 2019
Measuring retweets, for fun!
View retweets.R
 library(fitdistrplus) library(countreg) library(tidyverse) library(rtweet) get_timeline("statwonk", n = 1e4) %>% filter(!is_retweet) %>% pull(retweet_count) -> my_retweets # I know a priori it's negbin,
Last active Sep 28, 2019
A gist that shows how a rootogram helps find that the zero-inflated negative binomial was the data generating mechanism.
View rootograms.R
 # install.packages("countreg", repos="http://R-Forge.R-project.org") # https://www.fromthebottomoftheheap.net/2016/06/07/rootograms/ # https://channel9.msdn.com/Events/useR-international-R-User-conferences/useR-International-R-User-2017-Conference/countreg-Tools-for-count-data-regression library(countreg) rzinbinom(3e3, size = 4, mu = 20, pi = 0.05) -> x table(x) hist(x, col = "orange") rootogram(glm(x ~ 1, family = "poisson")) # zeros under fit
You can’t perform that action at this time.