Skip to content

Instantly share code, notes, and snippets.

Avatar
🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
View GitHub Profile
@statwonk
statwonk / beta_interval_data.R
Last active Dec 31, 2020
Using beta-distributed interval-censored data to produce an estimate for the median share of US adults having taken the vaccine by end of Q2 2021.
View beta_interval_data.R
library(tidyverse)
library(fitdistrplus)
dplyr::select -> select
.Machine$double.eps -> eps
1975 -> N # number of responses
tibble(lower = c(0, 0.25, 0.5, 0.75), # lower bins
upper = c(0.25 + eps, 0.5 + eps, 0.75 + eps, 1), # upper bins
pct = c(0.32, 0.51, 0.15, 1 - sum(0.32, 0.51, 0.15)), # response shares
n = floor(pct * N)) %>% # implied responses + eps
@statwonk
statwonk / mktcap.R
Last active Dec 25, 2020
Some rough beliefs about market capitalization for a new entrant from a variety of sectors, sub-sectors and states.
View mktcap.R
library(tidyverse)
library(rvest)
library(gamlss)
library(brms)
library(tidybayes)
select <- dplyr::select
####################################################################################
# Model the market capitalizations of members of the S&P 500.
####################################################################################
@statwonk
statwonk / gist:6283c5b01e5896b94c46edfdd9ff490a
Created Dec 25, 2020
A model of the market capitalizations of S&P 500 members.
View gist:6283c5b01e5896b94c46edfdd9ff490a
library(tidyverse)
library(rvest)
library(gamlss)
library(brms)
library(tidybayes)
select <- dplyr::select
####################################################################################
# Model the market capitalizations of members of the S&P 500.
####################################################################################
View fattails.R
library(tidyverse)
library(quantmod)
library(gamlss)
select <- dplyr::select
posix <- function(x) { as.POSIXct(x, origin = "1970-01-01") }
## "Fat tails"
## Here we compare the residuals from the normal and t distributional models.
## Notice standardized error is worse in the normal model. This happens
## because returns are leptokurtic (large surprises should be expected, there's risk in stock returns),
@statwonk
statwonk / risk_adds_up2.R
Last active Sep 2, 2020
In the context of coronavirus, let's revisit this post about risk accumulation. https://twitter.com/statwonk/status/1160542394544267265?s=20
View risk_adds_up2.R
library(tidyverse)
library(ggthemes)
expand.grid(
risk = seq(0.1/5e3, 1/5e3, 1e-05), # average daily risk e.g. - 1,000 infected per day in Alabama / 5,000,000 AL population
units_of_exposure = seq_len(31) # days of exposure (up to 31 days)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y)),
total_odds = 1/total_risk,
risk_threshold = case_when(total_odds <= 5e2 ~ "Worse than 1 in 500",
total_odds <= 1e3 ~ "Worse than 1 in 1k chance",
@statwonk
statwonk / generate_multilevel_logistic_data.R
Created Aug 30, 2020
A tool to simulate multilevel logistic data.
View generate_multilevel_logistic_data.R
library(tidyverse)
library(brms)
library(tidybayes)
1e7 -> N # obs
1 -> J # groups of members
10 -> K # members
0.5 -> base_p # base rate, this is logistic regression
# sample member coefficients
@statwonk
statwonk / out_of_core.py
Created Aug 30, 2020
scikit-learn partial_fit out-of-core learning.
View out_of_core.py
import sklearn
from sklearn import naive_bayes
import pandas as pd
import numpy as np
d = pd.read_csv("data.csv")
y = d.iloc[:, 1]
X = d.iloc[:,list(range(2, d.shape[1]))]
@statwonk
statwonk / variational_logistic.R
Created Aug 15, 2020
A simulation to learn about variational inference and compare it to MCMC.
View variational_logistic.R
library(tidyverse)
library(brms)
library(tidybayes)
3e4 -> N
40 -> K
rnorm(K) -> group_coefs
tibble(K = factor(rep(paste0("group_", seq_len(K)), length.out = N))) %>%
mutate(coef = rep(group_coefs, N/40)) %>%
@statwonk
statwonk / massive_logistic.R
Last active Aug 11, 2020
A simulation showing how cases can be discarded in logistic regression while preserving an unbiased estimator. https://twitter.com/statwonk/status/1291712092479860737?s=20
View massive_logistic.R
library(tidyverse)
1e4 -> N
0.03 -> p
# author: twitter.com/statwonk
# showing how cases can be discarded in logistic regression while preserving an unbiased estimator
seq_len(1e3) %>%
map_dbl(function(x) {
rbinom(N, 1, p) -> y
tibble(
all_data = tibble(y = y) %>% glm(y ~ 1, "binomial", .) %>% coef() %>% plogis(),
@statwonk
statwonk / coronavirus.R
Last active Mar 1, 2020
Showing that the number of deaths by coronavirus is somewhat robust to CFR restricted above 1% (South Korea lower bound at 0.4%).
View coronavirus.R
set.seed(1)
N <- 1e7 # sims
quantiles_of_interest <- function(x) { quantile(x, c(0.0001, 0.25, 0.5, 0.75, 0.9999)) }
death_rate <- function() { pmin(pmax(rlnorm(N, log(0.02), 0.35), 0.001), 0.06) }
quantiles_of_interest(death_rate())
# very robust / ignorant belief of range of case 2.4k to 60M cases.
susceptible_cases <- function() { runif(N, 0.0024, 150) }