Skip to content

Instantly share code, notes, and snippets.

Avatar
🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
View GitHub Profile
@statwonk
statwonk / clustering.R
Created Mar 4, 2021
Let's explore Dr. Wooldridge's clustering comment on Twitter. https://twitter.com/jmwooldridge/status/1366515323923488768?s=20
View clustering.R
library(tidyverse)
library(lmtest)
library(sandwich)
5e2 -> students
20 -> schools
tibble(student_id = 1:students) %>%
mutate(school_id = rep(1:schools, max(student_id) / schools)) %>%
left_join(tibble(school_id = 1:schools, school_effect = rnorm(schools)),
@statwonk
statwonk / beta_interval_data.R
Last active Dec 31, 2020
Using beta-distributed interval-censored data to produce an estimate for the median share of US adults having taken the vaccine by end of Q2 2021.
View beta_interval_data.R
library(tidyverse)
library(fitdistrplus)
dplyr::select -> select
.Machine$double.eps -> eps
1975 -> N # number of responses
tibble(lower = c(0, 0.25, 0.5, 0.75), # lower bins
upper = c(0.25 + eps, 0.5 + eps, 0.75 + eps, 1), # upper bins
pct = c(0.32, 0.51, 0.15, 1 - sum(0.32, 0.51, 0.15)), # response shares
n = floor(pct * N)) %>% # implied responses + eps
@statwonk
statwonk / mktcap.R
Last active Dec 25, 2020
Some rough beliefs about market capitalization for a new entrant from a variety of sectors, sub-sectors and states.
View mktcap.R
library(tidyverse)
library(rvest)
library(gamlss)
library(brms)
library(tidybayes)
select <- dplyr::select
####################################################################################
# Model the market capitalizations of members of the S&P 500.
####################################################################################
@statwonk
statwonk / gist:6283c5b01e5896b94c46edfdd9ff490a
Created Dec 25, 2020
A model of the market capitalizations of S&P 500 members.
View gist:6283c5b01e5896b94c46edfdd9ff490a
library(tidyverse)
library(rvest)
library(gamlss)
library(brms)
library(tidybayes)
select <- dplyr::select
####################################################################################
# Model the market capitalizations of members of the S&P 500.
####################################################################################
View fattails.R
library(tidyverse)
library(quantmod)
library(gamlss)
select <- dplyr::select
posix <- function(x) { as.POSIXct(x, origin = "1970-01-01") }
## "Fat tails"
## Here we compare the residuals from the normal and t distributional models.
## Notice standardized error is worse in the normal model. This happens
## because returns are leptokurtic (large surprises should be expected, there's risk in stock returns),
@statwonk
statwonk / risk_adds_up2.R
Last active Sep 2, 2020
In the context of coronavirus, let's revisit this post about risk accumulation. https://twitter.com/statwonk/status/1160542394544267265?s=20
View risk_adds_up2.R
library(tidyverse)
library(ggthemes)
expand.grid(
risk = seq(0.1/5e3, 1/5e3, 1e-05), # average daily risk e.g. - 1,000 infected per day in Alabama / 5,000,000 AL population
units_of_exposure = seq_len(31) # days of exposure (up to 31 days)
) %>% as_tibble() %>%
mutate(total_risk = map2_dbl(risk, units_of_exposure, ~ 1 - (1 - .x)^(.y)),
total_odds = 1/total_risk,
risk_threshold = case_when(total_odds <= 5e2 ~ "Worse than 1 in 500",
total_odds <= 1e3 ~ "Worse than 1 in 1k chance",
@statwonk
statwonk / sk_learn_logistic.Rmd
Last active Feb 7, 2021
Putting sklearn's SGD algo through its paces, now with J groups.
View sk_learn_logistic.Rmd
---
title: "Testing sklearn's Stochastic Gradient Descent Algo"
author: "Statwonk"
date: "2/07/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(reticulate)
@statwonk
statwonk / generate_multilevel_logistic_data.R
Created Aug 30, 2020
A tool to simulate multilevel logistic data.
View generate_multilevel_logistic_data.R
library(tidyverse)
library(brms)
library(tidybayes)
1e7 -> N # obs
1 -> J # groups of members
10 -> K # members
0.5 -> base_p # base rate, this is logistic regression
# sample member coefficients
@statwonk
statwonk / out_of_core.py
Created Aug 30, 2020
scikit-learn partial_fit out-of-core learning.
View out_of_core.py
import sklearn
from sklearn import naive_bayes
import pandas as pd
import numpy as np
d = pd.read_csv("data.csv")
y = d.iloc[:, 1]
X = d.iloc[:,list(range(2, d.shape[1]))]
@statwonk
statwonk / variational_logistic.R
Created Aug 15, 2020
A simulation to learn about variational inference and compare it to MCMC.
View variational_logistic.R
library(tidyverse)
library(brms)
library(tidybayes)
3e4 -> N
40 -> K
rnorm(K) -> group_coefs
tibble(K = factor(rep(paste0("group_", seq_len(K)), length.out = N))) %>%
mutate(coef = rep(group_coefs, N/40)) %>%