Skip to content

Instantly share code, notes, and snippets.

View statwonk's full-sized avatar
🏠
Working from home

Christopher Peters statwonk

🏠
Working from home
View GitHub Profile
@statwonk
statwonk / ai_data_engineer_copilot.py
Created March 24, 2024 15:07
A tool to make data engineering easier, written by Cora and Alan https://www.loom.com/share/b5c46b716f23476ba13d22fca1dd1d72
import pandas as pd
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
class AIDataEngineer:
def __init__(self):
self.data = None
self.model = None
self.X_train = None
@statwonk
statwonk / ivermectin.R
Created March 13, 2024 01:45
An analysis of ivermectin efficacy
library(tidyverse)
library(brms)
library(tidybayes)
# https://docs.google.com/spreadsheets/d/1vG0WdjZaYlS4_7_if-OaE3uMv3aX6zfvQMx5JvDREF0/edit?usp=sharing
# Chi.sq
prop.test(c(103, 135), c(5947, 5609),
alternative = "less",
conf.level = 0.99)
@statwonk
statwonk / religiousity_by_firearm_deaths.R
Created April 16, 2023 16:34
Analysis of how religiousity contributes to firearm deaths.
library(tidyverse)
read_csv("~/Downloads/Religion by Firearm Deaths Data - Firearm Deaths by State.csv",
skip = 4) %>%
janitor::clean_names() -> firearm_deaths
read_csv("~/Downloads/Religion by Firearm Deaths Data - Religiousity by State.csv",
skip = 3) %>%
janitor::clean_names() %>%
rename(state = region) %>%
mutate(religious = 1 - irreligion_percent/100) %>%
@statwonk
statwonk / inflation.R
Created August 17, 2022 23:04
Studying inflation dynamics using an ARDL model
library(tidyverse)
# CPI
# https://data.bls.gov/timeseries/CUSR0000SA0&output_view=pct_1mth
readxl::read_xlsx("~/Downloads/SeriesReport-20220817182003_bb0f80.xlsx", skip = 10) %>%
mutate(date = seq.POSIXt(as.POSIXct("1957-02-01"), by = "month", length.out = n())) %>%
select(date, cpi = Value) %>%
inner_join(
# Short-term Interest Rates - https://fred.stlouisfed.org/series/DGS1MO
read_csv("~/Downloads/DGS1MO (1).csv", col_types = cols(
@statwonk
statwonk / gist:55a29e2a8792b6f2d6833d376d4754da
Last active August 7, 2022 16:29
Examining US CPI persistence by using using the fractional-differencing technique of time series analysis. In this analysis, I roll the differencing procedure over a 120 month window to expose changes in the underlying process of inflation persistence.
library(tidyverse)
# CPI
# https://data.bls.gov/timeseries/CUSR0000SA0&output_view=pct_1mth
readxl::read_xlsx("~/Downloads/SeriesReport-20220807120638_733b06.xlsx", skip = 10) %>%
mutate(date = seq.POSIXt(as.POSIXct("1947-02-01"), by = "month", length.out = n())) %>%
select(date, cpi = Value) -> cpi
cpi %>%
filter(date >= as.POSIXct("1949-01-01")) %>%
@statwonk
statwonk / inflation.R
Last active January 30, 2022 17:43
Using a time series model to inspect the seasonally adj. US CPI inflation measure, % change
library(tidyverse)
library(forecast)
library(urca)
library(dynlm)
library(lmtest)
library(sandwich)
# https://data.bls.gov/timeseries/CUSR0000SA0&output_view=pct_1mth
read_csv("~/Downloads/inflation.csv", skip = 11) %>%
janitor::clean_names() %>%
@statwonk
statwonk / stock_prices.R
Created January 9, 2022 18:43
Analysis of stock prices as a function of the risk free rate and market risk.
library(tidyverse)
library(gamlss); select <- dplyr::select
library(fmpapi)
library(Quandl)
fmp_daily_prices("ASAN") -> d
fmp_daily_prices("TEAM") -> team
Quandl("USTREASURY/YIELD") -> yc
yc %>%
@statwonk
statwonk / clustering.R
Created March 4, 2021 12:58
Let's explore Dr. Wooldridge's clustering comment on Twitter. https://twitter.com/jmwooldridge/status/1366515323923488768?s=20
library(tidyverse)
library(lmtest)
library(sandwich)
5e2 -> students
20 -> schools
tibble(student_id = 1:students) %>%
mutate(school_id = rep(1:schools, max(student_id) / schools)) %>%
left_join(tibble(school_id = 1:schools, school_effect = rnorm(schools)),
@statwonk
statwonk / massive_logistic.R
Last active February 10, 2021 21:39
A simulation showing how cases can be discarded in logistic regression while preserving an unbiased estimator. https://twitter.com/statwonk/status/1291712092479860737?s=20
library(tidyverse)
1e4 -> N
0.03 -> p
# author: twitter.com/statwonk
# showing how cases can be discarded in logistic regression while preserving an unbiased estimator
seq_len(1e3) %>%
map_dbl(function(x) {
rbinom(N, 1, p) -> y
tibble(
all_data = tibble(y = y) %>% glm(y ~ 1, "binomial", .) %>% coef() %>% plogis(),
@statwonk
statwonk / sk_learn_logistic.Rmd
Last active February 7, 2021 17:35
Putting sklearn's SGD algo through its paces, now with J groups.
---
title: "Testing sklearn's Stochastic Gradient Descent Algo"
author: "Statwonk"
date: "2/07/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(reticulate)