Skip to content

Instantly share code, notes, and snippets.

View jknowles's full-sized avatar

Jared Knowles jknowles

View GitHub Profile
// Additional tools for machine learning and predictive analytics in stata
/*
Author: Jared Knowles
Date: 09/12/2018
Purpose: Survey of some additional code helpful in conducting and explaining
or demonstrating predictive analytics to stakeholders.
You do not need to run all of this code - this is a survey of commands that
tackle different techniques. Pick and choose what might be most useful to you.
*/
@jknowles
jknowles / 0_reuse_code.js
Created April 27, 2014 02:40
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@jknowles
jknowles / datausa_census_api.rmd
Created May 30, 2018 22:46 — forked from lecy/datausa_census_api.md
Building Census Dataset in R Using datausa.io API
# Using the dataUSA.io API for Census Data in R
This gist contains some notes on constructing a query for census and economic data from the [DataUSA.io](http://datausa.io/) site. This is a quick-start guide to their API; for in-depth documentation check out their [API documentation](https://github.com/DataUSA/datausa-api/wiki/Overview).
A great way to learn how to structure a query is to visit a specific datausa.io page and click on the "Options" button on top of any graph, then select "API" to see the query syntax that created the graph.
![Analytics](https://ga-beacon.appspot.com/UA-27835807-2/gist-id?pixel)
## Example Use
@jknowles
jknowles / robust_predict.lm.R
Created March 6, 2018 20:40
Robust Prediction Intervals for LM
predict.robust <- function(model, data, robust_vcov = NULL, level = 0.95,
interval = "prediction"){
# adapted from
# https://stackoverflow.com/questions/38109501/how-does-predict-lm-compute-confidence-interval-and-prediction-interval
# model is an lm object from r
# data is the dataset to predict from
# robust_vcov must be a robust vcov matrix created by V <- sandwich::vcovHC(model, ...)
# level = the % of the confidence interval, default is 95%
# interval = either "prediction" or "confidence" - prediction includes uncertainty about the model itself
if(is.null(robust_vcov)){
@jknowles
jknowles / helper_funcs.R
Last active September 10, 2017 04:30
R Helper functions for the Philadelphia SDP Cohort 8 Predictive Analytics Workshop
# Calculate the AUC of a GLM model easily
# Jared Knowles
# model = a fitted glm in R
# newdata = an optional data.frame of new fitted values
auc.glm <- function(model, newdata = NULL){
if(missing(newdata)){
resp <- model$y
# if(class(resp) == "numeric"){
# resp <- factor(resp)
# }
@jknowles
jknowles / TEXandRStudioEnv
Created January 27, 2012 17:19
Set R Sysenvironment for LaTeX and BibTex in RStudio
#Set Environment Variables
TEXINPUTS="C:\\" #Path to tex file in Windows
Sys.setenv(TEXINPUTS="C:\\~", BIBINPUTS=TEXINPUTS,BSTINPUTS=TEXINPUTS)
#Path to texfiles in Windows, set BIB files and BST files the same
#Run before clicking "Compile PDF"
@jknowles
jknowles / server.R
Created January 8, 2013 16:00
Simulating fitting a coin and receiving payoffs. A shiny app.
library(shiny)
library(scales)
shinyServer(function(input,output){
trialInput<-reactive(function(){
bias<-input$coin
sims<-input$obs
reps<-input$reps
trials<-rbinom(reps,sims,0.5+bias)
})
@jknowles
jknowles / server.R
Created January 8, 2013 15:57
Demonstrating bi-variate correlations using simulation.
# Script to demonstrate distributions
library(eeptools)
library(shiny)
library(ggplot2)
rnormcor <- function(x,rho) rnorm(1,rho*x,sqrt(1-rho^2))
shinyServer(function(input,output){
output$distPlot<-reactivePlot(function(){
@jknowles
jknowles / server.R
Created January 8, 2013 15:54
Show how different moments of a distribution can shift it away from a normal distribution.
# Script to demonstrate distributions
library(VGAM)
library(eeptools)
library(shiny)
library(ggplot2)
shinyServer(function(input,output){
@jknowles
jknowles / server.R
Created January 8, 2013 15:52
Draw a normal distribution with a given number of observations. Demonstrate what sample sizes and approximations can mean.
library(shiny)
shinyServer(function(input,output){
output$distPlot<-reactivePlot(function(){
dist<-rnorm(input$obs)
p<-qplot(dist,binwidth=0.1)+geom_vline(xintercept=mean(dist))+theme_dpi()
p<-p+coord_cartesian(xlim=c(-4,4))+geom_vline(xintercept=median(dist),color=I("red"))
print(p)
})