Skip to content

Instantly share code, notes, and snippets.

@andrewheiss
Created October 21, 2014 18:25
Show Gist options
  • Save andrewheiss/d674fb635648032364f7 to your computer and use it in GitHub Desktop.
Save andrewheiss/d674fb635648032364f7 to your computer and use it in GitHub Desktop.
Fancy logit in Stata
*------------------------------------------------
* Logistic regression done well
*
* Andrew Heiss (andrew.heiss@duke.edu)
* October 21, 2014
*------------------------------------------------
* Load data
use "http://www.ats.ucla.edu/stat/data/hsbdemo", clear
* Create basic model
logit honors read i.female i.prog
predict phat // Save predicted values
* Create more complex model
logit honors read i.female i.prog i.ses
predict phat1 // Save predicted values
*------------------
* Check model fit
*------------------
* R2
*---
* Pseudo R2 values are pretty meaningless, so don't try to use them
* Contingency tables
*-------------------
* Pretend that anything with a predicted probability of > 50% should happen
* Use a table to compare whether the predicted outcomes line up with the actual outcomes
gen likely = (phat > 0.5)
tab hon likely
* Receiver Operating Characteristic (ROC) Curves
*-----------------------------------------------
* x-axis = false positive rate, or specificity; # of false positives / sum(false positives + true negatives), or all the incorrectly identified negatives / actual negatives
* y-axis = true positive rate, or sensitivity; # of true positives / sum(true positives + false negatives), or all the correctly identified positives / actual positives
* Diagonal line = 50% coin toss line
* AUC = between 0 and 1; 0.5 = coin toss; higher = better
* Run lroc after a logit command to see an individual plot
* or run roccomp on saved predicted values
logit honors read i.female i.prog
lroc
roccomp hon phat phat1, graph summary
* All that matters is the AUC number.
* The shape of the ROC curve doesn't tell you anything about the model fit.
* Separation plots
*-----------------
* These only as an R package for now, but they're really intuitive
* See http://mdwardlab.com/biblio/separation-plot-new-visual-method-evaluating-fit-binary-models
* and http://cran.r-project.org/web/packages/separationplot/separationplot.pdf
* After generating all the predicted values you want, export your data as a csv
* MAKE SURE that you check "Output numeric values (not labels) of labeled variables" so that you get 0s and 1s instead of text. The current version of separationplot cannot handle text.
* Open R and type the following commands
*
* install.packages("separationplot") # If it's not already installed
* library(separationplot) # Load the library
* df <- read.csv("~/Desktop/test.csv") # Use the full path to the csv file
* separationplot(pred=df$phat, actual=df$honors, type="rect", show.expected=TRUE)
* # Note: expected = sum(phat)
*-----------------------------------
* Check effects of model variables
*-----------------------------------
* Run the second model again
logit honors read i.female i.prog i.ses
* Log odds don't make sense; odd ratios make a little more sense
* Calculate the odds ratio manually by using e^beta, or just add or to the logit command
logit , or // Keep the previous model!
* Play with different variables
*------------------------------
* Check predicted probabilities for factors/categories
margins prog, atmeans
marginsplot, recast(scatter)
* Check predicted probabilities for numeric variables
margins , at(read=(28(2)76)) vsquish
marginsplot, recast(line) recastci(rarea)
* Play with multiple variables at the same time
margins female, at(read=(28(2)76)) vsquish
marginsplot, recast(line) recastci(rarea)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment