Skip to content

Instantly share code, notes, and snippets.

@dmarx
Created August 9, 2017 12:11
Show Gist options
  • Save dmarx/7e1559ed27067597e863592df20a5c97 to your computer and use it in GitHub Desktop.
Save dmarx/7e1559ed27067597e863592df20a5c97 to your computer and use it in GitHub Desktop.
Demonstration of how to construct a bespoke regression to determine the optimal cutoff value for constructing a categorical variable for a logistic regression
# Finding best cut-off for constructing a categorical variable
# logistic regression
data(iris)
x0 = iris[iris$Species != 'setosa',]
plot(x0, col=x0$Species)
# Keep things simple for this demo
form = "is_virginica ~ Petal.Length + Petal.Width"
x = subset(x0, select=Petal.Length)
x$is_virginica = x0$Species=='virginica'
# Find best cut-off for Petal.Width
v = x0$Petal.Width
c0 = quantile(v, 0.5) # initialization = 1.6
names(c0) = "cutoff"
cost_fn = function(c_i){
x_i = x
x_i$Petal.Width = v < c_i
mod = glm(form, data=x_i, family=binomial)
-logLik(mod)
}
cutoff = optim(c0, cost_fn)
cutoff$par # 1.68
plot(x0$Petal.Length, x0$Petal.Width, col=x0$Species)
abline(h=c0, lty=2) # initialization
abline(h=cutoff$par) # best cutoff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment