Skip to content

Instantly share code, notes, and snippets.

@mmparker
Created July 31, 2012 19:32
Show Gist options
  • Save mmparker/3219819 to your computer and use it in GitHub Desktop.
Save mmparker/3219819 to your computer and use it in GitHub Desktop.
Regression with a group variable and a group-trait variable
# Make a dataset for Channel A
dat.a <- data.frame(purchase = rbinom(n = 100, size = 1, prob = .1),
channel = "a")
# One for Channel B
dat.b <- data.frame(purchase= rbinom(n = 100, size = 1, prob = .2),
channel = "b")
# Add the rates
dat.a$rate <- sum(dat.a$purchase) / nrow(dat.a)
dat.b$rate <- sum(dat.b$purchase) / nrow(dat.b)
# Merge into one dataset
dat <- rbind(dat.a, dat.b)
# Model purchase as a function of channel, rate, and both
mchannel <- glm(purchase ~ channel, data = dat)
mrate <- glm(purchase ~ rate, data = dat)
mboth <- glm(purchase ~ channel + rate, data = dat)
# Predictions from the models are all the same,
# but R will complain that "prediction from a rank-deficient fit may be misleading"
data.frame(mchannel = predict(mchannel, dat),
mrate = predict(mrate, dat),
mboth = predict(mboth, dat)
)
# Which is because when both of those variables are in the model,
# R recognizes that there is zero new information in the rates
# and doesn't estimate any coefficients for them
summary(mboth)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment