Skip to content

Instantly share code, notes, and snippets.

@MrFlick
Last active October 7, 2023 11:28
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save MrFlick/ae299d8f3760f02de6bf to your computer and use it in GitHub Desktop.
Save MrFlick/ae299d8f3760f02de6bf to your computer and use it in GitHub Desktop.
makeglm.R: Creates a "fake" glm object with specific coefficients that you can use for predicting without fitting a model first
makeglm <- function(formula, ..., family, data=NULL) {
dots <- list(...)
out<-list()
tt <- terms(formula, data=data)
if(!is.null(data)) {
mf <- model.frame(tt, data)
vn <- sapply(attr(tt, "variables")[-1], deparse)
if((yvar <- attr(tt, "response"))>0)
vn <- vn[-yvar]
xlvl <- lapply(data[vn], function(x) if (is.factor(x))
levels(x)
else if (is.character(x))
levels(as.factor(x))
else
NULL)
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)]
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass)
}
out$terms <- tt
coef <- numeric(0)
stopifnot(length(dots)>1 & !is.null(names(dots)))
for(i in seq_along(dots)) {
if((n<-names(dots)[i]) != "") {
v <- dots[[i]]
if(!is.null(names(v))) {
coef[paste0(n, names(v))] <- v
} else {
stopifnot(length(v)==1)
coef[n] <- v
}
} else {
coef["(Intercept)"] <- dots[[i]]
}
}
out$coefficients <- coef
out$rank <- length(coef)
if (!missing(family)) {
out$family <- if (class(family) == "family") {
family
} else if (class(family) == "function") {
family()
} else if (class(family) == "character") {
get(family)()
} else {
stop(paste("invalid family class:", class(family)))
}
out$qr <- list(pivot=seq_len(out$rank))
out$deviance <- 1
out$null.deviance <- 1
out$aic <- 1
class(out) <- c("glm","lm")
} else {
class(out) <- "lm"
out$fitted.values <- predict(out, newdata=dd)
out$residuals <- out$mf[attr(tt, "response")] - out$fitted.values
out$df.residual <- nrow(data) - out$rank
out$model <- data
#QR doesn't work
}
out
}
set.seed(15)
dd <- data.frame(
X1=runif(50),
X2=factor(sample(letters[1:4], 50, replace=T)),
X3=rpois(50, 5),
Outcome = sample(0:1, 50, replace=T)
)
mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial)
predict(mymodel, type="response")
newmodel <- makeglm(Outcome~X1+X2+X3, family=binomial, data=dd,
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15)
predict(newmodel, newdata=dd, type="response")
@hlin117
Copy link

hlin117 commented Jun 26, 2014

Hi. In this script, I'm confused about your construction of the input dd (data) object. It seems that in test.make.glm.R, dd stores a sample size of n = 50 with 3 predictors, but isn't this suggesting we are trying to fit the glm model?

I would imagine that if the model has been already fit, it would suffice to only pass in the four coefficients, beta0, beta1, beta2, and beta3 (represented by -0.5, 1, c(1.5, 1, 1.5), and -0.15). What is the purpose of the data parameter?

@hlin117
Copy link

hlin117 commented Jun 26, 2014

Okay, after some trial and error, I understand what the input data is representing. Sorry for the inconvenience.

@dionisio-acosta-ucl
Copy link

Excellent work. If you were to use the function for a logistic regression with boolean variables, how would you specify the values in the call? For instance, for a boolean variable Cough, le logistic regression specifies a value of 2 for TRUE (output of summary CoughTRUE 2). Thanks ever so much.

@andreasmeid
Copy link

I don't know what went wrong, but I get the following error:

newmodel1 <- makeglm(y~x+z, binomial, data=ds.ipd[,c(2:4)], -.5, x=betas[1,1], z=betas[1,2])
Error in coef["(Intercept)"] <- dots[[i]] :
incompatible types (from closure to double) in subassignment type fix

Can you help me with this issue? Many thanks

Andreas

@blah-crusader
Copy link

Hey,

this could be very interesting for something I'm trying to do, namely, calculating the log-likelihood of any GLM on a separate test set. Using this function, I could say: make a glm with the coefficients retrieved from fitting on the training set and then create a new GLM object with these coefficients but data from an independent test set. I believe, however, that this does not work yet in your code.
For instance:

    x1 = rnorm(100, mean = 1000)
    y = rpois(n = 100, lambda = 5)
    logLik(glm(y~x1, family=poisson))
    logLik(makeglm(y~x1, family=poisson, -.5, x1=1))

always returns 1.5 for the logLik, no matter what i choose as coefficients for the intercept and x1. Do you have any tips how to adjust this such that also the logLik() function keeps recalculating properly?

Thanks a lot for any advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment