-
-
Save MrFlick/ae299d8f3760f02de6bf to your computer and use it in GitHub Desktop.
makeglm <- function(formula, ..., family, data=NULL) { | |
dots <- list(...) | |
out<-list() | |
tt <- terms(formula, data=data) | |
if(!is.null(data)) { | |
mf <- model.frame(tt, data) | |
vn <- sapply(attr(tt, "variables")[-1], deparse) | |
if((yvar <- attr(tt, "response"))>0) | |
vn <- vn[-yvar] | |
xlvl <- lapply(data[vn], function(x) if (is.factor(x)) | |
levels(x) | |
else if (is.character(x)) | |
levels(as.factor(x)) | |
else | |
NULL) | |
attr(out, "xlevels") <- xlvl[!vapply(xlvl,is.null,NA)] | |
attr(tt, "dataClasses") <- sapply(data[vn], stats:::.MFclass) | |
} | |
out$terms <- tt | |
coef <- numeric(0) | |
stopifnot(length(dots)>1 & !is.null(names(dots))) | |
for(i in seq_along(dots)) { | |
if((n<-names(dots)[i]) != "") { | |
v <- dots[[i]] | |
if(!is.null(names(v))) { | |
coef[paste0(n, names(v))] <- v | |
} else { | |
stopifnot(length(v)==1) | |
coef[n] <- v | |
} | |
} else { | |
coef["(Intercept)"] <- dots[[i]] | |
} | |
} | |
out$coefficients <- coef | |
out$rank <- length(coef) | |
if (!missing(family)) { | |
out$family <- if (class(family) == "family") { | |
family | |
} else if (class(family) == "function") { | |
family() | |
} else if (class(family) == "character") { | |
get(family)() | |
} else { | |
stop(paste("invalid family class:", class(family))) | |
} | |
out$qr <- list(pivot=seq_len(out$rank)) | |
out$deviance <- 1 | |
out$null.deviance <- 1 | |
out$aic <- 1 | |
class(out) <- c("glm","lm") | |
} else { | |
class(out) <- "lm" | |
out$fitted.values <- predict(out, newdata=dd) | |
out$residuals <- out$mf[attr(tt, "response")] - out$fitted.values | |
out$df.residual <- nrow(data) - out$rank | |
out$model <- data | |
#QR doesn't work | |
} | |
out | |
} |
set.seed(15) | |
dd <- data.frame( | |
X1=runif(50), | |
X2=factor(sample(letters[1:4], 50, replace=T)), | |
X3=rpois(50, 5), | |
Outcome = sample(0:1, 50, replace=T) | |
) | |
mymodel<-glm(Outcome~X1+X2+X3, data=dd, family=binomial) | |
predict(mymodel, type="response") | |
newmodel <- makeglm(Outcome~X1+X2+X3, family=binomial, data=dd, | |
-.5, X1=1, X2=c(b=1.5, c=1, d=1.5), X3=-.15) | |
predict(newmodel, newdata=dd, type="response") |
Okay, after some trial and error, I understand what the input data is representing. Sorry for the inconvenience.
Excellent work. If you were to use the function for a logistic regression with boolean variables, how would you specify the values in the call? For instance, for a boolean variable Cough, le logistic regression specifies a value of 2 for TRUE (output of summary CoughTRUE 2). Thanks ever so much.
I don't know what went wrong, but I get the following error:
newmodel1 <- makeglm(y~x+z, binomial, data=ds.ipd[,c(2:4)], -.5, x=betas[1,1], z=betas[1,2])
Error in coef["(Intercept)"] <- dots[[i]] :
incompatible types (from closure to double) in subassignment type fix
Can you help me with this issue? Many thanks
Andreas
Hey,
this could be very interesting for something I'm trying to do, namely, calculating the log-likelihood of any GLM on a separate test set. Using this function, I could say: make a glm with the coefficients retrieved from fitting on the training set and then create a new GLM object with these coefficients but data from an independent test set. I believe, however, that this does not work yet in your code.
For instance:
x1 = rnorm(100, mean = 1000)
y = rpois(n = 100, lambda = 5)
logLik(glm(y~x1, family=poisson))
logLik(makeglm(y~x1, family=poisson, -.5, x1=1))
always returns 1.5 for the logLik, no matter what i choose as coefficients for the intercept and x1. Do you have any tips how to adjust this such that also the logLik() function keeps recalculating properly?
Thanks a lot for any advice!
Hi. In this script, I'm confused about your construction of the input
dd
(data) object. It seems that in test.make.glm.R,dd
stores a sample size of n = 50 with 3 predictors, but isn't this suggesting we are trying to fit the glm model?I would imagine that if the model has been already fit, it would suffice to only pass in the four coefficients, beta0, beta1, beta2, and beta3 (represented by -0.5, 1, c(1.5, 1, 1.5), and -0.15). What is the purpose of the
data
parameter?