|from sklearn import linear_model|
|import numpy as np|
|import scipy.stats as stat|
|Wrapper Class for Logistic Regression which has the usual sklearn instance|
|in an attribute self.model, and pvalues, z scores and estimated|
|errors for each coefficient in|
|as well as the negative hessian of the log Likelihood (Fisher information)|
|self.model = linear_model.LogisticRegression(*args,**kwargs)#,**args)|
|#### Get p-values for the fitted model ####|
|denom = (2.0*(1.0+np.cosh(self.model.decision_function(X))))|
|denom = np.tile(denom,(X.shape,1)).T|
|F_ij = np.dot((X/denom).T,X) ## Fisher Information Matrix|
|Cramer_Rao = np.linalg.inv(F_ij) ## Inverse Information Matrix|
|sigma_estimates = np.sqrt(np.diagonal(Cramer_Rao))|
|z_scores = self.model.coef_/sigma_estimates # z-score for eaach model coefficient|
|p_values = [stat.norm.sf(abs(x))*2 for x in z_scores] ### two tailed test for p-values|
|self.z_scores = z_scores|
|self.p_values = p_values|
|self.sigma_estimates = sigma_estimates|
|self.F_ij = F_ij|
Thanks for posting this! I'm wondering if, for the Fisher info matrix calculation, the denom should be tiled first because I'm running into problems of division on arrays of different shape. Was thinking something like this:
I believe sigma_estimates can be condensed to:
Thanks again for this!
Hey @rizzomichaelg, thanks so much for the comments. Put the changes in above. @MiloVentimiglia, you'll see that Cosh just comes from the Hessian of the binomial likelihood for logistic regression. (A little tricky but all Generalized linear models have a fisher information matrix of the form X.D.X^T, where X is the data matrix and D is some intermediary -- normally diagonal and in this case it's our cosh function)
I have tried to use your code, but I do get errors: I have the whole codes and error shown below.
Now fitting it with p-values
from sklearn.linear_model import LogisticRegression
class LogisticRegression_with_p_values: # this is a new class of reg
reg = LogisticRegression_with_p_values()
LinAlgError Traceback (most recent call last)
in fit(self, X, y)
~\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in inv(a)
~\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
LinAlgError: Singular matrix
When fit_intercept = True, shouldn't a column of ones be added to x?
It's quite possible I am doing something wrong, but I can't do predictions using the wrapper method because it does not know the method predict(). I tried to get around this problem by implementing inheritance, but failed miserably. Does anybody have an idea, what I am doing wrong?
If I use "self.model.fit(X,y)", I get the error
"This LogisticRegressionExtended instance is not fitted yet".
If I try to invoke the fit method in the base class by using "super().fit(X, y)", I get
`C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\py36GPU\lib\site-packages\sklearn\linear_model\logistic.py in fit(self, X, y, sample_weight)
AttributeError: 'LogisticRegressionExtended' object has no attribute 'C'`
When trying "super().model.fit(X, y)", I get
"'super' object has no attribute 'model'".
Thanks for any help :)