|from sklearn import linear_model|
|import numpy as np|
|import scipy.stats as stat|
|Wrapper Class for Logistic Regression which has the usual sklearn instance|
|in an attribute self.model, and pvalues, z scores and estimated|
|errors for each coefficient in|
|as well as the negative hessian of the log Likelihood (Fisher information)|
|self.model = linear_model.LogisticRegression(*args,**kwargs)#,**args)|
|#### Get p-values for the fitted model ####|
|denom = (2.0*(1.0+np.cosh(self.model.decision_function(X))))|
|denom = np.tile(denom,(X.shape,1)).T|
|F_ij = np.dot((X/denom).T,X) ## Fisher Information Matrix|
|Cramer_Rao = np.linalg.inv(F_ij) ## Inverse Information Matrix|
|sigma_estimates = np.sqrt(np.diagonal(Cramer_Rao))|
|z_scores = self.model.coef_/sigma_estimates # z-score for eaach model coefficient|
|p_values = [stat.norm.sf(abs(x))*2 for x in z_scores] ### two tailed test for p-values|
|self.z_scores = z_scores|
|self.p_values = p_values|
|self.sigma_estimates = sigma_estimates|
|self.F_ij = F_ij|
Thanks for posting this! I'm wondering if, for the Fisher info matrix calculation, the denom should be tiled first because I'm running into problems of division on arrays of different shape. Was thinking something like this:
I believe sigma_estimates can be condensed to:
Thanks again for this!
Hey @rizzomichaelg, thanks so much for the comments. Put the changes in above. @MiloVentimiglia, you'll see that Cosh just comes from the Hessian of the binomial likelihood for logistic regression. (A little tricky but all Generalized linear models have a fisher information matrix of the form X.D.X^T, where X is the data matrix and D is some intermediary -- normally diagonal and in this case it's our cosh function)
I have tried to use your code, but I do get errors: I have the whole codes and error shown below.
Now fitting it with p-values
from sklearn.linear_model import LogisticRegression
class LogisticRegression_with_p_values: # this is a new class of reg
reg = LogisticRegression_with_p_values()
LinAlgError Traceback (most recent call last)
in fit(self, X, y)
~\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in inv(a)
~\Anaconda3\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_singular(err, flag)
LinAlgError: Singular matrix
When fit_intercept = True, shouldn't a column of ones be added to x?
It's quite possible I am doing something wrong, but I can't do predictions using the wrapper method because it does not know the method predict(). I tried to get around this problem by implementing inheritance, but failed miserably. Does anybody have an idea, what I am doing wrong?
If I use "self.model.fit(X,y)", I get the error
"This LogisticRegressionExtended instance is not fitted yet".
If I try to invoke the fit method in the base class by using "super().fit(X, y)", I get
`C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\py36GPU\lib\site-packages\sklearn\linear_model\logistic.py in fit(self, X, y, sample_weight)
AttributeError: 'LogisticRegressionExtended' object has no attribute 'C'`
When trying "super().model.fit(X, y)", I get
"'super' object has no attribute 'model'".
Thanks for any help :)
Thank you so much for your effort but I am doing exactly same code now I am having unbound local error: local variable 'F_ij' referenced before assignment. Please give me solution. Thanks for this.…
On Mon 27 Apr, 2020, 9:58 AM Rob Speare, ***@***.***> wrote: ***@***.**** commented on this gist. ------------------------------ Hey @Akanksha594 <https://github.com/Akanksha594> and @wkangong <https://github.com/wkangong> and @Mikeal001 <https://github.com/Mikeal001> : Looks like the X matrix passed in has some correlated features. One way to fix this is with regularization, and adding a tiny amount to the diagonal of the matrix, e.g. eps=1e-4 F_ij = np.dot((X / denom).T,X) + np.eye(F_ij.shape)*eps ## Fisher Information Matrix — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://gist.github.com/77061e6e317896be29c6de9a85db301d#gistcomment-3271117>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APKKSFPN3SV476ZNQALTONTROUCX5ANCNFSM4IMG6MJQ> .
Hi, Rspeare! Thank you so much for your code. I used it in a research in comparative genomics and now I'm in the process of writing the paper for publishing. I want to know if you're ok with me citing this code directly with you as author or if you prefer me to cite the original method on logistic regression and the binomial likehood.