Instantly share code, notes, and snippets.

# johnlees/firth_regression.py

Last active Jun 28, 2022
Firth regression in python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 #!/usr/bin/env python '''Python implementation of Firth regression by John Lees See https://www.ncbi.nlm.nih.gov/pubmed/12758140''' def firth_likelihood(beta, logit): return -(logit.loglike(beta) + 0.5*np.log(np.linalg.det(-logit.hessian(beta)))) # Do firth regression # Note information = -hessian, for some reason available but not implemented in statsmodels def fit_firth(y, X, start_vec=None, step_limit=1000, convergence_limit=0.0001): logit_model = smf.Logit(y, X) if start_vec is None: start_vec = np.zeros(X.shape) beta_iterations = [] beta_iterations.append(start_vec) for i in range(0, step_limit): pi = logit_model.predict(beta_iterations[i]) W = np.diagflat(np.multiply(pi, 1-pi)) var_covar_mat = np.linalg.pinv(-logit_model.hessian(beta_iterations[i])) # build hat matrix rootW = np.sqrt(W) H = np.dot(np.transpose(X), np.transpose(rootW)) H = np.matmul(var_covar_mat, H) H = np.matmul(np.dot(rootW, X), H) # penalised score U = np.matmul(np.transpose(X), y - pi + np.multiply(np.diagonal(H), 0.5 - pi)) new_beta = beta_iterations[i] + np.matmul(var_covar_mat, U) # step halving j = 0 while firth_likelihood(new_beta, logit_model) > firth_likelihood(beta_iterations[i], logit_model): new_beta = beta_iterations[i] + 0.5*(new_beta - beta_iterations[i]) j = j + 1 if (j > step_limit): sys.stderr.write('Firth regression failed\n') return None beta_iterations.append(new_beta) if i > 0 and (np.linalg.norm(beta_iterations[i] - beta_iterations[i-1]) < convergence_limit): break return_fit = None if np.linalg.norm(beta_iterations[i] - beta_iterations[i-1]) >= convergence_limit: sys.stderr.write('Firth regression failed\n') else: # Calculate stats fitll = -firth_likelihood(beta_iterations[-1], logit_model) intercept = beta_iterations[-1] beta = beta_iterations[-1][1:].tolist() bse = np.sqrt(np.diagonal(np.linalg.pinv(-logit_model.hessian(beta_iterations[-1])))) return_fit = intercept, beta, bse, fitll return return_fit if __name__ == "__main__": import sys import warnings import math import statsmodels import numpy as np from scipy import stats import statsmodels.api as smf # create X and y here. Make sure X has an intercept term (column of ones) # ... # How to call and calculate p-values (intercept, beta, bse, fitll) = fit_firth(y, X) beta = [intercept] + beta # Wald test waldp = [] for beta_val, bse_val in zip(beta, bse): waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))) # LRT lrtp = [] for beta_idx, (beta_val, bse_val) in enumerate(zip(beta, bse)): null_X = np.delete(X, beta_idx, axis=1) (null_intercept, null_beta, null_bse, null_fitll) = fit_firth(y, null_X) lrstat = -2*(null_fitll - fitll) lrt_pvalue = 1 if lrstat > 0: # non-convergence lrt_pvalue = stats.chi2.sf(lrstat, 1) lrtp.append(lrt_pvalue)

### piotor87 commented May 24, 2018

Hi,
I'm using your script but i get an error on this line:
U = np.matmul(np.transpose(X), p.values - pi + np.multiply(np.diagonal(H), 0.5 - pi))
since no "p" is defined anywhere before.

### swisnieski commented Jan 12, 2019

Hi,
I'm using your script but i get an error on this line:
U = np.matmul(np.transpose(X), p.values - pi + np.multiply(np.diagonal(H), 0.5 - pi))
since no "p" is defined anywhere before.

Agreed, I have the same issue, and it's not clear what 'p' represents or where/how it ought to be defined.

### johnlees commented Feb 8, 2019 • edited

This was a typo and should simply have been `y` - I have fixed this now

### josef-pkt commented Apr 10, 2019

If you only need the diagonal of the hat matrix, then it can be computed without creating the full hat matrix. That saves a lot of memory.

### ldiangelis commented Nov 13, 2019

Hey All,

I am trying to use this code, but am running into a bit of an issue. Can anyone help me figure out how to get around the following error message? ValueError: shapes (54,3) and (54,) not aligned: 3 (dim 1) != 54 (dim 0)

I believe this is related to the following (where the code asks you to input variables):
# create X and y here. Make sure X has an intercept term (column of ones)
# ...
X = data[['RwG', 'vpdmax (hPa)']]
y = data['Occurred']
start_vec = data['SV']
sv = np.squeeze(np.asarray(start_vec))

I get the error message when running this line:
(intercept, beta, bse, fitll) = fit_firth(y, X, start_vec)

Can anyone advise how to get around this error message so I can run the code?

Thanks!

### johnlees commented Nov 18, 2019

Hey All,

I am trying to use this code, but am running into a bit of an issue. Can anyone help me figure out how to get around the following error message? ValueError: shapes (54,3) and (54,) not aligned: 3 (dim 1) != 54 (dim 0)

I believe this is related to the following (where the code asks you to input variables):

# ...

X = data[['RwG', 'vpdmax (hPa)']]
y = data['Occurred']
start_vec = data['SV']
sv = np.squeeze(np.asarray(start_vec))

I get the error message when running this line:
(intercept, beta, bse, fitll) = fit_firth(y, X, start_vec)

Can anyone advise how to get around this error message so I can run the code?

Thanks!

@ldiangelis which line is giving the error? What are the dimensions of X and y (`X.shape` and `y.shape`)?

### jds485 commented Dec 29, 2020

Note that now the Logit function is in statsmodels.api, not statsmodels.formula.api

Also, standard errors are not computed correctly. The equation should be the square root of the diagonal of the variance-covariance matrix. With this edit, standard errors match what the R logistf package reports:
bse = np.sqrt(np.diagonal(np.linalg.pinv(-logit_model.hessian(beta_iterations[-1]))))

Also, here is an edit to get p-values for the parameters using the Wald method:
from scipy import stats
from scipy.stats import chi2
rint = np.array([intercept])
rbeta = np.array(beta)
params = np.concatenate([rint,rbeta])
#Get Wald p vals for parameters
pvalues = 1. - chi2.cdf(x=(res.params/res.bse)**2, df=1)

### johnlees commented Dec 30, 2020

@jds485 I have added your corrections in - thanks for noting these

However, I think the Wald p-values were already included in the main function? I have added loops to calculate these (and the likelihood ratio test p-values) for all predictors

### jds485 commented Dec 30, 2020 • edited

Wald p-values should be computed from the chi-squared distribution, with (beta_val/bse_val)**2 as the test statistic. The p-value computed using the normal distribution is not accurate, at least from what I tested. You can also include the intercept in the Wald test.

1 - chi2.cdf(x=(beta_val/bse_val)**2, df=1)

### johnlees commented Dec 31, 2020 • edited

When df = 1 the chi-squared distribution is sqrt(N(0,1)):

``````1 - pchisq(W, df = 1) = 2 * (1 - pnorm(W ^ 0.5))
``````

which would make your calculation and the one in the gist equivalent

I have added the intercept in properly though, thanks for pointing that out!

### jds485 commented Dec 31, 2020

Thanks, that's right. I now realize that the p-value was different between these two calculations because the previous implementation used the intercept se to compute the test statistic for the first beta parameter, which you now fixed.

### extrospective commented Apr 30, 2021

1. With my version of python (3.7.4) I needed to default start_vec to None in the function declaration.
2. Since my data set has 448,92 observations, it ran out of memory allocating array (448926,448926). I see a note above about having the code more memory efficient and hope that there might be a ready-to-hand solution for that.

### johnlees commented May 5, 2021

@extrospective I haven't worked through the linear algebra to see if this is possible, but I would note that as well as H, I think var_covar_mat would be (448926,448926) too, so I'm not sure whether it's possible to eliminate that. You could try memory mapping the array, though whether the whole thing gets read in `matmul` negating any savings I'm not sure.
I have added a default arg for state_vec

@josef-pkt Are you able to suggest any code which would implement this?

### LukeLB commented May 21, 2021

Hi John,
Big thanks for making this available! I was using sci-kit learn for a project I've just finished and it was useful for me to rewrite your code so that it can interface with the sci-kit learn API. If you or anyone else finds this useful you may find it on my GitHub (https://github.com/LukeLB/LogisticRegression_Firth).

### IRambags commented Oct 26, 2021

Since I had some issues with (quasi) perfect separations in a small sample while performing simple logistic regression, I wanted to try out this implementation and compare outcomes. However, I do not know how to resolve the "ValueError: Shape of passed values is (70, 1), indices imply (70, 2)" at code line "null_X = np.delete(X, beta_idx, axis=1)" while in fact X.shape = (70, 2) due to adding a constant, so I don't know what happens that causes the passed values to have the shape of (70, 1). Did anyone have issues with this or know how to resolve it?

Minor comment on line "waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))" as it needed an extra closing round bracket in my case

### johnlees commented Oct 26, 2021

Since I had some issues with (quasi) perfect separations in a small sample while performing simple logistic regression, I wanted to try out this implementation and compare outcomes. However, I do not know how to resolve the "ValueError: Shape of passed values is (70, 1), indices imply (70, 2)" at code line "null_X = np.delete(X, beta_idx, axis=1)" while in fact X.shape = (70, 2) due to adding a constant, so I don't know what happens that causes the passed values to have the shape of (70, 1). Did anyone have issues with this or know how to resolve it?

I think this was an error I introduced when adding the intercept to the beta vector, hopefully fixed now

Minor comment on line "waldp.append(2 * (1 - stats.norm.cdf(abs(beta_val/bse_val)))" as it needed an extra closing round bracket in my case

Thanks, I've fixed this

### johnlees commented Oct 26, 2021

@IRambags actually sorry, I'm not sure what's causing the error with fitting the null. What length are `beta` and `bse` in the loop?

### IRambags commented Oct 26, 2021

both len(beta) and len(bse) equal 2

### johnlees commented Oct 26, 2021

Does running `np.delete(X, 0, axis=1)` or `np.delete(X, 1, axis=1)` work?

### IRambags commented Oct 26, 2021

Unfortunately not, both give the same ValueError as before. Does it matter that I work in jupyter i.e. iPython?

### johnlees commented Oct 26, 2021

Does it matter that I work in jupyter i.e. iPython?

I wouldn't have thought so.

Double check the shape of X being passed. You can also remove the necessary column using other indexing techniques if this isn't working for you https://numpy.org/doc/stable/reference/arrays.indexing.html

### IRambags commented Oct 26, 2021

I indeed now just went for null_X = X.drop('const', 1) and this seems to work. Very strange functioning of the np.delete() though. Thanks for your quick replies and support!

### jzluo commented Jun 27, 2022

Hi all, I'm working on an sklearn-compatible implementation based on logistf here: https://github.com/jzluo/firthlogist
I would greatly appreciate any feedback on it!
There are a few implementation differences compared to this gist. Most notably, the full hat matrix is not calculated which gives significant memory savings, and penalized likelihood ratio tests are used as in logistf for p-values (some more detail here).

### johnlees commented Jun 28, 2022

Hi all, I'm working on an sklearn-compatible implementation based on logistf here: https://github.com/jzluo/firthlogist I would greatly appreciate any feedback on it! There are a few implementation differences compared to this gist. Most notably, the full hat matrix is not calculated which gives significant memory savings, and penalized likelihood ratio tests are used as in logistf for p-values (some more detail here).

@jzluo Amazing! Looks like a nice improvement over this code 🙂