Skip to content

Instantly share code, notes, and snippets.

@joaopcnogueira
Last active October 9, 2023 12:26
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joaopcnogueira/5008cd7e696262a88a4da793219f4982 to your computer and use it in GitHub Desktop.
Save joaopcnogueira/5008cd7e696262a88a4da793219f4982 to your computer and use it in GitHub Desktop.
Feature selection by Backward Elimination using both the p-value and the adjusted r-squared
import numpy as np
import statsmodels.formula.api as sm
def backward_elimination2(X, y, sl):
"""
X: the data matrix with the independent variables (predictors)
y: the matrix of the dependent variable (target)
sl: statistical level, by default the user should add 0.05 (5%)
"""
X = np.append(arr=np.ones((len(X),1)).astype(int), values=X, axis=1)
while(True):
regressor_OLS = sm.OLS(y,X).fit()
ind = np.argmax(regressor_OLS.pvalues)
max_pvalue = regressor_OLS.pvalues[ind]
if max_pvalue > sl:
actual_adj_rsquared = regressor_OLS.rsquared_adj
X_temp = np.delete(X, ind, axis=1)
next_regressor_OLS = sm.OLS(y,X_temp).fit()
next_adj_rsquared = next_regressor_OLS.rsquared_adj
if(actual_adj_rsquared > next_adj_rsquared):
X = np.delete(X, 0, axis=1)
print(regressor_OLS.summary())
return X
else:
X = np.delete(X, ind, axis=1)
else:
print(regressor_OLS.summary())
X = np.delete(X, 0, axis=1)
return X
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment