Skip to content

Instantly share code, notes, and snippets.

@ant358
Created February 24, 2019 10:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ant358/232e78935546564af8bbc792e8df82e7 to your computer and use it in GitHub Desktop.
Save ant358/232e78935546564af8bbc792e8df82e7 to your computer and use it in GitHub Desktop.
def sig_num_columns(X_train, y_train, p_thres=0.05):
"""Which numerical features held in columns within the training data set are significantly correlated with
the target. Returns a dataframe with the column name and its p value. pvalue set to 0.05 for
95% confidence level enter a new p_thres if you want to change it. Only returns the significant columns
only pass numerical columns to the function! Other column types will return a shape error1"""
from scipy.stats import linregress
global sig_num
sig_num = {}
for col in X_train:
slope, intercept, rvalue, pvalue, stderr = linregress(X_train[col], y_train)
if pvalue <= p_thres:
sig_num[col] = pvalue
sig_num = pd.DataFrame.from_dict(sig_num, orient='index')
sig_num = sig_num[0].sort_values(ascending=True)
return sig_num
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment