Skip to content

Instantly share code, notes, and snippets.

@seahrh
Created October 16, 2019 05:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seahrh/a6861776579f3ca7f3dcc49743444b3d to your computer and use it in GitHub Desktop.
Save seahrh/a6861776579f3ca7f3dcc49743444b3d to your computer and use it in GitHub Desktop.
Removing all the features with a high correlation. Keeping those which correlate with target value better.
to_drop = list()
# Iterating over rows starting from the second one, because position [0, 0] will be self-correlation which is 1
for i in range(1, len(corr_matrix)):
# Iterating over columns of the row. Only going under the diagonal.
for j in range(i):
# See if the correlation between two features are more than a selected threshold
if corr_matrix.iloc[i, j] >= 0.98:
# Then keep the one from thos two which correlates with target better
if abs(pd.concat([X[corr_matrix.index[i]], y], axis=1).corr().iloc[0][1]) > abs(pd.concat([X[corr_matrix.columns[j]], y], axis=1).corr().iloc[0][1]):
to_drop.append(corr_matrix.columns[j])
else:
to_drop.append(corr_matrix.index[i])
to_drop = list(set(to_drop))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment