Skip to content

Instantly share code, notes, and snippets.

@AyishaR
Created January 16, 2021 16:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save AyishaR/4a42dd1ab67655bb7176b307d5258fc9 to your computer and use it in GitHub Desktop.
Save AyishaR/4a42dd1ab67655bb7176b307d5258fc9 to your computer and use it in GitHub Desktop.
# There are many column pairs with high correlation, thus acting as duplicates.
# We can drop a column if there is a correlation = 1 or -1
final_columns = list(df.columns) # maintaining a temporary list to remove columns from
for i in range(corr.shape[0]):
for j in range(i+1, corr.shape[0]):
#print(abs(corr.iloc[i,j]), list(df.columns)[i], list(df.columns)[j])
if abs(corr.iloc[i, j]) > 0.98: # if very high correlation
if list(df.columns)[j] in final_columns: # if not aldready removed
final_columns.remove(list(df.columns)[j]) # remove
df = df[final_columns] # selecting only the required columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment