Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save samarth-agrawal-86/d833009eae37fb8586b625dfb86a920d to your computer and use it in GitHub Desktop.
Save samarth-agrawal-86/d833009eae37fb8586b625dfb86a920d to your computer and use it in GitHub Desktop.
Detect features that have duplicate index
# load packages
import pandas as pd
from fast_ml.utilities import display_all
from fast_ml.feature_selection import get_duplicate_features
# load dataset
df = pd.read_csv('/kaggle/input/dataset-1/dataset_1.csv')
# function to detect duplicate features
duplicate_features = get_duplicate_features(df)
duplicate_features.head(10)
# all the duplicate features as list
duplicate_index_features_list = duplicate_features.query("Desc=='Duplicate Index'")['feature2'].to_list()
print(duplicate_index_features_list)
# drop these duplicate features from dataset
print('Shape of Dataset before dropping the duplicate index features: ', df.shape)
df.drop(columns = duplicate_index_features_list, inplace=True)
print('Shape of Dataset after dropping the duplicate index features: ', df.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment