Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save samarth-agrawal-86/f9aa1e445310529f02d4cdc4442e6492 to your computer and use it in GitHub Desktop.
Save samarth-agrawal-86/f9aa1e445310529f02d4cdc4442e6492 to your computer and use it in GitHub Desktop.
Detect features with duplicate values
# load packages
import pandas as pd
from fast_ml.utilities import display_all
from fast_ml.feature_selection import get_duplicate_features
# load dataset
df = pd.read_csv('/kaggle/input/dataset-1/dataset_1.csv')
# function to detect duplicate features
duplicate_features = get_duplicate_features(df)
duplicate_features.head(10)
# all the duplicate features as list
duplicate_features_list = duplicate_features.query("Desc=='Duplicate Values'")['feature2'].to_list()
print(duplicate_features_list)
# drop these duplicate features from dataset
print('Shape of Dataset before dropping the duplicate values features: ', df.shape)
df.drop(columns = duplicate_features_list, inplace=True)
print('Shape of Dataset after dropping the duplicate values features: ', df.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment