Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
data_cleaning_202001
# drop duplicates based on an subset of variables.
key = ['timestamp', 'full_sq', 'life_sq', 'floor', 'build_year', 'num_room', 'price_doc']
df_dedupped2 = df.drop_duplicates(subset=key)
print(df.shape)
print(df_dedupped2.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment