Skip to content

Instantly share code, notes, and snippets.

@liannewriting
Created January 21, 2020 19:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save liannewriting/7c6571444baf17c37fe050e762217a05 to your computer and use it in GitHub Desktop.
Save liannewriting/7c6571444baf17c37fe050e762217a05 to your computer and use it in GitHub Desktop.
data_cleaning_202001
# drop duplicates based on an subset of variables.
key = ['timestamp', 'full_sq', 'life_sq', 'floor', 'build_year', 'num_room', 'price_doc']
df_dedupped2 = df.drop_duplicates(subset=key)
print(df.shape)
print(df_dedupped2.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment