Skip to content

Instantly share code, notes, and snippets.

@HanaanY
Last active August 12, 2018 08:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save HanaanY/b36ed626f6a95496a0beb7723715859e to your computer and use it in GitHub Desktop.
Save HanaanY/b36ed626f6a95496a0beb7723715859e to your computer and use it in GitHub Desktop.
def remove_repeats(df):
df = df.assign(is_giraffe=lambda x: x.Species == 'giraffe')
df = df.sort_values('is_giraffe')
df = df.drop_duplicates(['URL_Info'], keep='last')
return df
clean_df = remove_repeats(test)
!ls -1 '/home/naan/SnapshotSerengeti/data/resized/' | wc -l
print(clean_df.shape[0])
'''
149138
149138
'''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment