Skip to content

Instantly share code, notes, and snippets.

@Ailuropoda1864
Created September 6, 2017 17:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ailuropoda1864/ebce8a34e838f0727b0e4a63e1a536e4 to your computer and use it in GitHub Desktop.
Save Ailuropoda1864/ebce8a34e838f0727b0e4a63e1a536e4 to your computer and use it in GitHub Desktop.
a function that prints out information on duplicate rows in a pandas DataFrame
import pandas as pd
def find_duplicated(dataframe, show=True, sort=False):
"""
prints out information on duplicate rows
:param dataframe: a pandas DataFrame
:param show: boolean; if True, the duplicated rows (if any) are shown
:param sort: boolean; if True, the duplicated rows are sorted by each column
of the dataframe
"""
n_duplicates = dataframe.duplicated().sum()
print('Number of duplicated rows: {}'.format(n_duplicates))
if show and n_duplicates > 0:
print()
duplicated_df = dataframe[dataframe.duplicated(keep=False)]
if sort:
print(duplicated_df.sort_values(list(duplicated_df.columns)))
else:
print(duplicated_df)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment