Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sokarob/2105815cb96d2aa43996dd9fb55473e1 to your computer and use it in GitHub Desktop.
Save sokarob/2105815cb96d2aa43996dd9fb55473e1 to your computer and use it in GitHub Desktop.
[Find and Isolate Duplicate Rows Across Multiple Columns] #duplicate #group
# Search for duplicate rows of records by grouping all columns, counting, and filtering.
Duplicates <- Data %>%
group_by(Column1,Column2,Column3,`Column 4`,`Column 5`) %>%
mutate(dupe = n()>1) %>%
ungroup() %>%
filter(dupe==TRUE)
# Credit to: https://stackoverflow.com/questions/6986657/find-duplicated-rows-based-on-2-columns-in-data-frame-in-r
# Note: Only 2 or more columns are needed to use this method. It isn't necessary to group by all columns if the data is structured in a way that fewer columns can still identify duplicates. Some use cases may be looking only for duplicates of specific information as well.
# This could also be adapted to group together and isolate related transactions that are not actually the same, like ones from a specific vendor and food item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment