Skip to content

Instantly share code, notes, and snippets.

@prakashrd
Last active June 12, 2020 04:58
Show Gist options
  • Save prakashrd/042b905972ff499925e087fb4867a265 to your computer and use it in GitHub Desktop.
Save prakashrd/042b905972ff499925e087fb4867a265 to your computer and use it in GitHub Desktop.
# To figure out duplicat values in a column and extract those rows
awk 'BEGIN { FS="," } { c[$2]++; l[$2,c[$2]]=$0 } END { for (i in c) { if (c[i] > 1) for (j = 1; j <= c[i]; j++) print l[i,j] } }' file.csv
# replace $2 to which ever column you want to look for duplicates
# Same above code with more comments
BEGIN { FS = ";" }
{
# Keep count of the fields in second column
count[$2]++;
# Save the line the first time we encounter a unique field
if (count[$2] == 1)
first[$2] = $0;
# If we encounter the field for the second time, print the
# previously saved line
if (count[$2] == 2)
print first[$2];
# From the second time onward. always print because the field is
# duplicated
if (count[$2] > 1)
print
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment