Skip to content

Instantly share code, notes, and snippets.

@sudar
Last active January 30, 2016 11:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sudar/5201701 to your computer and use it in GitHub Desktop.
Save sudar/5201701 to your computer and use it in GitHub Desktop.
Awk command to remove duplicate lines, based on a field. Explanation at http://sudarmuthu.com/blog/remove-duplicate-lines-based-on-a-field
# Before starting
x = {}
# After line 1
x = {
CTO => 1
}
# After line 2
x = {
CTO => 1
Manager => 1
}
# After line 3
x = {
CTO => 1
Manager => 1
CEO => 1
}
# After line 4
x = {
CTO => 1
Manager => 2
CEO => 1
}
# After line 5
x = {
CTO => 1
Manager => 2
CEO => 1
CFO => 1
}
awk '!x[$2]++' filename
{
if (x[$2] == 0 )
print
x[$2]++
}
Tom CTO 32
Harry Manager 45 -> Manager field is duplicate
Krish CEO 50
Bob Manager 49 -> Manager field is duplicate
Patrick CFO 20
Tom CTO 32
Harry Manager 45
Krish CEO 50
Patrick CFO 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment