Skip to content

Instantly share code, notes, and snippets.

@ryenus
Last active December 9, 2017 06:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryenus/5866268 to your computer and use it in GitHub Desktop.
Save ryenus/5866268 to your computer and use it in GitHub Desktop.
remove duplicate lines while keeping original order
# this awk script only print lines that appear for the first time
awk '!_[$0]++' FILE
#
# Explanation
#
# 1. `_[]` is an associative array, or map, the name can be any valid variable name in awk.
# 2. for each line, `$0` is the content text, which is used as the key in `_[]`
# 3. for each line, `_[$0]` results in a counter for its number of appearance
#
# 4. when a line appears for the first time,
# - its counter `_[$0]` is nil, or `0`
# - and `_[$0]++` is also `0` when it's evaluated,
# - therefore `!_[$0]++` (i.e. `!0`) evaluates to true,
# - then the default action `{print $0}` is performed, to print current line `$0`
# - after evaluation `_[$0]` is incremented by 1 (i.e. `_[$0] == 1`) as the side effect of `_[$0]++`
#
# 5. when a line appears for the next time,
# - now its counter `_[$0]` is at least `1` , so as `_[$0]++`
# - thus `!_[$0]++` evalutes to false, and the line won't be printed
#
# 6. as the end result, for each unique line, it's printed only when it appears for the first time.
@ryenus
Copy link
Author

ryenus commented Jun 26, 2013

deduplicate lines without uniq, order preserved:

$ awk '!_[$0]++' <<EOF
B
A
A
EOF

# =>
B
A

@ryenus
Copy link
Author

ryenus commented Dec 9, 2017

print each unique line with number of occurrences:

$ awk '{!_[$0]++} END {for(s in _) print _[s], s}' <<EOF
B
A
A
EOF

# =>
2 A
1 B

Note: the order is not reserved, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment