Last active
December 9, 2017 06:15
-
-
Save ryenus/5866268 to your computer and use it in GitHub Desktop.
remove duplicate lines while keeping original order
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# this awk script only print lines that appear for the first time | |
awk '!_[$0]++' FILE | |
# | |
# Explanation | |
# | |
# 1. `_[]` is an associative array, or map, the name can be any valid variable name in awk. | |
# 2. for each line, `$0` is the content text, which is used as the key in `_[]` | |
# 3. for each line, `_[$0]` results in a counter for its number of appearance | |
# | |
# 4. when a line appears for the first time, | |
# - its counter `_[$0]` is nil, or `0` | |
# - and `_[$0]++` is also `0` when it's evaluated, | |
# - therefore `!_[$0]++` (i.e. `!0`) evaluates to true, | |
# - then the default action `{print $0}` is performed, to print current line `$0` | |
# - after evaluation `_[$0]` is incremented by 1 (i.e. `_[$0] == 1`) as the side effect of `_[$0]++` | |
# | |
# 5. when a line appears for the next time, | |
# - now its counter `_[$0]` is at least `1` , so as `_[$0]++` | |
# - thus `!_[$0]++` evalutes to false, and the line won't be printed | |
# | |
# 6. as the end result, for each unique line, it's printed only when it appears for the first time. | |
print each unique line with number of occurrences:
$ awk '{!_[$0]++} END {for(s in _) print _[s], s}' <<EOF
B
A
A
EOF
# =>
2 A
1 B
Note: the order is not reserved, unfortunately.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
deduplicate lines without
uniq
, order preserved: