Created
November 24, 2016 22:29
-
-
Save subhankar94/1fef732e0c6cbdd5404f3c5c25f6b6ae to your computer and use it in GitHub Desktop.
McIlroy's 6 line shell script to print k most frequently occurring words in a file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tr -cs A-Za-z '\n' | # transliterate non-alphabetic chars (-c) to newline ('\n'), squeeze identical adjacent to single instance (-s) | |
tr A-Z a-z | # transliterate upper case characters to lower case | |
sort | # sort alphabetically | |
uniq -c | # collapse identical adjacent lines and add occurence of each line (-c) | |
sort -rn | # sort in reverse (-r), based on numeric value (-n) | |
sed ${1}q # pass through stream editor, print first-arg-many words and then quit (q) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment