Created
November 30, 2018 06:08
-
-
Save alces/d3132597add0b1f388d0c154f26874e6 to your computer and use it in GitHub Desktop.
Count words' occurrences in a text file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = load '$file' using TextLoader(); | |
tokens = foreach data generate FLATTEN(TOKENIZE($0)); | |
words = filter tokens by $0 MATCHES '[A-Za-z]+'; | |
lowers = foreach words generate LOWER($0); | |
groups = group lowers by $0; | |
counts = foreach groups generate group, COUNT(lowers.$0); | |
by_count = order counts by $1 DESC; | |
store by_count into '$file-by-count'; | |
by_word = order counts by $0 ASC; | |
store by_word into '$file-by-word'; |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment