Knuth/McIlroy word count task using Haskell
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import qualified System | |
import qualified Data.List as List | |
import qualified Data.Char as Char | |
import qualified Data.HashMap.Strict as HashMap | |
main :: IO () | |
main = do | |
[arg] <- System.getArgs | |
text <- getContents | |
let n = read arg | |
putStr $ createReport n text | |
createReport :: Int -> String -> String | |
createReport n text = | |
unlines $ | |
map (\(w, count) -> show count ++ " " ++ w) $ | |
take n $ | |
List.sortBy (flip compare) $ | |
map (\(w, count) -> (count, w)) $ | |
HashMap.toList $ | |
HashMap.fromListWith (+) $ | |
map (\w -> (w, 1)) $ | |
words $ | |
map Char.toLower $ | |
map (\c -> if Char.isLetter c then c else '\n') $ | |
text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The task: read a file of text, determine the n most frequently used | |
words, and print out a sorted list of those words along with their | |
frequencies. | |
Import the standard libraries we use. | |
Note that I prefer to import modules as qualified, to avoid namespace | |
pollution and confusion. I would rather see "Char.isLetter" in code | |
rather than "isLetter". | |
> import qualified System | |
> import qualified Data.List as List | |
> import qualified Data.Char as Char | |
> import qualified Data.HashMap.Strict as HashMap | |
The entry point for the command line program involves the IO monad. | |
> main :: IO () | |
> main = do | |
Get the command line arguments. For simplicity in this non-production | |
code, we do no error checking for the correct number of arguments. If | |
we do not give one argument, the program will exit with a pattern | |
match failure. | |
> [arg] <- System.getArgs | |
Bind "text" to the entire contents of stdin as a String. | |
> text <- getContents | |
No error checking here in converting the String "arg" to an Int. | |
> let n = read arg | |
Do all the real work, then print out the report. | |
> putStr $ createReport n text | |
Generate a report for the n most frequently occurring words in the text. | |
Read the code backwards in order to see the data flow. | |
> createReport :: Int -> String -> String | |
> createReport n text = | |
Convert the list of Strings into a single String with a newline | |
terminator for each String. | |
> unlines $ | |
For each pair, generate a String that is the count followed by a space | |
followed by the word. | |
> map (\(count, w) -> show count ++ " " ++ w) $ | |
Take only the first n pairs of the list. | |
> take n $ | |
Sort the list of pairs descending by the count for each word (and | |
secondarily, descending by the word). | |
> List.sortBy (flip compare) $ | |
Flip the pairs in the list. | |
> map (\(w, count) -> (count, w)) $ | |
Extract a key/value pair list from the hash map. | |
> HashMap.toList $ | |
Create a hash map from the list of pairs, with a combining function | |
for each unique word that is just the addition operator on the | |
value. The final result therefore is a hash map in which each word is | |
mapped to its total count. | |
> HashMap.fromListWith (+) $ | |
For each word, create a pair of it with the initial count 1. | |
> map (\w -> (w, 1)) $ | |
Split up the text, on white space, into a list of words. | |
> words $ | |
Make all the text lowercase. | |
> map Char.toLower $ | |
Replace all non-letters with a newline. | |
> map (\c -> if Char.isLetter c then c else '\n') $ | |
> text |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The function for flipping pairs is called swap.