The task: read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
Import the standard libraries we use.
Note that I prefer to import modules as qualified, to avoid namespace pollution and confusion. I would rather see "Char.isLetter" in code rather than "isLetter".
import qualified System import qualified Data.List as List import qualified Data.Char as Char import qualified Data.HashMap.Strict as HashMap
The entry point for the command line program involves the IO monad.
main :: IO () main = do
Get the command line arguments. For simplicity in this non-production code, we do no error checking for the correct number of arguments. If we do not give one argument, the program will exit with a pattern match failure.
[arg] <- System.getArgs
Bind "text" to the entire contents of stdin as a String.
text <- getContents
No error checking here in converting the String "arg" to an Int.
let n = read arg
Do all the real work, then print out the report.
putStr $ createReport n text
Generate a report for the n most frequently occurring words in the text.
Read the code backwards in order to see the data flow.
createReport :: Int -> String -> String createReport n text =
Convert the list of Strings into a single String with a newline terminator for each String.
For each pair, generate a String that is the count followed by a space followed by the word.
map (\(count, w) -> show count ++ " " ++ w) $
Take only the first n pairs of the list.
take n $
Sort the list of pairs descending by the count for each word (and secondarily, descending by the word).
List.sortBy (flip compare) $
Flip the pairs in the list.
map (\(w, count) -> (count, w)) $
Extract a key/value pair list from the hash map.
Create a hash map from the list of pairs, with a combining function for each unique word that is just the addition operator on the value. The final result therefore is a hash map in which each word is mapped to its total count.
HashMap.fromListWith (+) $
For each word, create a pair of it with the initial count 1.
map (\w -> (w, 1)) $
Split up the text, on white space, into a list of words.
Make all the text lowercase.
map Char.toLower $
Replace all non-letters with a newline.
map (\c -> if Char.isLetter c then c else '\n') $ text