Create a gist now

Instantly share code, notes, and snippets.

Knuth/McIlroy word count task using Haskell
import qualified System
import qualified Data.List as List
import qualified Data.Char as Char
import qualified Data.HashMap.Strict as HashMap
main :: IO ()
main = do
[arg] <- System.getArgs
text <- getContents
let n = read arg
putStr $ createReport n text
createReport :: Int -> String -> String
createReport n text =
unlines $
map (\(w, count) -> show count ++ " " ++ w) $
take n $
List.sortBy (flip compare) $
map (\(w, count) -> (count, w)) $
HashMap.toList $
HashMap.fromListWith (+) $
map (\w -> (w, 1)) $
words $
map Char.toLower $
map (\c -> if Char.isLetter c then c else '\n') $
text
The task: read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with their
frequencies.
Import the standard libraries we use.
Note that I prefer to import modules as qualified, to avoid namespace
pollution and confusion. I would rather see "Char.isLetter" in code
rather than "isLetter".
> import qualified System
> import qualified Data.List as List
> import qualified Data.Char as Char
> import qualified Data.HashMap.Strict as HashMap
The entry point for the command line program involves the IO monad.
> main :: IO ()
> main = do
Get the command line arguments. For simplicity in this non-production
code, we do no error checking for the correct number of arguments. If
we do not give one argument, the program will exit with a pattern
match failure.
> [arg] <- System.getArgs
Bind "text" to the entire contents of stdin as a String.
> text <- getContents
No error checking here in converting the String "arg" to an Int.
> let n = read arg
Do all the real work, then print out the report.
> putStr $ createReport n text
Generate a report for the n most frequently occurring words in the text.
Read the code backwards in order to see the data flow.
> createReport :: Int -> String -> String
> createReport n text =
Convert the list of Strings into a single String with a newline
terminator for each String.
> unlines $
For each pair, generate a String that is the count followed by a space
followed by the word.
> map (\(count, w) -> show count ++ " " ++ w) $
Take only the first n pairs of the list.
> take n $
Sort the list of pairs descending by the count for each word (and
secondarily, descending by the word).
> List.sortBy (flip compare) $
Flip the pairs in the list.
> map (\(w, count) -> (count, w)) $
Extract a key/value pair list from the hash map.
> HashMap.toList $
Create a hash map from the list of pairs, with a combining function
for each unique word that is just the addition operator on the
value. The final result therefore is a hash map in which each word is
mapped to its total count.
> HashMap.fromListWith (+) $
For each word, create a pair of it with the initial count 1.
> map (\w -> (w, 1)) $
Split up the text, on white space, into a list of words.
> words $
Make all the text lowercase.
> map Char.toLower $
Replace all non-letters with a newline.
> map (\c -> if Char.isLetter c then c else '\n') $
> text
@gurgeh

The function for flipping pairs is called swap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment