Skip to content

Instantly share code, notes, and snippets.

@FranklinChen
Created December 8, 2011 21:16
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save FranklinChen/1448622 to your computer and use it in GitHub Desktop.
Knuth/McIlroy word count task using Haskell
import qualified System
import qualified Data.List as List
import qualified Data.Char as Char
import qualified Data.HashMap.Strict as HashMap
main :: IO ()
main = do
[arg] <- System.getArgs
text <- getContents
let n = read arg
putStr $ createReport n text
createReport :: Int -> String -> String
createReport n text =
unlines $
map (\(w, count) -> show count ++ " " ++ w) $
take n $
List.sortBy (flip compare) $
map (\(w, count) -> (count, w)) $
HashMap.toList $
HashMap.fromListWith (+) $
map (\w -> (w, 1)) $
words $
map Char.toLower $
map (\c -> if Char.isLetter c then c else '\n') $
text
The task: read a file of text, determine the n most frequently used
words, and print out a sorted list of those words along with their
frequencies.
Import the standard libraries we use.
Note that I prefer to import modules as qualified, to avoid namespace
pollution and confusion. I would rather see "Char.isLetter" in code
rather than "isLetter".
> import qualified System
> import qualified Data.List as List
> import qualified Data.Char as Char
> import qualified Data.HashMap.Strict as HashMap
The entry point for the command line program involves the IO monad.
> main :: IO ()
> main = do
Get the command line arguments. For simplicity in this non-production
code, we do no error checking for the correct number of arguments. If
we do not give one argument, the program will exit with a pattern
match failure.
> [arg] <- System.getArgs
Bind "text" to the entire contents of stdin as a String.
> text <- getContents
No error checking here in converting the String "arg" to an Int.
> let n = read arg
Do all the real work, then print out the report.
> putStr $ createReport n text
Generate a report for the n most frequently occurring words in the text.
Read the code backwards in order to see the data flow.
> createReport :: Int -> String -> String
> createReport n text =
Convert the list of Strings into a single String with a newline
terminator for each String.
> unlines $
For each pair, generate a String that is the count followed by a space
followed by the word.
> map (\(count, w) -> show count ++ " " ++ w) $
Take only the first n pairs of the list.
> take n $
Sort the list of pairs descending by the count for each word (and
secondarily, descending by the word).
> List.sortBy (flip compare) $
Flip the pairs in the list.
> map (\(w, count) -> (count, w)) $
Extract a key/value pair list from the hash map.
> HashMap.toList $
Create a hash map from the list of pairs, with a combining function
for each unique word that is just the addition operator on the
value. The final result therefore is a hash map in which each word is
mapped to its total count.
> HashMap.fromListWith (+) $
For each word, create a pair of it with the initial count 1.
> map (\w -> (w, 1)) $
Split up the text, on white space, into a list of words.
> words $
Make all the text lowercase.
> map Char.toLower $
Replace all non-letters with a newline.
> map (\c -> if Char.isLetter c then c else '\n') $
> text
@gurgeh
Copy link

gurgeh commented Feb 18, 2013

The function for flipping pairs is called swap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment