Last active
June 9, 2018 08:03
-
-
Save regiskuckaertz/3c085c1977a0e09bbf217ec6cf63f660 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> module Anagrams where | |
Here is our specification: | |
Describe how you would go about designing `anagrams :: Int -> [Word] -> String` so that anagrams n | |
takes a dictionary (a sorted list of English words), extracts the n-letter ones and prints a string | |
that gives a list of the anagram entries for the n-letter words. Assume `type Word = String`. | |
Because `Word` is a type in the Prelude, we need to hide it from scope. That's how it's done: | |
> import Prelude hiding (Word) | |
As per spec, | |
> type Word = String | |
I also want to associate a word with its anagrams, let's define an alias for that too: | |
> type Anagrams = (Word, [Word]) | |
Alright now the fun begins. Reading the specs, one would probably come up with a plan of attack. | |
Just writing that plan down usually gets you 50% there. So here is mine: | |
1. Extract n-letters long words from the dictionary | |
2. Group each word of the resulting dictionary with its anagrams | |
3. Format into a string where each line is made of the word along with its anagrams | |
👆 That doesn't seem that difficult, does it? Let's write this down in code: | |
> anagrams :: Int -> [Word] -> String | |
> anagrams n = format . group . nletter n | |
A straightforward translation. This is a classic in fp: your first derivation should not be | |
more complicated than how you would express it in words. The compiler will let you know if | |
you made a mistake. If that happens, it just means you did not completely understand the | |
problem: go back to your plan and find out where it fails. You don't need to write code to | |
reason about a program. | |
Now we can drill down into each sub-expression. We need to throw away from the dictionary | |
all the words that do not have the right length; in other words, filter! | |
> nletter :: Int -> [Word] -> [Word] | |
> nletter n = filter ((== n) . length) | |
Again a fair translation, almost word to word. Let's turn to the next step, which is to | |
group each word with its anagrams: | |
> group :: [Word] -> [Anagrams] | |
It looks like we will need a way to check if two words are anagrams, so let's focus on that | |
first. Two words are anagrams if they contain the same letters in different orders. We can | |
just take out every letter of one word from the other: if there doesn't remain any after that, | |
we know both words are anagrams! Strings are just lists of characters, so we can use the | |
list difference operator for that: | |
> anagram :: Word -> Word -> Bool | |
> anagram w1 w2 = null (w1 \\ w2) | |
Let's bring it into scope: | |
> import Data.List ((\\)) | |
(The reasoning above would break if `(\\)` didn't have a particular property; can you guess | |
what that property is?) | |
Back to our grouping function. We can reason by induction on finite lists. There is no anagram | |
in an empty dictionary, so let's prove that: | |
> group [] = [] | |
Easy-peasy. In the inductive case, we reason: | |
> group (w:ws) = xs:xss | |
> where xs = _ | |
> xss = _ | |
Note that the `_` above are _type holes_, they're very useful for reasoning about a program. | |
If you try to compile that, the compiler will give you the expected type of each expression. | |
Here, we reason that if `xs` is `w` along with its anagrams, then it must be that `xss` is | |
the list of words that are not anagrams of `w` along with their respective anagrams! In other | |
words, we need to split `ws` in two: anagrams of `w` and the rest. We make progress: | |
> group (w:ws) = xs:xss | |
> where xs = (w, ws') | |
> xss = group ws'' | |
> (ws', ws'') = _ | |
At this point, we are stuck, but we can deduct that the remaining expression returns a tuple | |
of lists of words. And it cannot create that tuple out of thin air, so it must take a list of | |
words as an argument. Then according to our spec, that function breaks a list according | |
to a predicate. Let's ask Hoogle what Haskell has in store for such a function, i.e. | |
type this in the search box: `(Word -> Bool) -> [Word] -> ([Word],[Word])`. | |
It looks like `partition` is what we need. Putting all the pieces together, we derive: | |
> group (w:ws) = xs:xss | |
> where xs = (w, ws') | |
> xss = group ws'' | |
> (ws', ws'') = partition (anagram w) ws | |
Almost there! Now we need to format the result. Let's assume each line is made of a word | |
with its anagrams: | |
> line :: Anagrams -> String | |
> line (w,as) = w ++ ": " ++ unwords as | |
I'll let you Hoogle `unwords`. The last piece of the puzzle is to turn all these lines of | |
strings into a string of lines! Translate this to code, we get | |
> format :: [Anagrams] -> String | |
> format = unlines . map line | |
*And we're done*. The program above has been derived from pure reasoning, there is no magic | |
involved. The next step is to see if we can make it more efficient by reducing its | |
space/time complexity. Well, it is, but in this case it is trivial. We will see later how | |
we can turn a program with exponential complexity into a linear order with a more | |
interesting example 😃 | |
🤘 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment