Created
April 28, 2011 18:15
-
-
Save hakank/946920 to your computer and use it in GitHub Desktop.
Anagrams in a word list in K (Kona)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/ | |
/ Here we explain a K function that shows all words in a word list | |
/ that are anagrams. | |
/ | |
/ | |
/ The K function used was found here: | |
/ http://lambda-the-ultimate.org/node/1323 | |
/ Define the function | |
anagram: {x g@&1<#:'g:={x@<x}'x} | |
/ Read the word list "word.list" and assign to the variable a | |
/ (The ";" means that nothing is printed.) | |
a: 0: "word.lst" ; | |
/ show all anagrams, i.e. partitions which contains | |
/ of 2 or more words | |
anagram a | |
/ a lot of output.... | |
/ | |
/ Let's see how it works with a smaller example. | |
/ | |
/ define the word list | |
b: ("algorithm";"parse";"astrid";"sunday";"algorithm";"morning";"pears") | |
/ There are two anagram partitions: | |
anagram b | |
(("algorithm" | |
"algorithm") | |
("parse" | |
"pears")) | |
/ | |
/ Here is how it works. | |
/ | |
/ the anagram function again (for reference) | |
{x g@&1<#:'g:={x@<x}'x} | |
/ Sort each word, and assign to g | |
g:{x@<x}' b | |
("aghilmort" | |
"aeprs" | |
"adirst" | |
"adnsuy" | |
"aghilmort" | |
"gimnnor" | |
"aeprs") | |
/ group the sorted words into partitions (equivalence classes) | |
/ according to the its sorted version | |
/ = is the group function | |
/ Here we see that there are two partitions with 2 words | |
/ (0 4) and (1 6) | |
={x@<x}' b | |
(0 4 | |
1 6 | |
,2 | |
,3 | |
,5) | |
/ count (#) the number of (sorted) words in each partition | |
#:'g:={x@<x}'b | |
2 2 1 1 1 | |
/ we just want those partitions with more than one word | |
/ (1 is boolean true) | |
1<#:'g:={x@<x}'b | |
1 1 0 0 0 | |
/ get the index of these partitions | |
/ (i.e. the first two partitions) | |
&1<#:'g:={x@<x}'b | |
0 1 | |
/ now get the words as index in the word list for these partitions | |
g@&1<#:'g:={x@<x}' b | |
(0 4 | |
1 6) | |
/ and last, lookup these word in the original word list | |
b g@&1<#:'g:={x@<x}' b | |
(("algorithm" | |
"algorithm") | |
("parse" | |
"pears")) | |
/ Now we can make this a function by replacing the argument | |
/ "b" (the word list) with "x" (the first argument of | |
/ a function), and place { } around the expression to make | |
/ it a function. | |
{x g@&1<#:'g:={x@<x}'x} | |
A note about the speed. | |
----------------------- | |
I'm quite impressed by the speed of this in Kona. | |
* Kona is fast reading a file into memory: | |
a) Reading a word list with 173528 words (1.7M bytes) takes | |
about 30 milliseconds. | |
b) Reading a larger file of 1114610 (Swedish) words (12.6M bytes) | |
takes about 110ms. | |
* both reading and checking for anagrams and output the number | |
of partitions is also very fast. | |
a) Smallest word list: 0.46s | |
b) Larger word list: 2.99s | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment