This is the jumble solver challenge as levied by a company. The goal is to accept a string of letters and return all possible words that can be made from any combination of those letters, including smaller words. Thus; gdo
can be dog
, god
, do
, and go
.
To use the scan method:
python scan.py wordlist.txt
To use the tree method:
python tree.py wordlist.txt
Had this been a simpler jumble solver, where in you must use the entirety of provided letters, this would have been easier; load the wordlist into a dict wherein the key is the sorted letters of the word, the value the word(s) itself (plural for existing anagram words), then sort the jumbled letters and compare against a pre-loaded dict of jumbled words for an O(1) search time and O(n) memory complexity. Searching the combinations of possible smaller words, however, requires a bit more effort.
To this end I have provided two solutions of varying speed and use. Each is interactive, and will prompt the user for input of new words until they exit the program.
For each, I have utilized this wordlist, but will accept any new line delimited text file.
The first, scan.py
, will perform an O(n) comparison of the word against existing dictionary words as it streams through the target dictionary. It does this by generating a base count of letters within the target word and then, while reading the dictionary, will discover whether the target word has enough of the correct letters to spell the current dictionary word. If it does, it is a combinatorial match; otherwise it is not.
The second, tree.py
, first loads the entire word list into memory in the form of a tree. Each node represents a possible letter of the word, with a blank space as the root node. Then, when comparing a new jumble of letters we wish to discover words for, we explore the tree recursively finding leaf and terminal nodes. Terminal nodes are leaf nodes and nodes that mark the demarcation of a word within its branch; this is necessary as small words can exist within large ones. ie dog
is within dogged
. Here we perform a breadth-first search, which would typically be O(n), but because of combinatorial branching and multiple tries, we suffer with our complexity being at O(n!). However the n in this case is the size of our target search word versus the total search space (ie the length of the word list) like our previous algorithm. Thus there exists a set of letters for which this algorithm is faster than the first, and by extension, a set of words whose lengths are prohibitively slow for this approach. Depending on the application (such as scrabble solver, wherein you are limited to just 7 tiles) this is sufficient.
Thus you can choose one or the other per your liking; a more performant method would be combining them and switching at a set word length. What is that length of word? Somewhere in the mid teens of letters, or more accurately when a!<b
where a
is the number of letters in the search space and b
is the total wordlist length. For example, measuring with the word supercalifragilisticexpialidocious
, but expanding the word in our search one letter at a time amongst its 34 letters, we see the following datapoints for our tree method:
Letters | Time |
---|---|
1 | 0.00004 seconds |
2 | 0.00006 seconds |
3 | 0.00015 seconds |
4 | 0.00023 seconds |
5 | 0.00072 seconds |
6 | 0.00124 seconds |
7 | 0.00324 seconds |
8 | 0.00723 seconds |
9 | 0.02566 seconds |
10 | 0.03621 seconds |
11 | 0.05425 seconds |
12 | 0.10192 seconds |
13 | 0.14558 seconds |
14 | 0.27253 seconds |
15 | 0.37936 seconds |
16 | 0.60944 seconds |
17 | 0.95953 seconds |
18 | 2.08020 seconds |
19 | 3.25990 seconds |
20 | 5.21902 seconds |
21 | 9.40229 seconds |
22 | 10.6053 seconds |
23 | 17.1072 seconds |
24 | 26.1821 seconds |
25 | 45.5737 seconds |
26 | 72.0461 seconds |
... | ... |
34 | 2444.35706 seconds |
Yes, 40 minutes for the entire word. In contrast, our straight word list scan method is relatively flat with a slow linear increase as we'd expect:
Letters | Time |
---|---|
1 | 0.30149 seconds |
2 | 0.28906 seconds |
3 | 0.30319 seconds |
4 | 0.30238 seconds |
5 | 0.32480 seconds |
6 | 0.33583 seconds |
7 | 0.34534 seconds |
8 | 0.35774 seconds |
9 | 0.36771 seconds |
10 | 0.39921 seconds |
11 | 0.39273 seconds |
12 | 0.38280 seconds |
13 | 0.39572 seconds |
14 | 0.40934 seconds |
15 | 0.40445 seconds |
16 | 0.40673 seconds |
17 | 0.44207 seconds |
18 | 0.42891 seconds |
19 | 0.42212 seconds |
20 | 0.42697 seconds |
21 | 0.43239 seconds |
22 | 0.43151 seconds |
23 | 0.42970 seconds |
24 | 0.44323 seconds |
25 | 0.43483 seconds |
26 | 0.43742 seconds |
27 | 0.44024 seconds |
28 | 0.47338 seconds |
29 | 0.49609 seconds |
30 | 0.49711 seconds |
31 | 0.49912 seconds |
32 | 0.52220 seconds |
33 | 0.49713 seconds |
34 | 0.50754 seconds |