Skip to content

Instantly share code, notes, and snippets.

@hlfshell
Last active October 12, 2023 22:41
Show Gist options
  • Save hlfshell/9f6046ed91cdf70c0a14ce936dbe2147 to your computer and use it in GitHub Desktop.
Save hlfshell/9f6046ed91cdf70c0a14ce936dbe2147 to your computer and use it in GitHub Desktop.
Jumble Solver

Jumble Solver

This is the jumble solver challenge as levied by a company. The goal is to accept a string of letters and return all possible words that can be made from any combination of those letters, including smaller words. Thus; gdo can be dog, god, do, and go.

Using the Solvers

To use the scan method:

python scan.py wordlist.txt

To use the tree method:

python tree.py wordlist.txt

Solution

Had this been a simpler jumble solver, where in you must use the entirety of provided letters, this would have been easier; load the wordlist into a dict wherein the key is the sorted letters of the word, the value the word(s) itself (plural for existing anagram words), then sort the jumbled letters and compare against a pre-loaded dict of jumbled words for an O(1) search time and O(n) memory complexity. Searching the combinations of possible smaller words, however, requires a bit more effort.

To this end I have provided two solutions of varying speed and use. Each is interactive, and will prompt the user for input of new words until they exit the program.

For each, I have utilized this wordlist, but will accept any new line delimited text file.

The first, scan.py, will perform an O(n) comparison of the word against existing dictionary words as it streams through the target dictionary. It does this by generating a base count of letters within the target word and then, while reading the dictionary, will discover whether the target word has enough of the correct letters to spell the current dictionary word. If it does, it is a combinatorial match; otherwise it is not.

The second, tree.py, first loads the entire word list into memory in the form of a tree. Each node represents a possible letter of the word, with a blank space as the root node. Then, when comparing a new jumble of letters we wish to discover words for, we explore the tree recursively finding leaf and terminal nodes. Terminal nodes are leaf nodes and nodes that mark the demarcation of a word within its branch; this is necessary as small words can exist within large ones. ie dog is within dogged. Here we perform a breadth-first search, which would typically be O(n), but because of combinatorial branching and multiple tries, we suffer with our complexity being at O(n!). However the n in this case is the size of our target search word versus the total search space (ie the length of the word list) like our previous algorithm. Thus there exists a set of letters for which this algorithm is faster than the first, and by extension, a set of words whose lengths are prohibitively slow for this approach. Depending on the application (such as scrabble solver, wherein you are limited to just 7 tiles) this is sufficient.

Thus you can choose one or the other per your liking; a more performant method would be combining them and switching at a set word length. What is that length of word? Somewhere in the mid teens of letters, or more accurately when a!<b where a is the number of letters in the search space and b is the total wordlist length. For example, measuring with the word supercalifragilisticexpialidocious, but expanding the word in our search one letter at a time amongst its 34 letters, we see the following datapoints for our tree method:

Letters Time
1 0.00004 seconds
2 0.00006 seconds
3 0.00015 seconds
4 0.00023 seconds
5 0.00072 seconds
6 0.00124 seconds
7 0.00324 seconds
8 0.00723 seconds
9 0.02566 seconds
10 0.03621 seconds
11 0.05425 seconds
12 0.10192 seconds
13 0.14558 seconds
14 0.27253 seconds
15 0.37936 seconds
16 0.60944 seconds
17 0.95953 seconds
18 2.08020 seconds
19 3.25990 seconds
20 5.21902 seconds
21 9.40229 seconds
22 10.6053 seconds
23 17.1072 seconds
24 26.1821 seconds
25 45.5737 seconds
26 72.0461 seconds
... ...
34 2444.35706 seconds

Yes, 40 minutes for the entire word. In contrast, our straight word list scan method is relatively flat with a slow linear increase as we'd expect:

Letters Time
1 0.30149 seconds
2 0.28906 seconds
3 0.30319 seconds
4 0.30238 seconds
5 0.32480 seconds
6 0.33583 seconds
7 0.34534 seconds
8 0.35774 seconds
9 0.36771 seconds
10 0.39921 seconds
11 0.39273 seconds
12 0.38280 seconds
13 0.39572 seconds
14 0.40934 seconds
15 0.40445 seconds
16 0.40673 seconds
17 0.44207 seconds
18 0.42891 seconds
19 0.42212 seconds
20 0.42697 seconds
21 0.43239 seconds
22 0.43151 seconds
23 0.42970 seconds
24 0.44323 seconds
25 0.43483 seconds
26 0.43742 seconds
27 0.44024 seconds
28 0.47338 seconds
29 0.49609 seconds
30 0.49711 seconds
31 0.49912 seconds
32 0.52220 seconds
33 0.49713 seconds
34 0.50754 seconds
import csv
import sys
from typing import List
def build_base_letter_count(word: str) -> dict[str, int]:
"""
build_base_letter_count builds a dictionary of letters
and their counts from a given word.
"""
letter_count: dict[str, int] = {}
for letter in word:
if letter in letter_count:
letter_count[letter] += 1
else:
letter_count[letter] = 1
return letter_count
def word_composition_check(base_letters: dict[str, int], word: str) -> bool:
"""
word_composition_check confirms that a given word can
be formed from letters within the base word, without
letter re-use. Smaller words is allowed.
This is accomplished by moving through the word list and
comparing it to the base letters count dict. If the letter
from a word X is not in the dict, then the word cannot
exist; return False. If the letter is in the dict,
decrement the count. If the count is less than 0, we have
used that letter too often and thus the word cannot exist;
return False. If we make it through this process, then the
word must be possible with our available letters, and we
return True.
"""
for letter in word:
if letter not in base_letters:
return False
else:
base_letters[letter] -= 1
if base_letters[letter] < 0:
return False
return True
def jumble_solve(filename: str, base_word: str) -> List[str]:
"""
jumble_solve scans through the given wordlist and
determines what words are possible out of it. It
solves the jumble while loading the word list as
opposed to my other solution, which creates a
reusable graph approach. For single use or longer
jumbles, this is faster.
"""
# Get base letter count
base_letters = build_base_letter_count(base_word)
# Load our word list
words: List[str] = []
with open(filename, "r") as file:
reader = csv.reader(file)
for row in reader:
word = row[0]
# Note - dict({}) is a performant way to copy a
# shallow dict, since we don't want to actually
# modify the count in our scoped base_letters
if word_composition_check(dict(base_letters), word):
words.append(word)
return words
if __name__ == "__main__":
arguments = sys.argv
if len(arguments) < 2:
print("Please provide a word list")
exit(1)
wordlist = arguments[1]
while True:
# Prompt the user to enter a word
word = input("Enter a word: ")
# Set the word to lower case, strip spaces
word = word.lower().strip().replace(" ", "")
# Search for all possible words and print them
# sorted by length, largest first
words = jumble_solve(wordlist, word)
words.sort(key=len, reverse=True)
print(f"Possible words: ({len(words)})")
print(words)
from __future__ import annotations
import csv
import sys
from typing import List, Optional, Union
class WordList:
def __init__(self, wordlist: Optional[List] = None):
# Our word graph is initiated with a blank letter
# so we can just branch off of that when searching
self.word_graph = LetterNode("")
if wordlist is not None:
for word in wordlist:
self.add_word(word)
def add(self, word: str):
"""
add_word appends a word to our internal tree graph
of possible words.
"""
letters: List[str] = list(word)
current_node = self.word_graph
for letter in letters:
# If we have a node for this letter
# already, utilize it
node = current_node.get(letter)
# If not, create a fresh node
if node is None:
node = LetterNode(letter)
current_node.add(node)
# ...and continue onto the next letter
current_node = node
# Mark the leaf node as terminal, in case
# another word builds off of it. ie; do->dog
current_node.terminal = True
def search(self, word: str) -> List[str]:
"""
search accepts a word and performs a query across the
tree starting with root node. It returns a list of all
possible words
"""
return self.word_graph.search(list(word))
class LetterNode:
def __init__(self, letter: str, terminal: bool = False) -> None:
self.letter = letter
self.terminal = terminal
self.nodes: List[LetterNode] = []
def word(self) -> bool:
"""
word returns if a node is demarcated as a word;
nodes can be markes as such if its a leaf node
*or* if its a marked "terminal", aka the end of
a smaller word ie: sad->sadder - sad is a word
despite not being a leaf node, as many words
build off of those three letters.
"""
return len(self.nodes) == 0 or self.terminal
def add(self, node: LetterNode):
"""
add appends a node to our current node
"""
self.nodes.append(node)
def get(self, letter: Union[LetterNode, str]) -> Optional[LetterNode]:
"""
get returns the requested node (be it a node or a string
letter) if it exists, or None otherwise.
"""
for node in self.nodes:
if isinstance(letter, LetterNode):
if node == letter:
return node
elif isinstance(letter, str):
if node.letter == letter:
return node
else:
raise TypeError("node must be of type LetterNode or str")
return None
def search(self, letters: List[str]) -> List[str]:
"""
search accepts a set of letters and then performs a query
down our node's branches for words that utilize those
letters. To do this, we traverse each possible letter
combination recursively. Every time we meet a terminal
(leaf or demarcated terminal) node, we have found a word.
We then return up the recursive chain to form the words
discovered throughout our burrowing.
"""
words: List[str] = []
# If our current node is marked as a word, we can return
# that portion of the word from just this node
if self.word():
words.append(self.letter)
# We iterate through each letter available as our
# next target levels through the tree
for index, letter in enumerate(letters):
current_node = self.get(letter)
# If the current node does not exist, we must
# skip it as there is nothing more to search
if current_node is None:
continue
# Ignoring our current letter, isolate all remaining
# letters
remaining_letters = letters[:index] + letters[index + 1 :]
# Search the current target node with the remaining
# letters
found_words = current_node.search(remaining_letters)
# Combine words if any were returned with the current
# node's letter.
for word in found_words:
word = self.letter + word
if word not in words:
words.append(word)
return words
def __str__(self) -> str:
nodestrings = ""
for node in self.nodes:
nodestrings += f"({node})"
return f"({self.letter})->[{nodestrings}]"
def __eq__(self, other: LetterNode) -> bool:
return self.letter == other.letter
def load_word_list(filename: str) -> WordList:
with open(filename, "r") as file:
wordlist = WordList()
reader = csv.reader(file)
for row in reader:
word = row[0]
# Clean the word just in case; limit to lowercase
# and eliminate spaces.
word.strip().lower().replace(" ", "")
wordlist.add(word)
return wordlist
if __name__ == "__main__":
arguments = sys.argv
if len(arguments) < 2:
print("Please provide a word list")
exit(1)
else:
wordlist = load_word_list(arguments[1])
print("Wordlist loaded")
while True:
# Prompt the user to enter a word
word = input("Enter a word: ")
# Set the word to lower case, strip spaces
word = word.lower().strip().replace(" ", "")
# Search for all possible words and print them
# sorted by length, largest first
words = wordlist.search(word)
words.sort(key=len, reverse=True)
print(f"Possible words: ({len(words)})")
print(words)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment