hlfshell/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Jumble Solver

This is the jumble solver challenge as levied by a company. The goal is to accept a string of letters and return all possible words that can be made from any combination of those letters, including smaller words. Thus; gdo can be dog, god, do, and go.
Using the Solvers

To use the scan method:
python scan.py wordlist.txt

To use the tree method:
python tree.py wordlist.txt

Solution

Had this been a simpler jumble solver, where in you must use the entirety of provided letters, this would have been easier; load the wordlist into a dict wherein the key is the sorted letters of the word, the value the word(s) itself (plural for existing anagram words), then sort the jumbled letters and compare against a pre-loaded dict of jumbled words for an O(1) search time and O(n) memory complexity. Searching the combinations of possible smaller words, however, requires a bit more effort.
To this end I have provided two solutions of varying speed and use. Each is interactive, and will prompt the user for input of new words until they exit the program.
For each, I have utilized this wordlist, but will accept any new line delimited text file.
The first, scan.py, will perform an O(n) comparison of the word against existing dictionary words as it streams through the target dictionary. It does this by generating a base count of letters within the target word and then, while reading the dictionary, will discover whether the target word has enough of the correct letters to spell the current dictionary word. If it does, it is a combinatorial match; otherwise it is not.
The second, tree.py, first loads the entire word list into memory in the form of a tree. Each node represents a possible letter of the word, with a blank space as the root node. Then, when comparing a new jumble of letters we wish to discover words for, we explore the tree recursively finding leaf and terminal nodes. Terminal nodes are leaf nodes and nodes that mark the demarcation of a word within its branch; this is necessary as small words can exist within large ones. ie dog is within dogged. Here we perform a breadth-first search, which would typically be O(n), but because of combinatorial branching and multiple tries, we suffer with our complexity being at O(n!). However the n in this case is the size of our target search word versus the total search space (ie the length of the word list) like our previous algorithm. Thus there exists a set of letters for which this algorithm is faster than the first, and by extension, a set of words whose lengths are prohibitively slow for this approach. Depending on the application (such as scrabble solver, wherein you are limited to just 7 tiles) this is sufficient.
Thus you can choose one or the other per your liking; a more performant method would be combining them and switching at a set word length. What is that length of word? Somewhere in the mid teens of letters, or more accurately when a!<b where a is the number of letters in the search space and b is the total wordlist length. For example, measuring with the word supercalifragilisticexpialidocious, but expanding the word in our search one letter at a time amongst its 34 letters, we see the following datapoints for our tree method:


Letters
Time


1
0.00004 seconds


2
0.00006 seconds


3
0.00015 seconds


4
0.00023 seconds


5
0.00072 seconds


6
0.00124 seconds


7
0.00324 seconds


8
0.00723 seconds


9
0.02566 seconds


10
0.03621 seconds


11
0.05425 seconds


12
0.10192 seconds


13
0.14558 seconds


14
0.27253 seconds


15
0.37936 seconds


16
0.60944 seconds


17
0.95953 seconds


18
2.08020 seconds


19
3.25990 seconds


20
5.21902 seconds


21
9.40229 seconds


22
10.6053 seconds


23
17.1072 seconds


24
26.1821 seconds


25
45.5737 seconds


26
72.0461 seconds


...
...


34
2444.35706 seconds


Yes, 40 minutes for the entire word. In contrast, our straight word list scan method is relatively flat with a slow linear increase as we'd expect:


Letters
Time


1
0.30149 seconds


2
0.28906 seconds


3
0.30319 seconds


4
0.30238 seconds


5
0.32480 seconds


6
0.33583 seconds


7
0.34534 seconds


8
0.35774 seconds


9
0.36771 seconds


10
0.39921 seconds


11
0.39273 seconds


12
0.38280 seconds


13
0.39572 seconds


14
0.40934 seconds


15
0.40445 seconds


16
0.40673 seconds


17
0.44207 seconds


18
0.42891 seconds


19
0.42212 seconds


20
0.42697 seconds


21
0.43239 seconds


22
0.43151 seconds


23
0.42970 seconds


24
0.44323 seconds


25
0.43483 seconds


26
0.43742 seconds


27
0.44024 seconds


28
0.47338 seconds


29
0.49609 seconds


30
0.49711 seconds


31
0.49912 seconds


32
0.52220 seconds


33
0.49713 seconds


34
0.50754 seconds


## scan.py
import csv
import sys
from typing import List


def build_base_letter_count(word: str) -> dict[str, int]:
    """
    build_base_letter_count builds a dictionary of letters
    and their counts from a given word.
    """
    letter_count: dict[str, int] = {}

    for letter in word:
        if letter in letter_count:
            letter_count[letter] += 1
        else:
            letter_count[letter] = 1

    return letter_count


def word_composition_check(base_letters: dict[str, int], word: str) -> bool:
    """
    word_composition_check confirms that a given word can
    be formed from letters within the base word, without
    letter re-use. Smaller words is allowed.

    This is accomplished by moving through the word list and
    comparing it to the base letters count dict. If the letter
    from a word X is not in the dict, then the word cannot
    exist; return False. If the letter is in the dict,
    decrement the count. If the count is less than 0, we have
    used that letter too often and thus the word cannot exist;
    return False. If we make it through this process, then the
    word must be possible with our available letters, and we
    return True.
    """
    for letter in word:
        if letter not in base_letters:
            return False
        else:
            base_letters[letter] -= 1

            if base_letters[letter] < 0:
                return False

    return True


def jumble_solve(filename: str, base_word: str) -> List[str]:
    """
    jumble_solve scans through the given wordlist and
    determines what words are possible out of it. It
    solves the jumble while loading the word list as
    opposed to my other solution, which creates a
    reusable graph approach. For single use or longer
    jumbles, this is faster.
    """

    # Get base letter count
    base_letters = build_base_letter_count(base_word)

    # Load our word list
    words: List[str] = []
    with open(filename, "r") as file:
        reader = csv.reader(file)

        for row in reader:
            word = row[0]

            # Note - dict({}) is a performant way to copy a
            # shallow dict, since we don't want to actually
            # modify the count in our scoped base_letters
            if word_composition_check(dict(base_letters), word):
                words.append(word)

    return words


if __name__ == "__main__":
    arguments = sys.argv
    if len(arguments) < 2:
        print("Please provide a word list")
        exit(1)

    wordlist = arguments[1]

    while True:
        # Prompt the user to enter a word
        word = input("Enter a word: ")
        # Set the word to lower case, strip spaces
        word = word.lower().strip().replace(" ", "")

        # Search for all possible words and print them
        # sorted by length, largest first
        words = jumble_solve(wordlist, word)
        words.sort(key=len, reverse=True)
        print(f"Possible words: ({len(words)})")
        print(words)

## tree.py
from __future__ import annotations

import csv
import sys
from typing import List, Optional, Union


class WordList:
    def __init__(self, wordlist: Optional[List] = None):
        # Our word graph is initiated with a blank letter
        # so we can just branch off of that when searching
        self.word_graph = LetterNode("")

        if wordlist is not None:
            for word in wordlist:
                self.add_word(word)

    def add(self, word: str):
        """
        add_word appends a word to our internal tree graph
        of possible words.
        """
        letters: List[str] = list(word)

        current_node = self.word_graph

        for letter in letters:
            # If we have a node for this letter
            # already, utilize it
            node = current_node.get(letter)

            # If not, create a fresh node
            if node is None:
                node = LetterNode(letter)
                current_node.add(node)

            # ...and continue onto the next letter
            current_node = node

        # Mark the leaf node as terminal, in case
        # another word builds off of it. ie; do->dog
        current_node.terminal = True

    def search(self, word: str) -> List[str]:
        """
        search accepts a word and performs a query across the
        tree starting with root node. It returns a list of all
        possible words
        """
        return self.word_graph.search(list(word))


class LetterNode:
    def __init__(self, letter: str, terminal: bool = False) -> None:
        self.letter = letter
        self.terminal = terminal
        self.nodes: List[LetterNode] = []

    def word(self) -> bool:
        """
        word returns if a node is demarcated as a word;
        nodes can be markes as such if its a leaf node
        *or* if its a marked "terminal", aka the end of
        a smaller word ie: sad->sadder - sad is a word
        despite not being a leaf node, as many words
        build off of those three letters.
        """
        return len(self.nodes) == 0 or self.terminal

    def add(self, node: LetterNode):
        """
        add appends a node to our current node
        """
        self.nodes.append(node)

    def get(self, letter: Union[LetterNode, str]) -> Optional[LetterNode]:
        """
        get returns the requested node (be it a node or a string
        letter) if it exists, or None otherwise.
        """
        for node in self.nodes:
            if isinstance(letter, LetterNode):
                if node == letter:
                    return node
            elif isinstance(letter, str):
                if node.letter == letter:
                    return node
            else:
                raise TypeError("node must be of type LetterNode or str")

        return None

    def search(self, letters: List[str]) -> List[str]:
        """
        search accepts a set of letters and then performs a query
        down our node's branches for words that utilize those
        letters. To do this, we traverse each possible letter
        combination recursively. Every time we meet a terminal
        (leaf or demarcated terminal) node, we have found a word.
        We then return up the recursive chain to form the words
        discovered throughout our burrowing.
        """
        words: List[str] = []
        # If our current node is marked as a word, we can return
        # that portion of the word from just this node
        if self.word():
            words.append(self.letter)

        # We iterate through each letter available as our
        # next target levels through  the tree
        for index, letter in enumerate(letters):
            current_node = self.get(letter)
            # If the current node does not exist, we must
            # skip it as there is nothing more to search
            if current_node is None:
                continue

            # Ignoring our current letter, isolate all remaining
            # letters
            remaining_letters = letters[:index] + letters[index + 1 :]

            # Search the current target node with the remaining
            # letters
            found_words = current_node.search(remaining_letters)

            # Combine words if any were returned with the current
            # node's letter.
            for word in found_words:
                word = self.letter + word
                if word not in words:
                    words.append(word)

        return words

    def __str__(self) -> str:
        nodestrings = ""
        for node in self.nodes:
            nodestrings += f"({node})"
        return f"({self.letter})->[{nodestrings}]"

    def __eq__(self, other: LetterNode) -> bool:
        return self.letter == other.letter


def load_word_list(filename: str) -> WordList:
    with open(filename, "r") as file:
        wordlist = WordList()
        reader = csv.reader(file)

        for row in reader:
            word = row[0]

            # Clean the word just in case; limit to lowercase
            # and eliminate spaces.
            word.strip().lower().replace(" ", "")

            wordlist.add(word)

        return wordlist


if __name__ == "__main__":
    arguments = sys.argv
    if len(arguments) < 2:
        print("Please provide a word list")
        exit(1)
    else:
        wordlist = load_word_list(arguments[1])
        print("Wordlist loaded")

    while True:
        # Prompt the user to enter a word
        word = input("Enter a word: ")
        # Set the word to lower case, strip spaces
        word = word.lower().strip().replace(" ", "")

        # Search for all possible words and print them
        # sorted by length, largest first
        words = wordlist.search(word)
        words.sort(key=len, reverse=True)
        print(f"Possible words: ({len(words)})")
        print(words)
Letters	Time
1	0.00004 seconds
2	0.00006 seconds
3	0.00015 seconds
4	0.00023 seconds
5	0.00072 seconds
6	0.00124 seconds
7	0.00324 seconds
8	0.00723 seconds
9	0.02566 seconds
10	0.03621 seconds
11	0.05425 seconds
12	0.10192 seconds
13	0.14558 seconds
14	0.27253 seconds
15	0.37936 seconds
16	0.60944 seconds
17	0.95953 seconds
18	2.08020 seconds
19	3.25990 seconds
20	5.21902 seconds
21	9.40229 seconds
22	10.6053 seconds
23	17.1072 seconds
24	26.1821 seconds
25	45.5737 seconds
26	72.0461 seconds
...	...
34	2444.35706 seconds
Letters	Time
1	0.30149 seconds
2	0.28906 seconds
3	0.30319 seconds
4	0.30238 seconds
5	0.32480 seconds
6	0.33583 seconds
7	0.34534 seconds
8	0.35774 seconds
9	0.36771 seconds
10	0.39921 seconds
11	0.39273 seconds
12	0.38280 seconds
13	0.39572 seconds
14	0.40934 seconds
15	0.40445 seconds
16	0.40673 seconds
17	0.44207 seconds
18	0.42891 seconds
19	0.42212 seconds
20	0.42697 seconds
21	0.43239 seconds
22	0.43151 seconds
23	0.42970 seconds
24	0.44323 seconds
25	0.43483 seconds
26	0.43742 seconds
27	0.44024 seconds
28	0.47338 seconds
29	0.49609 seconds
30	0.49711 seconds
31	0.49912 seconds
32	0.52220 seconds
33	0.49713 seconds
34	0.50754 seconds
	import csv
	import sys
	from typing import List


	def build_base_letter_count(word: str) -> dict[str, int]:
	"""
	build_base_letter_count builds a dictionary of letters
	and their counts from a given word.
	"""
	letter_count: dict[str, int] = {}

	for letter in word:
	if letter in letter_count:
	letter_count[letter] += 1
	else:
	letter_count[letter] = 1

	return letter_count


	def word_composition_check(base_letters: dict[str, int], word: str) -> bool:
	"""
	word_composition_check confirms that a given word can
	be formed from letters within the base word, without
	letter re-use. Smaller words is allowed.

	This is accomplished by moving through the word list and
	comparing it to the base letters count dict. If the letter
	from a word X is not in the dict, then the word cannot
	exist; return False. If the letter is in the dict,
	decrement the count. If the count is less than 0, we have
	used that letter too often and thus the word cannot exist;
	return False. If we make it through this process, then the
	word must be possible with our available letters, and we
	return True.
	"""
	for letter in word:
	if letter not in base_letters:
	return False
	else:
	base_letters[letter] -= 1

	if base_letters[letter] < 0:
	return False

	return True


	def jumble_solve(filename: str, base_word: str) -> List[str]:
	"""
	jumble_solve scans through the given wordlist and
	determines what words are possible out of it. It
	solves the jumble while loading the word list as
	opposed to my other solution, which creates a
	reusable graph approach. For single use or longer
	jumbles, this is faster.
	"""

	# Get base letter count
	base_letters = build_base_letter_count(base_word)

	# Load our word list
	words: List[str] = []
	with open(filename, "r") as file:
	reader = csv.reader(file)

	for row in reader:
	word = row[0]

	# Note - dict({}) is a performant way to copy a
	# shallow dict, since we don't want to actually
	# modify the count in our scoped base_letters
	if word_composition_check(dict(base_letters), word):
	words.append(word)

	return words


	if __name__ == "__main__":
	arguments = sys.argv
	if len(arguments) < 2:
	print("Please provide a word list")
	exit(1)

	wordlist = arguments[1]

	while True:
	# Prompt the user to enter a word
	word = input("Enter a word: ")
	# Set the word to lower case, strip spaces
	word = word.lower().strip().replace(" ", "")

	# Search for all possible words and print them
	# sorted by length, largest first
	words = jumble_solve(wordlist, word)
	words.sort(key=len, reverse=True)
	print(f"Possible words: ({len(words)})")
	print(words)
	from __future__ import annotations

	import csv
	import sys
	from typing import List, Optional, Union


	class WordList:
	def __init__(self, wordlist: Optional[List] = None):
	# Our word graph is initiated with a blank letter
	# so we can just branch off of that when searching
	self.word_graph = LetterNode("")

	if wordlist is not None:
	for word in wordlist:
	self.add_word(word)

	def add(self, word: str):
	"""
	add_word appends a word to our internal tree graph
	of possible words.
	"""
	letters: List[str] = list(word)

	current_node = self.word_graph

	for letter in letters:
	# If we have a node for this letter
	# already, utilize it
	node = current_node.get(letter)

	# If not, create a fresh node
	if node is None:
	node = LetterNode(letter)
	current_node.add(node)

	# ...and continue onto the next letter
	current_node = node

	# Mark the leaf node as terminal, in case
	# another word builds off of it. ie; do->dog
	current_node.terminal = True

	def search(self, word: str) -> List[str]:
	"""
	search accepts a word and performs a query across the
	tree starting with root node. It returns a list of all
	possible words
	"""
	return self.word_graph.search(list(word))


	class LetterNode:
	def __init__(self, letter: str, terminal: bool = False) -> None:
	self.letter = letter
	self.terminal = terminal
	self.nodes: List[LetterNode] = []

	def word(self) -> bool:
	"""
	word returns if a node is demarcated as a word;
	nodes can be markes as such if its a leaf node
	or if its a marked "terminal", aka the end of
	a smaller word ie: sad->sadder - sad is a word
	despite not being a leaf node, as many words
	build off of those three letters.
	"""
	return len(self.nodes) == 0 or self.terminal

	def add(self, node: LetterNode):
	"""
	add appends a node to our current node
	"""
	self.nodes.append(node)

	def get(self, letter: Union[LetterNode, str]) -> Optional[LetterNode]:
	"""
	get returns the requested node (be it a node or a string
	letter) if it exists, or None otherwise.
	"""
	for node in self.nodes:
	if isinstance(letter, LetterNode):
	if node == letter:
	return node
	elif isinstance(letter, str):
	if node.letter == letter:
	return node
	else:
	raise TypeError("node must be of type LetterNode or str")

	return None

	def search(self, letters: List[str]) -> List[str]:
	"""
	search accepts a set of letters and then performs a query
	down our node's branches for words that utilize those
	letters. To do this, we traverse each possible letter
	combination recursively. Every time we meet a terminal
	(leaf or demarcated terminal) node, we have found a word.
	We then return up the recursive chain to form the words
	discovered throughout our burrowing.
	"""
	words: List[str] = []
	# If our current node is marked as a word, we can return
	# that portion of the word from just this node
	if self.word():
	words.append(self.letter)

	# We iterate through each letter available as our
	# next target levels through the tree
	for index, letter in enumerate(letters):
	current_node = self.get(letter)
	# If the current node does not exist, we must
	# skip it as there is nothing more to search
	if current_node is None:
	continue

	# Ignoring our current letter, isolate all remaining
	# letters
	remaining_letters = letters[:index] + letters[index + 1 :]

	# Search the current target node with the remaining
	# letters
	found_words = current_node.search(remaining_letters)

	# Combine words if any were returned with the current
	# node's letter.
	for word in found_words:
	word = self.letter + word
	if word not in words:
	words.append(word)

	return words

	def __str__(self) -> str:
	nodestrings = ""
	for node in self.nodes:
	nodestrings += f"({node})"
	return f"({self.letter})->[{nodestrings}]"

	def __eq__(self, other: LetterNode) -> bool:
	return self.letter == other.letter


	def load_word_list(filename: str) -> WordList:
	with open(filename, "r") as file:
	wordlist = WordList()
	reader = csv.reader(file)

	for row in reader:
	word = row[0]

	# Clean the word just in case; limit to lowercase
	# and eliminate spaces.
	word.strip().lower().replace(" ", "")

	wordlist.add(word)

	return wordlist


	if __name__ == "__main__":
	arguments = sys.argv
	if len(arguments) < 2:
	print("Please provide a word list")
	exit(1)
	else:
	wordlist = load_word_list(arguments[1])
	print("Wordlist loaded")

	while True:
	# Prompt the user to enter a word
	word = input("Enter a word: ")
	# Set the word to lower case, strip spaces
	word = word.lower().strip().replace(" ", "")

	# Search for all possible words and print them
	# sorted by length, largest first
	words = wordlist.search(word)
	words.sort(key=len, reverse=True)
	print(f"Possible words: ({len(words)})")
	print(words)