Created
October 22, 2016 20:01
-
-
Save nickstanisha/c419b70f2201378ade98f92aaca9d299 to your computer and use it in GitHub Desktop.
A notebook that finds the longest possible scrabble word, built one letter at a time from smaller valid scrabble words.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Scrabble Mini Riddler\n", | |
"[Link to original](http://fivethirtyeight.com/features/this-challenge-will-boggle-your-mind/)\n", | |
">What is the longest word you can build in a game of Scrabble one letter at a time? That is, starting with a valid two-letter word, how long a word can you build by playing one letter at a time on either side to form a valid three-letter word, then a valid four-letter word, and so on? (For example, HE could become THE, then THEM, then THEME, then THEMES, for a six-letter result.)\n", | |
"\n", | |
"We're going to approach this as a graph search problem, where each of our nodes is a unique word connected via directed edges to words reachable from it (e.g. he -> the). First, we'll need a list of english words, which I've downloaded from the TWL06 dictionary [here](https://www.wordgamedictionary.com/twl06/download/twl06.txt)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import networkx as nx" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"There are 178691 words in our vocabulary.\n" | |
] | |
} | |
], | |
"source": [ | |
"words, starters = set(), set()\n", | |
"c = 0\n", | |
"with open('scrabble_word_list.txt', 'r') as infile:\n", | |
" for line in infile:\n", | |
" word = line.strip()\n", | |
" if not word.isalpha():\n", | |
" continue\n", | |
" \n", | |
" if len(word) == 2:\n", | |
" starters.add(word)\n", | |
" words.add(word)\n", | |
"print(\"There are {} words in our vocabulary.\".format(len(words)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Using a directed graph frees us from the burden of keeping around a list of \"visited\" nodes commonly seen in BFS and DFS implementations. This is because our word search graph is a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). This graph construction completes quickly (~900 ms), even with a vocabulary size greater than 300,000." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"alphabet = 'abcdefghijklmnopqrstuvwyz'\n", | |
"g = nx.DiGraph()\n", | |
"\n", | |
"stack = [(None, s) for s in starters] # Parent, child pairs\n", | |
"while stack:\n", | |
" parent, child = stack.pop()\n", | |
" if parent is None:\n", | |
" g.add_node(child)\n", | |
" else:\n", | |
" g.add_edge(parent, child)\n", | |
" neighbors = [(child, c + child) for c in alphabet if c + child in words]\n", | |
" neighbors += [(child, child + c) for c in alphabet if child + c in words]\n", | |
" stack.extend(neighbors)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Out of 178691 words, 8331 are reachable beginning from a valid two-letter word.\n" | |
] | |
} | |
], | |
"source": [ | |
"print(\"Out of {} words, {} are reachable beginning from a valid two-letter word.\".format(len(words), len(g)))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def find_path_to(word):\n", | |
" \"\"\" DFS terminating at `word`. Returns a single path to `word` \"\"\"\n", | |
" if word not in g:\n", | |
" return None\n", | |
" stack = [(s, []) for s in starters]\n", | |
" while stack:\n", | |
" cur, path = stack.pop()\n", | |
" if cur == word:\n", | |
" return path + [cur]\n", | |
" new_path = path + [cur]\n", | |
" neighbors = [(c + cur, new_path) for c in alphabet if c + cur in words]\n", | |
" neighbors += [(cur + c, new_path) for c in alphabet if cur + c in words]\n", | |
" stack.extend(neighbors)\n", | |
" return None\n", | |
"\n", | |
"def longest_words_from(word):\n", | |
" \"\"\" DFS starting from `word`, returns a list of the longest reachable words \"\"\"\n", | |
" if word not in g:\n", | |
" return None\n", | |
" longest_words = list()\n", | |
" stack = [word]\n", | |
" while stack:\n", | |
" cur = stack.pop()\n", | |
" neighbors = [c + cur for c in alphabet if c + cur in words]\n", | |
" neighbors += [cur + c for c in alphabet if cur + c in words]\n", | |
" stack.extend(neighbors)\n", | |
" if not neighbors:\n", | |
" if not longest_words or len(cur) > len(longest_words[0]):\n", | |
" longest_words = [cur]\n", | |
" elif len(cur) == len(longest_words[0]) and cur not in longest_words:\n", | |
" longest_words.append(cur)\n", | |
" return longest_words" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['scrapings']" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"longest_words_from('in')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['relapsers', 'glassiest', 'sheathers', 'classists', 'scrapings', 'upraisers', 'classisms']\n" | |
] | |
} | |
], | |
"source": [ | |
"longest_words = ['a']\n", | |
"for word in g:\n", | |
" if len(word) > len(longest_words[0]):\n", | |
" longest_words = [word]\n", | |
" elif len(word) == len(longest_words[0]):\n", | |
" longest_words.append(word)\n", | |
"print(longest_words)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[['la', 'lap', 'laps', 'lapse', 'elapse', 'relapse', 'relapser', 'relapsers'], ['la', 'las', 'lass', 'lassi', 'lassie', 'lassies', 'glassies', 'glassiest'], ['at', 'eat', 'eath', 'heath', 'sheath', 'sheathe', 'sheather', 'sheathers'], ['la', 'las', 'lass', 'lassi', 'lassis', 'classis', 'classist', 'classists'], ['in', 'pin', 'ping', 'aping', 'raping', 'craping', 'scraping', 'scrapings'], ['is', 'ais', 'rais', 'raise', 'raiser', 'raisers', 'praisers', 'upraisers'], ['la', 'las', 'lass', 'lassi', 'lassis', 'classis', 'classism', 'classisms']]\n" | |
] | |
} | |
], | |
"source": [ | |
"print([find_path_to(w) for w in longest_words])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"find_path_to('pastorales')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#nx.write_graphml(g, 'word_search.graphml')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"adj_list = dict()\n", | |
"for node in g:\n", | |
" adj_list[node] = sorted(g.neighbors(node))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import json\n", | |
"with open('word_graph.json', 'w') as outfile:\n", | |
" json.dump(adj_list, outfile)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment