Skip to content

Instantly share code, notes, and snippets.

@nickstanisha
Created October 22, 2016 20:01
Show Gist options
  • Save nickstanisha/c419b70f2201378ade98f92aaca9d299 to your computer and use it in GitHub Desktop.
Save nickstanisha/c419b70f2201378ade98f92aaca9d299 to your computer and use it in GitHub Desktop.
A notebook that finds the longest possible scrabble word, built one letter at a time from smaller valid scrabble words.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scrabble Mini Riddler\n",
"[Link to original](http://fivethirtyeight.com/features/this-challenge-will-boggle-your-mind/)\n",
">What is the longest word you can build in a game of Scrabble one letter at a time? That is, starting with a valid two-letter word, how long a word can you build by playing one letter at a time on either side to form a valid three-letter word, then a valid four-letter word, and so on? (For example, HE could become THE, then THEM, then THEME, then THEMES, for a six-letter result.)\n",
"\n",
"We're going to approach this as a graph search problem, where each of our nodes is a unique word connected via directed edges to words reachable from it (e.g. he -> the). First, we'll need a list of english words, which I've downloaded from the TWL06 dictionary [here](https://www.wordgamedictionary.com/twl06/download/twl06.txt)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import networkx as nx"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are 178691 words in our vocabulary.\n"
]
}
],
"source": [
"words, starters = set(), set()\n",
"c = 0\n",
"with open('scrabble_word_list.txt', 'r') as infile:\n",
" for line in infile:\n",
" word = line.strip()\n",
" if not word.isalpha():\n",
" continue\n",
" \n",
" if len(word) == 2:\n",
" starters.add(word)\n",
" words.add(word)\n",
"print(\"There are {} words in our vocabulary.\".format(len(words)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using a directed graph frees us from the burden of keeping around a list of \"visited\" nodes commonly seen in BFS and DFS implementations. This is because our word search graph is a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). This graph construction completes quickly (~900 ms), even with a vocabulary size greater than 300,000."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"alphabet = 'abcdefghijklmnopqrstuvwyz'\n",
"g = nx.DiGraph()\n",
"\n",
"stack = [(None, s) for s in starters] # Parent, child pairs\n",
"while stack:\n",
" parent, child = stack.pop()\n",
" if parent is None:\n",
" g.add_node(child)\n",
" else:\n",
" g.add_edge(parent, child)\n",
" neighbors = [(child, c + child) for c in alphabet if c + child in words]\n",
" neighbors += [(child, child + c) for c in alphabet if child + c in words]\n",
" stack.extend(neighbors)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Out of 178691 words, 8331 are reachable beginning from a valid two-letter word.\n"
]
}
],
"source": [
"print(\"Out of {} words, {} are reachable beginning from a valid two-letter word.\".format(len(words), len(g)))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def find_path_to(word):\n",
" \"\"\" DFS terminating at `word`. Returns a single path to `word` \"\"\"\n",
" if word not in g:\n",
" return None\n",
" stack = [(s, []) for s in starters]\n",
" while stack:\n",
" cur, path = stack.pop()\n",
" if cur == word:\n",
" return path + [cur]\n",
" new_path = path + [cur]\n",
" neighbors = [(c + cur, new_path) for c in alphabet if c + cur in words]\n",
" neighbors += [(cur + c, new_path) for c in alphabet if cur + c in words]\n",
" stack.extend(neighbors)\n",
" return None\n",
"\n",
"def longest_words_from(word):\n",
" \"\"\" DFS starting from `word`, returns a list of the longest reachable words \"\"\"\n",
" if word not in g:\n",
" return None\n",
" longest_words = list()\n",
" stack = [word]\n",
" while stack:\n",
" cur = stack.pop()\n",
" neighbors = [c + cur for c in alphabet if c + cur in words]\n",
" neighbors += [cur + c for c in alphabet if cur + c in words]\n",
" stack.extend(neighbors)\n",
" if not neighbors:\n",
" if not longest_words or len(cur) > len(longest_words[0]):\n",
" longest_words = [cur]\n",
" elif len(cur) == len(longest_words[0]) and cur not in longest_words:\n",
" longest_words.append(cur)\n",
" return longest_words"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['scrapings']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"longest_words_from('in')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['relapsers', 'glassiest', 'sheathers', 'classists', 'scrapings', 'upraisers', 'classisms']\n"
]
}
],
"source": [
"longest_words = ['a']\n",
"for word in g:\n",
" if len(word) > len(longest_words[0]):\n",
" longest_words = [word]\n",
" elif len(word) == len(longest_words[0]):\n",
" longest_words.append(word)\n",
"print(longest_words)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[['la', 'lap', 'laps', 'lapse', 'elapse', 'relapse', 'relapser', 'relapsers'], ['la', 'las', 'lass', 'lassi', 'lassie', 'lassies', 'glassies', 'glassiest'], ['at', 'eat', 'eath', 'heath', 'sheath', 'sheathe', 'sheather', 'sheathers'], ['la', 'las', 'lass', 'lassi', 'lassis', 'classis', 'classist', 'classists'], ['in', 'pin', 'ping', 'aping', 'raping', 'craping', 'scraping', 'scrapings'], ['is', 'ais', 'rais', 'raise', 'raiser', 'raisers', 'praisers', 'upraisers'], ['la', 'las', 'lass', 'lassi', 'lassis', 'classis', 'classism', 'classisms']]\n"
]
}
],
"source": [
"print([find_path_to(w) for w in longest_words])"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"find_path_to('pastorales')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#nx.write_graphml(g, 'word_search.graphml')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"adj_list = dict()\n",
"for node in g:\n",
" adj_list[node] = sorted(g.neighbors(node))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import json\n",
"with open('word_graph.json', 'w') as outfile:\n",
" json.dump(adj_list, outfile)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment