Skip to content

Instantly share code, notes, and snippets.

@markbyrne
Last active February 21, 2022 00:17
Show Gist options
  • Save markbyrne/9b128f824dca4738c5665c86513d321e to your computer and use it in GitHub Desktop.
Save markbyrne/9b128f824dca4738c5665c86513d321e to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "aea5486d",
"metadata": {},
"source": [
"##### Current as of: 20 February 2022\n",
"# WORDLE Dictionary Analysis\n",
"## Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ae65bdc9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: requests in /opt/anaconda3/lib/python3.9/site-packages (2.26.0)\n",
"Requirement already satisfied: bs4 in /opt/anaconda3/lib/python3.9/site-packages (0.0.1)\n",
"Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.9/site-packages (1.3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (1.26.7)\n",
"Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (3.2)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (2.0.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (2021.10.8)\n",
"Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/lib/python3.9/site-packages (from bs4) (4.10.0)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2)\n",
"Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2021.3)\n",
"Requirement already satisfied: numpy>=1.17.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.20.3)\n",
"Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\n",
"Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/lib/python3.9/site-packages (from beautifulsoup4->bs4) (2.2.1)\n"
]
}
],
"source": [
"!pip3 install requests bs4 pandas"
]
},
{
"cell_type": "markdown",
"id": "acf4a20e",
"metadata": {},
"source": [
"## Get WORDLE Dictionary Data\n",
"### Overview\n",
"WORDLE uses 2 dictionaries. A \"solutions\" dictionary of more commonly known words, and an \"other words\" dictionary, containing valid guesses, but will not be the solution.\n",
"\n",
"To grab these arrays, we parse https://www.nytimes.com/games/wordle/index.html, and scrape for all Javascript src files.\n",
"\n",
"We want the main.js file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b5a1d0f8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All script files found:\n",
"['https://www.nytimes.com/games/wordle/main.4d41d2be.js', 'https://www.nytimes.com/games-assets/gdpr/cookie-notice-v2.1.2.min.js']\n",
"\n",
"Current main.js file is: https://www.nytimes.com/games/wordle/main.4d41d2be.js\n"
]
}
],
"source": [
"import requests\n",
"from bs4 import BeautifulSoup as bs\n",
"from urllib.parse import urljoin\n",
"\n",
"# URL of the web page you want to extract\n",
"url = \"https://www.nytimes.com/games/wordle/index.html\"\n",
"\n",
"# initialize a session\n",
"session = requests.Session()\n",
"# set the User-agent as a regular browser\n",
"session.headers[\"User-Agent\"] = \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36\"\n",
"\n",
"# get the HTML content\n",
"html = session.get(url).content\n",
"\n",
"# parse HTML using beautiful soup\n",
"soup = bs(html, \"html.parser\")\n",
"\n",
"# get the JavaScript files\n",
"script_files = []\n",
"main_script = None\n",
"\n",
"for script in soup.find_all(\"script\"):\n",
" if script.attrs.get(\"src\"):\n",
" # if the tag has the attribute 'src'\n",
" script_url = urljoin(url, script.attrs.get(\"src\"))\n",
" script_files.append(script_url)\n",
" if \"main\" in script_url:\n",
" main_script = script_url\n",
" \n",
"main = session.get(main_script).content.decode()\n",
"\n",
"print(f\"All script files found:\\n{script_files}\")\n",
"print()\n",
"print(f\"Current main.js file is: {main_script}\")"
]
},
{
"cell_type": "markdown",
"id": "1c6f4309",
"metadata": {},
"source": [
"### Parse the .js file\n",
"Now that we have found the .js file, we need to parse through to grab the dictionary data.\n",
"\n",
"The 'solutions' dictionary is set to:\n",
"```javascript \n",
"var Ma=[]\n",
"```\n",
"and the 'other words' dictionary is set to:\n",
"```javascript \n",
"var Oa=[]\n",
"```\n",
"We will use `re.search()` with the regex `var Ma=\\[(.*?)\\]` to grab the solutions dictionary, and `Oa=\\[(.*?)\\]`to grab the other words."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1c6be020",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solutions: 2309\n",
"Other valid words: 10638\n",
"Full dictionary words: 12947\n"
]
}
],
"source": [
"import re\n",
"\n",
"# get solutions\n",
"match = re.search(\"var Ma=\\[(.*?)\\]\",main)\n",
"solutions = match.groups()[0].replace(\"\\\"\",\"\").split(\",\")\n",
"print(f\"Solutions: {len(solutions)}\")\n",
"\n",
"# get other valid words\n",
"match = re.search(\"Oa=\\[(.*?)\\]\",main)\n",
"other_words = match.groups()[0].replace(\"\\\"\",\"\").split(\",\")\n",
"print(f\"Other valid words: {len(other_words)}\")\n",
"\n",
"# build full dictionary for later use\n",
"dictionary = other_words + solutions\n",
"print(f\"Full dictionary words: {len(dictionary)}\")"
]
},
{
"cell_type": "markdown",
"id": "fda0473f",
"metadata": {},
"source": [
"NOTE: Solutions dictionary is only 2,309 words, where as the entire search space of valid words is 12,947 words. \n",
"\n",
"For the purposes of this analysis, we will only use the solution words.\n",
"\n",
"## Build the DataFrames\n",
"Next, we will place the data into a pandas DataFrame and start our analysis."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52b0e845",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" </tr>\n",
" <tr>\n",
" <th>Words</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>cigar</th>\n",
" </tr>\n",
" <tr>\n",
" <th>rebut</th>\n",
" </tr>\n",
" <tr>\n",
" <th>sissy</th>\n",
" </tr>\n",
" <tr>\n",
" <th>humph</th>\n",
" </tr>\n",
" <tr>\n",
" <th>awake</th>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: []\n",
"Index: [cigar, rebut, sissy, humph, awake]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"solutions_df = pd.DataFrame(data=solutions, columns=[\"Words\"])\n",
"solutions_df.set_index(\"Words\", inplace=True)\n",
"solutions_df.head()"
]
},
{
"cell_type": "markdown",
"id": "d19a5eb4",
"metadata": {},
"source": [
"### Determine Letter Frequency\n",
"To determine which letters appear most frequent, we will combine all our words and count the letters.\n",
"\n",
"We give each letter a score of 26-1, from most frequent to least frequent."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6545ae13",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Frequency</th>\n",
" <th>Score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Letters</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>e</th>\n",
" <td>1230</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>975</td>\n",
" <td>25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>r</th>\n",
" <td>897</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>o</th>\n",
" <td>753</td>\n",
" <td>23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>t</th>\n",
" <td>729</td>\n",
" <td>22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>l</th>\n",
" <td>716</td>\n",
" <td>21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>i</th>\n",
" <td>670</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s</th>\n",
" <td>668</td>\n",
" <td>19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>n</th>\n",
" <td>573</td>\n",
" <td>18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <td>475</td>\n",
" <td>17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>u</th>\n",
" <td>466</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>y</th>\n",
" <td>424</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d</th>\n",
" <td>393</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>h</th>\n",
" <td>387</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>p</th>\n",
" <td>365</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>m</th>\n",
" <td>316</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>g</th>\n",
" <td>310</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>280</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>f</th>\n",
" <td>229</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>k</th>\n",
" <td>210</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>w</th>\n",
" <td>194</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v</th>\n",
" <td>152</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>40</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>x</th>\n",
" <td>37</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>q</th>\n",
" <td>29</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>j</th>\n",
" <td>27</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Frequency Score\n",
"Letters \n",
"e 1230 26\n",
"a 975 25\n",
"r 897 24\n",
"o 753 23\n",
"t 729 22\n",
"l 716 21\n",
"i 670 20\n",
"s 668 19\n",
"n 573 18\n",
"c 475 17\n",
"u 466 16\n",
"y 424 15\n",
"d 393 14\n",
"h 387 13\n",
"p 365 12\n",
"m 316 11\n",
"g 310 10\n",
"b 280 9\n",
"f 229 8\n",
"k 210 7\n",
"w 194 6\n",
"v 152 5\n",
"z 40 4\n",
"x 37 3\n",
"q 29 2\n",
"j 27 1"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from collections import Counter\n",
"letter_counts = Counter(\"\".join(solutions_df.index))\n",
"letter_frequency_df = pd.DataFrame(data=letter_counts.most_common(26), columns=[\"Letters\",\"Frequency\"])\n",
"letter_frequency_df[\"Score\"] = range(26,0,-1)\n",
"letter_frequency_df.set_index(\"Letters\", inplace=True)\n",
"letter_frequency_df"
]
},
{
"cell_type": "markdown",
"id": "5b08c37c",
"metadata": {},
"source": [
"## Score the Words\n",
"Now, we iterate through each word, and count up the score of each distinct letter. We are trying to find the words that will give us the best opportunity of scoring yellow tiles on our first guess, so it is more beneficial to favor words without repeat letters. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ac6a6fbf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Words</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alert</th>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alter</th>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>later</th>\n",
" <td>118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>arose</th>\n",
" <td>117</td>\n",
" </tr>\n",
" <tr>\n",
" <th>irate</th>\n",
" <td>117</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stare</th>\n",
" <td>116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>arise</th>\n",
" <td>114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>atone</th>\n",
" <td>114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cater</th>\n",
" <td>114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>crate</th>\n",
" <td>114</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Score\n",
"Words \n",
"alert 118\n",
"alter 118\n",
"later 118\n",
"arose 117\n",
"irate 117\n",
"stare 116\n",
"arise 114\n",
"atone 114\n",
"cater 114\n",
"crate 114"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find best words by letter frequency\n",
"solutions_df[\"Score\"] = 0\n",
"for word in solutions_df.index:\n",
" for letter in list(set(word)):\n",
" solutions_df.at[word,'Score'] += letter_frequency_df.loc[letter,'Score']\n",
"solutions_df.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
"\n",
"solutions_df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "0b0f7e0a",
"metadata": {},
"source": [
"We find that `alert`, `alter`, and `later` tie for best word by letter frequency. That is pretty good, but the truly optimal guess would take letter position into account as well, to help boost our chances of hitting a coveted green tile. Let's see what we can do about that.\n",
"## Build Out Letter Frequency by Position\n",
"Next, let's re-count our letters, but this time we will keep track of what position they were in the word."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d76506ae",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Frequency</th>\n",
" <th>Score</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Letters</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>e</th>\n",
" <td>1230</td>\n",
" <td>26</td>\n",
" <td>111</td>\n",
" <td>320</td>\n",
" <td>208</td>\n",
" <td>244</td>\n",
" <td>347</td>\n",
" </tr>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>975</td>\n",
" <td>25</td>\n",
" <td>171</td>\n",
" <td>336</td>\n",
" <td>294</td>\n",
" <td>136</td>\n",
" <td>38</td>\n",
" </tr>\n",
" <tr>\n",
" <th>r</th>\n",
" <td>897</td>\n",
" <td>24</td>\n",
" <td>130</td>\n",
" <td>283</td>\n",
" <td>178</td>\n",
" <td>132</td>\n",
" <td>174</td>\n",
" </tr>\n",
" <tr>\n",
" <th>o</th>\n",
" <td>753</td>\n",
" <td>23</td>\n",
" <td>46</td>\n",
" <td>321</td>\n",
" <td>248</td>\n",
" <td>92</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>t</th>\n",
" <td>729</td>\n",
" <td>22</td>\n",
" <td>183</td>\n",
" <td>87</td>\n",
" <td>119</td>\n",
" <td>111</td>\n",
" <td>229</td>\n",
" </tr>\n",
" <tr>\n",
" <th>l</th>\n",
" <td>716</td>\n",
" <td>21</td>\n",
" <td>98</td>\n",
" <td>206</td>\n",
" <td>134</td>\n",
" <td>156</td>\n",
" <td>122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>i</th>\n",
" <td>670</td>\n",
" <td>20</td>\n",
" <td>40</td>\n",
" <td>216</td>\n",
" <td>263</td>\n",
" <td>143</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s</th>\n",
" <td>668</td>\n",
" <td>19</td>\n",
" <td>381</td>\n",
" <td>20</td>\n",
" <td>83</td>\n",
" <td>170</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>n</th>\n",
" <td>573</td>\n",
" <td>18</td>\n",
" <td>44</td>\n",
" <td>96</td>\n",
" <td>137</td>\n",
" <td>172</td>\n",
" <td>124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <td>475</td>\n",
" <td>17</td>\n",
" <td>225</td>\n",
" <td>41</td>\n",
" <td>51</td>\n",
" <td>132</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>u</th>\n",
" <td>466</td>\n",
" <td>16</td>\n",
" <td>37</td>\n",
" <td>191</td>\n",
" <td>163</td>\n",
" <td>74</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>y</th>\n",
" <td>424</td>\n",
" <td>15</td>\n",
" <td>6</td>\n",
" <td>24</td>\n",
" <td>35</td>\n",
" <td>3</td>\n",
" <td>356</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d</th>\n",
" <td>393</td>\n",
" <td>14</td>\n",
" <td>121</td>\n",
" <td>23</td>\n",
" <td>79</td>\n",
" <td>56</td>\n",
" <td>114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>h</th>\n",
" <td>387</td>\n",
" <td>13</td>\n",
" <td>76</td>\n",
" <td>147</td>\n",
" <td>9</td>\n",
" <td>28</td>\n",
" <td>127</td>\n",
" </tr>\n",
" <tr>\n",
" <th>p</th>\n",
" <td>365</td>\n",
" <td>12</td>\n",
" <td>152</td>\n",
" <td>64</td>\n",
" <td>54</td>\n",
" <td>41</td>\n",
" <td>54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>m</th>\n",
" <td>316</td>\n",
" <td>11</td>\n",
" <td>120</td>\n",
" <td>38</td>\n",
" <td>61</td>\n",
" <td>59</td>\n",
" <td>38</td>\n",
" </tr>\n",
" <tr>\n",
" <th>g</th>\n",
" <td>310</td>\n",
" <td>10</td>\n",
" <td>119</td>\n",
" <td>12</td>\n",
" <td>73</td>\n",
" <td>67</td>\n",
" <td>39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>280</td>\n",
" <td>9</td>\n",
" <td>179</td>\n",
" <td>18</td>\n",
" <td>58</td>\n",
" <td>16</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>f</th>\n",
" <td>229</td>\n",
" <td>8</td>\n",
" <td>139</td>\n",
" <td>11</td>\n",
" <td>24</td>\n",
" <td>42</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>k</th>\n",
" <td>210</td>\n",
" <td>7</td>\n",
" <td>26</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>53</td>\n",
" <td>107</td>\n",
" </tr>\n",
" <tr>\n",
" <th>w</th>\n",
" <td>194</td>\n",
" <td>6</td>\n",
" <td>83</td>\n",
" <td>44</td>\n",
" <td>26</td>\n",
" <td>25</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v</th>\n",
" <td>152</td>\n",
" <td>5</td>\n",
" <td>46</td>\n",
" <td>15</td>\n",
" <td>49</td>\n",
" <td>42</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>40</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>16</td>\n",
" <td>15</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>x</th>\n",
" <td>37</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" <td>12</td>\n",
" <td>3</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>q</th>\n",
" <td>29</td>\n",
" <td>2</td>\n",
" <td>23</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>j</th>\n",
" <td>27</td>\n",
" <td>1</td>\n",
" <td>20</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Frequency Score 1 2 3 4 5\n",
"Letters \n",
"e 1230 26 111 320 208 244 347\n",
"a 975 25 171 336 294 136 38\n",
"r 897 24 130 283 178 132 174\n",
"o 753 23 46 321 248 92 46\n",
"t 729 22 183 87 119 111 229\n",
"l 716 21 98 206 134 156 122\n",
"i 670 20 40 216 263 143 8\n",
"s 668 19 381 20 83 170 14\n",
"n 573 18 44 96 137 172 124\n",
"c 475 17 225 41 51 132 26\n",
"u 466 16 37 191 163 74 1\n",
"y 424 15 6 24 35 3 356\n",
"d 393 14 121 23 79 56 114\n",
"h 387 13 76 147 9 28 127\n",
"p 365 12 152 64 54 41 54\n",
"m 316 11 120 38 61 59 38\n",
"g 310 10 119 12 73 67 39\n",
"b 280 9 179 18 58 16 9\n",
"f 229 8 139 11 24 42 13\n",
"k 210 7 26 12 12 53 107\n",
"w 194 6 83 44 26 25 16\n",
"v 152 5 46 15 49 42 0\n",
"z 40 4 3 2 16 15 4\n",
"x 37 3 0 14 12 3 8\n",
"q 29 2 23 5 1 0 0\n",
"j 27 1 20 2 3 2 0"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# copy letter_frequency_df\n",
"letter_position_df = letter_frequency_df.copy()\n",
"\n",
"# add 5 new columns, set data to 0\n",
"letter_position_df[[1,2,3,4,5]] = 0\n",
"\n",
"# count each letter and note position\n",
"for word in solutions_df.index:\n",
" for letter in word:\n",
" letter_position_df.at[letter,word.index(letter)+1] += 1\n",
"\n",
"letter_position_df"
]
},
{
"cell_type": "markdown",
"id": "dc8d7bdf",
"metadata": {},
"source": [
"## Normalize and Create a New Score Matrix\n",
"Next, we will normalize the letter counts we just found by dividing each position by the overall frequency of that letter. Following the normalization, we multiply each position by the overall letter score. Now, we have a matrix that favors more frequent words in more frequent positions when we rescore our words."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "308ace95",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Frequency</th>\n",
" <th>Score</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Letters</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>e</th>\n",
" <td>1230</td>\n",
" <td>26</td>\n",
" <td>2.346341</td>\n",
" <td>6.764228</td>\n",
" <td>4.396748</td>\n",
" <td>5.157724</td>\n",
" <td>7.334959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>975</td>\n",
" <td>25</td>\n",
" <td>4.384615</td>\n",
" <td>8.615385</td>\n",
" <td>7.538462</td>\n",
" <td>3.487179</td>\n",
" <td>0.974359</td>\n",
" </tr>\n",
" <tr>\n",
" <th>r</th>\n",
" <td>897</td>\n",
" <td>24</td>\n",
" <td>3.478261</td>\n",
" <td>7.571906</td>\n",
" <td>4.762542</td>\n",
" <td>3.531773</td>\n",
" <td>4.655518</td>\n",
" </tr>\n",
" <tr>\n",
" <th>o</th>\n",
" <td>753</td>\n",
" <td>23</td>\n",
" <td>1.405046</td>\n",
" <td>9.804781</td>\n",
" <td>7.575033</td>\n",
" <td>2.810093</td>\n",
" <td>1.405046</td>\n",
" </tr>\n",
" <tr>\n",
" <th>t</th>\n",
" <td>729</td>\n",
" <td>22</td>\n",
" <td>5.522634</td>\n",
" <td>2.625514</td>\n",
" <td>3.591221</td>\n",
" <td>3.349794</td>\n",
" <td>6.910837</td>\n",
" </tr>\n",
" <tr>\n",
" <th>l</th>\n",
" <td>716</td>\n",
" <td>21</td>\n",
" <td>2.874302</td>\n",
" <td>6.041899</td>\n",
" <td>3.930168</td>\n",
" <td>4.575419</td>\n",
" <td>3.578212</td>\n",
" </tr>\n",
" <tr>\n",
" <th>i</th>\n",
" <td>670</td>\n",
" <td>20</td>\n",
" <td>1.194030</td>\n",
" <td>6.447761</td>\n",
" <td>7.850746</td>\n",
" <td>4.268657</td>\n",
" <td>0.238806</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s</th>\n",
" <td>668</td>\n",
" <td>19</td>\n",
" <td>10.836826</td>\n",
" <td>0.568862</td>\n",
" <td>2.360778</td>\n",
" <td>4.835329</td>\n",
" <td>0.398204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>n</th>\n",
" <td>573</td>\n",
" <td>18</td>\n",
" <td>1.382199</td>\n",
" <td>3.015707</td>\n",
" <td>4.303665</td>\n",
" <td>5.403141</td>\n",
" <td>3.895288</td>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <td>475</td>\n",
" <td>17</td>\n",
" <td>8.052632</td>\n",
" <td>1.467368</td>\n",
" <td>1.825263</td>\n",
" <td>4.724211</td>\n",
" <td>0.930526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>u</th>\n",
" <td>466</td>\n",
" <td>16</td>\n",
" <td>1.270386</td>\n",
" <td>6.557940</td>\n",
" <td>5.596567</td>\n",
" <td>2.540773</td>\n",
" <td>0.034335</td>\n",
" </tr>\n",
" <tr>\n",
" <th>y</th>\n",
" <td>424</td>\n",
" <td>15</td>\n",
" <td>0.212264</td>\n",
" <td>0.849057</td>\n",
" <td>1.238208</td>\n",
" <td>0.106132</td>\n",
" <td>12.594340</td>\n",
" </tr>\n",
" <tr>\n",
" <th>d</th>\n",
" <td>393</td>\n",
" <td>14</td>\n",
" <td>4.310433</td>\n",
" <td>0.819338</td>\n",
" <td>2.814249</td>\n",
" <td>1.994911</td>\n",
" <td>4.061069</td>\n",
" </tr>\n",
" <tr>\n",
" <th>h</th>\n",
" <td>387</td>\n",
" <td>13</td>\n",
" <td>2.552972</td>\n",
" <td>4.937984</td>\n",
" <td>0.302326</td>\n",
" <td>0.940568</td>\n",
" <td>4.266150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>p</th>\n",
" <td>365</td>\n",
" <td>12</td>\n",
" <td>4.997260</td>\n",
" <td>2.104110</td>\n",
" <td>1.775342</td>\n",
" <td>1.347945</td>\n",
" <td>1.775342</td>\n",
" </tr>\n",
" <tr>\n",
" <th>m</th>\n",
" <td>316</td>\n",
" <td>11</td>\n",
" <td>4.177215</td>\n",
" <td>1.322785</td>\n",
" <td>2.123418</td>\n",
" <td>2.053797</td>\n",
" <td>1.322785</td>\n",
" </tr>\n",
" <tr>\n",
" <th>g</th>\n",
" <td>310</td>\n",
" <td>10</td>\n",
" <td>3.838710</td>\n",
" <td>0.387097</td>\n",
" <td>2.354839</td>\n",
" <td>2.161290</td>\n",
" <td>1.258065</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>280</td>\n",
" <td>9</td>\n",
" <td>5.753571</td>\n",
" <td>0.578571</td>\n",
" <td>1.864286</td>\n",
" <td>0.514286</td>\n",
" <td>0.289286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>f</th>\n",
" <td>229</td>\n",
" <td>8</td>\n",
" <td>4.855895</td>\n",
" <td>0.384279</td>\n",
" <td>0.838428</td>\n",
" <td>1.467249</td>\n",
" <td>0.454148</td>\n",
" </tr>\n",
" <tr>\n",
" <th>k</th>\n",
" <td>210</td>\n",
" <td>7</td>\n",
" <td>0.866667</td>\n",
" <td>0.400000</td>\n",
" <td>0.400000</td>\n",
" <td>1.766667</td>\n",
" <td>3.566667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>w</th>\n",
" <td>194</td>\n",
" <td>6</td>\n",
" <td>2.567010</td>\n",
" <td>1.360825</td>\n",
" <td>0.804124</td>\n",
" <td>0.773196</td>\n",
" <td>0.494845</td>\n",
" </tr>\n",
" <tr>\n",
" <th>v</th>\n",
" <td>152</td>\n",
" <td>5</td>\n",
" <td>1.513158</td>\n",
" <td>0.493421</td>\n",
" <td>1.611842</td>\n",
" <td>1.381579</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>z</th>\n",
" <td>40</td>\n",
" <td>4</td>\n",
" <td>0.300000</td>\n",
" <td>0.200000</td>\n",
" <td>1.600000</td>\n",
" <td>1.500000</td>\n",
" <td>0.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>x</th>\n",
" <td>37</td>\n",
" <td>3</td>\n",
" <td>0.000000</td>\n",
" <td>1.135135</td>\n",
" <td>0.972973</td>\n",
" <td>0.243243</td>\n",
" <td>0.648649</td>\n",
" </tr>\n",
" <tr>\n",
" <th>q</th>\n",
" <td>29</td>\n",
" <td>2</td>\n",
" <td>1.586207</td>\n",
" <td>0.344828</td>\n",
" <td>0.068966</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>j</th>\n",
" <td>27</td>\n",
" <td>1</td>\n",
" <td>0.740741</td>\n",
" <td>0.074074</td>\n",
" <td>0.111111</td>\n",
" <td>0.074074</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Frequency Score 1 2 3 4 5\n",
"Letters \n",
"e 1230 26 2.346341 6.764228 4.396748 5.157724 7.334959\n",
"a 975 25 4.384615 8.615385 7.538462 3.487179 0.974359\n",
"r 897 24 3.478261 7.571906 4.762542 3.531773 4.655518\n",
"o 753 23 1.405046 9.804781 7.575033 2.810093 1.405046\n",
"t 729 22 5.522634 2.625514 3.591221 3.349794 6.910837\n",
"l 716 21 2.874302 6.041899 3.930168 4.575419 3.578212\n",
"i 670 20 1.194030 6.447761 7.850746 4.268657 0.238806\n",
"s 668 19 10.836826 0.568862 2.360778 4.835329 0.398204\n",
"n 573 18 1.382199 3.015707 4.303665 5.403141 3.895288\n",
"c 475 17 8.052632 1.467368 1.825263 4.724211 0.930526\n",
"u 466 16 1.270386 6.557940 5.596567 2.540773 0.034335\n",
"y 424 15 0.212264 0.849057 1.238208 0.106132 12.594340\n",
"d 393 14 4.310433 0.819338 2.814249 1.994911 4.061069\n",
"h 387 13 2.552972 4.937984 0.302326 0.940568 4.266150\n",
"p 365 12 4.997260 2.104110 1.775342 1.347945 1.775342\n",
"m 316 11 4.177215 1.322785 2.123418 2.053797 1.322785\n",
"g 310 10 3.838710 0.387097 2.354839 2.161290 1.258065\n",
"b 280 9 5.753571 0.578571 1.864286 0.514286 0.289286\n",
"f 229 8 4.855895 0.384279 0.838428 1.467249 0.454148\n",
"k 210 7 0.866667 0.400000 0.400000 1.766667 3.566667\n",
"w 194 6 2.567010 1.360825 0.804124 0.773196 0.494845\n",
"v 152 5 1.513158 0.493421 1.611842 1.381579 0.000000\n",
"z 40 4 0.300000 0.200000 1.600000 1.500000 0.400000\n",
"x 37 3 0.000000 1.135135 0.972973 0.243243 0.648649\n",
"q 29 2 1.586207 0.344828 0.068966 0.000000 0.000000\n",
"j 27 1 0.740741 0.074074 0.111111 0.074074 0.000000"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# normalize the data\n",
"letter_position_df[[1,2,3,4,5]] = letter_position_df[[1,2,3,4,5]].div(letter_position_df[\"Frequency\"], axis=0)\n",
"# multiply by the letter score (favoring the more frequent letters)\n",
"letter_position_df[[1,2,3,4,5]] = letter_position_df[[1,2,3,4,5]].multiply(letter_position_df[\"Score\"], axis=0)\n",
"letter_position_df"
]
},
{
"cell_type": "markdown",
"id": "0761a6d1",
"metadata": {},
"source": [
"## Re-Score Words\n",
"We will now be re-scoring our words by adding up our letter values from our new letter frequency by position matrix."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0b495a8b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best Starting Word By Letter Location\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Words</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>sassy</th>\n",
" <td>53.720203</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sissy</th>\n",
" <td>51.552580</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sooty</th>\n",
" <td>46.390522</td>\n",
" </tr>\n",
" <tr>\n",
" <th>booby</th>\n",
" <td>43.711044</td>\n",
" </tr>\n",
" <tr>\n",
" <th>salsa</th>\n",
" <td>42.834590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sorry</th>\n",
" <td>42.761030</td>\n",
" </tr>\n",
" <tr>\n",
" <th>saucy</th>\n",
" <td>42.367328</td>\n",
" </tr>\n",
" <tr>\n",
" <th>soapy</th>\n",
" <td>42.122354</td>\n",
" </tr>\n",
" <tr>\n",
" <th>shiny</th>\n",
" <td>41.623038</td>\n",
" </tr>\n",
" <tr>\n",
" <th>booty</th>\n",
" <td>41.307267</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Score\n",
"Words \n",
"sassy 53.720203\n",
"sissy 51.552580\n",
"sooty 46.390522\n",
"booby 43.711044\n",
"salsa 42.834590\n",
"sorry 42.761030\n",
"saucy 42.367328\n",
"soapy 42.122354\n",
"shiny 41.623038\n",
"booty 41.307267"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Score words\n",
"# make a copy of the solutions df\n",
"solution_df2 = solutions_df.copy()\n",
"solution_df2[\"Score\"] = 0.0\n",
"\n",
"for word in solution_df2.index:\n",
" for letter in word:\n",
" solution_df2.at[word, \"Score\"] += letter_position_df.at[letter,word.index(letter)+1]\n",
"\n",
"solution_df2.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
"print(\"Best Starting Word By Letter Location\")\n",
"solution_df2.head(10)"
]
},
{
"cell_type": "markdown",
"id": "fd9d042e",
"metadata": {},
"source": [
"Now we have a list of the words most likely to get us a green tile on our first guess. These aren't really great guesses though, because they tend to repeat letters (i.e. sassy has strong odds of yielding a green tile, but likely not much else). \n",
"\n",
"Let's combine our results and see if we can't get a more optimal solution.\n",
"## Optimal Solution"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "cbe5f1d9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimal Starting Words\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Words</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>arose</th>\n",
" <td>148.701844</td>\n",
" </tr>\n",
" <tr>\n",
" <th>slate</th>\n",
" <td>148.101941</td>\n",
" </tr>\n",
" <tr>\n",
" <th>teary</th>\n",
" <td>147.951435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stare</th>\n",
" <td>147.867534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>crate</th>\n",
" <td>147.847753</td>\n",
" </tr>\n",
" <tr>\n",
" <th>trace</th>\n",
" <td>146.692172</td>\n",
" </tr>\n",
" <tr>\n",
" <th>raise</th>\n",
" <td>146.114680</td>\n",
" </tr>\n",
" <tr>\n",
" <th>arise</th>\n",
" <td>145.977557</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stale</th>\n",
" <td>145.911181</td>\n",
" </tr>\n",
" <tr>\n",
" <th>store</th>\n",
" <td>145.904106</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Score\n",
"Words \n",
"arose 148.701844\n",
"slate 148.101941\n",
"teary 147.951435\n",
"stare 147.867534\n",
"crate 147.847753\n",
"trace 146.692172\n",
"raise 146.114680\n",
"arise 145.977557\n",
"stale 145.911181\n",
"store 145.904106"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_df = solutions_df + solution_df2\n",
"\n",
"final_df.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
"print(\"Optimal Starting Words\")\n",
"final_df.head(10)"
]
},
{
"cell_type": "markdown",
"id": "2afd33b7",
"metadata": {},
"source": [
"## Clear Up Ambiguity from Letter Frequency Alone Results\n",
"Let's find `alert`, `alter`, and `later` from our first 'letter frequency alone' strategy and see if we can't clear up the 118 score tie."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "80c6931a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Score</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Words</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>alert</th>\n",
" <td>143.265872</td>\n",
" </tr>\n",
" <tr>\n",
" <th>alter</th>\n",
" <td>141.830978</td>\n",
" </tr>\n",
" <tr>\n",
" <th>later</th>\n",
" <td>142.894149</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Score\n",
"Words \n",
"alert 143.265872\n",
"alter 141.830978\n",
"later 142.894149"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_df.loc[['alert','alter','later']]"
]
},
{
"cell_type": "markdown",
"id": "7596c612",
"metadata": {},
"source": [
"There you have it, if you prefer a letter frequency strategy, `alert` gets the bump.\n",
"\n",
"Let's wrap this thing up.\n",
"# Conclusion\n",
"### Optimal Start Word\n",
"Arose\n",
"### Best Start Word by Letter Frequency Only\n",
"Alert\n",
"### Best Start Word by Letter Frequency in a Specific Position Only\n",
"Sassy"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment