markbyrne/wordle_analysis.ipynb

## wordle_analysis.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "aea5486d",
   "metadata": {},
   "source": [
    "##### Current as of: 20 February 2022\n",
    "# WORDLE Dictionary Analysis\n",
    "## Install Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ae65bdc9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: requests in /opt/anaconda3/lib/python3.9/site-packages (2.26.0)\n",
      "Requirement already satisfied: bs4 in /opt/anaconda3/lib/python3.9/site-packages (0.0.1)\n",
      "Requirement already satisfied: pandas in /opt/anaconda3/lib/python3.9/site-packages (1.3.4)\n",
      "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (1.26.7)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (3.2)\n",
      "Requirement already satisfied: charset-normalizer~=2.0.0 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (2.0.4)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/lib/python3.9/site-packages (from requests) (2021.10.8)\n",
      "Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/lib/python3.9/site-packages (from bs4) (4.10.0)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2017.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (2021.3)\n",
      "Requirement already satisfied: numpy>=1.17.3 in /opt/anaconda3/lib/python3.9/site-packages (from pandas) (1.20.3)\n",
      "Requirement already satisfied: six>=1.5 in /opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)\n",
      "Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/lib/python3.9/site-packages (from beautifulsoup4->bs4) (2.2.1)\n"
     ]
    }
   ],
   "source": [
    "!pip3 install requests bs4 pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acf4a20e",
   "metadata": {},
   "source": [
    "## Get WORDLE Dictionary Data\n",
    "### Overview\n",
    "WORDLE uses 2 dictionaries. A \"solutions\" dictionary of more commonly known words, and an \"other words\" dictionary, containing valid guesses, but will not be the solution.\n",
    "\n",
    "To grab these arrays, we parse https://www.nytimes.com/games/wordle/index.html, and scrape for all Javascript src files.\n",
    "\n",
    "We want the main.js file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b5a1d0f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All script files found:\n",
      "['https://www.nytimes.com/games/wordle/main.4d41d2be.js', 'https://www.nytimes.com/games-assets/gdpr/cookie-notice-v2.1.2.min.js']\n",
      "\n",
      "Current main.js file is: https://www.nytimes.com/games/wordle/main.4d41d2be.js\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "from bs4 import BeautifulSoup as bs\n",
    "from urllib.parse import urljoin\n",
    "\n",
    "# URL of the web page you want to extract\n",
    "url = \"https://www.nytimes.com/games/wordle/index.html\"\n",
    "\n",
    "# initialize a session\n",
    "session = requests.Session()\n",
    "# set the User-agent as a regular browser\n",
    "session.headers[\"User-Agent\"] = \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36\"\n",
    "\n",
    "# get the HTML content\n",
    "html = session.get(url).content\n",
    "\n",
    "# parse HTML using beautiful soup\n",
    "soup = bs(html, \"html.parser\")\n",
    "\n",
    "# get the JavaScript files\n",
    "script_files = []\n",
    "main_script = None\n",
    "\n",
    "for script in soup.find_all(\"script\"):\n",
    "    if script.attrs.get(\"src\"):\n",
    "        # if the tag has the attribute 'src'\n",
    "        script_url = urljoin(url, script.attrs.get(\"src\"))\n",
    "        script_files.append(script_url)\n",
    "        if \"main\" in script_url:\n",
    "            main_script = script_url\n",
    "        \n",
    "main = session.get(main_script).content.decode()\n",
    "\n",
    "print(f\"All script files found:\\n{script_files}\")\n",
    "print()\n",
    "print(f\"Current main.js file is: {main_script}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c6f4309",
   "metadata": {},
   "source": [
    "### Parse the .js file\n",
    "Now that we have found the .js file, we need to parse through to grab the dictionary data.\n",
    "\n",
    "The 'solutions' dictionary is set to:\n",
    "```javascript \n",
    "var Ma=[]\n",
    "```\n",
    "and the 'other words' dictionary is set to:\n",
    "```javascript \n",
    "var Oa=[]\n",
    "```\n",
    "We will use `re.search()` with the regex `var Ma=\\[(.*?)\\]` to grab the solutions dictionary, and `Oa=\\[(.*?)\\]`to grab the other words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1c6be020",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Solutions: 2309\n",
      "Other valid words: 10638\n",
      "Full dictionary words: 12947\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "\n",
    "# get solutions\n",
    "match = re.search(\"var Ma=\\[(.*?)\\]\",main)\n",
    "solutions = match.groups()[0].replace(\"\\\"\",\"\").split(\",\")\n",
    "print(f\"Solutions: {len(solutions)}\")\n",
    "\n",
    "# get other valid words\n",
    "match = re.search(\"Oa=\\[(.*?)\\]\",main)\n",
    "other_words = match.groups()[0].replace(\"\\\"\",\"\").split(\",\")\n",
    "print(f\"Other valid words: {len(other_words)}\")\n",
    "\n",
    "# build full dictionary for later use\n",
    "dictionary = other_words + solutions\n",
    "print(f\"Full dictionary words: {len(dictionary)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fda0473f",
   "metadata": {},
   "source": [
    "NOTE: Solutions dictionary is only 2,309 words, where as the entire search space of valid words is 12,947 words. \n",
    "\n",
    "For the purposes of this analysis, we will only use the solution words.\n",
    "\n",
    "## Build the DataFrames\n",
    "Next, we will place the data into a pandas DataFrame and start our analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "52b0e845",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Words</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>cigar</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>rebut</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sissy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>humph</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>awake</th>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: [cigar, rebut, sissy, humph, awake]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "solutions_df = pd.DataFrame(data=solutions, columns=[\"Words\"])\n",
    "solutions_df.set_index(\"Words\", inplace=True)\n",
    "solutions_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d19a5eb4",
   "metadata": {},
   "source": [
    "### Determine Letter Frequency\n",
    "To determine which letters appear most frequent, we will combine all our words and count the letters.\n",
    "\n",
    "We give each letter a score of 26-1, from most frequent to least frequent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6545ae13",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Frequency</th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Letters</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>1230</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>975</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r</th>\n",
       "      <td>897</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>o</th>\n",
       "      <td>753</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>t</th>\n",
       "      <td>729</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>l</th>\n",
       "      <td>716</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>670</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>s</th>\n",
       "      <td>668</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>n</th>\n",
       "      <td>573</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>475</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u</th>\n",
       "      <td>466</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>y</th>\n",
       "      <td>424</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>393</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>h</th>\n",
       "      <td>387</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p</th>\n",
       "      <td>365</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>m</th>\n",
       "      <td>316</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>310</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>280</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>229</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>k</th>\n",
       "      <td>210</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>w</th>\n",
       "      <td>194</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>v</th>\n",
       "      <td>152</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>z</th>\n",
       "      <td>40</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x</th>\n",
       "      <td>37</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>q</th>\n",
       "      <td>29</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>27</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Frequency  Score\n",
       "Letters                  \n",
       "e             1230     26\n",
       "a              975     25\n",
       "r              897     24\n",
       "o              753     23\n",
       "t              729     22\n",
       "l              716     21\n",
       "i              670     20\n",
       "s              668     19\n",
       "n              573     18\n",
       "c              475     17\n",
       "u              466     16\n",
       "y              424     15\n",
       "d              393     14\n",
       "h              387     13\n",
       "p              365     12\n",
       "m              316     11\n",
       "g              310     10\n",
       "b              280      9\n",
       "f              229      8\n",
       "k              210      7\n",
       "w              194      6\n",
       "v              152      5\n",
       "z               40      4\n",
       "x               37      3\n",
       "q               29      2\n",
       "j               27      1"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from collections import Counter\n",
    "letter_counts = Counter(\"\".join(solutions_df.index))\n",
    "letter_frequency_df = pd.DataFrame(data=letter_counts.most_common(26), columns=[\"Letters\",\"Frequency\"])\n",
    "letter_frequency_df[\"Score\"] = range(26,0,-1)\n",
    "letter_frequency_df.set_index(\"Letters\", inplace=True)\n",
    "letter_frequency_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b08c37c",
   "metadata": {},
   "source": [
    "## Score the Words\n",
    "Now, we iterate through each word, and count up the score of each distinct letter. We are trying to find the words that will give us the best opportunity of scoring yellow tiles on our first guess, so it is more beneficial to favor words without repeat letters. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ac6a6fbf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Words</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>alert</th>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>alter</th>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>later</th>\n",
       "      <td>118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>arose</th>\n",
       "      <td>117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>irate</th>\n",
       "      <td>117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>stare</th>\n",
       "      <td>116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>arise</th>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>atone</th>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cater</th>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>crate</th>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Score\n",
       "Words       \n",
       "alert    118\n",
       "alter    118\n",
       "later    118\n",
       "arose    117\n",
       "irate    117\n",
       "stare    116\n",
       "arise    114\n",
       "atone    114\n",
       "cater    114\n",
       "crate    114"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find best words by letter frequency\n",
    "solutions_df[\"Score\"] = 0\n",
    "for word in solutions_df.index:\n",
    "    for letter in list(set(word)):\n",
    "        solutions_df.at[word,'Score'] += letter_frequency_df.loc[letter,'Score']\n",
    "solutions_df.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
    "\n",
    "solutions_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b0f7e0a",
   "metadata": {},
   "source": [
    "We find that `alert`, `alter`, and `later` tie for best word by letter frequency. That is pretty good, but the truly optimal guess would take letter position into account as well, to help boost our chances of hitting a coveted green tile. Let's see what we can do about that.\n",
    "## Build Out Letter Frequency by Position\n",
    "Next, let's re-count our letters, but this time we will keep track of what position they were in the word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d76506ae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Frequency</th>\n",
       "      <th>Score</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Letters</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>1230</td>\n",
       "      <td>26</td>\n",
       "      <td>111</td>\n",
       "      <td>320</td>\n",
       "      <td>208</td>\n",
       "      <td>244</td>\n",
       "      <td>347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>975</td>\n",
       "      <td>25</td>\n",
       "      <td>171</td>\n",
       "      <td>336</td>\n",
       "      <td>294</td>\n",
       "      <td>136</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r</th>\n",
       "      <td>897</td>\n",
       "      <td>24</td>\n",
       "      <td>130</td>\n",
       "      <td>283</td>\n",
       "      <td>178</td>\n",
       "      <td>132</td>\n",
       "      <td>174</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>o</th>\n",
       "      <td>753</td>\n",
       "      <td>23</td>\n",
       "      <td>46</td>\n",
       "      <td>321</td>\n",
       "      <td>248</td>\n",
       "      <td>92</td>\n",
       "      <td>46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>t</th>\n",
       "      <td>729</td>\n",
       "      <td>22</td>\n",
       "      <td>183</td>\n",
       "      <td>87</td>\n",
       "      <td>119</td>\n",
       "      <td>111</td>\n",
       "      <td>229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>l</th>\n",
       "      <td>716</td>\n",
       "      <td>21</td>\n",
       "      <td>98</td>\n",
       "      <td>206</td>\n",
       "      <td>134</td>\n",
       "      <td>156</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>670</td>\n",
       "      <td>20</td>\n",
       "      <td>40</td>\n",
       "      <td>216</td>\n",
       "      <td>263</td>\n",
       "      <td>143</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>s</th>\n",
       "      <td>668</td>\n",
       "      <td>19</td>\n",
       "      <td>381</td>\n",
       "      <td>20</td>\n",
       "      <td>83</td>\n",
       "      <td>170</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>n</th>\n",
       "      <td>573</td>\n",
       "      <td>18</td>\n",
       "      <td>44</td>\n",
       "      <td>96</td>\n",
       "      <td>137</td>\n",
       "      <td>172</td>\n",
       "      <td>124</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>475</td>\n",
       "      <td>17</td>\n",
       "      <td>225</td>\n",
       "      <td>41</td>\n",
       "      <td>51</td>\n",
       "      <td>132</td>\n",
       "      <td>26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u</th>\n",
       "      <td>466</td>\n",
       "      <td>16</td>\n",
       "      <td>37</td>\n",
       "      <td>191</td>\n",
       "      <td>163</td>\n",
       "      <td>74</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>y</th>\n",
       "      <td>424</td>\n",
       "      <td>15</td>\n",
       "      <td>6</td>\n",
       "      <td>24</td>\n",
       "      <td>35</td>\n",
       "      <td>3</td>\n",
       "      <td>356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>393</td>\n",
       "      <td>14</td>\n",
       "      <td>121</td>\n",
       "      <td>23</td>\n",
       "      <td>79</td>\n",
       "      <td>56</td>\n",
       "      <td>114</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>h</th>\n",
       "      <td>387</td>\n",
       "      <td>13</td>\n",
       "      <td>76</td>\n",
       "      <td>147</td>\n",
       "      <td>9</td>\n",
       "      <td>28</td>\n",
       "      <td>127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p</th>\n",
       "      <td>365</td>\n",
       "      <td>12</td>\n",
       "      <td>152</td>\n",
       "      <td>64</td>\n",
       "      <td>54</td>\n",
       "      <td>41</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>m</th>\n",
       "      <td>316</td>\n",
       "      <td>11</td>\n",
       "      <td>120</td>\n",
       "      <td>38</td>\n",
       "      <td>61</td>\n",
       "      <td>59</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>310</td>\n",
       "      <td>10</td>\n",
       "      <td>119</td>\n",
       "      <td>12</td>\n",
       "      <td>73</td>\n",
       "      <td>67</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>280</td>\n",
       "      <td>9</td>\n",
       "      <td>179</td>\n",
       "      <td>18</td>\n",
       "      <td>58</td>\n",
       "      <td>16</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>229</td>\n",
       "      <td>8</td>\n",
       "      <td>139</td>\n",
       "      <td>11</td>\n",
       "      <td>24</td>\n",
       "      <td>42</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>k</th>\n",
       "      <td>210</td>\n",
       "      <td>7</td>\n",
       "      <td>26</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>53</td>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>w</th>\n",
       "      <td>194</td>\n",
       "      <td>6</td>\n",
       "      <td>83</td>\n",
       "      <td>44</td>\n",
       "      <td>26</td>\n",
       "      <td>25</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>v</th>\n",
       "      <td>152</td>\n",
       "      <td>5</td>\n",
       "      <td>46</td>\n",
       "      <td>15</td>\n",
       "      <td>49</td>\n",
       "      <td>42</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>z</th>\n",
       "      <td>40</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>16</td>\n",
       "      <td>15</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x</th>\n",
       "      <td>37</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>12</td>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>q</th>\n",
       "      <td>29</td>\n",
       "      <td>2</td>\n",
       "      <td>23</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>27</td>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Frequency  Score    1    2    3    4    5\n",
       "Letters                                           \n",
       "e             1230     26  111  320  208  244  347\n",
       "a              975     25  171  336  294  136   38\n",
       "r              897     24  130  283  178  132  174\n",
       "o              753     23   46  321  248   92   46\n",
       "t              729     22  183   87  119  111  229\n",
       "l              716     21   98  206  134  156  122\n",
       "i              670     20   40  216  263  143    8\n",
       "s              668     19  381   20   83  170   14\n",
       "n              573     18   44   96  137  172  124\n",
       "c              475     17  225   41   51  132   26\n",
       "u              466     16   37  191  163   74    1\n",
       "y              424     15    6   24   35    3  356\n",
       "d              393     14  121   23   79   56  114\n",
       "h              387     13   76  147    9   28  127\n",
       "p              365     12  152   64   54   41   54\n",
       "m              316     11  120   38   61   59   38\n",
       "g              310     10  119   12   73   67   39\n",
       "b              280      9  179   18   58   16    9\n",
       "f              229      8  139   11   24   42   13\n",
       "k              210      7   26   12   12   53  107\n",
       "w              194      6   83   44   26   25   16\n",
       "v              152      5   46   15   49   42    0\n",
       "z               40      4    3    2   16   15    4\n",
       "x               37      3    0   14   12    3    8\n",
       "q               29      2   23    5    1    0    0\n",
       "j               27      1   20    2    3    2    0"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# copy letter_frequency_df\n",
    "letter_position_df = letter_frequency_df.copy()\n",
    "\n",
    "# add 5 new columns, set data to 0\n",
    "letter_position_df[[1,2,3,4,5]] = 0\n",
    "\n",
    "# count each letter and note position\n",
    "for word in solutions_df.index:\n",
    "    for letter in word:\n",
    "        letter_position_df.at[letter,word.index(letter)+1] += 1\n",
    "\n",
    "letter_position_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc8d7bdf",
   "metadata": {},
   "source": [
    "## Normalize and Create a New Score Matrix\n",
    "Next, we will normalize the letter counts we just found by dividing each position by the overall frequency of that letter. Following the normalization, we multiply each position by the overall letter score. Now, we have a matrix that favors more frequent words in more frequent positions when we rescore our words."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "308ace95",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Frequency</th>\n",
       "      <th>Score</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Letters</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>1230</td>\n",
       "      <td>26</td>\n",
       "      <td>2.346341</td>\n",
       "      <td>6.764228</td>\n",
       "      <td>4.396748</td>\n",
       "      <td>5.157724</td>\n",
       "      <td>7.334959</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>975</td>\n",
       "      <td>25</td>\n",
       "      <td>4.384615</td>\n",
       "      <td>8.615385</td>\n",
       "      <td>7.538462</td>\n",
       "      <td>3.487179</td>\n",
       "      <td>0.974359</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r</th>\n",
       "      <td>897</td>\n",
       "      <td>24</td>\n",
       "      <td>3.478261</td>\n",
       "      <td>7.571906</td>\n",
       "      <td>4.762542</td>\n",
       "      <td>3.531773</td>\n",
       "      <td>4.655518</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>o</th>\n",
       "      <td>753</td>\n",
       "      <td>23</td>\n",
       "      <td>1.405046</td>\n",
       "      <td>9.804781</td>\n",
       "      <td>7.575033</td>\n",
       "      <td>2.810093</td>\n",
       "      <td>1.405046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>t</th>\n",
       "      <td>729</td>\n",
       "      <td>22</td>\n",
       "      <td>5.522634</td>\n",
       "      <td>2.625514</td>\n",
       "      <td>3.591221</td>\n",
       "      <td>3.349794</td>\n",
       "      <td>6.910837</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>l</th>\n",
       "      <td>716</td>\n",
       "      <td>21</td>\n",
       "      <td>2.874302</td>\n",
       "      <td>6.041899</td>\n",
       "      <td>3.930168</td>\n",
       "      <td>4.575419</td>\n",
       "      <td>3.578212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>i</th>\n",
       "      <td>670</td>\n",
       "      <td>20</td>\n",
       "      <td>1.194030</td>\n",
       "      <td>6.447761</td>\n",
       "      <td>7.850746</td>\n",
       "      <td>4.268657</td>\n",
       "      <td>0.238806</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>s</th>\n",
       "      <td>668</td>\n",
       "      <td>19</td>\n",
       "      <td>10.836826</td>\n",
       "      <td>0.568862</td>\n",
       "      <td>2.360778</td>\n",
       "      <td>4.835329</td>\n",
       "      <td>0.398204</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>n</th>\n",
       "      <td>573</td>\n",
       "      <td>18</td>\n",
       "      <td>1.382199</td>\n",
       "      <td>3.015707</td>\n",
       "      <td>4.303665</td>\n",
       "      <td>5.403141</td>\n",
       "      <td>3.895288</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>475</td>\n",
       "      <td>17</td>\n",
       "      <td>8.052632</td>\n",
       "      <td>1.467368</td>\n",
       "      <td>1.825263</td>\n",
       "      <td>4.724211</td>\n",
       "      <td>0.930526</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>u</th>\n",
       "      <td>466</td>\n",
       "      <td>16</td>\n",
       "      <td>1.270386</td>\n",
       "      <td>6.557940</td>\n",
       "      <td>5.596567</td>\n",
       "      <td>2.540773</td>\n",
       "      <td>0.034335</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>y</th>\n",
       "      <td>424</td>\n",
       "      <td>15</td>\n",
       "      <td>0.212264</td>\n",
       "      <td>0.849057</td>\n",
       "      <td>1.238208</td>\n",
       "      <td>0.106132</td>\n",
       "      <td>12.594340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>393</td>\n",
       "      <td>14</td>\n",
       "      <td>4.310433</td>\n",
       "      <td>0.819338</td>\n",
       "      <td>2.814249</td>\n",
       "      <td>1.994911</td>\n",
       "      <td>4.061069</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>h</th>\n",
       "      <td>387</td>\n",
       "      <td>13</td>\n",
       "      <td>2.552972</td>\n",
       "      <td>4.937984</td>\n",
       "      <td>0.302326</td>\n",
       "      <td>0.940568</td>\n",
       "      <td>4.266150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>p</th>\n",
       "      <td>365</td>\n",
       "      <td>12</td>\n",
       "      <td>4.997260</td>\n",
       "      <td>2.104110</td>\n",
       "      <td>1.775342</td>\n",
       "      <td>1.347945</td>\n",
       "      <td>1.775342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>m</th>\n",
       "      <td>316</td>\n",
       "      <td>11</td>\n",
       "      <td>4.177215</td>\n",
       "      <td>1.322785</td>\n",
       "      <td>2.123418</td>\n",
       "      <td>2.053797</td>\n",
       "      <td>1.322785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>310</td>\n",
       "      <td>10</td>\n",
       "      <td>3.838710</td>\n",
       "      <td>0.387097</td>\n",
       "      <td>2.354839</td>\n",
       "      <td>2.161290</td>\n",
       "      <td>1.258065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>280</td>\n",
       "      <td>9</td>\n",
       "      <td>5.753571</td>\n",
       "      <td>0.578571</td>\n",
       "      <td>1.864286</td>\n",
       "      <td>0.514286</td>\n",
       "      <td>0.289286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>229</td>\n",
       "      <td>8</td>\n",
       "      <td>4.855895</td>\n",
       "      <td>0.384279</td>\n",
       "      <td>0.838428</td>\n",
       "      <td>1.467249</td>\n",
       "      <td>0.454148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>k</th>\n",
       "      <td>210</td>\n",
       "      <td>7</td>\n",
       "      <td>0.866667</td>\n",
       "      <td>0.400000</td>\n",
       "      <td>0.400000</td>\n",
       "      <td>1.766667</td>\n",
       "      <td>3.566667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>w</th>\n",
       "      <td>194</td>\n",
       "      <td>6</td>\n",
       "      <td>2.567010</td>\n",
       "      <td>1.360825</td>\n",
       "      <td>0.804124</td>\n",
       "      <td>0.773196</td>\n",
       "      <td>0.494845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>v</th>\n",
       "      <td>152</td>\n",
       "      <td>5</td>\n",
       "      <td>1.513158</td>\n",
       "      <td>0.493421</td>\n",
       "      <td>1.611842</td>\n",
       "      <td>1.381579</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>z</th>\n",
       "      <td>40</td>\n",
       "      <td>4</td>\n",
       "      <td>0.300000</td>\n",
       "      <td>0.200000</td>\n",
       "      <td>1.600000</td>\n",
       "      <td>1.500000</td>\n",
       "      <td>0.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x</th>\n",
       "      <td>37</td>\n",
       "      <td>3</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.135135</td>\n",
       "      <td>0.972973</td>\n",
       "      <td>0.243243</td>\n",
       "      <td>0.648649</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>q</th>\n",
       "      <td>29</td>\n",
       "      <td>2</td>\n",
       "      <td>1.586207</td>\n",
       "      <td>0.344828</td>\n",
       "      <td>0.068966</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>j</th>\n",
       "      <td>27</td>\n",
       "      <td>1</td>\n",
       "      <td>0.740741</td>\n",
       "      <td>0.074074</td>\n",
       "      <td>0.111111</td>\n",
       "      <td>0.074074</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Frequency  Score          1         2         3         4          5\n",
       "Letters                                                                      \n",
       "e             1230     26   2.346341  6.764228  4.396748  5.157724   7.334959\n",
       "a              975     25   4.384615  8.615385  7.538462  3.487179   0.974359\n",
       "r              897     24   3.478261  7.571906  4.762542  3.531773   4.655518\n",
       "o              753     23   1.405046  9.804781  7.575033  2.810093   1.405046\n",
       "t              729     22   5.522634  2.625514  3.591221  3.349794   6.910837\n",
       "l              716     21   2.874302  6.041899  3.930168  4.575419   3.578212\n",
       "i              670     20   1.194030  6.447761  7.850746  4.268657   0.238806\n",
       "s              668     19  10.836826  0.568862  2.360778  4.835329   0.398204\n",
       "n              573     18   1.382199  3.015707  4.303665  5.403141   3.895288\n",
       "c              475     17   8.052632  1.467368  1.825263  4.724211   0.930526\n",
       "u              466     16   1.270386  6.557940  5.596567  2.540773   0.034335\n",
       "y              424     15   0.212264  0.849057  1.238208  0.106132  12.594340\n",
       "d              393     14   4.310433  0.819338  2.814249  1.994911   4.061069\n",
       "h              387     13   2.552972  4.937984  0.302326  0.940568   4.266150\n",
       "p              365     12   4.997260  2.104110  1.775342  1.347945   1.775342\n",
       "m              316     11   4.177215  1.322785  2.123418  2.053797   1.322785\n",
       "g              310     10   3.838710  0.387097  2.354839  2.161290   1.258065\n",
       "b              280      9   5.753571  0.578571  1.864286  0.514286   0.289286\n",
       "f              229      8   4.855895  0.384279  0.838428  1.467249   0.454148\n",
       "k              210      7   0.866667  0.400000  0.400000  1.766667   3.566667\n",
       "w              194      6   2.567010  1.360825  0.804124  0.773196   0.494845\n",
       "v              152      5   1.513158  0.493421  1.611842  1.381579   0.000000\n",
       "z               40      4   0.300000  0.200000  1.600000  1.500000   0.400000\n",
       "x               37      3   0.000000  1.135135  0.972973  0.243243   0.648649\n",
       "q               29      2   1.586207  0.344828  0.068966  0.000000   0.000000\n",
       "j               27      1   0.740741  0.074074  0.111111  0.074074   0.000000"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# normalize the data\n",
    "letter_position_df[[1,2,3,4,5]] = letter_position_df[[1,2,3,4,5]].div(letter_position_df[\"Frequency\"], axis=0)\n",
    "# multiply by the letter score (favoring the more frequent letters)\n",
    "letter_position_df[[1,2,3,4,5]] = letter_position_df[[1,2,3,4,5]].multiply(letter_position_df[\"Score\"], axis=0)\n",
    "letter_position_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0761a6d1",
   "metadata": {},
   "source": [
    "## Re-Score Words\n",
    "We will now be re-scoring our words by adding up our letter values from our new letter frequency by position matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0b495a8b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Best Starting Word By Letter Location\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Words</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>sassy</th>\n",
       "      <td>53.720203</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sissy</th>\n",
       "      <td>51.552580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sooty</th>\n",
       "      <td>46.390522</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>booby</th>\n",
       "      <td>43.711044</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>salsa</th>\n",
       "      <td>42.834590</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sorry</th>\n",
       "      <td>42.761030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>saucy</th>\n",
       "      <td>42.367328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>soapy</th>\n",
       "      <td>42.122354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>shiny</th>\n",
       "      <td>41.623038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>booty</th>\n",
       "      <td>41.307267</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Score\n",
       "Words           \n",
       "sassy  53.720203\n",
       "sissy  51.552580\n",
       "sooty  46.390522\n",
       "booby  43.711044\n",
       "salsa  42.834590\n",
       "sorry  42.761030\n",
       "saucy  42.367328\n",
       "soapy  42.122354\n",
       "shiny  41.623038\n",
       "booty  41.307267"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Score words\n",
    "# make a copy of the solutions df\n",
    "solution_df2 = solutions_df.copy()\n",
    "solution_df2[\"Score\"] = 0.0\n",
    "\n",
    "for word in solution_df2.index:\n",
    "    for letter in word:\n",
    "        solution_df2.at[word, \"Score\"] += letter_position_df.at[letter,word.index(letter)+1]\n",
    "\n",
    "solution_df2.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
    "print(\"Best Starting Word By Letter Location\")\n",
    "solution_df2.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd9d042e",
   "metadata": {},
   "source": [
    "Now we have a list of the words most likely to get us a green tile on our first guess. These aren't really great guesses though, because they tend to repeat letters (i.e. sassy has strong odds of yielding a green tile, but likely not much else). \n",
    "\n",
    "Let's combine our results and see if we can't get a more optimal solution.\n",
    "## Optimal Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "cbe5f1d9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimal Starting Words\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Words</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>arose</th>\n",
       "      <td>148.701844</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>slate</th>\n",
       "      <td>148.101941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>teary</th>\n",
       "      <td>147.951435</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>stare</th>\n",
       "      <td>147.867534</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>crate</th>\n",
       "      <td>147.847753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>trace</th>\n",
       "      <td>146.692172</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>raise</th>\n",
       "      <td>146.114680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>arise</th>\n",
       "      <td>145.977557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>stale</th>\n",
       "      <td>145.911181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>store</th>\n",
       "      <td>145.904106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Score\n",
       "Words            \n",
       "arose  148.701844\n",
       "slate  148.101941\n",
       "teary  147.951435\n",
       "stare  147.867534\n",
       "crate  147.847753\n",
       "trace  146.692172\n",
       "raise  146.114680\n",
       "arise  145.977557\n",
       "stale  145.911181\n",
       "store  145.904106"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_df = solutions_df + solution_df2\n",
    "\n",
    "final_df.sort_values(by=['Score','Words'],inplace=True, ascending=[False,True])\n",
    "print(\"Optimal Starting Words\")\n",
    "final_df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2afd33b7",
   "metadata": {},
   "source": [
    "## Clear Up Ambiguity from Letter Frequency Alone Results\n",
    "Let's find `alert`, `alter`, and `later` from our first 'letter frequency alone' strategy and see if we can't clear up the 118 score tie."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "80c6931a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Score</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Words</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>alert</th>\n",
       "      <td>143.265872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>alter</th>\n",
       "      <td>141.830978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>later</th>\n",
       "      <td>142.894149</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Score\n",
       "Words            \n",
       "alert  143.265872\n",
       "alter  141.830978\n",
       "later  142.894149"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_df.loc[['alert','alter','later']]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7596c612",
   "metadata": {},
   "source": [
    "There you have it, if you prefer a letter frequency strategy, `alert` gets the bump.\n",
    "\n",
    "Let's wrap this thing up.\n",
    "# Conclusion\n",
    "### Optimal Start Word\n",
    "Arose\n",
    "### Best Start Word by Letter Frequency Only\n",
    "Alert\n",
    "### Best Start Word by Letter Frequency in a Specific Position Only\n",
    "Sassy"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}