"outputs": [
"output_type": "display_data",
"data": {
"text/html": [
" <input type=\"file\" id=\"files-014046c1-ce74-4a65-bc7c-70c1a9929292\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-014046c1-ce74-4a65-bc7c-70c1a9929292\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
"text/plain": [
"<IPython.core.display.HTML object>"
"output_type": "stream",
"text": [
"Saving key.json to key.json\n",
"name": "stdout"
"cell_type": "markdown",
"source": [
"I'll install the API"
"cell_type": "code",
"source": [
"!pip install openai\n",
"import openai, json, pandas as pd, numpy as np"
"execution_count": null,
"cell_type": "markdown",
"source": [
Loading in key.json that I uploaded; I do this so I don't need to worry about accidently leaking creds if I share the colab (which I'm 99% sure is just a json file that won't expose them)
"cell_type": "code",
"source": [
openai.api_key = json.load(open("key.json", "r"))["key"]
"execution_count": null,
"outputs": []
"cell_type": "markdown",
"source": [
Default keyword arguments to pass the aPI
"cell_type": "code",
"source": [
#arguments to send the API
kwargs = {
"kwargs = {\n",
"cell_type": "markdown",
"source": [
Quick wrapper to automatically save prompts and responses sent for later analysis if needed
"cell_type": "code",
"source": [
prompt = """q: what is the capital of France
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
' Paris'
"cell_type": "code",
"source": [
kwargs["logprobs"] = 5
"cell_type": "code",
"source": [
r = openai.Completion.create(prompt=prompt, **kwargs)
"cell_type": "markdown",
"source": [
So here's all the logprobs for the subsequent tokens; it hit the stop (\n), generated a few moe followups but still stopped.
"cell_type": "code",
"source": [
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Paris</td>\n",
" <td>-0.828964</td>\n",
" <td>{' par': -1.6102142, ' Par': -4.235214, ' PAR'...</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>\\n</td>\n",
" <td>-0.364414</td>\n",
" <td>{',': -3.1456642, '.': -2.6144142, '\n",
"': -0.364...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>q</td>\n",
" <td>-1.213570</td>\n",
" <td>{'\n",
"': -1.5885696, 'The': -4.2291946, 'b': -2.4...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>:</td>\n",
" <td>-0.004189</td>\n",
" <td>{' :': -7.0354385, '.': -7.0354385, '1': -8.53...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>what</td>\n",
" <td>-0.479179</td>\n",
" <td>{' What': -2.2916794, ' who': -3.4791794, ' wh...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>is</td>\n",
" <td>-0.297340</td>\n",
" <td>{' country': -4.4223404, ' color': -4.0473404,...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>the</td>\n",
" <td>-0.146500</td>\n",
" <td>{' a': -4.0527496, ' the': -0.14649963, ' 1': ...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>capital</td>\n",
" <td>-0.774006</td>\n",
" <td>{' name': -3.586506, ' color': -3.867756, ' ca...</td>\n",
" <td>41</td>\n",
" </tr>\n",
" </tbody>\n",
"cell_type": "markdown",
"source": [
we can look more at the possibilites it considered for paris, converting the logprobs to % by taking e**logprob
Paris wins with 43%, although it almost went par
"Paris wins with 43%, although it almost went par"
"cell_type": "code",
"source": [
"scores = pd.DataFrame([r[\"choices\"][0][\"logprobs\"][\"top_logprobs\"][0]]).T\n",
"scores.columns = [\"logprob\"]\n",
"scores[\"%\"] = scores[\"logprob\"].apply(lambda x: 100*np.e**x)\n",
"cell_type": "markdown",
"source": [
We can see if we increase the temperature, it takes non-optimal answers. However, it still tries to complete the task and eventually makes it back to Paris (although that's not guaranteed)
"cell_type": "code",
"source": [
"kwargs[\"temperature\"] = 1.2\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
"prompt = \"\"\"These word rhyme:\n",
"kwargs[\"logprobs\"] = 10\n",
"kwargs[\"max_tokens\"] = 20\n",
"kwargs[\"temperature\"] = 0\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
"scores = pd.DataFrame([r[\"choices\"][0][\"logprobs\"][\"top_logprobs\"][0]]).T\n",
"scores.columns = [\"logprob\"]\n",
"scores[\"%\"] = scores[\"logprob\"].apply(lambda x: 100*np.e**x)\n",
"scores.sort_values(by=\"%\", ascending=False)"
"cell_type": "code",
"source": [
"prompt = \"\"\"These pairs of sentences rhyme:\n",
"My favorite color is red\n",
"ends with: \"red\"\n",
"\"red\" rhymes with \"bed\"\n",
"Rhyme: It's the color of my bed\n",
"I once had a dog\n",
"ends with: \"dog\"\n",
"\"dog\" rhymes with \"frog\"\n",
"Rhyme: That good boy ate a frog\n",
"I wish I was small\n",
"ends with: \"small\"\n",
"\"small\" rhymes with \"tall\"\n",
"Rhyme: Instead I'm so tall ='(\n",
"That's a cool train\n",
"ends with:\"\"\"\n",
"kwargs[\"logprobs\"] = 5\n",
"kwargs[\"max_tokens\"] = 40\n",
"kwargs[\"temperature\"] = 0\n",
"kwargs[\"stop\"] = \"-----\"\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
' "train"\n"train" rhymes with "rain"\nRhyme: I like to ride the rain\n'
"cell_type": "code",
"source": [
"scores = pd.DataFrame([r[\"choices\"][0][\"logprobs\"][\"top_logprobs\"][11]]).T\n",
"scores.columns = [\"logprob\"]\n",
"scores[\"%\"] = scores[\"logprob\"].apply(lambda x: 100*np.e**x)\n",
"scores.sort_values(by=\"%\", ascending=False)"
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
pd.DataFrame([r["choices"][0]["logprobs"]["top_logprobs"][23]]).apply(lambda x: 100*np.e**x)
"cell_type": "code",
"source": [
rhymed = pd.DataFrame(r["choices"][0]["logprobs"])[18:]
"cell_type": "markdown",
"source": [
What if we make this more creative
"cell_type": "code",
"source": [
"prompt = \"\"\"These pairs of sentences rhyme:\n",
"My favorite color is red\n",
"ends with: \"red\"\n",
"\"red\" rhymes with \"bed\"\n",
"Rhyme: It's the color of my bed\n",
"I once had a dog\n",
"ends with: \"dog\"\n",
"\"dog\" rhymes with \"frog\"\n",
"Rhyme: That good boy ate a frog\n",
"I wish I was small\n",
"ends with: \"small\"\n",
"\"small\" rhymes with \"tall\"\n",
"Rhyme: Instead I'm so tall ='(\n",
"That's a cool train\n",
"ends with:\"\"\"\n",
"kwargs[\"logprobs\"] = 5\n",
"kwargs[\"max_tokens\"] = 40\n",
"kwargs[\"temperature\"] = .5\n",
"kwargs[\"stop\"] = \"-----\"\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
' "train"\n"train" rhymes with "plane"\nRhyme: It\'s not a plane\n'
"cell_type": "code",
"source": [
df = pd.DataFrame(r["choices"][0]["logprobs"])[18:]
"cell_type": "code",
"source": [
"def getTopValueFromDict(someDict):\n",
" myDict = dict(someDict)\n",
" vals = [(myDict[x], x) for x in myDict]\n",
" return max(vals)"
"cell_type": "code",
"source": [
df["actual_top_logprob"] = df.top_logprobs.apply(lambda x: getTopValueFromDict(x))
"cell_type": "code",
df
"cell_type": "code",
"source": [
rhyming_pt5 = df.copy()
"cell_type": "markdown",
"source": [
How does the logprobs for the bad compare to logprobs for good?
"cell_type": "code",
"source": [
"cell_type": "code",
"source": [
"prompt = \"\"\"These pairs of sentences rhyme:\n",
"My favorite color is red\n",
"ends with: \"red\"\n",
"\"red\" rhymes with \"bed\"\n",
"Rhyme: It's the color of my bed\n",
"I once had a dog\n",
"ends with: \"dog\"\n",
"\"dog\" rhymes with \"frog\"\n",
"Rhyme: That good boy ate a frog\n",
"I wish I was small\n",
"ends with: \"small\"\n",
"\"small\" rhymes with \"tall\"\n",
"Rhyme: Instead I'm so tall ='(\n",
"That's a cool train\n",
"ends with:\"\"\"\n",
"kwargs[\"logprobs\"] = 5\n",
"kwargs[\"max_tokens\"] = 40\n",
"kwargs[\"temperature\"] = .5\n",
"kwargs[\"stop\"] = \"-----\"\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
' "train"\n"train" rhymes with "rain"\nRhyme: The rain is so cool\n'
"cell_type": "code",
"source": [
df = pd.DataFrame(r["choices"][0]["logprobs"])[18:]
"cell_type": "code",
"source": [
"def getTopValueFromDict(someDict):\n",
" myDict = dict(someDict)\n",
" vals = [(myDict[x], x) for x in myDict]\n",
" return max(vals)"
"cell_type": "code",
"source": [
df["actual_top_logprob"] = df.top_logprobs.apply(lambda x: getTopValueFromDict(x))
"cell_type": "code",
df
"cell_type": "code",
"source": [
bad_pt5 = df.copy()
"cell_type": "markdown",
"source": [
K, real quick, how do the average logprobs compare? The highest logprob average rhymes! So this is a good indication that an average high logprob will be the correct answer
"cell_type": "code",
"source": [
-1.3439806428571428
"cell_type": "code",
"source": [
-1.4363706428571428
"cell_type": "code",
"source": [
-1.0922275
"cell_type": "markdown",
"source": [
Cool! This actually how best_of works; for instance, let's get n=10 at temp=.5
"cell_type": "code",
"source": [
"prompt = \"\"\"These pairs of sentences rhyme:\n",
"My favorite color is red\n",
"ends with: \"red\"\n",
"\"red\" rhymes with \"bed\"\n",
"Rhyme: It's the color of my bed\n",
"I once had a dog\n",
"ends with: \"dog\"\n",
"\"dog\" rhymes with \"frog\"\n",
"Rhyme: That good boy ate a frog\n",
"I wish I was small\n",
"ends with: \"small\"\n",
"\"small\" rhymes with \"tall\"\n",
"Rhyme: Instead I'm so tall ='(\n",
"That's a cool train\n",
"ends with:\"\"\"\n",
"kwargs[\"logprobs\"] = 5\n",
"kwargs[\"max_tokens\"] = 40\n",
"kwargs[\"temperature\"] = .5\n",
"kwargs[\"stop\"] = \"-----\"\n",
"kwargs[\"n\"] = 10\n",
"r = openai.Completion.create(prompt=prompt, **kwargs)"
"cell_type": "code",
"source": [
texts = [r["choices"][i]["text"].split("\n")[-2][7:] for i in range(10)]
"cell_type": "code",
"source": [
"logprobs = []\n",
"for i in range(10):\n",
" df = pd.DataFrame(r[\"choices\"][i][\"logprobs\"])[18:]\n",
" df[\"actual_top_logprob\"] = df.top_logprobs.apply(lambda x: getTopValueFromDict(x))\n",
" logprobs.append(df[:df.tokens.to_list().index(\"\\n\")].token_logprobs.mean())"
"cell_type": "code",
"source": [
"df = pd.DataFrame([texts]).T\n",
"df[\"logprob\"] = logprobs\n",
"df[\"%\"] = df.logprob.apply(lambda x: 100*np.e**x)"
"cell_type": "code",
"source": [
df.sort_values(by="logprob", ascending=False)
"cell_type": "markdown",
"source": [
So the problem now is that the average logprob isn't even the best! We'll skin that cat later, but for now, what if we also get rid of the repitition. This might
"cell_type": "code",
"source": [
"kwargs[\"logprobs\"] = 1\n",
"kwargs[\"max_tokens\"] = 40\n",
"kwargs[\"temperature\"] = .5\n",
"kwargs[\"stop\"] = \"-----\"\n",
"kwargs[\"n\"] = 5\n",
"kwargs[\"frequency_penalty\"] = .1\n"
"cell_type": "code",
"source": [
"rslts = []\n",
"for frequency_penalty in range(-5, 6):\n",
" print(frequency_penalty)\n",
" kwargs[\"frequency_penalty\"] = np.round(.1 * frequency_penalty, 1)\n",
" r = openai.Completion.create(prompt=prompt, **kwargs)\n",
" texts = [r[\"choices\"][i][\"text\"].split(\"\\n\")[-2][7:] for i in range(5)]\n",
" logprobs = []\n",
" for i in range(5):\n",
" df = pd.DataFrame(r[\"choices\"][i][\"logprobs\"])[18:]\n",
" df[\"actual_top_logprob\"] = df.top_logprobs.apply(lambda x: getTopValueFromDict(x))\n",
" logprobs.append(df[:df.tokens.to_list().index(\"\\n\")].token_logprobs.mean())\n",
" df = pd.DataFrame([texts]).T\n",
" df.columns=[\"text\"]\n",
" df[\"logprob\"] = logprobs\n",
" df[\"%\"] = df.logprob.apply(lambda x: 100*np.e**x)\n",
" df.sort_values(by=\"logprob\", ascending=False)\n",
" df[\"frequency_penalty\"] = np.round(.1 * frequency_penalty, 1)\n",
" rslts.append(df.copy())"
"cell_type": "markdown",
"source": [
"Now, one of the big things we should realize is that changing the penalty likely influences the absolute value of the logprobs; \"that' a cool rain\" has basically the same logprob at .3 for some reason, but it drops off significantly at -.3 and .5."
"cell_type": "code",
"source": [
"pd.concat(rslts).sort_values(by=\"%\", ascending=False)"
