Skip to content

Instantly share code, notes, and snippets.

@brockmanmatt
Created August 10, 2020 04:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brockmanmatt/4e499f64fed585eff05219a1343c7886 to your computer and use it in GitHub Desktop.
Save brockmanmatt/4e499f64fed585eff05219a1343c7886 to your computer and use it in GitHub Desktop.
GPT_ANLI_PASS2_E->N_20200809.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "GPT_ANLI_PASS2_E->N_20200809.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyOX1HG8jJZ1d2CVNWNI/ukq",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/brockmanmatt/4e499f64fed585eff05219a1343c7886/gpt_anli_pass2_e-n_20200809.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "J7wnsgT2kPut",
"colab_type": "code",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 89
},
"outputId": "7273226a-7c41-4c61-eae2-b14a9a2595cc"
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()\n",
"print(\"done\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <input type=\"file\" id=\"files-947f2640-807b-4bc0-9168-f058798b3504\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-947f2640-807b-4bc0-9168-f058798b3504\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving key.json to key.json\n",
"done\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WHPHrUnhpKnI",
"colab_type": "text"
},
"source": [
"I'll install the API"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zq0ltp2xn4yt",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 292
},
"outputId": "7b5e589f-4cd5-4546-b9fa-c9f9409d8042"
},
"source": [
"!pip install openai\n",
"import openai, json, pandas as pd"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting openai\n",
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/a8/65/c7461f4c87984534683f480ea5742777bc39bbf5721123194c2d0347dc1f/openai-0.2.4.tar.gz (157kB)\n",
"\u001b[K |████████████████████████████████| 163kB 2.6MB/s \n",
"\u001b[?25hRequirement already satisfied: requests>=2.20 in /usr/local/lib/python3.6/dist-packages (from openai) (2.23.0)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20->openai) (2.10)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20->openai) (1.24.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20->openai) (2020.6.20)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20->openai) (3.0.4)\n",
"Building wheels for collected packages: openai\n",
" Building wheel for openai (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for openai: filename=openai-0.2.4-cp36-none-any.whl size=170709 sha256=06c62ee79d6d96abb438da6026e3abc07792154518fdf71f71c6328fa7a2c871\n",
" Stored in directory: /root/.cache/pip/wheels/74/96/c8/c6e170929c276b836613e1b9985343b501fe455e53d85e7d48\n",
"Successfully built openai\n",
"Installing collected packages: openai\n",
"Successfully installed openai-0.2.4\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q2yE0jcnpMEV",
"colab_type": "text"
},
"source": [
"Loading in key.json that I uploaded; I do this so I don't need to worry about accidently leaking creds if I share the colab (which I'm 99% sure is just a json file that won't expose them)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bwNXXwHen5x9",
"colab_type": "code",
"colab": {}
},
"source": [
"openai.api_key = json.load(open(\"key.json\", \"r\"))[\"key\"]"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "k67w5H0fpTkT",
"colab_type": "text"
},
"source": [
"Default keyword arguments to pass the aPI"
]
},
{
"cell_type": "code",
"metadata": {
"id": "e1EwpqqJkTYh",
"colab_type": "code",
"colab": {}
},
"source": [
"#arguments to send the API\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "zZubgPoOpWDH",
"colab_type": "text"
},
"source": [
"Quick wrapper to automatically save prompts and responses sent for later analysis if needed"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sXTDJx0An9Bl",
"colab_type": "code",
"colab": {}
},
"source": [
"import datetime\n",
"def query(prompt, myKwargs = {}):\n",
" \"\"\"\n",
" wrapper for the API to save the prompt and the result\n",
" \"\"\"\n",
" kwargs = {\n",
" \"engine\":\"davinci\",\n",
" \"temperature\":0,\n",
" \"max_tokens\":250,\n",
" \"stop\":\"\\n\\n\",\n",
" }\n",
" for kw in myKwargs:\n",
" kwargs[kw] = myKwargs[kw]\n",
"\n",
" r = openai.Completion.create(prompt=prompt, **kwargs)[\"choices\"][0][\"text\"].strip()\n",
" return r"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdFXafcJpZ3Q",
"colab_type": "text"
},
"source": [
"Test to make sure my query works"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4SlyKgjyopPn",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 52
},
"outputId": "f15e9b90-1394-4ddb-e6e2-f90d81aaf76e"
},
"source": [
"query(\"q: what is 1+1?\\na:\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'2\\nq: what is 2+2?\\na: 4\\nq: what is 3+3?\\na: 6\\nq: what is 4+4?\\na: 8\\nq: what is 5+5?\\na: 10\\nq: what is 6+6?\\na: 12\\nq: what is 7+7?\\na: 14\\nq: what is 8+8?\\na: 16\\nq: what is 9+9?\\na: 18\\nq: what is 10+10?\\na: 20'"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ybg_8ieJzWcW",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "773fd1b3-6acf-4690-f86c-23bbfe3504e5"
},
"source": [
"query(\"q: what is 1+1?\\na:\", myKwargs = {\"stop\":\"\\n\"})"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'2'"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "nrAGEs9CAbvc",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"outputId": "8b50b1a4-719e-4584-cbba-39759aaa77f1"
},
"source": [
"!wget https://dl.fbaipublicfiles.com/anli/anli_v1.0.zip"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"--2020-08-09 20:29:43-- https://dl.fbaipublicfiles.com/anli/anli_v1.0.zip\n",
"Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...\n",
"Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 18708061 (18M) [application/zip]\n",
"Saving to: ‘anli_v1.0.zip’\n",
"\n",
"anli_v1.0.zip 100%[===================>] 17.84M 12.6MB/s in 1.4s \n",
"\n",
"2020-08-09 20:29:45 (12.6 MB/s) - ‘anli_v1.0.zip’ saved [18708061/18708061]\n",
"\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "s-hW03pvAc-z",
"colab_type": "code",
"colab": {}
},
"source": [
"import zipfile\n",
"with zipfile.ZipFile(\"anli_v1.0.zip\",\"r\") as zip_ref:\n",
" zip_ref.extractall(\".\")\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "RwTUbq44AgNp",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "63d67dde-31c8-4b4f-d9f5-548cc239ac79"
},
"source": [
"ls anli_v1.0/R1"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"dev.jsonl test.jsonl train.jsonl\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "C-Moa3AuA0it",
"colab_type": "code",
"colab": {}
},
"source": [
"train = pd.read_json(\"anli_v1.0/R3/train.jsonl\",lines=True)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "UcoUhiblq8ql",
"colab_type": "text"
},
"source": [
"# I just want to see if I can do CONTRADICTS vs. NOT CONTRADICTS"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3OZAx9FeC3wA",
"colab_type": "text"
},
"source": [
"Get some variance in the type of question"
]
},
{
"cell_type": "code",
"metadata": {
"id": "tyug_qYcFn_Z",
"colab_type": "code",
"colab": {}
},
"source": [
"dev = pd.read_json(\"anli_v1.0/R3/dev.jsonl\",lines=True)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "V56PJ4vZH0N-",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 122
},
"outputId": "02909965-0b8e-4dd7-9330-8db360297523"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly\n",
"\n",
"Enter your authorization code:\n",
"··········\n",
"Mounted at /content/drive\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "5waJPsoxrccs",
"colab_type": "code",
"colab": {}
},
"source": [
"labeled = pd.read_pickle(\"/content/drive/My Drive/GPT3_Benchmarking/ANLI_EC_LABELED4225.pkl\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LRVR4aRmriwl",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 182
},
"outputId": "8fb49a39-c4e6-4aa9-a11a-03a2e39a67d5"
},
"source": [
"labeled[:1]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>uid</th>\n",
" <th>context</th>\n",
" <th>hypothesis</th>\n",
" <th>label</th>\n",
" <th>model_label</th>\n",
" <th>emturk</th>\n",
" <th>genre</th>\n",
" <th>reason</th>\n",
" <th>tag</th>\n",
" <th>r</th>\n",
" <th>r2</th>\n",
" <th>pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>50a5d4bc-bb8b-44f9-ba07-95dfaf5536ab</td>\n",
" <td>one of the orders issued by Ochola in April Lo...</td>\n",
" <td>The decision to move the photocopier business ...</td>\n",
" <td>e</td>\n",
" <td>n</td>\n",
" <td>False</td>\n",
" <td>news</td>\n",
" <td>I made an obvious inference from the text that...</td>\n",
" <td>r3_dev</td>\n",
" <td>'You generally got it!, The decision was made ...</td>\n",
" <td>understood</td>\n",
" <td>e</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" uid ... pred\n",
"0 50a5d4bc-bb8b-44f9-ba07-95dfaf5536ab ... e\n",
"\n",
"[1 rows x 12 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 138
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "IooruU5BrrEv",
"colab": {}
},
"source": [
"dev2 = labeled.copy()\n",
"dev2[\"pred\"] = dev2[\"r2\"].apply(lambda x: \"e\" if x == \"understood\" else \"c\")\n",
"\n",
"for row in dev2.iterrows():\n",
" break"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "MqJq4d0vrrEx",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "299a2b9e-7edc-4297-904f-77e0986fa46f"
},
"source": [
"len(labeled)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1200"
]
},
"metadata": {
"tags": []
},
"execution_count": 140
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "gqBx9sLBrrE0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "611e2fae-ed6a-481d-f849-9b8633841685"
},
"source": [
"(dev2[\"pred\"]==dev2[\"label\"]).sum()/len(dev2)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4225"
]
},
"metadata": {
"tags": []
},
"execution_count": 141
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "q8b7eSxYBw60",
"colab_type": "text"
},
"source": [
"# k, so that didn't work, what if go with the ENTRY/UPDATE"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ebBj9TLb_ABx",
"colab_type": "code",
"colab": {}
},
"source": [
"myPositives = [13, 503, 1000]\n",
"myExamples = []\n",
"\n",
"for row in myPositives:\n",
" myExamples.append(train[train.label==\"n\"].reset_index().loc()[row])\n",
" myExamples.append(train[train.label==\"c\"].reset_index().loc()[row])\n",
" myExamples.append(train[train.label==\"e\"].reset_index().loc()[row])\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "VeCvoYEqCrTZ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 802
},
"outputId": "7931ba6e-7aa5-4107-bcbe-3dc0f217b897"
},
"source": [
"for row in myExamples:\n",
" print(\"Entry '{}'\".format(row[\"context\"]))\n",
" print(\"Update '{}'\".format(row[\"hypothesis\"]))\n",
" print(\"Topic''\")\n",
" print(\"Jerry: {}\".format(row[\"label\"]))\n",
" print()"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Entry 'The four-time world champion has established himself as one of the world’s greatest ever drivers since he broke onto the scene in 2007. But his outspoken character has often got him in trouble off the track, most recently with Max Verstappen. Hamilton’s former team-mate Jenson Button even said he was “weird” to work with. Rob Wilson – who has taught over half of the current F1 circuit how to drive faster – also believes there is something different about the Mercedes ace.'\n",
"Update 'many many people thing the driver is weird'\n",
"Topic''\n",
"Jerry: n\n",
"\n",
"Entry 'Burks (shoulder) is inactive for Sunday's game against the Vikings, VP of Communications for the Packers Jason Wahlers reports. Burks will continue waiting for his first regular season action, and he'll set his sights on a Week 3 matchup with the Redskins. In his place, a mix of Antonio Morrison and James Crawford will fill in at inside linebacker against a strong Vikings' running game.'\n",
"Update 'On Sunday the game against the Viking's will include Morrison and Crawford filling in for Burks as opposed to the previous game (week 3) against the Redskins'\n",
"Topic''\n",
"Jerry: c\n",
"\n",
"Entry 'Join Mix 106.5 this Sunday from 11am to 1pm at the Giant on Daybreak Circle in Clarksville and help a family in need this holiday season! Giant's \"Food for Families\" drive is in full swing – just drop off non-perishable items in our bins at the front of the store. From canned cranberry sauce and pumpkin to boxed stuffing – all items will be donated to local Feeding America Food banks, ensuring that local families have a happy Thanksgiving. The food drive ends on on November 23rd so be sure to donate early!'\n",
"Update 'Mix 106.5 want to do something to help poor families'\n",
"Topic''\n",
"Jerry: e\n",
"\n",
"Entry 'The Corn<br>Gary had a plot of land. He decided to plant some produce on it. He picked corn for the first crop. It grew tall and healthy. He harvested it and made some delicious food.'\n",
"Update 'Gary plants delicious fruits.'\n",
"Topic''\n",
"Jerry: n\n",
"\n",
"Entry 'It was all a mistake .<br>So Sir John told Grimes to go home , and promised him five shillings if he would bring the boy quietly up to him , without beating him , that he might be sure of the truth .<br>For he took for granted , and Grimes too , that Tom had made his way home .'\n",
"Update 'Sir John told Grimes to go to the corner market. '\n",
"Topic''\n",
"Jerry: c\n",
"\n",
"Entry 'Eric's Mistake<br>Eric was going on a road trip to Harper's Ferry. He set his GPS to the route, but the battery died part way through. Eric ended up making several wrong turns and getting very lost. Eric refused to ask for directions and ran out of gas. Eric had to walk 5 miles to a gas station to get more gas.'\n",
"Update 'eric made too many wrong turns'\n",
"Topic''\n",
"Jerry: e\n",
"\n",
"Entry 'How to operate a segway<br>Make sure your segway is fully charged by monitoring the battery indicator on the lcd panel. Ensure that the kickstand is fully functioning and holding the segway upright. Turn the segway on by quickly hitting the \" on/off \" button.'\n",
"Update 'The lcd panel holds the battery indicator and the on/off button.'\n",
"Topic''\n",
"Jerry: n\n",
"\n",
"Entry 'How to invite friends to like your page on facebook pages manager<br>Download and install the application. Facebook pages manager is available on both itunes app store and google play for android. Log in your personal facebook account.'\n",
"Update 'Facebook page manager is not available for apple devices. '\n",
"Topic''\n",
"Jerry: c\n",
"\n",
"Entry 'How to deal with your friend who is being jealous of your other friends<br>Invite your friend to join the group. This might feel hard if your friend has made occasions with your other friends uncomfortable in the past. However, stay positive.'\n",
"Update 'The agent suggest inviting the friends to hang out with your other friends'\n",
"Topic''\n",
"Jerry: e\n",
"\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wQML1_KHdkut",
"colab_type": "text"
},
"source": [
"## Try with just 1 of each"
]
},
{
"cell_type": "code",
"metadata": {
"id": "-WouyBopbseA",
"colab_type": "code",
"colab": {}
},
"source": [
"entryPrompt = \"\"\"Entry: 'The four-time world champion has established himself as one of the world’s greatest ever drivers since he broke onto the scene in 2007. But his outspoken character has often got him in trouble off the track, most recently with Max Verstappen. Hamilton’s former team-mate Jenson Button even said he was “weird” to work with. Rob Wilson – who has taught over half of the current F1 circuit how to drive faster – also believes there is something different about the Mercedes ace.'\n",
"Update: 'many many people thing the driver is weird'\n",
"Topic: Number of people who think the driver is weird\n",
"Entry Relevance: Former-team mate thinks he is weird to work with.\n",
"Comparison: Update covers how many people think driver is weird, entry only covers one person not others.\n",
"Alignment: Not covered\n",
"\n",
"Entry: 'Burks (shoulder) is inactive for Sunday's game against the Vikings, VP of Communications for the Packers Jason Wahlers reports. Burks will continue waiting for his first regular season action, and he'll set his sights on a Week 3 matchup with the Redskins. In his place, a mix of Antonio Morrison and James Crawford will fill in at inside linebacker against a strong Vikings' running game.'\n",
"Update: 'On Sunday the game against the Viking's will include Morrison and Crawford filling in for Burks as opposed to the previous game (week 3) against the Redskins'\n",
"Topic: Participation in Sunday game compared to previous week 3 game.\n",
"Entry Relevance: Burks is inactive, Morrison and Crawford wlll fill in, Burks will be back in week 3.\n",
"Comparison In entry, Burks is coming back for week 3. In update week 3 happened.\n",
"Alignment: Contadictory\n",
"\n",
"Entry: 'Join Mix 106.5 this Sunday from 11am to 1pm at the Giant on Daybreak Circle in Clarksville and help a family in need this holiday season! Giant's \"Food for Families\" drive is in full swing – just drop off non-perishable items in our bins at the front of the store. From canned cranberry sauce and pumpkin to boxed stuffing – all items will be donated to local Feeding America Food banks, ensuring that local families have a happy Thanksgiving. The food drive ends on on November 23rd so be sure to donate early!'\n",
"Update: 'Mix 106.5 want to do something to help poor families'\n",
"Topic: Mix 106.5 helping poor families\n",
"Entry Relevance: Mix 106.5 this sunday is helping families in need.\n",
"Comparison: In entry, Mix 106.5 is helping poor families. In update, Mix 106.5 is helping poor families.\n",
"Alignment: Agreement\n",
"\n",
"\"\"\""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "3AZDZuT7b3VY",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "XotOlx37hQqA",
"colab": {}
},
"source": [
"eval = train[train.label==\"e\"][1000:1010].copy()\n",
"for row in eval.iterrows():\n",
" payload = \"Entry: '{}'\\n\".format(row[1][\"context\"])\n",
" payload += \"Update: '{}'\\n\".format(row[1][\"hypothesis\"])\n",
" payload += \"Topic:\"\n",
" eval.at[row[0], \"r\"] = query(entryPrompt + payload)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "R-sCvpHnhQqE",
"colab": {}
},
"source": [
"eval2 = train[train.label==\"c\"][1000:1010].copy()\n",
"for row in eval2.iterrows():\n",
" payload = \"Entry: '{}'\\n\".format(row[1][\"context\"])\n",
" payload += \"Update: '{}'\\n\".format(row[1][\"hypothesis\"])\n",
" payload += \"Topic:\"\n",
" eval2.at[row[0], \"r\"] = query(entryPrompt + payload)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "3kzJd06PhQqG",
"colab": {}
},
"source": [
"eval3 = train[train.label==\"n\"][1000:1010].copy()\n",
"for row in eval3.iterrows():\n",
" payload = \"Entry: '{}'\\n\".format(row[1][\"context\"])\n",
" payload += \"Update: '{}'\\n\".format(row[1][\"hypothesis\"])\n",
" payload += \"Topic:\"\n",
" eval3.at[row[0], \"r\"] = query(entryPrompt + payload)\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "oPMGs14AhesI",
"colab_type": "code",
"colab": {}
},
"source": [
"eval[\"r2\"] = eval[\"r\"].apply(lambda x: x.split(\"\\n\")[-1].split()[-1])\n",
"eval2[\"r2\"] = eval2[\"r\"].apply(lambda x: x.split(\"\\n\")[-1].split()[-1])\n",
"eval3[\"r2\"] = eval3[\"r\"].apply(lambda x: x.split(\"\\n\")[-1].split()[-1])"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "LcUq7e57h3JX",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"outputId": "75a658a9-359d-40e5-c3ab-afc9bc1680ae"
},
"source": [
"eval3"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>uid</th>\n",
" <th>context</th>\n",
" <th>hypothesis</th>\n",
" <th>label</th>\n",
" <th>model_label</th>\n",
" <th>emturk</th>\n",
" <th>genre</th>\n",
" <th>reason</th>\n",
" <th>tag</th>\n",
" <th>r</th>\n",
" <th>r2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3337</th>\n",
" <td>cb642cfd-944c-4d72-8101-b552e3a7e0fd</td>\n",
" <td>How to operate a segway&lt;br&gt;Make sure your segw...</td>\n",
" <td>The lcd panel holds the battery indicator and ...</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>We don't know where the on/off button is. I th...</td>\n",
" <td>r3_train</td>\n",
" <td>How to operate a segway\\nEntry Relevance: The ...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3338</th>\n",
" <td>2d19af0a-c490-446a-9735-99935e894111</td>\n",
" <td>How to make korean sweet potato cake&lt;br&gt;Fill a...</td>\n",
" <td>more or less than 1.5 pounds can be used</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>not clear how many potatoes there needs to be</td>\n",
" <td>r3_train</td>\n",
" <td>Amount of korean sweet potato used\\nEntry Rele...</td>\n",
" <td>covered.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3339</th>\n",
" <td>ff6936d0-1719-4fc7-944e-5b967e956913</td>\n",
" <td>How to pebble a garden&lt;br&gt;Tour local gardens f...</td>\n",
" <td>no one wants pebbles do they</td>\n",
" <td>n</td>\n",
" <td>c</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>I mean they may or may not</td>\n",
" <td>r3_train</td>\n",
" <td>Pebbling a garden\\nEntry Relevance: Pebbling a...</td>\n",
" <td>Contradictory.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3340</th>\n",
" <td>59cf42b3-5a60-4eda-a9ef-549b44546d9d</td>\n",
" <td>How to clean potatoes&lt;br&gt;Start with clean hand...</td>\n",
" <td>Antibacterial soap cleans bacteria off potatoe...</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>I used the word MUST, but you don't have to ri...</td>\n",
" <td>r3_train</td>\n",
" <td>How to clean potatoes\\nEntry Relevance: How to...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3341</th>\n",
" <td>66707360-5358-4c51-958a-3d0f9374d7fb</td>\n",
" <td>How to write a creative marketing brief&lt;br&gt;Gat...</td>\n",
" <td>There is no time limit on the brief</td>\n",
" <td>n</td>\n",
" <td>c</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>A time limit has not been set for making the m...</td>\n",
" <td>r3_train</td>\n",
" <td>Time limit on the brief\\nEntry Relevance: Ther...</td>\n",
" <td>Contradictory.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3342</th>\n",
" <td>59bcb576-ae44-4756-96f8-c9f2d6738c5a</td>\n",
" <td>How to backpack in europe on a budget&lt;br&gt;Plan ...</td>\n",
" <td>Eastern Europe is much cheaper than staying in...</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>We know it's cheaper but we don't know if it's...</td>\n",
" <td>r3_train</td>\n",
" <td>How to backpack in Europe on a budget\\nEntry R...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3343</th>\n",
" <td>fe537aee-3364-47aa-8dd8-298a1d465476</td>\n",
" <td>How to use a water level&lt;br&gt;Use 50 to 100 feet...</td>\n",
" <td>A water level requires at least 7 different su...</td>\n",
" <td>n</td>\n",
" <td>c</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>The statement specifies that a long tube and a...</td>\n",
" <td>r3_train</td>\n",
" <td>How to use a water level\\nEntry Relevance: How...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3344</th>\n",
" <td>6a4b73c3-6d9c-453b-acaf-4a7a6367e349</td>\n",
" <td>How to make a christmas gift for a teacher&lt;br&gt;...</td>\n",
" <td>The teacher wants a Christmas gift from you.</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>Just because we make one doesn't mean the teac...</td>\n",
" <td>r3_train</td>\n",
" <td>Christmas gift for teacher\\nEntry Relevance: C...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3345</th>\n",
" <td>2e8aee8b-43ae-48ee-8456-1995c3244e3d</td>\n",
" <td>How to make the teacher think you are smart&lt;br...</td>\n",
" <td>You are not smart.</td>\n",
" <td>n</td>\n",
" <td>c</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>The scenario makes no assumptions about actual...</td>\n",
" <td>r3_train</td>\n",
" <td>How to make the teacher think you are smart\\nE...</td>\n",
" <td>Contradictory.</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3346</th>\n",
" <td>31490328-5d07-44a3-a164-1e122d63ad46</td>\n",
" <td>How to take legal action for player to officia...</td>\n",
" <td>If you call the police, they will investigate ...</td>\n",
" <td>n</td>\n",
" <td>e</td>\n",
" <td>False</td>\n",
" <td>procedural/causal</td>\n",
" <td>Who knows what the police will do? Questioning...</td>\n",
" <td>r3_train</td>\n",
" <td>How to take legal action for player to officia...</td>\n",
" <td>Agreement</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" uid ... r2\n",
"3337 cb642cfd-944c-4d72-8101-b552e3a7e0fd ... Agreement\n",
"3338 2d19af0a-c490-446a-9735-99935e894111 ... covered.\n",
"3339 ff6936d0-1719-4fc7-944e-5b967e956913 ... Contradictory.\n",
"3340 59cf42b3-5a60-4eda-a9ef-549b44546d9d ... Agreement\n",
"3341 66707360-5358-4c51-958a-3d0f9374d7fb ... Contradictory.\n",
"3342 59bcb576-ae44-4756-96f8-c9f2d6738c5a ... Agreement\n",
"3343 fe537aee-3364-47aa-8dd8-298a1d465476 ... Agreement\n",
"3344 6a4b73c3-6d9c-453b-acaf-4a7a6367e349 ... Agreement\n",
"3345 2e8aee8b-43ae-48ee-8456-1995c3244e3d ... Contradictory.\n",
"3346 31490328-5d07-44a3-a164-1e122d63ad46 ... Agreement\n",
"\n",
"[10 rows x 11 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 59
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "TY30vVh-h77E",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "e_ZBzuUamMua",
"colab_type": "text"
},
"source": [
"# EVALUATING"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Y1SnGacgmNT_",
"colab_type": "code",
"colab": {}
},
"source": [
"labeled = pd.read_pickle(\"/content/drive/My Drive/GPT3_Benchmarking/ANLI_dev_EN1.pkl\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "qHYi5rCsmel8",
"colab_type": "code",
"colab": {}
},
"source": [
"dev2 = labeled.copy()\n",
"dev2[\"pred\"] = dev2[\"r2\"].apply(lambda x: \"e\" if x == \"understood\" else \"c\")\n",
"\n",
"for row in dev2.iterrows():\n",
" break"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YxiU86cSmPi4",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "0b66ace4-56d9-4a99-9aad-f6f358b359d6"
},
"source": [
"len(labeled)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1200"
]
},
"metadata": {
"tags": []
},
"execution_count": 149
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "XgS8vyeWmZ4r",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "31ca2fe4-2227-40c2-f456-1a9b2a87ab07"
},
"source": [
"(dev2[\"pred\"]==dev2[\"label\"]).sum()/len(dev2)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4225"
]
},
"metadata": {
"tags": []
},
"execution_count": 156
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ESROt3KAlfJN",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 425
},
"outputId": "753e7daf-4b6b-43ad-e876-e9481b941bad"
},
"source": [
"for row in dev2.iterrows():\n",
" if row[0] % 50 == 0:\n",
" print(row[0])\n",
" labeled.to_pickle(\"/content/drive/My Drive/GPT3_Benchmarking/ANLI_dev_EN1.pkl\")\n",
" if row[1][\"pred\"] == \"c\":\n",
" dev2.at[row[0], \"pred2\"] = dev2.at[row[0], \"pred\"]\n",
" continue\n",
"\n",
" try:\n",
" a = row[1][\"r5\"]\n",
" if type(a) == str:\n",
" continue\n",
" except:\n",
" pass\n",
" \n",
" payload = \"Entry: '{}'\\n\".format(row[1][\"context\"])\n",
" payload += \"Update: '{}'\\n\".format(row[1][\"hypothesis\"])\n",
" payload += \"Topic:\"\n",
" r = query(entryPrompt + payload)\n",
" dev2.at[row[0], \"r5\"] = r\n",
" lastWord = r.split()[-1]\n",
" myLabel = \"e\"\n",
" if lastWord.lower().startswith(\"c\"):\n",
" myLabel = \"n\"\n",
" dev2.at[row[0], \"pred2\"] = myLabel\n",
"\n",
"\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"0\n",
"50\n",
"100\n",
"150\n",
"200\n",
"250\n",
"300\n",
"350\n",
"400\n",
"450\n",
"500\n",
"550\n",
"600\n",
"650\n",
"700\n",
"750\n",
"800\n",
"850\n",
"900\n",
"950\n",
"1000\n",
"1050\n",
"1100\n",
"1150\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "qh9AZhP5lfJU",
"colab": {}
},
"source": [
"labeled.to_pickle(\"/content/drive/My Drive/GPT3_Benchmarking/ANLI_dev_EN1.pkl\")"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Q9psHW7SuAYI",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "9JR5sVx9t_Ux",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "a3710e11-8253-4cf7-8c7b-427bae97ef5d"
},
"source": [
"(dev2[\"pred\"]==dev2[\"label\"]).sum()/len(dev2)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4225"
]
},
"metadata": {
"tags": []
},
"execution_count": 162
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "vjx0XKJ7t_2a",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "ca3eb093-cf16-4653-efd3-a003b0665739"
},
"source": [
"(dev3[\"pred2\"]==dev3[\"label\"]).sum()/len(dev3)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0.4353628023352794"
]
},
"metadata": {
"tags": []
},
"execution_count": 165
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "8X4zIMT2uBgs",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "iM1Hu52BvSdo",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment