Skip to content

Instantly share code, notes, and snippets.

@brockmanmatt
Created July 19, 2020 09:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save brockmanmatt/d9b785d141755ac0c687404f2a3e6a78 to your computer and use it in GitHub Desktop.
Save brockmanmatt/d9b785d141755ac0c687404f2a3e6a78 to your computer and use it in GitHub Desktop.
ExploreContextStuffing.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "ExploreContextStuffing.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyOQmg3TB23Ux+1+2C6QwqFl",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/brockmanmatt/d9b785d141755ac0c687404f2a3e6a78/explorecontextstuffing.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"metadata": {
"id": "J7wnsgT2kPut",
"colab_type": "code",
"colab": {
"resources": {
"http://localhost:8080/nbextensions/google.colab/files.js": {
"data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCkgewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwogICAgICBwZXJjZW50LnRleHRDb250ZW50ID0KICAgICAgICAgIGAke01hdGgucm91bmQoKHBvc2l0aW9uIC8gZmlsZURhdGEuYnl0ZUxlbmd0aCkgKiAxMDApfSUgZG9uZWA7CiAgICB9CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK",
"ok": true,
"headers": [
[
"content-type",
"application/javascript"
]
],
"status": 200,
"status_text": ""
}
},
"base_uri": "https://localhost:8080/",
"height": 89
},
"outputId": "086069d6-0f58-475c-8cba-45c85e5f31cd"
},
"source": [
"from google.colab import files\n",
"uploaded = files.upload()\n",
"print(\"done\")"
],
"execution_count": 25,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"\n",
" <input type=\"file\" id=\"files-8cf1452c-822d-43e1-ae2c-d0f36497424b\" name=\"files[]\" multiple disabled\n",
" style=\"border:none\" />\n",
" <output id=\"result-8cf1452c-822d-43e1-ae2c-d0f36497424b\">\n",
" Upload widget is only available when the cell has been executed in the\n",
" current browser session. Please rerun this cell to enable.\n",
" </output>\n",
" <script src=\"/nbextensions/google.colab/files.js\"></script> "
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"Saving key.json to key (2).json\n",
"done\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WHPHrUnhpKnI",
"colab_type": "text"
},
"source": [
"I'll install the API"
]
},
{
"cell_type": "code",
"metadata": {
"id": "zq0ltp2xn4yt",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 139
},
"outputId": "0e71c7c4-db81-41a8-dc66-fcc214c77a01"
},
"source": [
"!pip install openai\n",
"import openai, json, pandas as pd"
],
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already satisfied: openai in /usr/local/lib/python3.6/dist-packages (0.2.4)\n",
"Requirement already satisfied: requests>=2.20; python_version >= \"3.0\" in /usr/local/lib/python3.6/dist-packages (from openai) (2.23.0)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20; python_version >= \"3.0\"->openai) (1.24.3)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20; python_version >= \"3.0\"->openai) (2.10)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20; python_version >= \"3.0\"->openai) (3.0.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests>=2.20; python_version >= \"3.0\"->openai) (2020.6.20)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q2yE0jcnpMEV",
"colab_type": "text"
},
"source": [
"Loading in key.json that I uploaded; I do this so I don't need to worry about accidently leaking creds if I share the colab (which I'm 99% sure is just a json file that won't expose them)"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bwNXXwHen5x9",
"colab_type": "code",
"colab": {}
},
"source": [
"openai.api_key = json.load(open(\"key.json\", \"r\"))[\"key\"]"
],
"execution_count": 27,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "k67w5H0fpTkT",
"colab_type": "text"
},
"source": [
"Default keyword arguments to pass the aPI"
]
},
{
"cell_type": "code",
"metadata": {
"id": "e1EwpqqJkTYh",
"colab_type": "code",
"colab": {}
},
"source": [
"#arguments to send the API\n",
"kwargs = {\n",
"\"engine\":\"davinci\",\n",
"\"temperature\":0,\n",
"\"max_tokens\":150,\n",
"\"stop\":\"\\n\\n\",\n",
"}"
],
"execution_count": 28,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "zZubgPoOpWDH",
"colab_type": "text"
},
"source": [
"Quick wrapper to automatically save prompts and responses sent for later analysis if needed"
]
},
{
"cell_type": "code",
"metadata": {
"id": "sXTDJx0An9Bl",
"colab_type": "code",
"colab": {}
},
"source": [
"import datetime\n",
"def query(prompt, myKwargs = kwargs, full=False):\n",
" \"\"\"\n",
" wrapper for the API to save the prompt and the result\n",
" \"\"\"\n",
"\n",
" r = openai.Completion.create(prompt=prompt, **myKwargs)\n",
" if not full:\n",
" r = r[\"choices\"][0][\"text\"].strip()\n",
" with open(\"{}.json\".format(datetime.datetime.now().strftime(\"%Y%m%d%s\")), \"w\") as fh:\n",
" json.dump({\"prompt\":prompt, \"response\":r}, fh, indent=4)\n",
" return r"
],
"execution_count": 29,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdFXafcJpZ3Q",
"colab_type": "text"
},
"source": [
"Test to make sure my query works"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4SlyKgjyopPn",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "0912c141-cdf3-4c7a-fb47-10511a5fc071"
},
"source": [
"newKwargs = kwargs.copy()\n",
"newKwargs[\"stop\"] = \"\\n\"\n",
"query(\"q: what is 1+1?\\na:\", newKwargs)"
],
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic": {
"type": "string"
},
"text/plain": [
"'2'"
]
},
"metadata": {
"tags": []
},
"execution_count": 30
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "wC5givsUoJKW",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 255
},
"outputId": "e699773d-22d5-4259-9011-70e46365a356"
},
"source": [
"newKwargs = kwargs.copy()\n",
"newKwargs[\"stop\"] = \"\\n\"\n",
"query(\"q: what is 1+1?\\na:\", newKwargs, full=True)"
],
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<OpenAIObject text_completion id=cmpl-uZAJFTBScSG8s3Sf7H9nkmSq at 0x7ff0c4a2f2b0> JSON: {\n",
" \"choices\": [\n",
" {\n",
" \"finish_reason\": \"stop\",\n",
" \"index\": 0,\n",
" \"logprobs\": null,\n",
" \"text\": \" 2\"\n",
" }\n",
" ],\n",
" \"created\": 1595118124,\n",
" \"id\": \"cmpl-uZAJFTBScSG8s3Sf7H9nkmSq\",\n",
" \"model\": \"davinci:2020-05-03\",\n",
" \"object\": \"text_completion\"\n",
"}"
]
},
"metadata": {
"tags": []
},
"execution_count": 31
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "FOBqrcsV8gJg",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "c01e3a68-f39b-47fe-b117-921d53466ada"
},
"source": [
"prompt = \"\"\"Car\\nMachine\\n\\nDog\\nAnimal\\n\\nJaguar\\n\"\"\"\n",
"query(prompt)"
],
"execution_count": 39,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic": {
"type": "string"
},
"text/plain": [
"'Animal'"
]
},
"metadata": {
"tags": []
},
"execution_count": 39
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "6hnwZ_ZYdbix",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "4bbd3e61-fca7-47f2-8d5c-daa0b7cb0a0d"
},
"source": [
"prompt = \"\"\"Car\\nMachine\\n\\nApple\\nComputer\\n\\nJaguar\\n\"\"\"\n",
"query(prompt)"
],
"execution_count": 40,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic": {
"type": "string"
},
"text/plain": [
"'Car'"
]
},
"metadata": {
"tags": []
},
"execution_count": 40
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "taEOc04ld7aJ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "df904c79-942f-4379-facc-2c166d0c4cae"
},
"source": [
"prompt = \"\"\"Dog\\nAnimal\\n\\nCar\\nMachine\\n\\nJaguar\\n\"\"\"\n",
"query(prompt)"
],
"execution_count": 43,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic": {
"type": "string"
},
"text/plain": [
"'Animal'"
]
},
"metadata": {
"tags": []
},
"execution_count": 43
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6fH7CjGyKBwj",
"colab_type": "text"
},
"source": [
"Luckily, I've got a bunch of article headlines lying around"
]
},
{
"cell_type": "code",
"metadata": {
"id": "CtXYXf4yersD",
"colab_type": "code",
"colab": {}
},
"source": [
"df = pd.read_csv(\"https://raw.githubusercontent.com/brockmanmatt/CoverageTrends/master/archived_links/newyorktimes/202007/newyorktimes_20200718.csv\")"
],
"execution_count": 44,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "DsnGS8PwLGg-",
"colab_type": "text"
},
"source": [
"I'll keep the unique headlines"
]
},
{
"cell_type": "code",
"metadata": {
"id": "WzL5QItLKBOC",
"colab_type": "code",
"colab": {}
},
"source": [
"df = pd.DataFrame(df.text.unique(), columns=[\"text\"])"
],
"execution_count": 46,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YPULuKUAK-hU",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "23257411-92ef-4cbd-b3f5-f4e08ccb7035"
},
"source": [
"df.shape #117 unique headlines"
],
"execution_count": 49,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(117, 1)"
]
},
"metadata": {
"tags": []
},
"execution_count": 49
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4pN65zx_MydD",
"colab_type": "text"
},
"source": [
"Grab 20 random headlines"
]
},
{
"cell_type": "code",
"metadata": {
"id": "uPEkwWS4LSl6",
"colab_type": "code",
"colab": {}
},
"source": [
"batch1 = df.sample(20)"
],
"execution_count": 51,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jZPRu9EOLWRa",
"colab_type": "code",
"colab": {}
},
"source": [
"prompt = \"\"\"Label what each article headline is about\n",
"headline: Today it rained in Idaho\n",
"Label: Weather\n",
"\n",
"healdine:{}\n",
"Label:\"\"\""
],
"execution_count": 57,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "-IKEcz78MX4g",
"colab_type": "code",
"colab": {}
},
"source": [
"batch1_labels = []\n",
"for row in batch1.iterrows():\n",
" batch1_labels.append(query(prompt.format(row[1][\"text\"].strip())))"
],
"execution_count": 58,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "HAqaoZ8xMoy-",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"outputId": "4d564ea2-68df-4b89-e2a5-95e3301f7170"
},
"source": [
"batch1[\"label\"] = batch1_labels\n",
"batch1"
],
"execution_count": 59,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>U.S. Reports More Than 70,000 New Cases for Se...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>N.D.</td>\n",
" <td>News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>97</th>\n",
" <td>Officer Who Pressed Knee on George Floyd’s Nec...</td>\n",
" <td>Police</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>The Horror Novel Lurking in Your Busy Online L...</td>\n",
" <td>Horror</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>David W. BlightThere’s a Chance to Tell a New ...</td>\n",
" <td>Opinion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>Ginsburg Says Her Cancer Has Returned, but She...</td>\n",
" <td>Supreme Court</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>John Lewis was the last surviving speaker of t...</td>\n",
" <td>History</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>A City Investigates Its Traumatic HistoryNearl...</td>\n",
" <td>History</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>‘They Didn’t Just Love Him. They Knew Him.’ Yo...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>Jamaal Bowman Proves Alexandria Ocasio-Cortez ...</td>\n",
" <td>Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>Mistrust of a Coronavirus Vaccine Could Imperi...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>U.S. Reports More Than 70,000 New Cases for Se...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>82</th>\n",
" <td>How to Kill a Vampire: Not With This Kit, Appa...</td>\n",
" <td>Vampires</td>\n",
" </tr>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>Older Children Spread Virus Just as Much as Ad...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Casablanca, Morocco\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\...</td>\n",
" <td>Weather</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>John Lewis, Towering Figure of Civil Rights Er...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>87</th>\n",
" <td>Bernard Lafayette Jr.The First Time John Lewis...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>New York City will enter a new phase of reopen...</td>\n",
" <td>Disaster</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>Battle in the HimalayasChina and India are loc...</td>\n",
" <td>War</td>\n",
" </tr>\n",
" <tr>\n",
" <th>68</th>\n",
" <td>Did you follow the headlines this week? Take o...</td>\n",
" <td>News</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text label\n",
"38 U.S. Reports More Than 70,000 New Cases for Se... Health\n",
"58 N.D. News\n",
"97 Officer Who Pressed Knee on George Floyd’s Nec... Police\n",
"109 The Horror Novel Lurking in Your Busy Online L... Horror\n",
"24 David W. BlightThere’s a Chance to Tell a New ... Opinion\n",
"15 Ginsburg Says Her Cancer Has Returned, but She... Supreme Court\n",
"74 John Lewis was the last surviving speaker of t... History\n",
"5 A City Investigates Its Traumatic HistoryNearl... History\n",
"111 ‘They Didn’t Just Love Him. They Knew Him.’ Yo... Civil Rights\n",
"30 Jamaal Bowman Proves Alexandria Ocasio-Cortez ... Politics\n",
"102 Mistrust of a Coronavirus Vaccine Could Imperi... Health\n",
"49 U.S. Reports More Than 70,000 New Cases for Se... Health\n",
"82 How to Kill a Vampire: Not With This Kit, Appa... Vampires\n",
"101 Older Children Spread Virus Just as Much as Ad... Health\n",
"8 Casablanca, Morocco\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\... Weather\n",
"45 John Lewis, Towering Figure of Civil Rights Er... Civil Rights\n",
"87 Bernard Lafayette Jr.The First Time John Lewis... Civil Rights\n",
"37 New York City will enter a new phase of reopen... Disaster\n",
"66 Battle in the HimalayasChina and India are loc... War\n",
"68 Did you follow the headlines this week? Take o... News"
]
},
"metadata": {
"tags": []
},
"execution_count": 59
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y36FOKz2oF2o",
"colab_type": "text"
},
"source": [
"# Use Search for Stuffing"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QSx9lIPqLOZh",
"colab_type": "text"
},
"source": [
"K, use search to select 5 most similar items"
]
},
{
"cell_type": "code",
"metadata": {
"id": "_4doIg6hXfF1",
"colab_type": "code",
"colab": {}
},
"source": [
"batch2 = df[~df.text.isin(batch1.text)].sample(20)"
],
"execution_count": 63,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "QzZG62QIYFyL",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "417725b7-985f-42bc-d221-686d6f6e1091"
},
"source": [
"sorted([4,1,2,3])"
],
"execution_count": 65,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"metadata": {
"tags": []
},
"execution_count": 65
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "gVVADp06LF3c",
"colab_type": "code",
"colab": {}
},
"source": [
"batch2_sims = []\n",
"labeled_doc_headlines = batch1.text.to_list()\n",
"\n",
"for row in batch2.iterrows():\n",
" #get most similar\n",
" scores = openai.Engine(\"davinci\").search(documents=[x for x in labeled_doc_headlines],query=row[1][\"text\"])\n",
" batch2_sims.append([(scores[\"data\"][i][\"score\"], labeled_doc_headlines[i]) for i in range(len(labeled_doc_headlines))])"
],
"execution_count": 70,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "adwprxlVaOn-",
"colab_type": "text"
},
"source": [
"Great! Now I just need to create a prompt. For the purposes of this, I'll do it on the fly rather than using a function which is probably a better way to do it. Also, I could have combined these but this way you can play with it easier"
]
},
{
"cell_type": "code",
"metadata": {
"id": "NCfO9g9RdvsJ",
"colab_type": "code",
"colab": {}
},
"source": [
"batch2[\"sims\"] = batch2_sims"
],
"execution_count": 77,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Tu95g0hgfbf_",
"colab_type": "code",
"colab": {}
},
"source": [
"labeled = batch1.copy()\n",
"labeled.set_index(\"text\", drop=True, inplace=True) #set the index to the headlines"
],
"execution_count": 96,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "0NCffeXSahBv",
"colab_type": "code",
"colab": {}
},
"source": [
"batch2_labels = []\n",
"for row in batch2.iterrows():\n",
" prompt = \"\"\n",
" for label in sorted(row[1][\"sims\"])[-3:]: #add 3 most similar headlines with their labels to prompt\n",
" prompt += \"\"\"Headline: {}\\nLabel: {}\\n\\n\"\"\".format(label[1], labeled.at[label[1], \"label\"])\n",
" prompt += \"Headline: {}\\n\".format(row[1][\"text\"])\n",
" prompt += \"Label:\"\n",
" batch2_labels.append(query(prompt))"
],
"execution_count": 104,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "Vj5HFjYdXsnK",
"colab_type": "code",
"colab": {}
},
"source": [
"batch2[\"label\"] = batch2_labels"
],
"execution_count": 105,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "kET19sWYZ444",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"outputId": "30af54de-86d7-47b0-9322-cdf57fbfc46e"
},
"source": [
"batch2"
],
"execution_count": 106,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>sims</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Several people involved in the hacking of prom...</td>\n",
" <td>[(-15.553, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>Bitcoin</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Idaho</td>\n",
" <td>[(133.493, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>Idaho</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>U.S. Report Finds 18 States With ‘Red Zone’ Ou...</td>\n",
" <td>[(82.858, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>See the U.S. hot spots</td>\n",
" <td>[(94.649, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>U.S. Reports More Than 70,000 New Cases for Se...</td>\n",
" <td>[(123.148, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>John Lewis, Towering Figure of Civil Rights Er...</td>\n",
" <td>[(5.638, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>China’s Swimwear Capital Can’t Wait for You to...</td>\n",
" <td>[(26.309, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>11 of Our Best Weekend ReadsColin Powell. Less...</td>\n",
" <td>[(42.476, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Weekend Reading</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>P.R.</td>\n",
" <td>[(-18.657, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>113</th>\n",
" <td>Federal Officers in Portland Didn’t Have Prope...</td>\n",
" <td>[(45.114, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Police</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>‘What’s Happening’ To Dr. Anthony Fauci?Americ...</td>\n",
" <td>[(38.61, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>Are Your Kids Ready to Go Back to School?</td>\n",
" <td>[(24.238, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>Federal officers pulled a protester into a cou...</td>\n",
" <td>[(12.221, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Protest</td>\n",
" </tr>\n",
" <tr>\n",
" <th>114</th>\n",
" <td>‘On Tech With Shira Ovide’Just collect less da...</td>\n",
" <td>[(17.409, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Tech</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>Mo.</td>\n",
" <td>[(-2.566, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Cate Stetson and Ruth FriedmanThe Justice Depa...</td>\n",
" <td>[(-0.922, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Opinion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>Rebecca MartinsonPlease Don’t Make Me Risk Get...</td>\n",
" <td>[(7.461, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Education</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>LIVESurges in 18 U.S. States Prompt ‘Red Zone’...</td>\n",
" <td>[(80.573, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>Federal Officers In Portland Face Rising Oppos...</td>\n",
" <td>[(21.06, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Police</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Praise for the civil rights icon poured in fro...</td>\n",
" <td>[(-1.349, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text ... label\n",
"18 Several people involved in the hacking of prom... ... Bitcoin\n",
"60 Idaho ... Idaho\n",
"34 U.S. Report Finds 18 States With ‘Red Zone’ Ou... ... Health\n",
"9 See the U.S. hot spots ... Health\n",
"54 U.S. Reports More Than 70,000 New Cases for Se... ... Health\n",
"40 John Lewis, Towering Figure of Civil Rights Er... ... Civil Rights\n",
"93 China’s Swimwear Capital Can’t Wait for You to... ... Health\n",
"67 11 of Our Best Weekend ReadsColin Powell. Less... ... Weekend Reading\n",
"55 P.R. ... News\n",
"113 Federal Officers in Portland Didn’t Have Prope... ... Police\n",
"52 ‘What’s Happening’ To Dr. Anthony Fauci?Americ... ... Health\n",
"25 Are Your Kids Ready to Go Back to School? ... Health\n",
"31 Federal officers pulled a protester into a cou... ... Protest\n",
"114 ‘On Tech With Shira Ovide’Just collect less da... ... Tech\n",
"61 Mo. ... News\n",
"20 Cate Stetson and Ruth FriedmanThe Justice Depa... ... Opinion\n",
"88 Rebecca MartinsonPlease Don’t Make Me Risk Get... ... Education\n",
"96 LIVESurges in 18 U.S. States Prompt ‘Red Zone’... ... Health\n",
"86 Federal Officers In Portland Face Rising Oppos... ... Police\n",
"46 Praise for the civil rights icon poured in fro... ... Civil Rights\n",
"\n",
"[20 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 106
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5iTUaY-ooEAi",
"colab_type": "text"
},
"source": [
""
]
},
{
"cell_type": "code",
"metadata": {
"id": "93jV4_kHgN5B",
"colab_type": "code",
"colab": {}
},
"source": [
"batch3 = df[~df.text.isin(batch1.text.to_list()+batch2.text.to_list())].sample(20)"
],
"execution_count": 118,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "AI10V_gxzr1s",
"colab_type": "code",
"colab": {}
},
"source": [
"batch4 = df[~df.text.isin(batch1.text.to_list()+batch2.text.to_list()+batch3.text.to_list())].sample(20)"
],
"execution_count": 120,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "QEeiD8xc1QKV",
"colab_type": "text"
},
"source": [
"1 shot doesn't work here, going with 2 shot (and it looks like it works)"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "G3TBDahD0DtI",
"colab": {}
},
"source": [
"prompt = \"\"\"Label each article with the mentioned entities and a label for the general topic\n",
"\n",
"Headline: Today it rained in Idaho\n",
"Entities: Idaho\n",
"Label: Weather\n",
"\n",
"Headline: A fireman saved a cat from a tree in the bronx\n",
"Entities: Fireman, Cat, Bronx\n",
"Label: Local News\n",
"\n",
"Headline: {}\n",
"Entities:\"\"\""
],
"execution_count": 140,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "o6pVdA_m0DtN",
"colab": {}
},
"source": [
"batch3_labels = []\n",
"for row in batch3.iterrows():\n",
" batch3_labels.append(query(prompt.format(row[1][\"text\"].strip())))"
],
"execution_count": 143,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "ocLc40oS0DtP",
"colab": {}
},
"source": [
"ents = []\n",
"labels = []\n",
"\n",
"for label in batch3_labels:\n",
" ent, label = label.split(\"\\nLabel:\")\n",
" ents.append(ent.strip())\n",
" labels.append(label.strip())\n",
"\n",
"batch3[\"entities\"] = ents\n",
"batch3[\"label\"] = labels"
],
"execution_count": 153,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "2wGVs7jqw9AF",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"outputId": "3b0b635f-7680-42ba-ca75-93f3b83eb9ae"
},
"source": [
"batch3"
],
"execution_count": 152,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>entities</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>Carrboro, N.C.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n...</td>\n",
" <td>Mexico, United States</td>\n",
" <td>Immigration</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>The Editorial BoardThe Radical Resistance of J...</td>\n",
" <td>John Lewis, Editorial Board, Radical Resistanc...</td>\n",
" <td>Opinion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>The Editorial BoardThe Legal System Should Not...</td>\n",
" <td>Editorial Board, Legal System, Bullies</td>\n",
" <td>Opinion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>President Trump weighed in on John Lewis’s pas...</td>\n",
" <td>Trump, John Lewis</td>\n",
" <td>Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>106</th>\n",
" <td>John M. BarryThe Pandemic Could Get Much, Much...</td>\n",
" <td>John M. Barry, Pandemic, Must Act Now</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>Christopher Nolan Says ‘Tenet’ Will Come Out T...</td>\n",
" <td>Christopher Nolan, Tenet, Summer</td>\n",
" <td>Entertainment</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>David HughesI’m a Black Police Officer. Here’s...</td>\n",
" <td>David Hughes, Police Officer, Black, System</td>\n",
" <td>Social Issues</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Annik LaFarge10 Fingers, 2 Feet and 5,000 Pipe...</td>\n",
" <td>Annik LaFarge, Fingers, Feet, Pipes, Present</td>\n",
" <td>Music</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>Manhattan\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\...</td>\n",
" <td>North Miami Beach, Fla.</td>\n",
" <td>Local News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>U.S. Report Finds 18 States With ‘Red Zone’ Ou...</td>\n",
" <td>U.S., States, California, Schools, Virtual, Le...</td>\n",
" <td>Education</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Ky.</td>\n",
" <td>Kentucky</td>\n",
" <td>State</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>LIVEA Warning for 18 U.S. StatesWith case coun...</td>\n",
" <td>LIVEA, U.S. States, Case counts, Restrictions</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>Federal Officers Deployed in Portland Didn’t H...</td>\n",
" <td>Federal Officers, Portland, D.H.S.</td>\n",
" <td>Local News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>A Father of 2 Boys Went to War. We Chronicled ...</td>\n",
" <td>Father, Boys, War</td>\n",
" <td>War</td>\n",
" </tr>\n",
" <tr>\n",
" <th>115</th>\n",
" <td>The Editorial BoardJohn Lewis Risked His Life ...</td>\n",
" <td>John Lewis, Editorial Board, Risked, Life, Jus...</td>\n",
" <td>Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>Inside Trump’s Failure: The Rush to Abandon Le...</td>\n",
" <td>Trump, Virus, White House</td>\n",
" <td>Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>Manhattan\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\...</td>\n",
" <td>North Miami Beach, Fla.</td>\n",
" <td>Local News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>A Detailed Map of Who Is Wearing Masks in the ...</td>\n",
" <td>Mask, U.S.</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Portland Leaders Question Role of the Feds in ...</td>\n",
" <td>Portland, Mayor Ted Wheeler, Federal authoriti...</td>\n",
" <td>Politics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>Colo.</td>\n",
" <td>Colorado</td>\n",
" <td>State</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text ... label\n",
"100 Carrboro, N.C.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n... ... Immigration\n",
"42 The Editorial BoardThe Radical Resistance of J... ... Opinion\n",
"21 The Editorial BoardThe Legal System Should Not... ... Opinion\n",
"95 President Trump weighed in on John Lewis’s pas... ... Politics\n",
"106 John M. BarryThe Pandemic Could Get Much, Much... ... Health\n",
"27 Christopher Nolan Says ‘Tenet’ Will Come Out T... ... Entertainment\n",
"22 David HughesI’m a Black Police Officer. Here’s... ... Social Issues\n",
"23 Annik LaFarge10 Fingers, 2 Feet and 5,000 Pipe... ... Music\n",
"48 Manhattan\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\... ... Local News\n",
"6 U.S. Report Finds 18 States With ‘Red Zone’ Ou... ... Education\n",
"57 Ky. ... State\n",
"78 LIVEA Warning for 18 U.S. StatesWith case coun... ... Health\n",
"104 Federal Officers Deployed in Portland Didn’t H... ... Local News\n",
"26 A Father of 2 Boys Went to War. We Chronicled ... ... War\n",
"115 The Editorial BoardJohn Lewis Risked His Life ... ... Politics\n",
"89 Inside Trump’s Failure: The Rush to Abandon Le... ... Politics\n",
"50 Manhattan\\n\\n\\n\\n\\n\\n\\n\\n\\n\\t\\t\\t\\t\\t\\t\\n\\t\\t\\... ... Local News\n",
"14 A Detailed Map of Who Is Wearing Masks in the ... ... Health\n",
"3 Portland Leaders Question Role of the Feds in ... ... Politics\n",
"62 Colo. ... State\n",
"\n",
"[20 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 152
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "CMdxWOzg1tB8",
"colab_type": "code",
"colab": {}
},
"source": [
"batch4_labels = []\n",
"for row in batch4.iterrows():\n",
" batch4_labels.append(query(prompt.format(row[1][\"text\"].strip())))\n",
"\n",
"ents4 = []\n",
"labels4 = []\n",
"\n",
"for label in batch4_labels:\n",
" ent, label = label.split(\"\\nLabel:\")\n",
" ents4.append(ent.strip())\n",
" labels4.append(label.strip())\n",
"\n",
"batch4[\"entities\"] = ents\n",
"batch4[\"label\"] = labels"
],
"execution_count": 154,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "TzBPAuHPHcfz",
"colab_type": "code",
"colab": {}
},
"source": [
"newLabels = []\n",
"for row in batch4.iterrows():\n",
" prompt = \"\"\n",
" labelMatches = batch3[batch3.entities == row[1][\"entities\"]]\n",
" if len(labelMatches) < 3: #add some more if not enough extra samples\n",
" randomExtra = batch3[~batch3.text.isin(labelMatches.text.to_list())].sample(3-len(labelMatches))\n",
" labelMatches = pd.concat([labelMatches, randomExtra], axis=0)\n",
" for label in labelMatches.sample(3).iterrows(): #add 3 selected headlines with their labels to prompt\n",
" prompt += \"\"\"Headline: {}\\nEntities: {}\\nLabel: {}\\n\\n\"\"\".format(label[1][\"text\"], label[1][\"entities\"], label[1][\"label\"])\n",
" prompt += \"Headline: {}\\n\".format(row[1][\"text\"])\n",
" prompt += \"Entities:\"\n",
" newLabels.append(query(prompt)) \n",
" \n"
],
"execution_count": 177,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YAzKwoIjKioI",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 357
},
"outputId": "9f523a91-735f-40bc-847c-62f123b37794"
},
"source": [
"newLabels"
],
"execution_count": 180,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['Governor, Gavin Newsom, California, School, Online\\nLabel: Education',\n",
" 'Trump, Takeaways, Efforts, Responsibility\\nLabel: Analysis',\n",
" 'Federal Officers, Portland, Ore., Protests, Rising, Court\\nLabel: Politics',\n",
" 'Trump, Conservatives, Caricature\\nLabel: Politics',\n",
" 'Government assistance, Taxpayers, Study\\nLabel: Health',\n",
" 'Neto, Trump\\nLabel: Health',\n",
" 'States, 18, U.S., Warnings, Restrictions\\nLabel: Health',\n",
" 'Bowman, Engel, House, Party, Primary\\nLabel: Politics',\n",
" 'Rabbit Hole, Internet',\n",
" 'Person, Hot Dogs, Eat, Bear, Coyote, Wolf, Python, Study, Wolf, Burmese Python\\nLabel: Food',\n",
" 'New York City, Mayor Bill de Blasio, Restaurants\\nLabel: New York',\n",
" 'Portland, Ore., Federal authorities, President Trump, Law enforcement, Cities, Protests',\n",
" 'Johnson & Johnson, Coronavirus, Vaccine, Boston, Netherlands\\nLabel: Health',\n",
" 'Federal Agents, Portland, Legal\\nLabel: News',\n",
" 'United States, July 17, 14-day change, Trend, New cases, 70,790, 902\\nLabel: Health',\n",
" 'Larry, Ventilator, FaceTime, Heather Sten\\nLabel: Opinion',\n",
" 'Robin DiAngelo, White, Fragility, Everywhere, Antiracism, Training, Work\\nLabel: Education',\n",
" 'California, Capitol, Anti-vaccination, Rally, Steps\\nLabel: Health',\n",
" 'Georgia, Governor, Atlanta, Mayor, Mask\\nLabel: Politics',\n",
" 'Vermont, Aerial images, Transcendence, Tranquillity\\nLabel: Travel']"
]
},
"metadata": {
"tags": []
},
"execution_count": 180
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "khJdWn2RKiia",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "GcMdjOkrIEOK",
"colab_type": "code",
"colab": {}
},
"source": [
"ents4 = []\n",
"labels4 = []\n",
"\n",
"for label in newLabels:\n",
" try:\n",
" ent, label = label.split(\"\\nLabel:\")\n",
" except:\n",
" ent=label\n",
" label=\"\"\n",
" ents4.append(ent.strip())\n",
" labels4.append(label.strip())\n",
"\n",
"batch4[\"entities\"] = ents\n",
"batch4[\"label\"] = labels"
],
"execution_count": 181,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "sRLyqYeUQz8Z",
"colab_type": "text"
},
"source": [
"And going back to 2 and 1, we can select off word similarity of really whatever using traditional methods"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6qr6GkNeJZaS",
"colab_type": "code",
"colab": {}
},
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.metrics.pairwise import linear_kernel\n",
"vectorizer = TfidfVectorizer(max_features = 5000)\n",
"vectorizer.fit(batch1.text)\n",
"docs = vectorizer.transform(batch1.text)"
],
"execution_count": 184,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "-nLZWK6XRB5s",
"colab": {}
},
"source": [
"batch2_labels = []\n",
"\n",
"labeled = batch1.copy()\n",
"labeled.set_index(\"text\", drop=True, inplace=True) #set the index to the headlines\n",
"\n",
"for row in batch2.iterrows():\n",
" #get most similar\n",
" tmp = vectorizer.transform([row[1][\"text\"]])\n",
" sims = linear_kernel(tmp, docs).flatten() #get similarities to corpus\n",
" idxs = sims.argsort()[-3:] #get 7 most similar\n",
" myExamples = batch1[batch1.index.isin(batch1.index[idxs])]\n",
"\n",
" prompt = \"\"\n",
" for label in myExamples.iterrows(): #add 3 most similar headlines with their labels to prompt\n",
" prompt += \"\"\"Headline: {}\\nLabel: {}\\n\\n\"\"\".format(label[1][\"text\"], label[1][\"label\"])\n",
" prompt += \"Headline: {}\\n\".format(row[1][\"text\"])\n",
" prompt += \"Label:\"\n",
" batch2_labels.append(query(prompt))\n",
" "
],
"execution_count": 202,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "aPSXhgPiRB5x"
},
"source": [
"Great! Now I just need to create a prompt. For the purposes of this, I'll do it on the fly rather than using a function which is probably a better way to do it. Also, I could have combined these but this way you can play with it easier"
]
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "cGaAVwG4RB53",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab_type": "code",
"id": "HrKQKRXXRB57",
"colab": {}
},
"source": [
"batch2[\"label\"] = batch2_labels"
],
"execution_count": 203,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6phGEGRXTtV3",
"colab_type": "text"
},
"source": [
"Now it's got my new labels. still has sims from before."
]
},
{
"cell_type": "code",
"metadata": {
"id": "1X_X6mnHS9aq",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 669
},
"outputId": "def94b71-fbb7-471e-bfbc-ad2fb0489fa9"
},
"source": [
"batch2"
],
"execution_count": 204,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>text</th>\n",
" <th>sims</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Several people involved in the hacking of prom...</td>\n",
" <td>[(-15.553, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>Technology</td>\n",
" </tr>\n",
" <tr>\n",
" <th>60</th>\n",
" <td>Idaho</td>\n",
" <td>[(133.493, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>U.S. Report Finds 18 States With ‘Red Zone’ Ou...</td>\n",
" <td>[(82.858, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>See the U.S. hot spots</td>\n",
" <td>[(94.649, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>U.S. Reports More Than 70,000 New Cases for Se...</td>\n",
" <td>[(123.148, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>John Lewis, Towering Figure of Civil Rights Er...</td>\n",
" <td>[(5.638, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" <tr>\n",
" <th>93</th>\n",
" <td>China’s Swimwear Capital Can’t Wait for You to...</td>\n",
" <td>[(26.309, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>11 of Our Best Weekend ReadsColin Powell. Less...</td>\n",
" <td>[(42.476, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Weekend Reads</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>P.R.</td>\n",
" <td>[(-18.657, U.S. Reports More Than 70,000 New C...</td>\n",
" <td>War</td>\n",
" </tr>\n",
" <tr>\n",
" <th>113</th>\n",
" <td>Federal Officers in Portland Didn’t Have Prope...</td>\n",
" <td>[(45.114, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Police</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>‘What’s Happening’ To Dr. Anthony Fauci?Americ...</td>\n",
" <td>[(38.61, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>Are Your Kids Ready to Go Back to School?</td>\n",
" <td>[(24.238, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Education</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>Federal officers pulled a protester into a cou...</td>\n",
" <td>[(12.221, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Protest</td>\n",
" </tr>\n",
" <tr>\n",
" <th>114</th>\n",
" <td>‘On Tech With Shira Ovide’Just collect less da...</td>\n",
" <td>[(17.409, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Privacy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>Mo.</td>\n",
" <td>[(-2.566, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>News</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Cate Stetson and Ruth FriedmanThe Justice Depa...</td>\n",
" <td>[(-0.922, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Death Penalty</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>Rebecca MartinsonPlease Don’t Make Me Risk Get...</td>\n",
" <td>[(7.461, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Opinion</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>LIVESurges in 18 U.S. States Prompt ‘Red Zone’...</td>\n",
" <td>[(80.573, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Health</td>\n",
" </tr>\n",
" <tr>\n",
" <th>86</th>\n",
" <td>Federal Officers In Portland Face Rising Oppos...</td>\n",
" <td>[(21.06, U.S. Reports More Than 70,000 New Cas...</td>\n",
" <td>Police</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Praise for the civil rights icon poured in fro...</td>\n",
" <td>[(-1.349, U.S. Reports More Than 70,000 New Ca...</td>\n",
" <td>Civil Rights</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" text ... label\n",
"18 Several people involved in the hacking of prom... ... Technology\n",
"60 Idaho ... News\n",
"34 U.S. Report Finds 18 States With ‘Red Zone’ Ou... ... Health\n",
"9 See the U.S. hot spots ... Civil Rights\n",
"54 U.S. Reports More Than 70,000 New Cases for Se... ... Health\n",
"40 John Lewis, Towering Figure of Civil Rights Er... ... Civil Rights\n",
"93 China’s Swimwear Capital Can’t Wait for You to... ... Health\n",
"67 11 of Our Best Weekend ReadsColin Powell. Less... ... Weekend Reads\n",
"55 P.R. ... War\n",
"113 Federal Officers in Portland Didn’t Have Prope... ... Police\n",
"52 ‘What’s Happening’ To Dr. Anthony Fauci?Americ... ... Health\n",
"25 Are Your Kids Ready to Go Back to School? ... Education\n",
"31 Federal officers pulled a protester into a cou... ... Protest\n",
"114 ‘On Tech With Shira Ovide’Just collect less da... ... Privacy\n",
"61 Mo. ... News\n",
"20 Cate Stetson and Ruth FriedmanThe Justice Depa... ... Death Penalty\n",
"88 Rebecca MartinsonPlease Don’t Make Me Risk Get... ... Opinion\n",
"96 LIVESurges in 18 U.S. States Prompt ‘Red Zone’... ... Health\n",
"86 Federal Officers In Portland Face Rising Oppos... ... Police\n",
"46 Praise for the civil rights icon poured in fro... ... Civil Rights\n",
"\n",
"[20 rows x 3 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 204
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "wNWuIjrcTFDi",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment