caleb-kaiser/2-prompt-tracking-comet.ipynb

## 2-prompt-tracking-comet.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iruud8QA-hma"
      },
      "source": [
        "# Logging, Tracking, and Debugging Prompts using Comet\n",
        "\n",
        "In this section, we will demonstrate how to log, track, and debug prompt using the `comet-llm` library. `comet-llm` is an open-sourced repo managed by Comet. Please give the repo star if you have a chance and submit any feedback you have! https://github.com/comet-ml/comet-llm\n",
        "\n",
        "Let's first load all the necessary libraries:\n"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "! pip install openai comet-llm --quiet"
      ],
      "metadata": {
        "id": "cLNconIi-rhZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "WcRfRjy6-hmc"
      },
      "outputs": [],
      "source": [
        "import openai\n",
        "import os\n",
        "import IPython\n",
        "import json\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import comet_llm\n",
        "import urllib\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "o8yyTvrK-hmd"
      },
      "source": [
        "The function below helps to generate the final results from the model after calling the OpenAI API:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EiKIWPuQ-hmd"
      },
      "outputs": [],
      "source": [
        "def get_completion(messages, model=\"gpt-3.5-turbo\", temperature=0, max_tokens=300):\n",
        "    response = openai.ChatCompletion.create(\n",
        "        model=model,\n",
        "        messages=messages,\n",
        "        temperature=temperature,\n",
        "        max_tokens=max_tokens,\n",
        "    )\n",
        "    return response.choices[0].message[\"content\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OHLJxr5B-hmd"
      },
      "source": [
        "### Load the Data\n",
        "\n",
        "The code below loads both the few-shot demonstrations and the validation dataset used for testing the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "h1Q29ooc-hme"
      },
      "outputs": [],
      "source": [
        "# API configuration\n",
        "openai.api_key = \"OPENAI_API_KEY\"\n",
        "\n",
        "# print markdown\n",
        "def print_markdown(text):\n",
        "    \"\"\"Prints text as markdown\"\"\"\n",
        "    IPython.display.display(IPython.display.Markdown(text))\n",
        "\n",
        "# load validation data from GitHub\n",
        "f = urllib.request.urlopen(\"https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/article-tags.json\")\n",
        "val_data = json.load(f)\n",
        "\n",
        "# load few shot data from GitHub\n",
        "f = urllib.request.urlopen(\"https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/few_shot.json\")\n",
        "few_shot_data = json.load(f)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kkjHAZW0-hme"
      },
      "source": [
        "The following is a helper function to obtain the final predictions from the model given a prompt template (e.g., zero-shot or few-shot) and the provided input data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "FOPd0RiP-hme"
      },
      "outputs": [],
      "source": [
        "def get_predictions(prompt_template, inputs):\n",
        "\n",
        "    responses = []\n",
        "\n",
        "    for i in range(len(inputs)):\n",
        "        messages = messages = [\n",
        "            {\n",
        "                \"role\": \"system\",\n",
        "                \"content\": prompt_template.format(input=inputs[i])\n",
        "            }\n",
        "        ]\n",
        "        response = get_completion(messages)\n",
        "        responses.append(response)\n",
        "\n",
        "    return responses"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1Sbh8Rl1-hme"
      },
      "source": [
        "### Few-Shot\n",
        "\n",
        "First, we define a few-shot template which will leverage the few-shot demonstration data loaded previously."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ur-R41tr-hme"
      },
      "outputs": [],
      "source": [
        "# function to define the few-shot template\n",
        "def get_few_shot_template(few_shot_prefix, few_shot_suffix, few_shot_examples):\n",
        "    return few_shot_prefix + \"\\n\\n\" + \"\\n\".join([ \"Abstract: \"+ ex[\"abstract\"] + \"\\n\" + \"Tags: \" + str(ex[\"tags\"]) + \"\\n\" for ex in few_shot_examples]) + \"\\n\\n\" + few_shot_suffix\n",
        "\n",
        "# function to sample few shot data\n",
        "def random_sample_data (data, n):\n",
        "    return np.random.choice(few_shot_data, n, replace=False)\n",
        "\n",
        "\n",
        "# the few-shot prefix and suffix\n",
        "few_shot_prefix = \"\"\"Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\"\"\"\n",
        "few_shot_suffix = \"\"\"Abstract: {input}\\nTags:\"\"\"\n",
        "\n",
        "# load 3 samples from few shot data\n",
        "few_shot_template = get_few_shot_template(few_shot_prefix, few_shot_suffix, random_sample_data(few_shot_data, 3))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "z716KoK2-hmf",
        "outputId": "8552777f-809c-4c1c-e880-3071ee584a5f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "'Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\"model_name\"]. If you don\\'t find model names in the abstract or you are not sure, return [\"NA\"]\\n\\nAbstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca large models, emphasizing instruction fine-tuning. We expand the original LLaMA\\'s Chinese vocabulary by adding 20K Chinese tokens, increasing encoding efficiency and enhancing basic semantic understanding. By incorporating secondary pre-training using Chinese data and fine-tuning with Chinese instruction data, we substantially improve the models\\' comprehension and execution of instructions. Our pilot study serves as a foundation for researchers adapting LLaMA and Alpaca models to other languages. Resources are made publicly available through GitHub, fostering open research in the Chinese NLP community and beyond. GitHub repository:\\nTags: [\\'ChatGPT\\', \\'GPT-4\\', \\'LLaMA\\', \\'Alpaca\\']\\n\\nAbstract: The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their \"cognitive\" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on:\\nTags: [\\'NA\\']\\n\\nAbstract: Prevalent semantic segmentation solutions are, in essence, a dense discriminative classifier of p(class|pixel feature). Though straightforward, this de facto paradigm neglects the underlying data distribution p(pixel feature|class), and struggles to identify out-of-distribution data. Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class). For each class, GMMSeg builds Gaussian Mixture Models (GMMs) via Expectation-Maximization (EM), so as to capture class-conditional densities. Meanwhile, the deep dense representation is end-to-end trained in a discriminative manner, i.e., maximizing p(class|pixel feature). This endows GMMSeg with the strengths of both generative and discriminative models. With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on three closed-set datasets. More impressively, without any modification, GMMSeg even performs well on open-world datasets. We believe this work brings fundamental insights into the related fields.\\nTags: [\\'GMMSeg\\', \\'GMMs\\']\\n\\n\\nAbstract: {input}\\nTags:'"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "few_shot_template"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jvvfx8ae-hmf"
      },
      "source": [
        "### Zero-Shot Template\n",
        "\n",
        "The code below defines the zero-shot template. Note that we use the same instruction from the few-shot prompt template. But in this case, we don't use the demonstrations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wYvzzwYA-hmf"
      },
      "outputs": [],
      "source": [
        "zero_shot_template = \"\"\"\n",
        "Your task is extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\n",
        "\n",
        "Abstract: {input}\n",
        "Tags:\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qwvsFEof-hmf"
      },
      "source": [
        "### Get Predictions\n",
        "\n",
        "We then generated all the predictions using the validation data as inputs:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XGhppRv--hmg"
      },
      "outputs": [],
      "source": [
        "# get the predictions\n",
        "\n",
        "abstracts = [val_data[i][\"abstract\"] for i in range(len(val_data))]\n",
        "few_shot_predictions = get_predictions(few_shot_template, abstracts)\n",
        "zero_shot_predictions = get_predictions(zero_shot_template, abstracts)\n",
        "expected_tags = [str(val_data[i][\"tags\"]) for i in range(len(val_data))]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5to6iSU2-hmg",
        "outputId": "bcb9e84a-8215-4636-e85a-dad2b305a640"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Few shot predictions\n",
            "[\"['LLM', 'ChatGPT', 'LLaMA', 'WizardLM', 'OpenAI ChatGPT']\", \"['FLAN-T5', 'AMR', 'UD', 'SRL', 'LoRA']\", '[\"NA\"]', \"['ChatGPT', 'GPT-4', 'LLaMA', 'Alpaca', 'GMMSeg', 'GMMs', 'PAXQA']\", \"['ChatGPT']\", \"['ViT', 'OpenCLIP']\", \"['SAM', 'IA', 'AIGC', 'Stable Diffusion']\", \"['Anything-3D', 'BLIP', 'Segment-Anything']\", \"['Chameleon', 'LLMs', 'GPT-4', 'ScienceQA', 'TabMWP', 'ChatGPT']\", \"['NA']\"]\n",
            "\n",
            "\n",
            "Zero shot predictions\n",
            "['[\"WizardLM\", \"Evol-Instruct\", \"LLaMA\", \"ChatGPT\"]', '[\"FLAN-T5\"]', '[\"large language models\", \"generative AI\", \"hypothesis machines\"]', '[\"PAXQA\"]', '[\"ChatGPT\"]', '[\"ViT\", \"OpenCLIP\"]', '[\"Segment-Anything Model (SAM)\", \"Inpaint Anything (IA)\", \"AIGC models\", \"Stable Diffusion\", \"Inpaint Anything (IA)\"]', '[\"Anything-3D\", \"BLIP\", \"Segment-Anything\", \"text-to-image diffusion model\"]', '[\"Chameleon\", \"GPT-4\", \"ScienceQA\", \"TabMWP\", \"ChatGPT\"]', '[\"NA\"]']\n",
            "\n",
            "\n",
            "Expected tags\n",
            "[\"['LLaMA', 'ChatGPT', 'WizardLM']\", \"['FLAN-T5', 'FLAN']\", \"['NA']\", \"['PAXQA']\", \"['ChatGPT']\", \"['OpenCLIP', 'ViT']\", \"['SAM', 'IA']\", \"['Anything-3D', 'BLIP', 'Segment-Anything']\", \"['Chameleon', 'GPT-4', 'ChatGPT']\", \"['NA']\"]\n"
          ]
        }
      ],
      "source": [
        "print(\"Few shot predictions\")\n",
        "print(few_shot_predictions)\n",
        "print(\"\\n\\nZero shot predictions\")\n",
        "print(zero_shot_predictions)\n",
        "print(\"\\n\\nExpected tags\")\n",
        "print(expected_tags)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vq47EV18-hmg"
      },
      "source": [
        "### Log Prompt Results\n",
        "\n",
        "Finally, we log the prompt + results to Comet. Note that we are logging both the few-shot and zero-shot results, together with all the metadata and tags."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "iBhWMBZa-hmg",
        "outputId": "c982ef14-e3e7-498f-9fb7-cfab6581fcf5"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "\u001b[1;38;5;39mCOMET INFO:\u001b[0m Valid Comet API Key saved in /Users/elvissaravia/.comet.config (set COMET_CONFIG to change where it is saved).\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Prompt logged to https://www.comet.com/omarsar/ml-paper-tagger-prompts\n"
          ]
        }
      ],
      "source": [
        "# log the predictions in Comet along with the ground truth for comparison\n",
        "\n",
        "# set up comet\n",
        "COMET_API_KEY = \"COMET_API_KEY\"\n",
        "COMET_WORKSPACE = \"COMET_WORKSPACE\"\n",
        "\n",
        "# initialize comet\n",
        "comet_llm.init(COMET_API_KEY, COMET_WORKSPACE, project=\"ml-paper-tagger-prompts\")\n",
        "\n",
        "# log the predictions\n",
        "for i in range(len(expected_tags)):\n",
        "    # log the few-shot predictions\n",
        "    comet_llm.log_prompt(\n",
        "        prompt=few_shot_template.format(input=abstracts[i]),\n",
        "        prompt_template=few_shot_template,\n",
        "        output=few_shot_predictions[i],\n",
        "        tags = [\"gpt-3.5-turbo\", \"few-shot\"],\n",
        "        metadata = {\n",
        "            \"expected_tags\": expected_tags[i],\n",
        "            \"abstract\": abstracts[i],\n",
        "        }\n",
        "    )\n",
        "\n",
        "    # log the zero-shot predictions\n",
        "    comet_llm.log_prompt(\n",
        "        prompt=zero_shot_template.format(input=abstracts[i]),\n",
        "        prompt_template=zero_shot_template,\n",
        "        output=zero_shot_predictions[i],\n",
        "        tags = [\"gpt-3.5-turbo\", \"zero-shot\"],\n",
        "        metadata = {\n",
        "            \"expected_tags\": expected_tags[i],\n",
        "            \"abstract\": abstracts[i],\n",
        "        }\n",
        "    )"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "comet",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.17"
    },
    "orig_nbformat": 4,
    "colab": {
      "provenance": [],
      "include_colab_link": true
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "iruud8QA-hma"
	},
	"source": [
	"# Logging, Tracking, and Debugging Prompts using Comet\n",
	"\n",
	"In this section, we will demonstrate how to log, track, and debug prompt using the `comet-llm` library. `comet-llm` is an open-sourced repo managed by Comet. Please give the repo star if you have a chance and submit any feedback you have! https://github.com/comet-ml/comet-llm\n",
	"\n",
	"Let's first load all the necessary libraries:\n"
	]
	},
	{
	"cell_type": "code",
	"source": [
	"! pip install openai comet-llm --quiet"
	],
	"metadata": {
	"id": "cLNconIi-rhZ"
	},
	"execution_count": null,
	"outputs": []
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"id": "WcRfRjy6-hmc"
	},
	"outputs": [],
	"source": [
	"import openai\n",
	"import os\n",
	"import IPython\n",
	"import json\n",
	"import pandas as pd\n",
	"import numpy as np\n",
	"import comet_llm\n",
	"import urllib\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "o8yyTvrK-hmd"
	},
	"source": [
	"The function below helps to generate the final results from the model after calling the OpenAI API:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "EiKIWPuQ-hmd"
	},
	"outputs": [],
	"source": [
	"def get_completion(messages, model=\"gpt-3.5-turbo\", temperature=0, max_tokens=300):\n",
	" response = openai.ChatCompletion.create(\n",
	" model=model,\n",
	" messages=messages,\n",
	" temperature=temperature,\n",
	" max_tokens=max_tokens,\n",
	" )\n",
	" return response.choices[0].message[\"content\"]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "OHLJxr5B-hmd"
	},
	"source": [
	"### Load the Data\n",
	"\n",
	"The code below loads both the few-shot demonstrations and the validation dataset used for testing the model."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"id": "h1Q29ooc-hme"
	},
	"outputs": [],
	"source": [
	"# API configuration\n",
	"openai.api_key = \"OPENAI_API_KEY\"\n",
	"\n",
	"# print markdown\n",
	"def print_markdown(text):\n",
	" \"\"\"Prints text as markdown\"\"\"\n",
	" IPython.display.display(IPython.display.Markdown(text))\n",
	"\n",
	"# load validation data from GitHub\n",
	"f = urllib.request.urlopen(\"https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/article-tags.json\")\n",
	"val_data = json.load(f)\n",
	"\n",
	"# load few shot data from GitHub\n",
	"f = urllib.request.urlopen(\"https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/few_shot.json\")\n",
	"few_shot_data = json.load(f)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "kkjHAZW0-hme"
	},
	"source": [
	"The following is a helper function to obtain the final predictions from the model given a prompt template (e.g., zero-shot or few-shot) and the provided input data."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "FOPd0RiP-hme"
	},
	"outputs": [],
	"source": [
	"def get_predictions(prompt_template, inputs):\n",
	"\n",
	" responses = []\n",
	"\n",
	" for i in range(len(inputs)):\n",
	" messages = messages = [\n",
	" {\n",
	" \"role\": \"system\",\n",
	" \"content\": prompt_template.format(input=inputs[i])\n",
	" }\n",
	" ]\n",
	" response = get_completion(messages)\n",
	" responses.append(response)\n",
	"\n",
	" return responses"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "1Sbh8Rl1-hme"
	},
	"source": [
	"### Few-Shot\n",
	"\n",
	"First, we define a few-shot template which will leverage the few-shot demonstration data loaded previously."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "Ur-R41tr-hme"
	},
	"outputs": [],
	"source": [
	"# function to define the few-shot template\n",
	"def get_few_shot_template(few_shot_prefix, few_shot_suffix, few_shot_examples):\n",
	" return few_shot_prefix + \"\\n\\n\" + \"\\n\".join([ \"Abstract: \"+ ex[\"abstract\"] + \"\\n\" + \"Tags: \" + str(ex[\"tags\"]) + \"\\n\" for ex in few_shot_examples]) + \"\\n\\n\" + few_shot_suffix\n",
	"\n",
	"# function to sample few shot data\n",
	"def random_sample_data (data, n):\n",
	" return np.random.choice(few_shot_data, n, replace=False)\n",
	"\n",
	"\n",
	"# the few-shot prefix and suffix\n",
	"few_shot_prefix = \"\"\"Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\"\"\"\n",
	"few_shot_suffix = \"\"\"Abstract: {input}\\nTags:\"\"\"\n",
	"\n",
	"# load 3 samples from few shot data\n",
	"few_shot_template = get_few_shot_template(few_shot_prefix, few_shot_suffix, random_sample_data(few_shot_data, 3))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "z716KoK2-hmf",
	"outputId": "8552777f-809c-4c1c-e880-3071ee584a5f"
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"'Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\"model_name\"]. If you don\\'t find model names in the abstract or you are not sure, return [\"NA\"]\\n\\nAbstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca large models, emphasizing instruction fine-tuning. We expand the original LLaMA\\'s Chinese vocabulary by adding 20K Chinese tokens, increasing encoding efficiency and enhancing basic semantic understanding. By incorporating secondary pre-training using Chinese data and fine-tuning with Chinese instruction data, we substantially improve the models\\' comprehension and execution of instructions. Our pilot study serves as a foundation for researchers adapting LLaMA and Alpaca models to other languages. Resources are made publicly available through GitHub, fostering open research in the Chinese NLP community and beyond. GitHub repository:\\nTags: [\\'ChatGPT\\', \\'GPT-4\\', \\'LLaMA\\', \\'Alpaca\\']\\n\\nAbstract: The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their \"cognitive\" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on:\\nTags: [\\'NA\\']\\n\\nAbstract: Prevalent semantic segmentation solutions are, in essence, a dense discriminative classifier of p(class\|pixel feature). Though straightforward, this de facto paradigm neglects the underlying data distribution p(pixel feature\|class), and struggles to identify out-of-distribution data. Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class). For each class, GMMSeg builds Gaussian Mixture Models (GMMs) via Expectation-Maximization (EM), so as to capture class-conditional densities. Meanwhile, the deep dense representation is end-to-end trained in a discriminative manner, i.e., maximizing p(class\|pixel feature). This endows GMMSeg with the strengths of both generative and discriminative models. With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on three closed-set datasets. More impressively, without any modification, GMMSeg even performs well on open-world datasets. We believe this work brings fundamental insights into the related fields.\\nTags: [\\'GMMSeg\\', \\'GMMs\\']\\n\\n\\nAbstract: {input}\\nTags:'"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"few_shot_template"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Jvvfx8ae-hmf"
	},
	"source": [
	"### Zero-Shot Template\n",
	"\n",
	"The code below defines the zero-shot template. Note that we use the same instruction from the few-shot prompt template. But in this case, we don't use the demonstrations."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "wYvzzwYA-hmf"
	},
	"outputs": [],
	"source": [
	"zero_shot_template = \"\"\"\n",
	"Your task is extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\n",
	"\n",
	"Abstract: {input}\n",
	"Tags:\n",
	"\"\"\""
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "qwvsFEof-hmf"
	},
	"source": [
	"### Get Predictions\n",
	"\n",
	"We then generated all the predictions using the validation data as inputs:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "XGhppRv--hmg"
	},
	"outputs": [],
	"source": [
	"# get the predictions\n",
	"\n",
	"abstracts = [val_data[i][\"abstract\"] for i in range(len(val_data))]\n",
	"few_shot_predictions = get_predictions(few_shot_template, abstracts)\n",
	"zero_shot_predictions = get_predictions(zero_shot_template, abstracts)\n",
	"expected_tags = [str(val_data[i][\"tags\"]) for i in range(len(val_data))]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "5to6iSU2-hmg",
	"outputId": "bcb9e84a-8215-4636-e85a-dad2b305a640"
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Few shot predictions\n",
	"[\"['LLM', 'ChatGPT', 'LLaMA', 'WizardLM', 'OpenAI ChatGPT']\", \"['FLAN-T5', 'AMR', 'UD', 'SRL', 'LoRA']\", '[\"NA\"]', \"['ChatGPT', 'GPT-4', 'LLaMA', 'Alpaca', 'GMMSeg', 'GMMs', 'PAXQA']\", \"['ChatGPT']\", \"['ViT', 'OpenCLIP']\", \"['SAM', 'IA', 'AIGC', 'Stable Diffusion']\", \"['Anything-3D', 'BLIP', 'Segment-Anything']\", \"['Chameleon', 'LLMs', 'GPT-4', 'ScienceQA', 'TabMWP', 'ChatGPT']\", \"['NA']\"]\n",
	"\n",
	"\n",
	"Zero shot predictions\n",
	"['[\"WizardLM\", \"Evol-Instruct\", \"LLaMA\", \"ChatGPT\"]', '[\"FLAN-T5\"]', '[\"large language models\", \"generative AI\", \"hypothesis machines\"]', '[\"PAXQA\"]', '[\"ChatGPT\"]', '[\"ViT\", \"OpenCLIP\"]', '[\"Segment-Anything Model (SAM)\", \"Inpaint Anything (IA)\", \"AIGC models\", \"Stable Diffusion\", \"Inpaint Anything (IA)\"]', '[\"Anything-3D\", \"BLIP\", \"Segment-Anything\", \"text-to-image diffusion model\"]', '[\"Chameleon\", \"GPT-4\", \"ScienceQA\", \"TabMWP\", \"ChatGPT\"]', '[\"NA\"]']\n",
	"\n",
	"\n",
	"Expected tags\n",
	"[\"['LLaMA', 'ChatGPT', 'WizardLM']\", \"['FLAN-T5', 'FLAN']\", \"['NA']\", \"['PAXQA']\", \"['ChatGPT']\", \"['OpenCLIP', 'ViT']\", \"['SAM', 'IA']\", \"['Anything-3D', 'BLIP', 'Segment-Anything']\", \"['Chameleon', 'GPT-4', 'ChatGPT']\", \"['NA']\"]\n"
	]
	}
	],
	"source": [
	"print(\"Few shot predictions\")\n",
	"print(few_shot_predictions)\n",
	"print(\"\\n\\nZero shot predictions\")\n",
	"print(zero_shot_predictions)\n",
	"print(\"\\n\\nExpected tags\")\n",
	"print(expected_tags)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "vq47EV18-hmg"
	},
	"source": [
	"### Log Prompt Results\n",
	"\n",
	"Finally, we log the prompt + results to Comet. Note that we are logging both the few-shot and zero-shot results, together with all the metadata and tags."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"id": "iBhWMBZa-hmg",
	"outputId": "c982ef14-e3e7-498f-9fb7-cfab6581fcf5"
	},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"\u001b[1;38;5;39mCOMET INFO:\u001b[0m Valid Comet API Key saved in /Users/elvissaravia/.comet.config (set COMET_CONFIG to change where it is saved).\n"
	]
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Prompt logged to https://www.comet.com/omarsar/ml-paper-tagger-prompts\n"
	]
	}
	],
	"source": [
	"# log the predictions in Comet along with the ground truth for comparison\n",
	"\n",
	"# set up comet\n",
	"COMET_API_KEY = \"COMET_API_KEY\"\n",
	"COMET_WORKSPACE = \"COMET_WORKSPACE\"\n",
	"\n",
	"# initialize comet\n",
	"comet_llm.init(COMET_API_KEY, COMET_WORKSPACE, project=\"ml-paper-tagger-prompts\")\n",
	"\n",
	"# log the predictions\n",
	"for i in range(len(expected_tags)):\n",
	" # log the few-shot predictions\n",
	" comet_llm.log_prompt(\n",
	" prompt=few_shot_template.format(input=abstracts[i]),\n",
	" prompt_template=few_shot_template,\n",
	" output=few_shot_predictions[i],\n",
	" tags = [\"gpt-3.5-turbo\", \"few-shot\"],\n",
	" metadata = {\n",
	" \"expected_tags\": expected_tags[i],\n",
	" \"abstract\": abstracts[i],\n",
	" }\n",
	" )\n",
	"\n",
	" # log the zero-shot predictions\n",
	" comet_llm.log_prompt(\n",
	" prompt=zero_shot_template.format(input=abstracts[i]),\n",
	" prompt_template=zero_shot_template,\n",
	" output=zero_shot_predictions[i],\n",
	" tags = [\"gpt-3.5-turbo\", \"zero-shot\"],\n",
	" metadata = {\n",
	" \"expected_tags\": expected_tags[i],\n",
	" \"abstract\": abstracts[i],\n",
	" }\n",
	" )"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "comet",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.9.17"
	},
	"orig_nbformat": 4,
	"colab": {
	"provenance": [],
	"include_colab_link": true
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}