Skip to content

Instantly share code, notes, and snippets.

@virattt
Created November 8, 2023 13:18
Show Gist options
  • Save virattt/0dd58fb915151981863a231a35921fe9 to your computer and use it in GitHub Desktop.
Save virattt/0dd58fb915151981863a231a35921fe9 to your computer and use it in GitHub Desktop.
openai-assistant-apple_10Q.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyMn2Bdwkhumbomjtw9H8i06",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/virattt/0dd58fb915151981863a231a35921fe9/openai-assistant-apple_10q.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"### Welcome! 👋\n",
"\n",
"This is a quick tutorial on how to use OpenAI's new Assistants API with Knowledge Retrieval.\n",
"\n",
"In this notebook, we:\n",
"1. Create an assistant\n",
"2. Upload Apple's most recent quarterly report (10-Q) to OpenAI\n",
"3. Attach the report to our assistant\n",
"4. Chat with the assistant about the quarterly report!\n",
"\n",
"If you have any questions or issues, please reach out to me [here](https://twitter.com/virattt) 🙂"
],
"metadata": {
"id": "ldnUnKchD2Fe"
}
},
{
"cell_type": "markdown",
"source": [
"### 1. Setup and Installation"
],
"metadata": {
"id": "CxJxmsto9upG"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GGcNvksN2PRY"
},
"outputs": [],
"source": [
"pip install openai # OpenAI's python library"
]
},
{
"cell_type": "code",
"source": [
"pip install bs4 # We use bs4 to parse raw HTML"
],
"metadata": {
"id": "NGRI3RndPo9g"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from getpass import getpass\n",
"\n",
"openai_api_key = getpass('Enter your OpenAI API key')"
],
"metadata": {
"id": "Fr8-nJyC9uw-"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from openai import OpenAI\n",
"\n",
"# Instantiate the OpenAI client\n",
"client = OpenAI(api_key=openai_api_key)"
],
"metadata": {
"id": "fE6S3p0o2Wda"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 2. Create the Assistant"
],
"metadata": {
"id": "X9UtWarZ9Tah"
}
},
{
"cell_type": "code",
"source": [
"# Create the Assistant\n",
"assistant = client.beta.assistants.create(\n",
" name=\"Financial assistant\",\n",
" instructions=\"You are a financial assistant. You help users analyze and understand businesses like Warren Buffett does.\",\n",
" tools=[{\"type\": \"retrieval\"}],\n",
" model=\"gpt-4-1106-preview\",\n",
")"
],
"metadata": {
"id": "wDJPtd_Y9RoV"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 3. Download and parse file\n",
"For our example, we use Apple's quarterly report (10-Q) from September 2023, as this is past GPT's new cutoff date of April 2023."
],
"metadata": {
"id": "o0dA9dD5NPRl"
}
},
{
"cell_type": "code",
"source": [
"import requests\n",
"from bs4 import BeautifulSoup\n",
"\n",
"# Apple's Q3 2023 quarterly report\n",
"file_url = \"https://www.sec.gov/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htm\"\n",
"\n",
"# Download the report\n",
"response = requests.get(file_url, headers={'User-Agent': 'Mozilla/5.0'})\n",
"\n",
"# Parse the report, which is originally HTML\n",
"soup = BeautifulSoup(response.content, 'html.parser')\n",
"\n",
"# Extract the text from the parsed HTML\n",
"text = soup.get_text()"
],
"metadata": {
"id": "EMJ6kCKSNtmJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 4. Upload the file to OpenAI\n",
"The files API expects one of the following [supported files](https://platform.openai.com/docs/assistants/tools/supported-files). This means that we need to convert our parsed `text` into a file. For simplicity, we convert our parsed `text` into a `.txt` file, but you can also convert it into a different file format like `.pdf`."
],
"metadata": {
"id": "GIYcGdck-Zo5"
}
},
{
"cell_type": "code",
"source": [
"if response.status_code == 200:\n",
" # Save the quarterly report to a .txt file\n",
" with open('aapl_Q3-2023_10Q.txt', 'w', encoding='utf-8') as f:\n",
" f.write(text)\n",
"\n",
" # Upload the .txt file to OpenAI's files endpont\n",
" with open('aapl_Q3-2023_10Q.txt', 'rb') as f:\n",
" file_response = client.files.create(\n",
" file=f,\n",
" purpose=\"assistants\", # our file will be used by our assistant\n",
" )"
],
"metadata": {
"id": "sPsbfsHY9VVw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 5. Attach file to Assistant\n",
"After uploading + creating the file in step #4 above, we now need to attach the file to our Assistant to create an [assistant file](https://platform.openai.com/docs/api-reference/assistants/file-object)."
],
"metadata": {
"id": "ZE_5esJj_M9x"
}
},
{
"cell_type": "code",
"source": [
"assistant_file = client.beta.assistants.files.create(\n",
" assistant_id=assistant.id, # our assistant\n",
" file_id=file_response.id, # the file we uploaded\n",
")"
],
"metadata": {
"id": "JXbs5iYNPGTU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 6. Create a Thread\n",
"A Thread represents a conversation. OpenAI recommends creating one Thread per user as soon as the user initiates the conversation. We can pass any user-specific context and files in this thread by creating Messages [(learn more)](https://platform.openai.com/docs/assistants/overview/step-2-create-a-thread)."
],
"metadata": {
"id": "5ydDX7RH_4wE"
}
},
{
"cell_type": "code",
"source": [
"thread = client.beta.threads.create(\n",
" messages=[\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": \"What was Apple's revenue, net income, and free cash flow in Q3 2023?\",\n",
" \"file_ids\": [assistant_file.id]\n",
" }\n",
" ]\n",
")"
],
"metadata": {
"id": "Ca8l4M3EPVVI"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 7. Run the Assistant and Thread\n",
"For the Assistant to respond to the user message, we need to create a `Run`. This makes the `Assistant` read the `Thread` and decide whether to call tools or simply use the model to best answer the user query.\n",
"\n",
"We can optionally pass additional instructions to the Assistant while creating the `Run`."
],
"metadata": {
"id": "a55Q-bHlAaD2"
}
},
{
"cell_type": "code",
"source": [
"run = client.beta.threads.runs.create(\n",
" thread_id=thread.id,\n",
" assistant_id=assistant.id,\n",
" instructions=\"Please answer the user's query as Warren Buffett would.\"\n",
")"
],
"metadata": {
"id": "LlMlgcywUfMr"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 8. Retrieve the Run's status\n",
"By default, when we create a `Run`, its initial status will be `\"queued\"`. We can periodically retrieve the `Run` to check on its status and see if it has moved to `\"completed\"`."
],
"metadata": {
"id": "Y-4fFN_HBBQZ"
}
},
{
"cell_type": "code",
"source": [
"import time\n",
"\n",
"while run.status != \"completed\":\n",
" # Retrieve the Run\n",
" run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)\n",
" # Print the status of the run\n",
" print(f\"Run status: {run.status}\")\n",
" # Delay retrieval of status by 1 second\n",
" time.sleep(1)"
],
"metadata": {
"id": "_4dp-rA4Ux8x"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### 9. Print the output"
],
"metadata": {
"id": "AwUwxrRyA6ei"
}
},
{
"cell_type": "code",
"source": [
"# Get the messages from the Thread\n",
"thread_messages = client.beta.threads.messages.list(thread.id)\n",
"\n",
"# Loop through the messages in the Thread and print their content\n",
"for message in thread_messages:\n",
" for content in message.content:\n",
" print(f\"{content.text.value}\\n\\n\")"
],
"metadata": {
"id": "vrfYTpj3VznS"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment