Skip to content

Instantly share code, notes, and snippets.

@virattt
Last active May 14, 2024 09:22
Show Gist options
  • Save virattt/491f52b19aeca90afc7d78ed165610fc to your computer and use it in GitHub Desktop.
Save virattt/491f52b19aeca90afc7d78ed165610fc to your computer and use it in GitHub Desktop.
rag-reranking-gpt.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/virattt/491f52b19aeca90afc7d78ed165610fc/rag-reranking-gpt.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# Install dependencies"
],
"metadata": {
"id": "S2mGQxA958dW"
}
},
{
"cell_type": "code",
"source": [
"pip install openai"
],
"metadata": {
"id": "2bY0NapN_z98"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lEQQJHH9gufm"
},
"outputs": [],
"source": [
"!pip install chromadb"
]
},
{
"cell_type": "code",
"source": [
"!pip install langchain"
],
"metadata": {
"id": "ygccK6lm54VT"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!pip install tiktoken"
],
"metadata": {
"id": "K5KyVC5O7Elw"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!pip install pypdf"
],
"metadata": {
"id": "_o1MOUo07GBO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import getpass\n",
"import os\n",
"\n",
"# Set your OpenAI API key\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass()"
],
"metadata": {
"id": "tavToGb_MJrc"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Download and prepare SEC filing"
],
"metadata": {
"id": "sz639zFf6JoK"
}
},
{
"cell_type": "code",
"source": [
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"# Load $ABNB's financial report. This may take 1-2 minutes since the PDF is large\n",
"sec_filing_pdf = \"https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf\"\n",
"\n",
"# Create your PDF loader\n",
"loader = PyPDFLoader(sec_filing_pdf)\n",
"\n",
"# Load the PDF document\n",
"documents = loader.load()\n",
"\n",
"# Chunk the financial report\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
],
"metadata": {
"id": "rIO5t-j7611h"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Load the SEC filing into vector store"
],
"metadata": {
"id": "iaYSqxiMLUGb"
}
},
{
"cell_type": "code",
"source": [
"from langchain_community.vectorstores import Chroma\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"# Load the document into Chroma\n",
"embedding_function = OpenAIEmbeddings()\n",
"db = Chroma.from_documents(docs, embedding_function)"
],
"metadata": {
"id": "QVZevdc-Md4N"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Query the vector store"
],
"metadata": {
"id": "m8HqBNyYrDHb"
}
},
{
"cell_type": "code",
"source": [
"query = \"What are the specific factors contributing to Airbnb's increased operational expenses in the last fiscal year?\"\n",
"docs = db.similarity_search(query)\n",
"docs"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3qZTrAtXLPl1",
"outputId": "135e4578-4734-45d2-f662-3428410f525f"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[Document(page_content='As we look forward, we recognize the potential impact of the challenging macroeconomic conditions, including inflation and rising interest rates, potential decreased consumer\\nspending, and the continued disruption of the COVID-19 pandemic on travel across the world.\\nWe believe we are well positioned for the road ahead due to our adaptability and relentless innovation. First, our business model is adaptable. We have nearly every type of space in\\nnearly every location, so however travel changes, we are able to adapt. Regardless of the economic environment, our guests come to Airbnb because they can find great value, and\\nour Hosts can earn extra income. Second, we’ve relentlessly innovated while also staying focused and disciplined. During the height of the pandemic, we made many difficult\\nchoices to reduce our spending, making us a leaner and more focused company, and we have kept this discipline ever since.\\nOur Long-Term Growth Strategy\\nOur strategy is to continue to invest in our key strengths:', metadata={'page': 5, 'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf'}),\n",
" Document(page_content='Table of Contents\\nWe need to continue to invest in the development of new offerings and initiatives that differentiate us from our competitors, such as Airbnb Experiences. Developing and delivering\\nthese new offerings and initiatives increase our expenses and our organizational complexity, and we may experience difficulties in developing and implementing these new offerings\\nand initiatives.\\nOur new offerings and initiatives have a high degree of risk, as they may involve unproven businesses with which we have limited or no prior development or operating experience.\\nThere can be no assurance that consumer demand for such offerings and initiatives will exist or be sustained at the levels that we anticipate, that we will be able to successfully\\nmanage the development and delivery of such offerings and initiatives, or that any of these offerings or initiatives will gain sufficient market acceptance to generate sufficient revenue', metadata={'page': 26, 'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf'}),\n",
" Document(page_content='Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising\\nspend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider\\nexpenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to\\nthe changes in the fair value of contingent consideration related to a 2019 acquisition.\\nGeneral and Administrative\\n2021 2022 % Change\\n(in millions, except percentages)\\nGeneral and administrative $ 836 $ 950 14 %\\nPercentage of revenue 14 % 11 %\\nGeneral and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3', metadata={'page': 62, 'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf'}),\n",
" Document(page_content='•Invest in our brand. We intend to continue to invest in our brand to educate new Hosts and guests on the benefits of Airbnb and the uniqueness of our offerings. We will\\ncontinue to leverage our brand through a cohesive and integrated marketing strategy punctuated by our two product launches per year.\\n•Expand our global network. We plan to expand our global network and continue to partner with communities to update laws and regulations for short-term rentals to allow\\nmore Hosts to join our platform.\\n•Design new products and offerings. Our innovations are focused on improving our Host and guest experiences, making Airbnb more accessible and appealing for new Hosts\\nand guests and driving increased engagement and loyalty with our existing community. We have made over 340 upgrades to our platform over the past two years, making it\\neven easier to host and guests to book on Airbnb.\\nOur Platform\\nOur Platform for Hosts', metadata={'page': 5, 'source': 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001559720/8a9ebed0-815a-469a-87eb-1767d21d8cec.pdf'})]"
]
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"source": [
"# Re-rank the results using GPT-4"
],
"metadata": {
"id": "0UMU-ogKM6w8"
}
},
{
"cell_type": "code",
"source": [
"from openai import OpenAI\n",
"\n",
"client = OpenAI(api_key=os.environ[\"OPENAI_API_KEY\"])\n",
"response = client.chat.completions.create(\n",
" model='gpt-4-1106-preview',\n",
" response_format={\"type\": \"json_object\"},\n",
" temperature=0,\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": \"You are an expert relevance ranker. Given a list of documents and a query, your job is to determine how relevant each document is for answering the query. Your output is JSON, which is a list of documents. Each document has two fields, content and relevance_score. relevance_score is from 0 to 100.0. Higher relevance means higher score. The list should be sorted by relevance_score, descending.\"},\n",
" {\"role\": \"user\", \"content\": f\"Query: {query} Docs: {docs}\"}\n",
" ]\n",
" )\n",
"\n"
],
"metadata": {
"id": "Z83h16UuMlMt"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import json\n",
"\n",
"scores = json.loads(response.choices[0].message.content)\n",
"print(json.dumps(scores, indent=2))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3fBaXTZuQBXs",
"outputId": "6b449a90-604b-4e6f-eef7-e1a6cebd5e40"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{\n",
" \"documents\": [\n",
" {\n",
" \"content\": \"Made Possible by Hosts, Strangers, AirCover, Categories, and OMG marketing campaigns and launches, a $67.9 million increase in our search engine marketing and advertising spend, a $25.1 million increase in payroll-related expenses due to growth in headcount and increase in compensation costs, a $22.0 million increase in third-party service provider expenses, and a $11.1 million increase in coupon expense in line with increase in revenue and launch of AirCover for guests, partially offset by a decrease of $22.9 million related to the changes in the fair value of contingent consideration related to a 2019 acquisition. General and Administrative 2021 2022 % Change (in millions, except percentages) General and administrative $ 836 $ 950 14 % Percentage of revenue 14 % 11 % General and administrative expense increased $114.0 million, or 14%, in 2022 compared to 2021, primarily due to an increase in other business and operational taxes of $41.3\",\n",
" \"relevance_score\": 95.0\n",
" },\n",
" {\n",
" \"content\": \"We need to continue to invest in the development of new offerings and initiatives that differentiate us from our competitors, such as Airbnb Experiences. Developing and delivering these new offerings and initiatives increase our expenses and our organizational complexity, and we may experience difficulties in developing and implementing these new offerings and initiatives. Our new offerings and initiatives have a high degree of risk, as they may involve unproven businesses with which we have limited or no prior development or operating experience. There can be no assurance that consumer demand for such offerings and initiatives will exist or be sustained at the levels that we anticipate, that we will be able to successfully manage the development and delivery of such offerings and initiatives, or that any of these offerings or initiatives will gain sufficient market acceptance to generate sufficient revenue\",\n",
" \"relevance_score\": 90.0\n",
" },\n",
" {\n",
" \"content\": \"\\u2022Invest in our brand. We intend to continue to invest in our brand to educate new Hosts and guests on the benefits of Airbnb and the uniqueness of our offerings. We will continue to leverage our brand through a cohesive and integrated marketing strategy punctuated by our two product launches per year. \\u2022Expand our global network. We plan to expand our global network and continue to partner with communities to update laws and regulations for short-term rentals to allow more Hosts to join our platform. \\u2022Design new products and offerings. Our innovations are focused on improving our Host and guest experiences, making Airbnb more accessible and appealing for new Hosts and guests and driving increased engagement and loyalty with our existing community. We have made over 340 upgrades to our platform over the past two years, making it even easier to host and guests to book on Airbnb. Our Platform for Hosts\",\n",
" \"relevance_score\": 85.0\n",
" },\n",
" {\n",
" \"content\": \"As we look forward, we recognize the potential impact of the challenging macroeconomic conditions, including inflation and rising interest rates, potential decreased consumer spending, and the continued disruption of the COVID-19 pandemic on travel across the world. We believe we are well positioned for the road ahead due to our adaptability and relentless innovation. First, our business model is adaptable. We have nearly every type of space in nearly every location, so however travel changes, we are able to adapt. Regardless of the economic environment, our guests come to Airbnb because they can find great value, and our Hosts can earn extra income. Second, we\\u2019ve relentlessly innovated while also staying focused and disciplined. During the height of the pandemic, we made many difficult choices to reduce our spending, making us a leaner and more focused company, and we have kept this discipline ever since. Our Long-Term Growth Strategy Our strategy is to continue to invest in our key strengths:\",\n",
" \"relevance_score\": 80.0\n",
" }\n",
" ]\n",
"}\n"
]
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "u84ePIKtjrtg"
},
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4,
"colab": {
"provenance": [],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment