Skip to content

Instantly share code, notes, and snippets.

@Daethyra
Last active October 5, 2023 18:33
Show Gist options
  • Save Daethyra/0e3f515a41d78d89babbea00c057b8d2 to your computer and use it in GitHub Desktop.
Save Daethyra/0e3f515a41d78d89babbea00c057b8d2 to your computer and use it in GitHub Desktop.
langchain-embeddings-retrieval-agent.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/Daethyra/0e3f515a41d78d89babbea00c057b8d2/langchain-embeddings-retrieval-agent.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TXC2wBpCU9f7"
},
"source": [
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/langchain-retrieval-agent.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bhWwrfbbVGOA"
},
"source": [
"#### [LangChain Handbook](https://pinecone.io/learn/langchain)\n",
"\n",
"# Retrieval Agents\n",
"\n",
"We've seen in previous chapters how powerful [retrieval augmentation](https://www.pinecone.io/learn/langchain-retrieval-augmentation/) and [conversational agents](https://www.pinecone.io/learn/langchain-agents/) can be. They become even more impressive when we begin using them together.\n",
"\n",
"Conversational agents can struggle with data freshness, knowledge about specific domains, or accessing internal documentation. By coupling agents with retrieval augmentation tools we no longer have these problems.\n",
"\n",
"One the other side, using \"naive\" retrieval augmentation without the use of an agent means we will retrieve contexts with *every* query. Again, this isn't always ideal as not every query requires access to external knowledge.\n",
"\n",
"Merging these methods gives us the best of both worlds. In this notebook we'll learn how to do this.\n",
"\n",
"[![Open full notebook](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/full-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)\n",
"\n",
"To begin, we must install the prerequisite libraries that we will be using in this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pva9ehKXUpU2",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "21af5614-b078-415d-8aa3-9efd125b4757"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/72.0 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m71.7/72.0 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m72.0/72.0 kB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m177.2/177.2 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m770.9/770.9 kB\u001b[0m \u001b[31m9.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m12.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.5/62.5 kB\u001b[0m \u001b[31m5.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m300.4/300.4 kB\u001b[0m \u001b[31m16.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m20.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m30.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m16.4/16.4 MB\u001b[0m \u001b[31m20.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m34.9/34.9 MB\u001b[0m \u001b[31m28.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m224.5/224.5 kB\u001b[0m \u001b[31m25.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m223.6/223.6 kB\u001b[0m \u001b[31m25.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m223.0/223.0 kB\u001b[0m \u001b[31m23.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.0/218.0 kB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.0/218.0 kB\u001b[0m \u001b[31m24.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.7/211.7 kB\u001b[0m \u001b[31m23.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.8/341.8 kB\u001b[0m \u001b[31m30.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m73.4/73.4 kB\u001b[0m \u001b[31m9.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.1/11.1 MB\u001b[0m \u001b[31m42.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m143.4/143.4 kB\u001b[0m \u001b[31m17.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m121.4/121.4 kB\u001b[0m \u001b[31m15.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m120.3/120.3 kB\u001b[0m \u001b[31m15.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.6/115.6 kB\u001b[0m \u001b[31m14.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.5/115.5 kB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.1/115.1 kB\u001b[0m \u001b[31m14.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.6/114.6 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"google-cloud-bigquery 3.10.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-bigquery-connection 1.12.1 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-bigquery-connection 1.12.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-bigquery-storage 2.22.0 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-bigquery-storage 2.22.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-datastore 2.15.2 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-datastore 2.15.2 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-firestore 2.11.1 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-firestore 2.11.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-functions 1.13.3 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-functions 1.13.3 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-language 2.9.1 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-language 2.9.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-translate 3.11.3 requires google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.10.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3.0.0dev,>=1.34.0, but you have google-api-core 2.8.2 which is incompatible.\n",
"google-cloud-translate 3.11.3 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.1.1 which is incompatible.\n",
"grpc-google-iam-v1 0.12.6 requires protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"pandas-gbq 0.17.9 requires pyarrow<10.0dev,>=3.0.0, but you have pyarrow 11.0.0 which is incompatible.\n",
"tensorboard 2.13.0 requires protobuf>=3.19.6, but you have protobuf 3.19.3 which is incompatible.\n",
"tensorflow 2.13.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.19.3 which is incompatible.\n",
"tensorflow-datasets 4.9.3 requires protobuf>=3.20, but you have protobuf 3.19.3 which is incompatible.\n",
"tensorflow-hub 0.14.0 requires protobuf>=3.19.6, but you have protobuf 3.19.3 which is incompatible.\n",
"tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3, but you have protobuf 3.19.3 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m"
]
}
],
"source": [
"!pip install -qU \\\n",
" openai==0.27.7 \\\n",
" \"pinecone-client[grpc]\"==2.2.1 \\\n",
" pinecone-datasets==0.5.1 \\\n",
" langchain==0.0.162 \\\n",
" tiktoken==0.4.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZTgrOQziXUto"
},
"source": [
"## Building the Knowledge Base"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qNyRsz0ZXXaq"
},
"source": [
"We will download a pre-embedded dataset from `pinecone-datasets`. Allowing us to skip the embedding and preprocessing steps, if you'd rather work through those steps you can find the [full notebook here](https://github.com/pinecone-io/examples/blob/master/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "laSDMjqQXuj-",
"outputId": "dfd8f0b5-5043-4802-fd81-78ac245f16e6"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" id \\\n",
"0 5733be284776f41900661182 \n",
"1 5733bf84d058e614000b61be \n",
"2 5733bed24776f41900661188 \n",
"3 5733a6424776f41900660f51 \n",
"4 5733a70c4776f41900660f64 \n",
"\n",
" values sparse_values \\\n",
"0 [-0.010262451963272523, 0.02222637996192584, -... None \n",
"1 [-0.009786712423983223, -0.013988726438873078,... None \n",
"2 [0.013343917696606181, -0.0007001232846109822,... None \n",
"3 [-0.0085222901071539, 0.004399558219521822, -0... None \n",
"4 [-0.006695996885869355, -0.02067068565761649, ... None \n",
"\n",
" metadata blob \n",
"0 {'text': 'Architecturally, the school has a Ca... None \n",
"1 {'text': 'As at most other universities, Notre... None \n",
"2 {'text': 'The university is the major seat of ... None \n",
"3 {'text': 'The College of Engineering was estab... None \n",
"4 {'text': 'All of Notre Dame's undergraduate st... None "
],
"text/html": [
"\n",
" <div id=\"df-0fdba9f0-1dc5-4930-aa6a-45073c217038\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>values</th>\n",
" <th>sparse_values</th>\n",
" <th>metadata</th>\n",
" <th>blob</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5733be284776f41900661182</td>\n",
" <td>[-0.010262451963272523, 0.02222637996192584, -...</td>\n",
" <td>None</td>\n",
" <td>{'text': 'Architecturally, the school has a Ca...</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5733bf84d058e614000b61be</td>\n",
" <td>[-0.009786712423983223, -0.013988726438873078,...</td>\n",
" <td>None</td>\n",
" <td>{'text': 'As at most other universities, Notre...</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5733bed24776f41900661188</td>\n",
" <td>[0.013343917696606181, -0.0007001232846109822,...</td>\n",
" <td>None</td>\n",
" <td>{'text': 'The university is the major seat of ...</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5733a6424776f41900660f51</td>\n",
" <td>[-0.0085222901071539, 0.004399558219521822, -0...</td>\n",
" <td>None</td>\n",
" <td>{'text': 'The College of Engineering was estab...</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5733a70c4776f41900660f64</td>\n",
" <td>[-0.006695996885869355, -0.02067068565761649, ...</td>\n",
" <td>None</td>\n",
" <td>{'text': 'All of Notre Dame's undergraduate st...</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0fdba9f0-1dc5-4930-aa6a-45073c217038')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-0fdba9f0-1dc5-4930-aa6a-45073c217038 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-0fdba9f0-1dc5-4930-aa6a-45073c217038');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-302387b5-69bc-4657-8569-e2dde97c1a10\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-302387b5-69bc-4657-8569-e2dde97c1a10')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-302387b5-69bc-4657-8569-e2dde97c1a10 button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 2
}
],
"source": [
"from pinecone_datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"squad-text-embedding-ada-002\")\n",
"dataset.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "K5Q16wRH9SmO",
"outputId": "fdf2947f-a270-4417-a02b-057e62356dfe"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"18891"
]
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"len(dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c3-Plec39SmO"
},
"source": [
"We'll format the dataset ready for upsert and reduce what we use to a subset of the full dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "4CW5mNi89SmO",
"outputId": "b7485e0d-1aa6-4f2b-840e-fdef19d58ffc"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" id \\\n",
"0 5733be284776f41900661182 \n",
"1 5733bf84d058e614000b61be \n",
"2 5733bed24776f41900661188 \n",
"3 5733a6424776f41900660f51 \n",
"4 5733a70c4776f41900660f64 \n",
"\n",
" values \\\n",
"0 [-0.010262451963272523, 0.02222637996192584, -... \n",
"1 [-0.009786712423983223, -0.013988726438873078,... \n",
"2 [0.013343917696606181, -0.0007001232846109822,... \n",
"3 [-0.0085222901071539, 0.004399558219521822, -0... \n",
"4 [-0.006695996885869355, -0.02067068565761649, ... \n",
"\n",
" metadata \n",
"0 {'text': 'Architecturally, the school has a Ca... \n",
"1 {'text': 'As at most other universities, Notre... \n",
"2 {'text': 'The university is the major seat of ... \n",
"3 {'text': 'The College of Engineering was estab... \n",
"4 {'text': 'All of Notre Dame's undergraduate st... "
],
"text/html": [
"\n",
" <div id=\"df-ae493f02-d341-44ae-b022-ef7c62f7ae4d\" class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>id</th>\n",
" <th>values</th>\n",
" <th>metadata</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5733be284776f41900661182</td>\n",
" <td>[-0.010262451963272523, 0.02222637996192584, -...</td>\n",
" <td>{'text': 'Architecturally, the school has a Ca...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5733bf84d058e614000b61be</td>\n",
" <td>[-0.009786712423983223, -0.013988726438873078,...</td>\n",
" <td>{'text': 'As at most other universities, Notre...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5733bed24776f41900661188</td>\n",
" <td>[0.013343917696606181, -0.0007001232846109822,...</td>\n",
" <td>{'text': 'The university is the major seat of ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5733a6424776f41900660f51</td>\n",
" <td>[-0.0085222901071539, 0.004399558219521822, -0...</td>\n",
" <td>{'text': 'The College of Engineering was estab...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5733a70c4776f41900660f64</td>\n",
" <td>[-0.006695996885869355, -0.02067068565761649, ...</td>\n",
" <td>{'text': 'All of Notre Dame's undergraduate st...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <div class=\"colab-df-buttons\">\n",
"\n",
" <div class=\"colab-df-container\">\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ae493f02-d341-44ae-b022-ef7c62f7ae4d')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
" <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" .colab-df-buttons div {\n",
" margin-bottom: 4px;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-ae493f02-d341-44ae-b022-ef7c62f7ae4d button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-ae493f02-d341-44ae-b022-ef7c62f7ae4d');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
"\n",
"\n",
"<div id=\"df-2794b672-1cdf-41a1-8f1b-4d58ca5d2ecb\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-2794b672-1cdf-41a1-8f1b-4d58ca5d2ecb')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" --bg-color: #E8F0FE;\n",
" --fill-color: #1967D2;\n",
" --hover-bg-color: #E2EBFA;\n",
" --hover-fill-color: #174EA6;\n",
" --disabled-fill-color: #AAA;\n",
" --disabled-bg-color: #DDD;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" --bg-color: #3B4455;\n",
" --fill-color: #D2E3FC;\n",
" --hover-bg-color: #434B5C;\n",
" --hover-fill-color: #FFFFFF;\n",
" --disabled-bg-color: #3B4455;\n",
" --disabled-fill-color: #666;\n",
" }\n",
"\n",
" .colab-df-quickchart {\n",
" background-color: var(--bg-color);\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: var(--fill-color);\n",
" height: 32px;\n",
" padding: 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: var(--hover-bg-color);\n",
" box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: var(--button-hover-fill-color);\n",
" }\n",
"\n",
" .colab-df-quickchart-complete:disabled,\n",
" .colab-df-quickchart-complete:disabled:hover {\n",
" background-color: var(--disabled-bg-color);\n",
" fill: var(--disabled-fill-color);\n",
" box-shadow: none;\n",
" }\n",
"\n",
" .colab-df-spinner {\n",
" border: 2px solid var(--fill-color);\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" animation:\n",
" spin 1s steps(1) infinite;\n",
" }\n",
"\n",
" @keyframes spin {\n",
" 0% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" border-left-color: var(--fill-color);\n",
" }\n",
" 20% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 30% {\n",
" border-color: transparent;\n",
" border-left-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 40% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-top-color: var(--fill-color);\n",
" }\n",
" 60% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" }\n",
" 80% {\n",
" border-color: transparent;\n",
" border-right-color: var(--fill-color);\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" 90% {\n",
" border-color: transparent;\n",
" border-bottom-color: var(--fill-color);\n",
" }\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const quickchartButtonEl =\n",
" document.querySelector('#' + key + ' button');\n",
" quickchartButtonEl.disabled = true; // To prevent multiple clicks.\n",
" quickchartButtonEl.classList.add('colab-df-spinner');\n",
" try {\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" } catch (error) {\n",
" console.error('Error during call to suggestCharts:', error);\n",
" }\n",
" quickchartButtonEl.classList.remove('colab-df-spinner');\n",
" quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
" }\n",
" (() => {\n",
" let quickchartButtonEl =\n",
" document.querySelector('#df-2794b672-1cdf-41a1-8f1b-4d58ca5d2ecb button');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
" })();\n",
" </script>\n",
"</div>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"# we drop sparse_values as they are not needed for this example\n",
"dataset.documents.drop(['sparse_values', 'blob'], axis=1, inplace=True)\n",
"\n",
"dataset.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "B2_Pt7N6Zg2X"
},
"source": [
"## Vector Database"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JQTfOTR6aBRS"
},
"source": [
"Next we initialize the vector database. For this we need a [free API key](https://app.pinecone.io/), then we create the index:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lgfywcQj9SmP"
},
"outputs": [],
"source": [
"index_name = 'langchain-retrieval-agent-fast'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C3wrG-9yaJel"
},
"outputs": [],
"source": [
"import pinecone\n",
"import os\n",
"\n",
"# Load Pinecone API key\n",
"api_key = os.getenv('PINECONE_API_KEY') or 'api_key'\n",
"# Set Pinecone environment. Find next to API key in console\n",
"env = os.getenv('PINECONE_ENVIRONMENT') or \"us-central1-gcp\"\n",
"\n",
"pinecone.init(api_key=api_key, environment=env)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "D5WT4PAN9SmP"
},
"outputs": [],
"source": [
"import time\n",
"\n",
"if index_name in pinecone.list_indexes():\n",
" pinecone.delete_index(index_name)\n",
"\n",
"# we create a new index\n",
"pinecone.create_index(\n",
" name=index_name,\n",
" metric='dotproduct',\n",
" dimension=1536 # 1536 dim of text-embedding-ada-002\n",
")\n",
"\n",
"# wait for index to be initialized\n",
"while not pinecone.describe_index(index_name).status['ready']:\n",
" time.sleep(1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uiSWrAQ5aRco"
},
"source": [
"Then connect to the index:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bfsfuFmqaS4G",
"outputId": "45f17443-b87a-4682-ab44-6cfd6efdc46c"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'dimension': 1536,\n",
" 'index_fullness': 0.0,\n",
" 'namespaces': {},\n",
" 'total_vector_count': 0}"
]
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"index = pinecone.GRPCIndex(index_name)\n",
"index.describe_index_stats()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QbDTrvvm9SmP"
},
"source": [
"We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.\n",
"\n",
"Now we upsert the data to Pinecone:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 98,
"referenced_widgets": [
"d7b2791e5f3d4c68b02da4123f715a72",
"e4e2a2e10c684ac7bbf102bad235464f",
"5290c01d786b4baf8b5e9adfe1a5befe",
"f073c54fdece48c0817f21f0970621d9",
"c593df22c7294a078a8d036d10e1c117",
"2a15af3253884c7cb97c6c6f3dd21e3f",
"bb3e80cb30214f6b80c4316606761d34",
"1a90f0bf4c8e4aabb8e346c3d9cfdff6",
"6e332262dd2944a68e3415bc827f1407",
"ded11ea7cc6b4c8aa6acbe8d03a6f742",
"bd0b82bd40b0418a8e326bec7e1cfe9e",
"2868c074bd55491a92000c7cd363ce6b",
"be04454d283147d79e353c1ab24b8573",
"89c1ae7c90004a6bae1e5aeedb19fa8c",
"b22dd946e0ac4a0abfb27efcf811c790",
"dd464906d8ab4900916b35dd3e779d46",
"4b2dd63f4b5e4a40ab5ec52826cc5bb3",
"87258691c1e041219045522dcf52bc52",
"9f125e0287ed46ebba34a0d26cdfb8cc",
"795dabbc25bb426c8756a36b5778f572",
"d9dd4543607840bfb4e813e801549c66",
"7e51d478a2e14ff09853d93c102ff40f"
]
},
"id": "AhDcbRGTaWPi",
"outputId": "14b0b058-fa02-4078-83b9-7c3067edf613"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"sending upsert requests: 0%| | 0/18891 [00:00<?, ?it/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "d7b2791e5f3d4c68b02da4123f715a72"
}
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"collecting async responses: 0%| | 0/148 [00:00<?, ?it/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "2868c074bd55491a92000c7cd363ce6b"
}
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"upserted_count: 18891"
]
},
"metadata": {},
"execution_count": 13
}
],
"source": [
"index.upsert_from_dataframe(dataset.documents, batch_size=128)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jDUnLdy1b7G1"
},
"source": [
"We've indexed everything, now we can check the number of vectors in our index like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SiccGZKAb_Qo",
"outputId": "c0e1ad44-f0a2-48b3-b1b6-0a1f35c83102"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'dimension': 1536,\n",
" 'index_fullness': 0.1,\n",
" 'namespaces': {'': {'vector_count': 18891}},\n",
" 'total_vector_count': 18891}"
]
},
"metadata": {},
"execution_count": 14
}
],
"source": [
"index.describe_index_stats()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b-3oolT5cCR8"
},
"source": [
"## Creating a Vector Store and Querying"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-og9Vt_-9SmQ"
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"openai_api_key = os.getenv('OPENAI_API_KEY') or 'sk-'\n",
"model_name = 'text-embedding-ada-002'\n",
"\n",
"embed = OpenAIEmbeddings(\n",
" model=model_name,\n",
" openai_api_key=openai_api_key\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DcZ12U06cCH5"
},
"source": [
"Now that we've build our index we can switch back over to LangChain. We start by initializing a vector store using the same index we just built. We do that like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0MBJ477-cFNw"
},
"outputs": [],
"source": [
"from langchain.vectorstores import Pinecone\n",
"\n",
"text_field = \"text\"\n",
"\n",
"# switch back to normal index for langchain\n",
"index = pinecone.Index(index_name)\n",
"\n",
"vectorstore = Pinecone(\n",
" index, embed.embed_query, text_field\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3K3xRthWcXzW"
},
"source": [
"As in previous examples, we can use the `similarity_search` method to do a pure semantic search (without the generation component)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uITMZtzschJF",
"outputId": "e6fce934-0927-4710-b60e-33cb9b2018b1"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[Document(page_content='Episcopalians and Presbyterians, as well as other WASPs, tend to be considerably wealthier and better educated (having graduate and post-graduate degrees per capita) than most other religious groups in United States, and are disproportionately represented in the upper reaches of American business, law and politics, especially the Republican Party. Numbers of the most wealthy and affluent American families as the Vanderbilts and the Astors, Rockefeller, Du Pont, Roosevelt, Forbes, Whitneys, the Morgans and Harrimans are Mainline Protestant families.', metadata={'title': 'Protestantism'}),\n",
" Document(page_content='Yale has had many financial supporters, but some stand out by the magnitude or timeliness of their contributions. Among those who have made large donations commemorated at the university are: Elihu Yale; Jeremiah Dummer; the Harkness family (Edward, Anna, and William); the Beinecke family (Edwin, Frederick, and Walter); John William Sterling; Payne Whitney; Joseph E. Sheffield, Paul Mellon, Charles B. G. Murphy and William K. Lanman. The Yale Class of 1954, led by Richard Gilder, donated $70 million in commemoration of their 50th reunion. Charles B. Johnson, a 1954 graduate of Yale College, pledged a $250 million gift in 2013 to support of the construction of two new residential colleges.', metadata={'title': 'Yale_University'}),\n",
" Document(page_content='During a panel discussion at Harvard University\\'s reunion for African American alumni during the 2003–04 academic year, two prominent black professors at the institution—Lani Guinier and Henry Louis Gates—pointed out an unintended effect of affirmative action policies at Harvard. They stated that only about a third of black Harvard undergraduates were from families in which all four grandparents were born into the African American community. The majority of black students at Harvard were Caribbean and African immigrants or their children, with some others the mixed-race children of biracial couples. One Harvard student, born in the South Bronx to a black family whose ancestors have been in the United States for multiple generations, said that there were so few Harvard students from the historic African American community that they took to calling themselves \"the descendants\" (i.e., descendants of American slaves). The reasons for this underrepresentation of historic African Americans, and possible remedies, remain a subject of debate.', metadata={'title': 'Affirmative_action_in_the_United_States'}),\n",
" Document(page_content='During the Gilded Age, there was substantial growth in population in the United States and extravagant displays of wealth and excess of America\\'s upper-class during the post-Civil War and post-Reconstruction era, in the late 19th century. The wealth polarization derived primarily from industrial and population expansion. The businessmen of the Second Industrial Revolution created industrial towns and cities in the Northeast with new factories, and contributed to the creation of an ethnically diverse industrial working class which produced the wealth owned by rising super-rich industrialists and financiers called the \"robber barons\". An example is the company of John D. Rockefeller, who was an important figure in shaping the new oil industry. Using highly effective tactics and aggressive practices, later widely criticized, Standard Oil absorbed or destroyed most of its competition.', metadata={'title': 'Modern_history'}),\n",
" Document(page_content='In the United States, two of the wealthiest nonprofit organizations are the Bill and Melinda Gates Foundation, which has an endowment of US$38 billion, and the Howard Hughes Medical Institute originally funded by Hughes Aircraft prior to divestiture, which has an endowment of approximately $14.8 billion. Outside the United States, another large NPO is the British Wellcome Trust, which is a \"charity\" by British usage. See: List of wealthiest foundations. Note that this assessment excludes universities, at least a few of which have assets in the tens of billions of dollars. For example; List of U.S. colleges and universities by endowment.', metadata={'title': 'Nonprofit_organization'})]"
]
},
"metadata": {},
"execution_count": 22
}
],
"source": [
"query = \"What universities had the most intergenerational wealth?\"\n",
"\n",
"vectorstore.similarity_search(\n",
" query, # our search query\n",
" k=5 # return 3 most relevant docs\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-zGF6YsgczqT"
},
"source": [
"Looks like we're getting good results. Let's take a look at how we can begin integrating this into a conversational agent."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tFsIOm73dcOI"
},
"source": [
"## Initializing the Conversational Agent"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XMv6TXWkdfNR"
},
"source": [
"Our conversational agent needs a Chat LLM, conversational memory, and a `RetrievalQA` chain to initialize. We create these using:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zMRs9Klic5-Y"
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains.conversation.memory import ConversationBufferWindowMemory\n",
"from langchain.chains import RetrievalQA\n",
"\n",
"# chat completion llm\n",
"llm = ChatOpenAI(\n",
" openai_api_key=openai_api_key,\n",
" model_name='gpt-3.5-turbo',\n",
" temperature=0.0\n",
")\n",
"# conversational memory\n",
"conversational_memory = ConversationBufferWindowMemory(\n",
" memory_key='chat_history',\n",
" k=5,\n",
" return_messages=True\n",
")\n",
"# retrieval qa chain\n",
"qa = RetrievalQA.from_chain_type(\n",
" llm=llm,\n",
" chain_type=\"stuff\",\n",
" retriever=vectorstore.as_retriever()\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-ySfWyZLdboX"
},
"source": [
"Using these we can generate an answer using the `run` method:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 105
},
"id": "LaYSq0V-dxHw",
"outputId": "1f7b2862-6dd4-450e-b2a5-3bb23a3a9530"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'Based on the provided context, Yale University is mentioned as having received significant donations from wealthy individuals and families, such as Elihu Yale, the Harkness family, the Beinecke family, John William Sterling, Payne Whitney, Joseph E. Sheffield, Paul Mellon, Charles B. G. Murphy, William K. Lanman, and the Yale Class of 1954. These donations suggest a strong presence of intergenerational wealth at Yale University. However, it is important to note that this information does not provide a comprehensive ranking of universities based on intergenerational wealth.'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
}
},
"metadata": {},
"execution_count": 24
}
],
"source": [
"qa.run(query)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DtSXR5RXdyU0"
},
"source": [
"But this isn't yet ready for our conversational agent. For that we need to convert this retrieval chain into a tool. We do that like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FwCYrS4duqBW"
},
"outputs": [],
"source": [
"from langchain.agents import Tool\n",
"\n",
"tools = [\n",
" Tool(\n",
" name='Knowledge Base',\n",
" func=qa.run,\n",
" description=(\n",
" 'use this tool when answering general knowledge queries to get '\n",
" 'more information about the topic'\n",
" )\n",
" )\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wXi_0ipTvM_l"
},
"source": [
"Now we can initialize the agent like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JaKTzPUEvOoy"
},
"outputs": [],
"source": [
"from langchain.agents import initialize_agent\n",
"\n",
"agent = initialize_agent(\n",
" agent='chat-conversational-react-description',\n",
" tools=tools,\n",
" llm=llm,\n",
" verbose=True,\n",
" max_iterations=3,\n",
" early_stopping_method='generate',\n",
" memory=conversational_memory\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WbXl-AzVvszB"
},
"source": [
"With that our retrieval augmented conversational agent is ready and we can begin using it."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IlxUBWKcvzeP"
},
"source": [
"### Using the Conversational Agent"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZZapCP4Pv2kz"
},
"source": [
"To make queries we simply call the `agent` directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RJoAhy76vzAB",
"outputId": "62d6f4b2-42c1-485f-d9fc-51866f2488b2"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Knowledge Base\",\n",
" \"action_input\": \"Universities with the most intergenerational wealth\"\n",
"}\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mI don't have specific information on universities with the most intergenerational wealth. However, some universities in the United States have significant endowments, which can contribute to their overall wealth. Examples of universities with large endowments include Harvard University, Stanford University, and Princeton University.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'input': 'What universities had the most intergenerational wealth?',\n",
" 'chat_history': [],\n",
" 'output': 'Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.'}"
]
},
"metadata": {},
"execution_count": 27
}
],
"source": [
"agent(query)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YcMqa9Va2hU6"
},
"source": [
"Looks great, now what if we ask it a non-general knowledge question?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "85vipqC02deV",
"outputId": "96dfbebf-82ba-420b-ee4c-de1abea670c3"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"The product of 2 multiplied by 7 is 14.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'input': 'what is 2 * 7?',\n",
" 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),\n",
" AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False)],\n",
" 'output': 'The product of 2 multiplied by 7 is 14.'}"
]
},
"metadata": {},
"execution_count": 28
}
],
"source": [
"agent(\"what is 2 * 7?\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gR_b0IN32rQ9"
},
"source": [
"Perfect, the agent is able to recognize that it doesn't need to refer to it's general knowledge tool for that question. Let's try some more questions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mQeicHTj2pmY",
"outputId": "af5b61d9-4d7d-45fe-c561-521256246acc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Knowledge Base\",\n",
" \"action_input\": \"legacy admissions\"\n",
"}\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mLegacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken into account in the holistic admissions process used by some universities in the United States.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken into account in the holistic admissions process used by some universities in the United States.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'input': 'can you tell me some facts about legacy admissions?',\n",
" 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),\n",
" AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False),\n",
" HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),\n",
" AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False)],\n",
" 'output': \"Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken into account in the holistic admissions process used by some universities in the United States.\"}"
]
},
"metadata": {},
"execution_count": 29
}
],
"source": [
"agent(\"can you tell me some facts about legacy admissions?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "G93vLXso3B5Z",
"outputId": "9a955254-4ec9-4c40-d65d-c822e963cdf7"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Knowledge Base\",\n",
" \"action_input\": \"Legacy admissions and their impact on the playing field in college admissions\"\n",
"}\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mLegacy admissions refer to the practice of giving preferential treatment to applicants who have family members, usually parents or grandparents, who attended the university in question. This practice is controversial because it can perpetuate social and economic advantages for certain groups of people, particularly those from wealthier backgrounds. \n",
"\n",
"Legacy admissions can have an impact on the playing field in college admissions by potentially disadvantaging applicants from underrepresented or disadvantaged backgrounds. By giving preference to legacy applicants, universities may be prioritizing the continuation of a privileged class rather than promoting diversity and equal opportunity. This can create a system where certain groups have a higher likelihood of gaining admission based on their family connections rather than their own merits.\n",
"\n",
"However, it is important to note that the impact of legacy admissions on the playing field is just one aspect of the broader debate on affirmative action and racial preferences in college admissions. The issue is complex and involves considerations of diversity, merit, and the mission of universities.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"Legacy admissions can have a negative impact on the playing field in college admissions by potentially disadvantaging applicants from underrepresented or disadvantaged backgrounds. By giving preferential treatment to legacy applicants, universities may prioritize the continuation of a privileged class rather than promoting diversity and equal opportunity.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'input': 'Teach a class of 7th graders how legacy admissions ruin the playing field.',\n",
" 'chat_history': [HumanMessage(content='What universities had the most intergenerational wealth?', additional_kwargs={}, example=False),\n",
" AIMessage(content='Some universities in the United States with large endowments include Harvard University, Stanford University, and Princeton University.', additional_kwargs={}, example=False),\n",
" HumanMessage(content='what is 2 * 7?', additional_kwargs={}, example=False),\n",
" AIMessage(content='The product of 2 multiplied by 7 is 14.', additional_kwargs={}, example=False),\n",
" HumanMessage(content='can you tell me some facts about legacy admissions?', additional_kwargs={}, example=False),\n",
" AIMessage(content=\"Legacy admissions refer to the practice of giving preferential treatment to applicants who have family members who attended the university in question. This means that if a student's parent, grandparent, or sibling attended the university, they may have a higher chance of being admitted compared to other applicants with similar qualifications. Legacy admissions are one of the factors that can be taken into account in the holistic admissions process used by some universities in the United States.\", additional_kwargs={}, example=False)],\n",
" 'output': 'Legacy admissions can have a negative impact on the playing field in college admissions by potentially disadvantaging applicants from underrepresented or disadvantaged backgrounds. By giving preferential treatment to legacy applicants, universities may prioritize the continuation of a privileged class rather than promoting diversity and equal opportunity.'}"
]
},
"metadata": {},
"execution_count": 30
}
],
"source": [
"agent(\"Teach a class of 7th graders how legacy admissions ruin the playing field.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PWivmw9F3bCw"
},
"source": [
"Looks great! We're also able to ask questions that refer to previous interactions in the conversation and the agent is able to refer to the conversation history to as a source of information.\n",
"\n",
"That's all for this example of building a retrieval augmented conversational agent with OpenAI and Pinecone (the OP stack) and LangChain.\n",
"\n",
"Once finished, we delete the Pinecone index to save resources:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Pa1whr8V3Wfm"
},
"outputs": [],
"source": [
"pinecone.delete_index(index_name)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ykg5TYA033yR"
},
"source": [
"---"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4",
"collapsed_sections": [
"bhWwrfbbVGOA"
],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"d7b2791e5f3d4c68b02da4123f715a72": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_e4e2a2e10c684ac7bbf102bad235464f",
"IPY_MODEL_5290c01d786b4baf8b5e9adfe1a5befe",
"IPY_MODEL_f073c54fdece48c0817f21f0970621d9"
],
"layout": "IPY_MODEL_c593df22c7294a078a8d036d10e1c117"
}
},
"e4e2a2e10c684ac7bbf102bad235464f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_2a15af3253884c7cb97c6c6f3dd21e3f",
"placeholder": "​",
"style": "IPY_MODEL_bb3e80cb30214f6b80c4316606761d34",
"value": "sending upsert requests: 100%"
}
},
"5290c01d786b4baf8b5e9adfe1a5befe": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_1a90f0bf4c8e4aabb8e346c3d9cfdff6",
"max": 18891,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_6e332262dd2944a68e3415bc827f1407",
"value": 18891
}
},
"f073c54fdece48c0817f21f0970621d9": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_ded11ea7cc6b4c8aa6acbe8d03a6f742",
"placeholder": "​",
"style": "IPY_MODEL_bd0b82bd40b0418a8e326bec7e1cfe9e",
"value": " 18891/18891 [00:05&lt;00:00, 4043.05it/s]"
}
},
"c593df22c7294a078a8d036d10e1c117": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"2a15af3253884c7cb97c6c6f3dd21e3f": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"bb3e80cb30214f6b80c4316606761d34": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"1a90f0bf4c8e4aabb8e346c3d9cfdff6": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"6e332262dd2944a68e3415bc827f1407": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"ded11ea7cc6b4c8aa6acbe8d03a6f742": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"bd0b82bd40b0418a8e326bec7e1cfe9e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"2868c074bd55491a92000c7cd363ce6b": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_be04454d283147d79e353c1ab24b8573",
"IPY_MODEL_89c1ae7c90004a6bae1e5aeedb19fa8c",
"IPY_MODEL_b22dd946e0ac4a0abfb27efcf811c790"
],
"layout": "IPY_MODEL_dd464906d8ab4900916b35dd3e779d46"
}
},
"be04454d283147d79e353c1ab24b8573": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_4b2dd63f4b5e4a40ab5ec52826cc5bb3",
"placeholder": "​",
"style": "IPY_MODEL_87258691c1e041219045522dcf52bc52",
"value": "collecting async responses: 100%"
}
},
"89c1ae7c90004a6bae1e5aeedb19fa8c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_9f125e0287ed46ebba34a0d26cdfb8cc",
"max": 148,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_795dabbc25bb426c8756a36b5778f572",
"value": 148
}
},
"b22dd946e0ac4a0abfb27efcf811c790": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_d9dd4543607840bfb4e813e801549c66",
"placeholder": "​",
"style": "IPY_MODEL_7e51d478a2e14ff09853d93c102ff40f",
"value": " 148/148 [00:00&lt;00:00, 1274.85it/s]"
}
},
"dd464906d8ab4900916b35dd3e779d46": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"4b2dd63f4b5e4a40ab5ec52826cc5bb3": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"87258691c1e041219045522dcf52bc52": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"9f125e0287ed46ebba34a0d26cdfb8cc": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"795dabbc25bb426c8756a36b5778f572": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"d9dd4543607840bfb4e813e801549c66": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"7e51d478a2e14ff09853d93c102ff40f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
}
}
},
"accelerator": "GPU"
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment