Skip to content

Instantly share code, notes, and snippets.

@ugiacoman
Created April 20, 2023 05:24
Show Gist options
  • Save ugiacoman/a7e7fb250fe090bea92ed273716e930b to your computer and use it in GitHub Desktop.
Save ugiacoman/a7e7fb250fe090bea92ed273716e930b to your computer and use it in GitHub Desktop.
Talk to Kingfisher.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPHX8Ch21VGAH7tTcwdYNT9",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ugiacoman/a7e7fb250fe090bea92ed273716e930b/talk-to-kingfisher.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i9lDj1aZ9dNT",
"outputId": "c17b88a2-ad13-45f8-fafe-f969bd3feae4"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting langchain\n",
" Downloading langchain-0.0.144-py3-none-any.whl (578 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m578.5/578.5 kB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting deeplake\n",
" Downloading deeplake-3.2.22.tar.gz (457 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m457.4/457.4 kB\u001b[0m \u001b[31m19.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"Collecting openai\n",
" Downloading openai-0.27.4-py3-none-any.whl (70 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m70.3/70.3 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting tiktoken\n",
" Downloading tiktoken-0.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m40.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.9/dist-packages (from langchain) (8.2.2)\n",
"Collecting aiohttp<4.0.0,>=3.8.3\n",
" Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m23.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.9/dist-packages (from langchain) (1.22.4)\n",
"Requirement already satisfied: PyYAML>=5.4.1 in /usr/local/lib/python3.9/dist-packages (from langchain) (6.0)\n",
"Collecting async-timeout<5.0.0,>=4.0.0\n",
" Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
"Requirement already satisfied: numexpr<3.0.0,>=2.8.4 in /usr/local/lib/python3.9/dist-packages (from langchain) (2.8.4)\n",
"Collecting openapi-schema-pydantic<2.0,>=1.2\n",
" Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting dataclasses-json<0.6.0,>=0.5.7\n",
" Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)\n",
"Requirement already satisfied: pydantic<2,>=1 in /usr/local/lib/python3.9/dist-packages (from langchain) (1.10.7)\n",
"Collecting SQLAlchemy<2,>=1\n",
" Downloading SQLAlchemy-1.4.47-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m19.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.9/dist-packages (from langchain) (2.27.1)\n",
"Requirement already satisfied: pillow in /usr/local/lib/python3.9/dist-packages (from deeplake) (8.4.0)\n",
"Collecting boto3\n",
" Downloading boto3-1.26.116-py3-none-any.whl (135 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m135.6/135.6 kB\u001b[0m \u001b[31m3.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: click in /usr/local/lib/python3.9/dist-packages (from deeplake) (8.1.3)\n",
"Collecting pathos\n",
" Downloading pathos-0.3.0-py3-none-any.whl (79 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.8/79.8 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting humbug>=0.3.1\n",
" Downloading humbug-0.3.1-py3-none-any.whl (15 kB)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from deeplake) (4.65.0)\n",
"Collecting numcodecs\n",
" Downloading numcodecs-0.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.7/6.7 MB\u001b[0m \u001b[31m44.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting pyjwt\n",
" Downloading PyJWT-2.6.0-py3-none-any.whl (20 kB)\n",
"Collecting aioboto3==10.4.0\n",
" Downloading aioboto3-10.4.0-py3-none-any.whl (32 kB)\n",
"Requirement already satisfied: nest_asyncio in /usr/local/lib/python3.9/dist-packages (from deeplake) (1.5.6)\n",
"Collecting aiobotocore[boto3]==2.4.2\n",
" Downloading aiobotocore-2.4.2-py3-none-any.whl (66 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m66.8/66.8 kB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting botocore<1.27.60,>=1.27.59\n",
" Downloading botocore-1.27.59-py3-none-any.whl (9.1 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m9.1/9.1 MB\u001b[0m \u001b[31m52.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting aioitertools>=0.5.1\n",
" Downloading aioitertools-0.11.0-py3-none-any.whl (23 kB)\n",
"Requirement already satisfied: wrapt>=1.10.10 in /usr/local/lib/python3.9/dist-packages (from aiobotocore[boto3]==2.4.2->aioboto3==10.4.0->deeplake) (1.14.1)\n",
"Collecting boto3\n",
" Downloading boto3-1.24.59-py3-none-any.whl (132 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m132.5/132.5 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.9/dist-packages (from tiktoken) (2022.10.31)\n",
"Collecting multidict<7.0,>=4.5\n",
" Downloading multidict-6.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.2/114.2 kB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting yarl<2.0,>=1.0\n",
" Downloading yarl-1.8.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (264 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m264.6/264.6 kB\u001b[0m \u001b[31m12.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting frozenlist>=1.1.1\n",
" Downloading frozenlist-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (158 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m158.8/158.8 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.1.0)\n",
"Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.9/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (2.0.12)\n",
"Collecting aiosignal>=1.1.2\n",
" Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
"Collecting jmespath<2.0.0,>=0.7.1\n",
" Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)\n",
"Collecting s3transfer<0.7.0,>=0.6.0\n",
" Downloading s3transfer-0.6.0-py3-none-any.whl (79 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.6/79.6 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting typing-inspect>=0.4.0\n",
" Downloading typing_inspect-0.8.0-py3-none-any.whl (8.7 kB)\n",
"Collecting marshmallow<4.0.0,>=3.3.0\n",
" Downloading marshmallow-3.19.0-py3-none-any.whl (49 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 kB\u001b[0m \u001b[31m2.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting marshmallow-enum<2.0.0,>=1.5.1\n",
" Downloading marshmallow_enum-1.5.1-py2.py3-none-any.whl (4.2 kB)\n",
"Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.9/dist-packages (from pydantic<2,>=1->langchain) (4.5.0)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests<3,>=2->langchain) (2022.12.7)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests<3,>=2->langchain) (1.26.15)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests<3,>=2->langchain) (3.4)\n",
"Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.9/dist-packages (from SQLAlchemy<2,>=1->langchain) (2.0.2)\n",
"Requirement already satisfied: entrypoints in /usr/local/lib/python3.9/dist-packages (from numcodecs->deeplake) (0.4)\n",
"Collecting pox>=0.3.2\n",
" Downloading pox-0.3.2-py3-none-any.whl (29 kB)\n",
"Collecting dill>=0.3.6\n",
" Downloading dill-0.3.6-py3-none-any.whl (110 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m110.5/110.5 kB\u001b[0m \u001b[31m9.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting ppft>=1.7.6.6\n",
" Downloading ppft-1.7.6.6-py3-none-any.whl (52 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m52.8/52.8 kB\u001b[0m \u001b[31m4.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting multiprocess>=0.70.14\n",
" Downloading multiprocess-0.70.14-py39-none-any.whl (132 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m132.9/132.9 kB\u001b[0m \u001b[31m8.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.9/dist-packages (from botocore<1.27.60,>=1.27.59->aiobotocore[boto3]==2.4.2->aioboto3==10.4.0->deeplake) (2.8.2)\n",
"Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.9/dist-packages (from marshmallow<4.0.0,>=3.3.0->dataclasses-json<0.6.0,>=0.5.7->langchain) (23.1)\n",
"Collecting mypy-extensions>=0.3.0\n",
" Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.9/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.27.60,>=1.27.59->aiobotocore[boto3]==2.4.2->aioboto3==10.4.0->deeplake) (1.16.0)\n",
"Building wheels for collected packages: deeplake\n",
" Building wheel for deeplake (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for deeplake: filename=deeplake-3.2.22-py3-none-any.whl size=555349 sha256=0fdc4a42b59cfbc9bb478e05d434c48104559395e144b4ec34dccb407e68a0f5\n",
" Stored in directory: /root/.cache/pip/wheels/24/cb/09/2278354bd08b44c63bc98f9d6bb0d24d406f542357facf6d1c\n",
"Successfully built deeplake\n",
"Installing collected packages: SQLAlchemy, pyjwt, ppft, pox, numcodecs, mypy-extensions, multidict, marshmallow, jmespath, frozenlist, dill, async-timeout, aioitertools, yarl, typing-inspect, tiktoken, openapi-schema-pydantic, multiprocess, marshmallow-enum, humbug, botocore, aiosignal, s3transfer, pathos, dataclasses-json, aiohttp, openai, langchain, boto3, aiobotocore, aioboto3, deeplake\n",
" Attempting uninstall: SQLAlchemy\n",
" Found existing installation: SQLAlchemy 2.0.9\n",
" Uninstalling SQLAlchemy-2.0.9:\n",
" Successfully uninstalled SQLAlchemy-2.0.9\n",
"Successfully installed SQLAlchemy-1.4.47 aioboto3-10.4.0 aiobotocore-2.4.2 aiohttp-3.8.4 aioitertools-0.11.0 aiosignal-1.3.1 async-timeout-4.0.2 boto3-1.24.59 botocore-1.27.59 dataclasses-json-0.5.7 deeplake-3.2.22 dill-0.3.6 frozenlist-1.3.3 humbug-0.3.1 jmespath-1.0.1 langchain-0.0.144 marshmallow-3.19.0 marshmallow-enum-1.5.1 multidict-6.0.4 multiprocess-0.70.14 mypy-extensions-1.0.0 numcodecs-0.11.0 openai-0.27.4 openapi-schema-pydantic-1.2.4 pathos-0.3.0 pox-0.3.2 ppft-1.7.6.6 pyjwt-2.6.0 s3transfer-0.6.0 tiktoken-0.3.3 typing-inspect-0.8.0 yarl-1.8.2\n"
]
}
],
"source": [
"!python3 -m pip install --upgrade langchain deeplake openai tiktoken"
]
},
{
"cell_type": "code",
"source": [
"import os\n",
"import getpass\n",
"\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')\n",
"embeddings = OpenAIEmbeddings()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "h5JsxXh-97SZ",
"outputId": "af564f56-de50-4e6d-e31f-265268e3013b"
},
"execution_count": null,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key:··········\n",
"Activeloop Token:··········\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!git clone https://github.com/onevcat/Kingfisher"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZKfLJOr_-iMG",
"outputId": "7b57a088-0c33-411a-ccd0-47177d52bf4e"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'Kingfisher'...\n",
"remote: Enumerating objects: 16842, done.\u001b[K\n",
"remote: Counting objects: 100% (236/236), done.\u001b[K\n",
"remote: Compressing objects: 100% (142/142), done.\u001b[K\n",
"remote: Total 16842 (delta 116), reused 190 (delta 93), pack-reused 16606\u001b[K\n",
"Receiving objects: 100% (16842/16842), 4.75 MiB | 15.31 MiB/s, done.\n",
"Resolving deltas: 100% (11546/11546), done.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import os\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"root_dir = './Kingfisher'\n",
"docs = []\n",
"for dirpath, dirnames, filenames in os.walk(root_dir):\n",
" for file in filenames:\n",
" try: \n",
" loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')\n",
" docs.extend(loader.load_and_split())\n",
" except Exception as e: \n",
" pass"
],
"metadata": {
"id": "xb_YMo6O-wrg"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(docs)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JLMrMB7K-7KU",
"outputId": "d75b5f8d-94b9-4862-92a8-22326cfb47cc"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:langchain.text_splitter:Created a chunk of size 1031, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1331, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1660, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1596, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1013, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1064, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1387, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2573, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1551, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1551, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1036, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1316, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1262, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1569, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1328, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1066, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1234, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1236, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1225, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1250, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1255, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1238, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1717, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1251, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1242, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1239, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1243, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1252, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1234, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1661, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1242, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1239, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1235, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1243, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1247, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1675, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1247, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1249, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1247, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1748, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1223, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1062, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1234, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1536, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1252, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1254, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1748, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1235, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1150, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1281, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1085, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1109, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1375, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1134, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1129, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2430, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1237, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1182, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1597, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1235, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1236, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1234, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1189, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1243, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1234, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2325, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1124, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1824, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1821, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1286, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1358, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1236, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1329, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3913, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1427, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1651, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3165, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1739, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1152, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2305, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1805, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1237, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2532, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2252, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1115, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1542, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1271, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1733, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1226, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1027, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1568, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1230, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1902, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1214, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1220, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1144, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3011, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1375, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1223, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1388, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1225, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1814, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1226, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1500, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1221, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1740, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2635, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1216, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1927, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1936, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1119, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1015, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1273, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1049, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1078, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2768, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1105, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1432, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1048, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2052, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1393, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1420, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3091, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1126, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1124, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2837, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1075, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1435, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3079, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1261, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1018, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1340, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1161, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1022, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1216, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1813, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1018, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1225, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2409, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1358, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1726, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1191, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1043, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1493, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1238, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1182, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1219, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1220, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1253, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1133, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1169, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1876, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1209, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1235, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1652, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1131, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1684, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1249, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 3196, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1065, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1245, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1719, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1131, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1734, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1231, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2032, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1645, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1349, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1187, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1027, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1241, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1221, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1154, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1401, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1387, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1350, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1205, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1502, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1414, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1226, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1002, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1230, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1222, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1711, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1230, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1523, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1064, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1126, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1022, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2012, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1307, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1711, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1249, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1057, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1226, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1704, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1546, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2029, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1251, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1239, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1218, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1233, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1227, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1230, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1215, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1116, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1238, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2597, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1444, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1266, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1463, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1316, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1284, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1084, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1279, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1226, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1232, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1176, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1071, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1221, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1225, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1215, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1027, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1229, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 2785, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1228, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1020, which is longer than the specified 1000\n",
"WARNING:langchain.text_splitter:Created a chunk of size 1540, which is longer than the specified 1000\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"db = DeepLake.from_documents(texts, embeddings, dataset_path=\"hub://ulises_handshake/kingfisher\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HEeoQpIb_BQ0",
"outputId": "23af6971-9e38-4380-fb98-2b25ffae81af"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/ulises_handshake/kingfisher\n",
"\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"-"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"hub://ulises_handshake/kingfisher loaded successfully.\n",
"\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"\r \r\r \rWARNING:langchain.vectorstores.deeplake:Deep Lake Dataset in hub://ulises_handshake/kingfisher already exists, loading from the storage\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset(path='hub://ulises_handshake/kingfisher', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (0,) float32 None \n",
" ids text (0,) str None \n",
" metadata json (0,) str None \n",
" text text (0,) str None \n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"Evaluating ingest: 0%| | 0/2 [00:00<?WARNING:langchain.embeddings.openai:Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details..\n",
"Evaluating ingest: 100%|██████████| 2/2 [00:24<00:00\n",
"\\"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset(path='hub://ulises_handshake/kingfisher', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (1534, 1536) float32 None \n",
" ids text (1534, 1) str None \n",
" metadata json (1534, 1) str None \n",
" text text (1534, 1) str None \n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"\r \r"
]
}
]
},
{
"cell_type": "code",
"source": [
"db = DeepLake(dataset_path=\"hub://ulises_handshake/kingfisher\", read_only=True, embedding_function=embeddings)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gE1uLNUqCGLm",
"outputId": "e1944421-b64d-49c0-f6f6-879e67456ae8"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"-"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/ulises_handshake/kingfisher\n",
"\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"|"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"hub://ulises_handshake/kingfisher loaded successfully.\n",
"\n"
]
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"\r \r\r \rWARNING:langchain.vectorstores.deeplake:Deep Lake Dataset in hub://ulises_handshake/kingfisher already exists, loading from the storage\n"
]
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset(path='hub://ulises_handshake/kingfisher', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (1534, 1536) float32 None \n",
" ids text (1534, 1) str None \n",
" metadata json (1534, 1) str None \n",
" text text (1534, 1) str None \n"
]
}
]
},
{
"cell_type": "code",
"source": [
"retriever = db.as_retriever()\n",
"retriever.search_kwargs['distance_metric'] = 'cos'\n",
"retriever.search_kwargs['fetch_k'] = 100\n",
"retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
"retriever.search_kwargs['k'] = 20"
],
"metadata": {
"id": "sg0JHL_fCQv3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(model='gpt-3.5-turbo') # 'gpt-3.5-turbo',\n",
"qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever, max_tokens_limit=4097)"
],
"metadata": {
"id": "tHd2WBEYCTgG"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"questions = [\n",
" \"How do I fetch a remote image and set it on a SwiftUI View?\",\n",
" \"When fetching a remote image, does the library check disk or memory first?\",\n",
" \"How do I fetch multiple images in parallel?\",\n",
" \"Are the functions that set images thread safe?\",\n",
"] \n",
"chat_history = []\n",
"\n",
"for question in questions: \n",
" result = qa({\"question\": question, \"chat_history\": chat_history})\n",
" chat_history.append((question, result['answer']))\n",
" print(f\"-> **Question**: {question} \\n\")\n",
" print(f\"**Answer**: {result['answer']} \\n\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UaXQ8fglCX7o",
"outputId": "b97b4698-f9fe-40e0-a12b-16df57b5ed91"
},
"execution_count": 24,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"-> **Question**: How do I fetch a remote image and set it on a SwiftUI View? \n",
"\n",
"**Answer**: You can use the `KFImage` view provided by Kingfisher to fetch and display remote images. Here's an example:\n",
"\n",
"```swift\n",
"import SwiftUI\n",
"import Kingfisher\n",
"\n",
"struct ContentView: View {\n",
" var body: some View {\n",
" KFImage(URL(string: \"https://example.com/image.png\"))\n",
" .placeholder {\n",
" ProgressView()\n",
" }\n",
" .resizable()\n",
" .aspectRatio(contentMode: .fit)\n",
" }\n",
"}\n",
"```\n",
"\n",
"In this example, the `KFImage` view takes a `URL` instance to fetch the image from the remote server. It then shows a `ProgressView` as a placeholder while the image is being downloaded. Once the image is downloaded and ready, the `KFImage` view will show it, proportionally scaled to fit the available space. \n",
"\n",
"-> **Question**: When fetching a remote image, does the library check disk or memory first? \n",
"\n",
"**Answer**: By default, Kingfisher will firstly check the memory cache when fetching an image. If it does not find the image in the memory cache, it will check the disk cache. If the image is not found in the memory or disk cache, it will download it from the web. \n",
"\n",
"-> **Question**: How do I fetch multiple images in parallel? \n",
"\n",
"**Answer**: Kingfisher provides an `ImagePrefetcher` class to fetch multiple images in parallel. You can use it like this:\n",
"\n",
"```swift\n",
"import Kingfisher\n",
"\n",
"let urls = [\n",
" URL(string: \"https://example.com/image1.png\")!,\n",
" URL(string: \"https://example.com/image2.png\")!,\n",
" URL(string: \"https://example.com/image3.png\")!,\n",
" // ...\n",
"]\n",
"\n",
"let prefetcher = ImagePrefetcher(urls: urls)\n",
"prefetcher.start()\n",
"```\n",
"\n",
"The `start()` method will begin downloading the images in parallel, and cache them for future use. If an image has already been downloaded and cached, it will be skipped.\n",
"\n",
"You can also provide a `completionHandler` closure to be notified when all images have been downloaded:\n",
"\n",
"```swift\n",
"prefetcher.start() { skippedResources, failedResources, completedResources in\n",
" // Do something with the results\n",
"}\n",
"```\n",
"\n",
"`skippedResources` will contain the URLs of any resources that were skipped, because they were already cached. `failedResources` will contain the URLs of any resources that failed to download. `completedResources` will contain the URLs of all resources that were successfully downloaded. \n",
"\n",
"-> **Question**: Are the functions that set images thread safe? \n",
"\n",
"**Answer**: Yes, all the functions in Kingfisher that set images are thread safe, including downloading and caching. Kingfisher uses NSURLSession and the latest technology of GCD to provide a strong and swift framework, and it provides easy APIs to use. Additionally, Kingfisher goes asynchronously, not only downloading, but also caching, so you never need to worry about blocking your UI thread. \n",
"\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment