abodacs/GPT4all-langchain-demo.ipynb

## GPT4all-langchain-demo.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0f6a4a52",
   "metadata": {},
   "source": [
    "# GPT4All Langchain Demo\n",
    "\n",
    "Example of locally running [`GPT4All`](https://github.com/nomic-ai/gpt4all), a 4GB, *llama.cpp* based large langage model (LLM) under `langchachain`](https://github.com/hwchase17/langchain), in a Jupyter notebook running a Python 3.10 kernel.\n",
    "\n",
    "*Tested on a mid-2015 16GB Macbook Pro, concurrently running Docker (a single container running a sepearate Jupyter server) and Chrome with approx. 40 open tabs).*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81b77eca",
   "metadata": {},
   "source": [
    "## Model preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95d873d9",
   "metadata": {},
   "source": [
    "- download `gpt4all` model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e612700",
   "metadata": {},
   "outputs": [],
   "source": [
    "#https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34170e49",
   "metadata": {},
   "source": [
    "- download `llama.cpp` 7B model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9b392a3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "#%pip install pyllama\n",
    "#!python3.10 -m llama.download --model_size 7B --folder llama/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33f214b3",
   "metadata": {},
   "source": [
    "- transform `gpt4all` model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc5c0340",
   "metadata": {},
   "outputs": [],
   "source": [
    "#%pip install pyllamacpp\n",
    "#!pyllamacpp-convert-gpt4all ./gpt4all-main/chat/gpt4all-lora-quantized.bin llama/tokenizer.model ./gpt4all-main/chat/gpt4all-lora-q-converted.bin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c83ff89a",
   "metadata": {},
   "outputs": [],
   "source": [
    "GPT4ALL_MODEL_PATH = \"./gpt4all-main/chat/gpt4all-lora-q-converted.bin\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43c72555",
   "metadata": {},
   "source": [
    "## `langchain` Demo\n",
    "\n",
    "Example of running a prompt using `langchain`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b03851fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "#https://python.langchain.com/en/latest/ecosystem/llamacpp.html\n",
    "#%pip uninstall -y langchain\n",
    "#%pip install --upgrade git+https://github.com/hwchase17/langchain.git\n",
    "\n",
    "from langchain.llms import LlamaCpp\n",
    "from langchain import PromptTemplate, LLMChain"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "766a2c6d",
   "metadata": {},
   "source": [
    "- set up prompt template:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "66d6f183",
   "metadata": {},
   "outputs": [],
   "source": [
    "template = \"\"\"\n",
    "Question: {question}\n",
    "Answer: Let's think step by step.\n",
    "\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6458a779",
   "metadata": {},
   "source": [
    "- load model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "801af875",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
      "llama_model_load: n_vocab = 32001\n",
      "llama_model_load: n_ctx   = 512\n",
      "llama_model_load: n_embd  = 4096\n",
      "llama_model_load: n_mult  = 256\n",
      "llama_model_load: n_head  = 32\n",
      "llama_model_load: n_layer = 32\n",
      "llama_model_load: n_rot   = 128\n",
      "llama_model_load: f16     = 2\n",
      "llama_model_load: n_ff    = 11008\n",
      "llama_model_load: n_parts = 1\n",
      "llama_model_load: type    = 1\n",
      "llama_model_load: ggml map size = 4017.70 MB\n",
      "llama_model_load: ggml ctx size =  81.25 KB\n",
      "llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)\n",
      "llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
      "llama_model_load: model size =  4017.27 MB / num tensors = 291\n",
      "llama_init_from_file: kv self size  =  512.00 MB\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 572 ms, sys: 711 ms, total: 1.28 s\n",
      "Wall time: 1.42 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "llm = LlamaCpp(model_path=GPT4ALL_MODEL_PATH)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "074b912d",
   "metadata": {},
   "source": [
    "- create language chain using prompt template and loaded model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d40d2cc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_chain = LLMChain(prompt=prompt, llm=llm)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "836b613b",
   "metadata": {},
   "source": [
    "- run prompt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "6eba968c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 5min 2s, sys: 4.17 s, total: 5min 6s\n",
      "Wall time: 43.7 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'1) The year Justin Bieber was born (2005):\\n2) Justin Bieber was born on March 1, 1994:\\n3) The Buffalo Bills won Super Bowl XXVIII over the Dallas Cowboys in 1994:\\nTherefore, the NFL team that won the Super Bowl in the year Justin Bieber was born is the Buffalo Bills.'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
    "\n",
    "llm_chain.run(question)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddfadbb5",
   "metadata": {},
   "source": [
    "Another example..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "89abe837",
   "metadata": {},
   "outputs": [],
   "source": [
    "template2 = \"\"\"\n",
    "Question: {question}\n",
    "Answer: \n",
    "\"\"\"\n",
    "\n",
    "prompt2 = PromptTemplate(template=template2, input_variables=[\"question\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "60676f7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_chain2 = LLMChain(prompt=prompt, llm=llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "cb6a9962",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 14min 37s, sys: 5.56 s, total: 14min 42s\n",
      "Wall time: 2min 4s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "\"A relational database is a type of database management system (DBMS) that stores data in tables where each row represents one entity or object (e.g., customer, order, or product), and each column represents a property or attribute of the entity (e.g., first name, last name, email address, or shipping address).\\n\\nACID stands for Atomicity, Consistency, Isolation, Durability:\\n\\nAtomicity: The transaction's effects are either all applied or none at all; it cannot be partially applied. For example, if a customer payment is made but not authorized by the bank, then the entire transaction should fail and no changes should be committed to the database.\\nConsistency: Once a transaction has been committed, its effects should be durable (i.e., not lost), and no two transactions can access data in an inconsistent state. For example, if one transaction is in progress while another transaction attempts to update the same data, both transactions should fail.\\nIsolation: Each transaction should execute without interference from other concurrently executing transactions, thereby ensuring its properties are applied atomically and consistently. For example, two transactions cannot affect each other's data\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "question2 = \"What is a relational database and what is ACID in that context?\"\n",
    "\n",
    "llm_chain2.run(question2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfb1fff1",
   "metadata": {},
   "source": [
    "## Generating Embeddings\n",
    "\n",
    "We can also use the model to generate embddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "bfaabcf0",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
      "llama_model_load: n_vocab = 32001\n",
      "llama_model_load: n_ctx   = 512\n",
      "llama_model_load: n_embd  = 4096\n",
      "llama_model_load: n_mult  = 256\n",
      "llama_model_load: n_head  = 32\n",
      "llama_model_load: n_layer = 32\n",
      "llama_model_load: n_rot   = 128\n",
      "llama_model_load: f16     = 2\n",
      "llama_model_load: n_ff    = 11008\n",
      "llama_model_load: n_parts = 1\n",
      "llama_model_load: type    = 1\n",
      "llama_model_load: ggml map size = 4017.70 MB\n",
      "llama_model_load: ggml ctx size =  81.25 KB\n",
      "llama_model_load: mem required  = 5809.78 MB (+ 2052.00 MB per state)\n",
      "llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
      "llama_model_load: model size =  4017.27 MB / num tensors = 291\n",
      "llama_init_from_file: kv self size  =  512.00 MB\n"
     ]
    }
   ],
   "source": [
    "#https://abetlen.github.io/llama-cpp-python/\n",
    "#%pip uninstall -y llama-cpp-python\n",
    "#%pip install --upgrade llama-cpp-python\n",
    "\n",
    "from langchain.embeddings import LlamaCppEmbeddings\n",
    "\n",
    "llama = LlamaCppEmbeddings(model_path=GPT4ALL_MODEL_PATH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "0c6c1603",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 12.9 s, sys: 1.57 s, total: 14.5 s\n",
      "Wall time: 2.13 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "text = \"This is a test document.\"\n",
    "\n",
    "query_result = llama.embed_query(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d0a45169",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 10.4 s, sys: 59.7 ms, total: 10.4 s\n",
      "Wall time: 1.47 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "doc_result = llama.embed_documents([text])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "0f6a4a52",
	"metadata": {},
	"source": [
	"# GPT4All Langchain Demo\n",
	"\n",
	"Example of locally running [`GPT4All`](https://github.com/nomic-ai/gpt4all), a 4GB, llama.cpp based large langage model (LLM) under `langchachain`](https://github.com/hwchase17/langchain), in a Jupyter notebook running a Python 3.10 kernel.\n",
	"\n",
	"Tested on a mid-2015 16GB Macbook Pro, concurrently running Docker (a single container running a sepearate Jupyter server) and Chrome with approx. 40 open tabs)."
	]
	},
	{
	"cell_type": "markdown",
	"id": "81b77eca",
	"metadata": {},
	"source": [
	"## Model preparation"
	]
	},
	{
	"cell_type": "markdown",
	"id": "95d873d9",
	"metadata": {},
	"source": [
	"- download `gpt4all` model:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "0e612700",
	"metadata": {},
	"outputs": [],
	"source": [
	"#https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin"
	]
	},
	{
	"cell_type": "markdown",
	"id": "34170e49",
	"metadata": {},
	"source": [
	"- download `llama.cpp` 7B model"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "9b392a3f",
	"metadata": {},
	"outputs": [],
	"source": [
	"#%pip install pyllama\n",
	"#!python3.10 -m llama.download --model_size 7B --folder llama/"
	]
	},
	{
	"cell_type": "markdown",
	"id": "33f214b3",
	"metadata": {},
	"source": [
	"- transform `gpt4all` model:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "fc5c0340",
	"metadata": {},
	"outputs": [],
	"source": [
	"#%pip install pyllamacpp\n",
	"#!pyllamacpp-convert-gpt4all ./gpt4all-main/chat/gpt4all-lora-quantized.bin llama/tokenizer.model ./gpt4all-main/chat/gpt4all-lora-q-converted.bin"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c83ff89a",
	"metadata": {},
	"outputs": [],
	"source": [
	"GPT4ALL_MODEL_PATH = \"./gpt4all-main/chat/gpt4all-lora-q-converted.bin\""
	]
	},
	{
	"cell_type": "markdown",
	"id": "43c72555",
	"metadata": {},
	"source": [
	"## `langchain` Demo\n",
	"\n",
	"Example of running a prompt using `langchain`."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"id": "b03851fc",
	"metadata": {},
	"outputs": [],
	"source": [
	"#https://python.langchain.com/en/latest/ecosystem/llamacpp.html\n",
	"#%pip uninstall -y langchain\n",
	"#%pip install --upgrade git+https://github.com/hwchase17/langchain.git\n",
	"\n",
	"from langchain.llms import LlamaCpp\n",
	"from langchain import PromptTemplate, LLMChain"
	]
	},
	{
	"cell_type": "markdown",
	"id": "766a2c6d",
	"metadata": {},
	"source": [
	"- set up prompt template:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "66d6f183",
	"metadata": {},
	"outputs": [],
	"source": [
	"template = \"\"\"\n",
	"Question: {question}\n",
	"Answer: Let's think step by step.\n",
	"\"\"\"\n",
	"\n",
	"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
	]
	},
	{
	"cell_type": "markdown",
	"id": "6458a779",
	"metadata": {},
	"source": [
	"- load model:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"id": "801af875",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
	"llama_model_load: n_vocab = 32001\n",
	"llama_model_load: n_ctx = 512\n",
	"llama_model_load: n_embd = 4096\n",
	"llama_model_load: n_mult = 256\n",
	"llama_model_load: n_head = 32\n",
	"llama_model_load: n_layer = 32\n",
	"llama_model_load: n_rot = 128\n",
	"llama_model_load: f16 = 2\n",
	"llama_model_load: n_ff = 11008\n",
	"llama_model_load: n_parts = 1\n",
	"llama_model_load: type = 1\n",
	"llama_model_load: ggml map size = 4017.70 MB\n",
	"llama_model_load: ggml ctx size = 81.25 KB\n",
	"llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)\n",
	"llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
	"llama_model_load: model size = 4017.27 MB / num tensors = 291\n",
	"llama_init_from_file: kv self size = 512.00 MB\n"
	]
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"CPU times: user 572 ms, sys: 711 ms, total: 1.28 s\n",
	"Wall time: 1.42 s\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"\n",
	"llm = LlamaCpp(model_path=GPT4ALL_MODEL_PATH)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "074b912d",
	"metadata": {},
	"source": [
	"- create language chain using prompt template and loaded model:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"id": "d40d2cc0",
	"metadata": {},
	"outputs": [],
	"source": [
	"llm_chain = LLMChain(prompt=prompt, llm=llm)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "836b613b",
	"metadata": {},
	"source": [
	"- run prompt:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "6eba968c",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"CPU times: user 5min 2s, sys: 4.17 s, total: 5min 6s\n",
	"Wall time: 43.7 s\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"'1) The year Justin Bieber was born (2005):\\n2) Justin Bieber was born on March 1, 1994:\\n3) The Buffalo Bills won Super Bowl XXVIII over the Dallas Cowboys in 1994:\\nTherefore, the NFL team that won the Super Bowl in the year Justin Bieber was born is the Buffalo Bills.'"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"%%time\n",
	"question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
	"\n",
	"llm_chain.run(question)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "ddfadbb5",
	"metadata": {},
	"source": [
	"Another example..."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"id": "89abe837",
	"metadata": {},
	"outputs": [],
	"source": [
	"template2 = \"\"\"\n",
	"Question: {question}\n",
	"Answer: \n",
	"\"\"\"\n",
	"\n",
	"prompt2 = PromptTemplate(template=template2, input_variables=[\"question\"])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "60676f7a",
	"metadata": {},
	"outputs": [],
	"source": [
	"llm_chain2 = LLMChain(prompt=prompt, llm=llm)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"id": "cb6a9962",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"CPU times: user 14min 37s, sys: 5.56 s, total: 14min 42s\n",
	"Wall time: 2min 4s\n"
	]
	},
	{
	"data": {
	"text/plain": [
	"\"A relational database is a type of database management system (DBMS) that stores data in tables where each row represents one entity or object (e.g., customer, order, or product), and each column represents a property or attribute of the entity (e.g., first name, last name, email address, or shipping address).\\n\\nACID stands for Atomicity, Consistency, Isolation, Durability:\\n\\nAtomicity: The transaction's effects are either all applied or none at all; it cannot be partially applied. For example, if a customer payment is made but not authorized by the bank, then the entire transaction should fail and no changes should be committed to the database.\\nConsistency: Once a transaction has been committed, its effects should be durable (i.e., not lost), and no two transactions can access data in an inconsistent state. For example, if one transaction is in progress while another transaction attempts to update the same data, both transactions should fail.\\nIsolation: Each transaction should execute without interference from other concurrently executing transactions, thereby ensuring its properties are applied atomically and consistently. For example, two transactions cannot affect each other's data\""
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"%%time\n",
	"question2 = \"What is a relational database and what is ACID in that context?\"\n",
	"\n",
	"llm_chain2.run(question2)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "dfb1fff1",
	"metadata": {},
	"source": [
	"## Generating Embeddings\n",
	"\n",
	"We can also use the model to generate embddings."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"id": "bfaabcf0",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"llama_model_load: loading model from './gpt4all-main/chat/gpt4all-lora-q-converted.bin' - please wait ...\n",
	"llama_model_load: n_vocab = 32001\n",
	"llama_model_load: n_ctx = 512\n",
	"llama_model_load: n_embd = 4096\n",
	"llama_model_load: n_mult = 256\n",
	"llama_model_load: n_head = 32\n",
	"llama_model_load: n_layer = 32\n",
	"llama_model_load: n_rot = 128\n",
	"llama_model_load: f16 = 2\n",
	"llama_model_load: n_ff = 11008\n",
	"llama_model_load: n_parts = 1\n",
	"llama_model_load: type = 1\n",
	"llama_model_load: ggml map size = 4017.70 MB\n",
	"llama_model_load: ggml ctx size = 81.25 KB\n",
	"llama_model_load: mem required = 5809.78 MB (+ 2052.00 MB per state)\n",
	"llama_model_load: loading tensors from './gpt4all-main/chat/gpt4all-lora-q-converted.bin'\n",
	"llama_model_load: model size = 4017.27 MB / num tensors = 291\n",
	"llama_init_from_file: kv self size = 512.00 MB\n"
	]
	}
	],
	"source": [
	"#https://abetlen.github.io/llama-cpp-python/\n",
	"#%pip uninstall -y llama-cpp-python\n",
	"#%pip install --upgrade llama-cpp-python\n",
	"\n",
	"from langchain.embeddings import LlamaCppEmbeddings\n",
	"\n",
	"llama = LlamaCppEmbeddings(model_path=GPT4ALL_MODEL_PATH)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"id": "0c6c1603",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"CPU times: user 12.9 s, sys: 1.57 s, total: 14.5 s\n",
	"Wall time: 2.13 s\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"text = \"This is a test document.\"\n",
	"\n",
	"query_result = llama.embed_query(text)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"id": "d0a45169",
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"CPU times: user 10.4 s, sys: 59.7 ms, total: 10.4 s\n",
	"Wall time: 1.47 s\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"doc_result = llama.embed_documents([text])"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.10.7"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}