Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save oddrationale/311b6923b67cc703f6d7a73c117ac867 to your computer and use it in GitHub Desktop.
Save oddrationale/311b6923b67cc703f6d7a73c117ac867 to your computer and use it in GitHub Desktop.
LangChain-ConversationalRetrievalChain-with-Memory.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPcRPDNjg7M+5Hx3Ed4Q4j1",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/oddrationale/311b6923b67cc703f6d7a73c117ac867/langchain-conversationalretrievalchain-with-memory.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# LangChain Tutorial: How to create a `ConversationalRetrievalChain` with Memory"
],
"metadata": {
"id": "-rfKDpyPJ9uq"
}
},
{
"cell_type": "markdown",
"source": [
"Motivation: The documentation shows [how to use `ConversationalRetrievalChain` to chat over documents with chat history](https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html). However, in the example, you have to manage the chat history manually by updating the `chat_history` list with each conversation turn and passing it to the next turn.\n",
"\n",
"This tutorial shows how to use Memory, such as `ConversationBufferMemory` to automatically manage the `chat_history`.\n",
"\n",
"Hopefully, this provides a simple and straight-forward implementation."
],
"metadata": {
"id": "kcNtiko_KhhZ"
}
},
{
"cell_type": "code",
"source": [
"# This was created in Google Colab and the environment variables are stored in a\n",
"# `.env` file.\n",
"from google.colab import drive\n",
"drive.mount('/content/drive')\n",
"\n",
"!pip install -Uqq python-dotenv\n",
"from dotenv import load_dotenv\n",
"load_dotenv('/content/drive/MyDrive/.env')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SuwjOwt8MaYI",
"outputId": "9441e796-cdfe-4b5f-fe65-dcfaf96ddb2c"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /content/drive\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"True"
]
},
"metadata": {},
"execution_count": 1
}
]
},
{
"cell_type": "code",
"source": [
"!pip install -Uqq langchain OpenAI tiktoken chromadb"
],
"metadata": {
"id": "1-uLQXmLMxQp"
},
"execution_count": 10,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import re\n",
"from langchain.chains import (\n",
" ConversationalRetrievalChain,\n",
" SequentialChain,\n",
" TransformChain,\n",
")\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.vectorstores import Chroma"
],
"metadata": {
"id": "PnQBMSOyNDvU"
},
"execution_count": 30,
"outputs": []
},
{
"cell_type": "code",
"source": [
"text = TextLoader(\"/content/drive/MyDrive/Colab Notebooks/state_of_the_union_2023.txt\").load()\n",
"docs = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0).split_documents(text)"
],
"metadata": {
"id": "PmkaDWYRPiR2"
},
"execution_count": 31,
"outputs": []
},
{
"cell_type": "code",
"source": [
"vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QJYmiOc7QPv5",
"outputId": "31404589-94a9-4acb-c784-fdd9be6ede34"
},
"execution_count": 32,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"WARNING:chromadb:Using embedded DuckDB without persistence: data will be transient\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"This approach uses a `SequentialChain` to chain together a `TransformChain` and a `ConversationalRetrievalChain`:\n",
"\n",
" SequentialChain ->\n",
" TransformChain -> ConversationalRetrievalChain\n",
"\n",
"The `SequentialChain` holds the Memory and passes it to the inner chains as a `history` input with the format as:\n",
"\n",
" Human: <Human message>\n",
" AI: <AI message>\n",
"\n",
"The `TransformChain` uses regular expressions to parse the `history` into the `List` of `Tuple`s that the `ConversationBufferMemory` expects.\n",
"\n",
"Note: This specific transform function will only work with `ConversationBufferMemory` and `ConversationBufferWindowMemory`. Adjustments will need to be made if using something else like `ConversationSummaryMemory`."
],
"metadata": {
"id": "-0vJS9r2Uz5U"
}
},
{
"cell_type": "code",
"source": [
"# Create a TransformChain instance with a lambda function that processes\n",
"# a given chat history input and returns a dictionary containing the question and chat history.\n",
"#\n",
"# The lambda function does the following:\n",
"# 1. Compiles regular expression patterns for matching Human and AI lines.\n",
"# 2. Finds all matches for Human and AI lines in the given chat history.\n",
"# 3. Extracts the text from the matched lines and stores them in lists.\n",
"# 4. Zips the lists to create a list of tuples with corresponding Human and AI texts.\n",
"# 5. Returns a dictionary with the question and chat history (list of tuples).\n",
"#\n",
"\n",
"transform = TransformChain(\n",
" input_variables=[\"input\"],\n",
" output_variables=[\"question\", \"chat_history\"],\n",
" transform=lambda inputs: {\n",
" \"question\": inputs[\"input\"],\n",
" \"chat_history\": [\n",
" (human.group(1).strip(), ai.group(1).strip())\n",
" for human, ai in zip(\n",
" re.compile(r'Human: (.*(?:\\n(?!(Human|AI):).*)*)').finditer(inputs[\"history\"]),\n",
" re.compile(r'AI: (.*(?:\\n(?!(Human|AI):).*)*)').finditer(inputs[\"history\"]),\n",
" )\n",
" ],\n",
" },\n",
")"
],
"metadata": {
"id": "bqK4TArxRdKE"
},
"execution_count": 33,
"outputs": []
},
{
"cell_type": "code",
"source": [
"conversational_retrieval = ConversationalRetrievalChain.from_llm(\n",
" llm=ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0.7),\n",
" retriever=vectorstore.as_retriever(),\n",
")"
],
"metadata": {
"id": "BhxYgQVhR7Pj"
},
"execution_count": 34,
"outputs": []
},
{
"cell_type": "code",
"source": [
"chat = SequentialChain(\n",
" memory=ConversationBufferMemory(),\n",
" input_variables=[\"input\"],\n",
" output_variables=[\"answer\"],\n",
" chains=[transform, conversational_retrieval],\n",
")"
],
"metadata": {
"id": "UsgKW0k6SSou"
},
"execution_count": 35,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(chat.run(\"What did the president say about semiconductors?\"))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6xN5qB0BSdZv",
"outputId": "70d198dd-cf8d-4f77-a36b-b8f7e2476f2c"
},
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The President mentioned that semiconductors, small computer chips that power everything from cellphones to automobiles, were invented in America and that the country used to make 40% of the world's chips, but now only produces 10%. He also talked about the consequences of chip factories shutting down overseas during the pandemic which affected the production of American automobiles and caused a rise in car prices and layoffs in various industries. The President also mentioned the bipartisan CHIPS and Science Act aimed at making more chips in America.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"chat.memory.chat_memory.messages"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gZ9B9lPwT-m9",
"outputId": "59776419-d863-4412-eb1b-8cdb21ef8b36"
},
"execution_count": 37,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[HumanMessage(content='What did the president say about semiconductors?', additional_kwargs={}),\n",
" AIMessage(content=\"The President mentioned that semiconductors, small computer chips that power everything from cellphones to automobiles, were invented in America and that the country used to make 40% of the world's chips, but now only produces 10%. He also talked about the consequences of chip factories shutting down overseas during the pandemic which affected the production of American automobiles and caused a rise in car prices and layoffs in various industries. The President also mentioned the bipartisan CHIPS and Science Act aimed at making more chips in America.\", additional_kwargs={})]"
]
},
"metadata": {},
"execution_count": 37
}
]
},
{
"cell_type": "code",
"source": [
"print(chat.run(\"What did the Act entail?\"))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IiG3Vr5oSpey",
"outputId": "535a1d31-846d-4474-f7e1-ce9155ed2532"
},
"execution_count": 38,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The bipartisan CHIPS and Science Act is a law that aims to increase the production of semiconductors or computer chips in America. The law provides funding for research and development of new chip technologies, as well as incentives for companies to manufacture chips in the United States. The Act was passed to prevent the shortage of chips that happened during the COVID-19 pandemic from happening again. By creating more semiconductor factories in America, the law aims to create hundreds of thousands of new jobs across the country.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"chat.memory.chat_memory.messages"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2Bvspt7sUHsA",
"outputId": "543891c5-0536-4e8b-a737-cf36017635e8"
},
"execution_count": 39,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[HumanMessage(content='What did the president say about semiconductors?', additional_kwargs={}),\n",
" AIMessage(content=\"The President mentioned that semiconductors, small computer chips that power everything from cellphones to automobiles, were invented in America and that the country used to make 40% of the world's chips, but now only produces 10%. He also talked about the consequences of chip factories shutting down overseas during the pandemic which affected the production of American automobiles and caused a rise in car prices and layoffs in various industries. The President also mentioned the bipartisan CHIPS and Science Act aimed at making more chips in America.\", additional_kwargs={}),\n",
" HumanMessage(content='What did the Act entail?', additional_kwargs={}),\n",
" AIMessage(content='The bipartisan CHIPS and Science Act is a law that aims to increase the production of semiconductors or computer chips in America. The law provides funding for research and development of new chip technologies, as well as incentives for companies to manufacture chips in the United States. The Act was passed to prevent the shortage of chips that happened during the COVID-19 pandemic from happening again. By creating more semiconductor factories in America, the law aims to create hundreds of thousands of new jobs across the country.', additional_kwargs={})]"
]
},
"metadata": {},
"execution_count": 39
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment