Skip to content

Instantly share code, notes, and snippets.

@mobilestack
Forked from ninehills/chatpdf-zh.ipynb
Created March 25, 2023 11:14
Show Gist options
  • Save mobilestack/6d69eae6a85f6cfe619465cf10cbf822 to your computer and use it in GitHub Desktop.
Save mobilestack/6d69eae6a85f6cfe619465cf10cbf822 to your computer and use it in GitHub Desktop.
ChatPDF-zh.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Chat with pdf file "
],
"metadata": {
"id": "4Sw_ysmQlk-8"
}
},
{
"cell_type": "code",
"source": [
"# 建议将 PDF 文件保存在 Google Drive 上,左侧 Connect to Google Drive\n",
"\n",
"from google.colab import drive\n",
"drive.mount('/content/drive')"
],
"metadata": {
"id": "WKhC2AZRjyok",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3c50322c-5eef-43e1-f4a5-34084831e52e"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"WORK_DIR = \"/content/drive/MyDrive/ChatGPT/Notebooks/ChatPDF/\"\n",
"SRC_FILE = \"jianshang.pdf\"\n",
"INDEX_FILE = \"jianshang.index\""
],
"metadata": {
"id": "UXw1TWw_nj_F"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%%capture\n",
"# update or install the necessary libraries\n",
"!pip install --upgrade llama_index\n",
"!pip install --upgrade langchain\n",
"!pip install --upgrade python-dotenv\n"
],
"metadata": {
"id": "Aqef8N2RlUpo"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from llama_index import GPTSimpleVectorIndex, LLMPredictor, PromptHelper\n",
"from llama_index.response.notebook_utils import display_response\n",
"from llama_index.prompts.prompts import QuestionAnswerPrompt\n",
"from langchain.chat_models import ChatOpenAI\n",
"from IPython.display import Markdown, display"
],
"metadata": {
"id": "Vp6JcErhmt_w"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Load environment variables, Just create a .env file with your OPENAI_API_KEY then load it.\n",
"\n",
"import os \n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()\n",
"\n",
"# API configuration\n",
"OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")"
],
"metadata": {
"id": "WKoA2bzul7Gz"
},
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"准备 Index 文件,为了避免重复索引,增加缓存\n",
"\n",
"\n"
],
"metadata": {
"id": "SApFHwHCpEGJ"
}
},
{
"cell_type": "code",
"source": [
"# Load pdf to documents\n",
"\n",
"from pathlib import Path\n",
"from llama_index import download_loader\n",
"\n",
"CJKPDFReader = download_loader(\"CJKPDFReader\")\n",
"\n",
"loader = CJKPDFReader()\n",
"index_file = os.path.join(Path(WORK_DIR), Path(INDEX_FILE))\n",
"\n",
"if os.path.exists(index_file) == False:\n",
" documents = loader.load_data(file=os.path.join(Path(WORK_DIR), Path(SRC_FILE)))\n",
" index = GPTSimpleVectorIndex(documents)\n",
" index.save_to_disk(index_file)\n",
"else:\n",
" index = GPTSimpleVectorIndex.load_from_disk(index_file)\n"
],
"metadata": {
"id": "Cb98YMtrnTxU"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "code",
"source": [
"llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.2, model_name=\"gpt-3.5-turbo\"))\n",
"\n",
"QUESTION_ANSWER_PROMPT_TMPL = (\n",
" \"Context information is below. \\n\"\n",
" \"---------------------\\n\"\n",
" \"{context_str}\"\n",
" \"\\n---------------------\\n\"\n",
" \"{query_str}\\n\"\n",
")\n",
"\n",
"QUESTION_ANSWER_PROMPT_TMPL_2 = \"\"\"\n",
"You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.\n",
"If you can't find the answer in the context below, just say \"Hmm, I'm not sure.\" Don't try to make up an answer.\n",
"If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.\n",
"Context information is below.\n",
"=========\n",
"{context_str}\n",
"=========\n",
"{query_str}\n",
"\"\"\"\n",
"\n",
"QUESTION_ANSWER_PROMPT = QuestionAnswerPrompt(QUESTION_ANSWER_PROMPT_TMPL_2)\n",
"\n",
"def chat(query):\n",
" return index.query(\n",
" query,\n",
" llm_predictor=llm_predictor,\n",
" text_qa_template=QUESTION_ANSWER_PROMPT,\n",
" response_mode=\"tree_summarize\",\n",
" similarity_top_k=3,\n",
" )\n",
"\n",
"display_response(chat(\"这本书讲了什么?\"))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 427
},
"id": "6ddjxclno8tg",
"outputId": "254939dd-b48d-442f-aef1-8e13b5a13c99"
},
"execution_count": 8,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Final Response:`** 这本书叫做《周灭商与华夏新生》,主要讲述了商周之变的历史转折,以及商文化与周文化的不同之处。它是一部关于古代中国思想、信仰、伦理、心态、风俗,以及军事、政治、制度、规则的历史著作,讲述了商朝的祭祀与战争为何有如此紧密的联系,以及殷周之变是如何发生的。它还设立了一个出发点:凡对古典中国思想、信仰、伦理、心态、风俗,以及军事、政治、制度、规则有兴趣的研究者或普通读者,可以先从这本书开始你的探索。"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 1/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8000785245735972<br>**Text:** 如《象传》和《彖传》可能是周公作品。其他篇章里常出现“子日”,孔子 \n自己肯定不会这样写,它们应当是孔门弟子编写的。《周易》经传的详细知识, \n可参考廖明春《周易经传十五讲》,北京大学出版社,2...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 2/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7965259185103453<br>**Text:** 的支持,其实是心理上的,让我意识到除了祭祀坑里的尸骨,这世界 上还有别的东西。 也许,人不应当凝视深渊;虽然深渊就在那里。 \f \f 始于一页,抵达世界 Humanities ■ Histor...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 3/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7947722920317227<br>**Text:** 书》则是“太姒梦见商之庭产棘”。此事应载于《逸周书•程寤》篇,但传 \n世本只存篇名,正文缺。参见黄怀信等《逸周书汇校集注》(修订本),上海 \n古籍出版社,2007年,第262、1141页;李学勤...<br>"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"display_response(chat(\"牧野之战的具体过程是什么?\"))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 509
},
"id": "bS5LAJkuqR4U",
"outputId": "b0431e12-55d0-41ce-b84f-43fe88666c6e"
},
"execution_count": 9,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Final Response:`** 根据文本描述,牧野之战开始时,武王的军队面对着数量远远超过自己的商军阵列,而且他们没有内应,没有商人助战,所以处于两难的困境。武王派出他的岳父兼老师和战略阴谋家吕尚率步兵前往敌阵,自己则带着他的三百辆战车冲向商军阵列,吸引敌军注意力。商军阵列突然自行解体,变成了互相砍杀的人群。接着,西土联军全部投入了混战。最终,商军队伍溃散,武王取得了胜利。"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 1/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7968888490270644<br>**Text:** 王受命第十一年,5他再度起兵东征。有好几种文献记载武王此次伐\n商的行军日程,但年份和月份皆有所不同。总的来说,武王此次起兵\n是在隆冬季节,决战则是在冬末春初。\n\n总攻的前期工作在前一年底就开始了...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 2/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7918882416293495<br>**Text:** 并不是武王的私人属下。他们很在意这种身份区别。7\n\n天色渐明,雨势渐小,对面的商军阵列逐渐成形。周人史诗的描\n述是,敌军的戈矛像森林一样密集,所谓“殷商之旅,其会如林”。(《诗\n经•大雅•大明》...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 3/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.7853722213787273<br>**Text:** 作的“金花”。这都是王室才会有的财物,看来王室和奴隶们居住\n的地方相隔并不远。\n\n\f\n第十一章商人的思维与国家\n\n225\n\n需要注意的是,只有殷墟王宫区发现有大量集中存放的石头农具,\n其他任何商...<br>"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"display_response(chat(\"人祭有几种情况?\"))"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 421
},
"id": "7c1Kwsj2waEB",
"outputId": "da748c8d-3a13-43cb-881a-efd347088018"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Final Response:`** 人祭有两种情况,一种是有蓄意虐杀的迹象,献祭者会尽量延缓人牲的死亡,任凭被剁去肢体的人牲尽量地挣扎、哀嚎或咒骂;另一种是例行的祭祀,随意性更大。"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 1/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8310803169491916<br>**Text:** 祭坑留有蓄意虐杀的迹象,尤其当人牲数量不足,献祭者还会尽量延 \n缓人牲的死亡,任凭被剁去肢体的人牲尽量地挣扎、哀嚎或咒骂。这 \n种心态,跟观看古罗马的角斗士表演有相似之处。\n\n\f\n第二十一章殷都...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 2/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8285896369748251<br>**Text:** :祭祀坑中的无头尸身,往往连带着下顎甚至上顎骨,说明\n每年例行的祭祀的随意性更大。\n\n殷商的王陵祭祀对男性人牲和殉人多用斩首,甚至肢解,而女性\n则多能保存全尸。这背后的宗教思维可能是:男性俘虏和...<br>"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "---"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**`Source Node 3/3`**"
},
"metadata": {}
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<IPython.core.display.Markdown object>"
],
"text/markdown": "**Document ID:** 1f43ae29-e41d-474b-b9c7-2afcb36d0d0b<br>**Similarity:** 0.8259583785233079<br>**Text:** 仪式上,首先奉献的是侯来、陈本等征伐周边斩获的首级,并 搭配现场屠宰的牲畜,“断牛六,断羊二”;然后向天(上帝)和后 稷献祭,用的是牛“五百有四”头;再向其他百神、水土之神献祭, 用猪、羊等牲畜...<br>"
},
"metadata": {}
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment