Skip to content

Instantly share code, notes, and snippets.

@sanketsudake
Last active January 5, 2024 05:31
Show Gist options
  • Save sanketsudake/fe8a09943b3b8ffb7653d61fc118795e to your computer and use it in GitHub Desktop.
Save sanketsudake/fe8a09943b3b8ffb7653d61fc118795e to your computer and use it in GitHub Desktop.
PDF summarizer with langchain and Amazon Bedrock Titan

Usage

  • Setup python dependencies from requirements.txt
  • Source your AWS enviroment credentials.
    • Enable Amazon bedrock amazon.titan-text-express-v1 model for access
    • AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN(optional)
  • Run python pdf_summary.py
    python pdf_summary.py
    Running on local URL:  http://127.0.0.1:7860
    
  • Access URL exposed from browser

image

import boto3
import gradio
from langchain.document_loaders import PyPDFLoader
from langchain.llms.bedrock import Bedrock
from langchain.chains.summarize import load_summarize_chain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
def get_bedrock_runtime_client():
return boto3.client(
service_name="bedrock-runtime",
region_name="us-west-2",
)
def summarize_pdf(pdf_path):
loader = PyPDFLoader(pdf_path)
documents = loader.load_and_split()
llm = Bedrock(
model_id="amazon.titan-text-express-v1",
model_kwargs={
"maxTokenCount": 1024,
"stopSequences": [],
"temperature": 0,
"topP": 0.9,
},
client=get_bedrock_runtime_client(),
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
chain = load_summarize_chain(llm)
summary = chain(documents)
# print(f'{summary} {dict(summary)}')
return summary["output_text"]
def main():
input_pdf_path = gradio.File(label="Upload PDF file", type="filepath")
output_summary = gradio.Textbox(label="Summary")
gradio.Interface(
fn=summarize_pdf,
inputs=input_pdf_path,
outputs=output_summary,
title="Summarizer",
description="This app allows you to summarize your PDF file.",
).launch(share=False)
if __name__ == "__main__":
main()
boto3==1.34.13
gradio==4.13.0
gradio_client==0.8.0
langchain==0.0.354
langchain-community==0.0.8
langchain-core==0.1.6
pypdf==3.17.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment