Skip to content

Instantly share code, notes, and snippets.

@dhruvilp
Created April 23, 2023 04:15
Show Gist options
  • Save dhruvilp/85f3fa24ef6dec03a850a27391630b79 to your computer and use it in GitHub Desktop.
Save dhruvilp/85f3fa24ef6dec03a850a27391630b79 to your computer and use it in GitHub Desktop.
PDF Summarization
pip install pypdf2
pip install transformers
import PyPDF2
from transformers import pipeline

# Load the summarization pipeline
summarizer = pipeline("summarization",  model="t5-base", tokenizer="t5-base", framework="tf")
import PyPDF2
# Open the PDF file in read-binary mode
with open('Tutorial_EDIT.pdf', 'rb') as file:
  # Create a PDF object
  pdf = PyPDF2.PdfReader(file)  
  # Get the number of pages in the PDF
  page = pdf.pages[6]
  text = page.extract_text()
  print(text)
# Summarize the text
summary = summarizer(text, max_length=100, min_length=30, do_sample=False)
# Print the summary
print(summary[0][‘summary_text’])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment