Skip to content

Instantly share code, notes, and snippets.

@keitazoumana
Created December 6, 2021 12:12
Show Gist options
  • Save keitazoumana/6b642dc9541eceeba10be1b375ba1ee9 to your computer and use it in GitHub Desktop.
Save keitazoumana/6b642dc9541eceeba10be1b375ba1ee9 to your computer and use it in GitHub Desktop.
from PyPDF2 import PdfFileReader
# creating a pdf file object
pdfObject = open('./data/obama-worlds-matter.pdf', 'rb')
# creating a pdf reader object
pdfReader = PdfFileReader(pdfObject)
# Extract and concatenate each page's content
text=''
for i in range(0,pdfReader.numPages):
# creating a page object
pageObject = pdfReader.getPage(i)
# extracting text from page
text += pageObject.extractText()
print(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment