Skip to content

Instantly share code, notes, and snippets.

@suensummit
Created March 2, 2017 09:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save suensummit/30e4a4428b3f62d829feed7a80f4198f to your computer and use it in GitHub Desktop.
Save suensummit/30e4a4428b3f62d829feed7a80f4198f to your computer and use it in GitHub Desktop.
Python PDF Parser example code.
import os, csv
from urllib2 import Request, urlopen
from StringIO import StringIO
from PyPDF2 import PdfFileReader
# Get the moneydj urls into a list.
with open('/money_url_list.csv', 'rb') as money_url_list:
reader = csv.reader(money_url_list, delimiter = ',')
moneydj_list = list(reader)
# Open PDF Reader.
pdfFile = PdfFileReader(StringIO(urlopen(Request(moneydj_list[1][2])).read()))
# Print content extracted.
page = pdfFile.getPage(pageNumber = 1)
print page.extractText().encode("ascii", "ignore")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment