Skip to content

Instantly share code, notes, and snippets.

View CherylJacob's full-sized avatar

Cheryl Mariam Jacob CherylJacob

View GitHub Profile
@CherylJacob
CherylJacob / pdfxtract.py
Created August 11, 2018 12:46 — forked from jmcarp/pdfxtract.py
Extract text from PDF document using PDFMiner
"""
Extract PDF text using PDFMiner. Adapted from
http://stackoverflow.com/questions/5725278/python-help-using-pdfminer-as-a-library
"""
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter#process_pdf
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams