Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Extract TOC information from pdf file using pdfminer
#!/usr/bin/env python
from pdfminer.pdfparser import PDFParser, PDFDocument
def parse(filename, maxlevel):
fp = open(filename, 'rb')
parser = PDFParser(fp)
doc = PDFDocument()
outlines = doc.get_outlines()
for (level, title, dest, a, se) in outlines:
if level <= maxlevel:
print ' ' * level, title
if __name__ == '__main__':
import sys
if len(sys.argv) != 3:
print 'Usage: %s xxx.pdf level' % sys.argv[0]
parse(sys.argv[1], int(sys.argv[2]))

This comment has been minimized.

Copy link

@tilusnet tilusnet commented May 16, 2014

Hi sakti,

I adapted your gist to PDFMiner 20140328 here:

Feel free to merge back, cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment