Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Script used to extract all PDF file titles (from a conference) and put them into a txt file, which can be used to generate word cloud
#!/usr/bin/env python
from pdfrw import PdfReader
import glob
fobj = open('alltitles.txt', 'w') # output file
allpdf = glob.glob('./*.pdf') # assuming all PDF files in current dir
for fname in allpdf:
ipdf = PdfReader(fname)
title = ipdf.Info.get('/Title')
print 'file = ' + fname
print 'title = ' + title
fobj.write(title[1:-1] + ' ') # in my case title has "()"
# import ipdb; ipdb.set_trace()
fobj.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.