Skip to content

Instantly share code, notes, and snippets.

@actsasflinn
Created December 16, 2016 02:09
Show Gist options
  • Save actsasflinn/a2b67d28308ee00cf5cf4c29ff64fdfa to your computer and use it in GitHub Desktop.
Save actsasflinn/a2b67d28308ee00cf5cf4c29ff64fdfa to your computer and use it in GitHub Desktop.
PDF Split using PDFBox
require 'java'
require './pdfbox-1.8.13.jar'
require './fontbox-1.8.13.jar'
require './commons-logging-1.2.jar'
input = java.io.File.new("example.pdf")
inputDocument = org.apache.pdfbox.pdmodel.PDDocument::loadNonSeq(input, nil)
stripper = org.apache.pdfbox.util.PDFTextStripper.new
outputDocument = org.apache.pdfbox.pdmodel.PDDocument.new
for page in (1..inputDocument.getNumberOfPages)
stripper.setStartPage(page)
stripper.setEndPage(page)
text = stripper.getText(inputDocument)
p = java.util.regex.Pattern::compile("Growing Faster Than PCs")
m = p.matcher(text)
if m.find
pdPage = inputDocument.getDocumentCatalog.getAllPages.get(page - 1)
outputDocument.importPage(pdPage)
end
end
f = java.io.File.new("extract.pdf")
outputDocument.save(f)
outputDocument.close()
inputDocument.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment