Skip to content

Instantly share code, notes, and snippets.

@actsasflinn actsasflinn/Split.java
Last active Dec 16, 2016

Embed
What would you like to do?
PDF Split using PDFBox
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
import org.apache.pdfbox.exceptions.*;
import java.util.regex.*;
class Split {
public static void main(String[] args) throws IOException, COSVisitorException {
File input = new File("example.pdf");
PDPage pdPage = null;
PDDocument outputDocument = null;
PDDocument inputDocument = PDDocument.loadNonSeq(input, null);
PDFTextStripper stripper = new PDFTextStripper();
outputDocument = new PDDocument();
for (int page = 1; page <= inputDocument.getNumberOfPages(); ++page) {
stripper.setStartPage(page);
stripper.setEndPage(page);
String text = stripper.getText(inputDocument);
Pattern p = Pattern.compile("Growing Faster Than PCs");
// Matcher refers to the actual text where the pattern will be found
Matcher m = p.matcher(text);
if (m.find())
{
pdPage = (PDPage) inputDocument.getDocumentCatalog().getAllPages().get(page - 1);
// append page to current document
outputDocument.importPage(pdPage);
}
}
File f = new File("extract.pdf");
outputDocument.save(f);
outputDocument.close();
inputDocument.close();
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.