Skip to content

Instantly share code, notes, and snippets.

@actsasflinn
Last active December 16, 2016 02:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save actsasflinn/4516ae1c322447bdc2634fab9240d70c to your computer and use it in GitHub Desktop.
Save actsasflinn/4516ae1c322447bdc2634fab9240d70c to your computer and use it in GitHub Desktop.
PDF Split using PDFBox
import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.util.*;
import org.apache.pdfbox.exceptions.*;
import java.util.regex.*;
class Split {
public static void main(String[] args) throws IOException, COSVisitorException {
File input = new File("example.pdf");
PDPage pdPage = null;
PDDocument outputDocument = null;
PDDocument inputDocument = PDDocument.loadNonSeq(input, null);
PDFTextStripper stripper = new PDFTextStripper();
outputDocument = new PDDocument();
for (int page = 1; page <= inputDocument.getNumberOfPages(); ++page) {
stripper.setStartPage(page);
stripper.setEndPage(page);
String text = stripper.getText(inputDocument);
Pattern p = Pattern.compile("Growing Faster Than PCs");
// Matcher refers to the actual text where the pattern will be found
Matcher m = p.matcher(text);
if (m.find())
{
pdPage = (PDPage) inputDocument.getDocumentCatalog().getAllPages().get(page - 1);
// append page to current document
outputDocument.importPage(pdPage);
}
}
File f = new File("extract.pdf");
outputDocument.save(f);
outputDocument.close();
inputDocument.close();
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment