Skip to content

Instantly share code, notes, and snippets.

@ppearcy
Created January 23, 2012 06:27
Show Gist options
  • Save ppearcy/1661161 to your computer and use it in GitHub Desktop.
Save ppearcy/1661161 to your computer and use it in GitHub Desktop.
Tika PDFBox temporary file
public void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context)
throws IOException, SAXException, TikaException {
File tmpFile = File.createTempFile("pdfbox-", ".tmp", null);
RandomAccess scratchFile = new RandomAccessFile(tmpFile, "rw");
PDDocument pdfDocument =
PDDocument.load(new CloseShieldInputStream(stream), scratchFile, true);
try {
if (pdfDocument.isEncrypted()) {
try {
String password = metadata.get(PASSWORD);
if (password == null) {
password = "";
}
pdfDocument.decrypt(password);
} catch (Exception e) {
// Ignore
}
}
metadata.set(Metadata.CONTENT_TYPE, "application/pdf");
extractMetadata(pdfDocument, metadata);
PDF2XHTML.process(pdfDocument, handler, metadata, extractAnnotationText, enableAutoSpace);
} finally {
pdfDocument.close();
scratchFile.close();
tmpFile.delete();
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment