Skip to content

Instantly share code, notes, and snippets.

@jchristopher
Last active July 6, 2020 12:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jchristopher/a295bf2e63e0f45b94febb61fffe2c8c to your computer and use it in GitHub Desktop.
Save jchristopher/a295bf2e63e0f45b94febb61fffe2c8c to your computer and use it in GitHub Desktop.
Use Apache Tika to extract PDF content in SearchWP
<?php
// Use Apache Tika to extract PDF content in SearchWP.
add_filter( 'searchwp\parser\pdf', function( $content, $args ) {
// Ensure this path is updated to match your Tika installation path!
$path_to_tika = '/srv/bin/tika-app-1.18.jar';
// Execute the command.
$cmd = "java -jar {$path_to_tika} -t {$args['file']}";
@exec( $cmd, $output, $exitCode );
// If there was a problem, send the output to the debug log.
if ( $exitCode ) {
do_action( 'searchwp\debug\log', 'Error running Tika, exit code: ' . $exitCode );
}
return $output;
}, 20, 2 );
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment