Skip to content

Instantly share code, notes, and snippets.

@andrewmkhoury
Last active April 29, 2021 08:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andrewmkhoury/511b137c7ba486ac4724df6fa10b8c71 to your computer and use it in GitHub Desktop.
Save andrewmkhoury/511b137c7ba486ac4724df6fa10b8c71 to your computer and use it in GitHub Desktop.
How to disable Text Extraction in Apache Jackrabbit Oak
  1. Find the location of the oak-lucene jar file:
    find crx-quickstart/launchpad/felix -name "bundle.info" -exec grep oak-lucene {} \; -print
    
    Example output:
    launchpad:resources/install.crx3/15/oak-lucene-1.2.2.jar
    crx-quickstart/launchpad/felix/bundle96/bundle.info
    
  2. Take the second line of the output and remove bundle.info from the path, following the example above, we have:
    crx-quickstart/launchpad/felix/bundle96
    
  3. cd to the subdirectory of that directory starting with "version" in the name:
    cd crx-quickstart/launchpad/felix/bundle96/version*
    
  4. Extract the tika-config.xml file from that jar file to verify that you have the correct jar:
    jar -xvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
  5. Check that the file was extracted:
    vi org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
  6. Replace the contents of the file with the tika-config.xml file below.
  7. Backup the bundle.jar file and update the jar file with the new version of the tika-config.xml:
    jar -uvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
<properties>
<detectors>
<detector class="org.apache.tika.detect.EmptyDetector"/>
</detectors>
<parsers>
<parser class="org.apache.tika.parser.EmptyParser"/>
</parsers>
</properties>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment