Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to disable Text Extraction in Apache Jackrabbit Oak
  1. Find the location of the oak-lucene jar file:
    find crx-quickstart/launchpad/felix -name "bundle.info" -exec grep oak-lucene {} \; -print
    
    Example output:
    launchpad:resources/install.crx3/15/oak-lucene-1.2.2.jar
    crx-quickstart/launchpad/felix/bundle96/bundle.info
    
  2. Take the second line of the output and remove bundle.info from the path, following the example above, we have:
    crx-quickstart/launchpad/felix/bundle96
    
  3. cd to the subdirectory of that directory starting with "version" in the name:
    cd crx-quickstart/launchpad/felix/bundle96/version*
    
  4. Extract the tika-config.xml file from that jar file to verify that you have the correct jar:
    jar -xvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
  5. Check that the file was extracted:
    vi org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
  6. Replace the contents of the file with the tika-config.xml file below.
  7. Backup the bundle.jar file and update the jar file with the new version of the tika-config.xml:
    jar -uvf bundle.jar org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml
    
<properties>
<detectors>
<detector class="org.apache.tika.detect.EmptyDetector"/>
</detectors>
<parsers>
<parser class="org.apache.tika.parser.EmptyParser"/>
</parsers>
</properties>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment