Skip to content

Instantly share code, notes, and snippets.

@GrazingScientist
Last active March 23, 2018 06:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save GrazingScientist/c60ee4b24f7efa4cbe7bc384325a3cd0 to your computer and use it in GitHub Desktop.
Save GrazingScientist/c60ee4b24f7efa4cbe7bc384325a3cd0 to your computer and use it in GitHub Desktop.
Register a Custom TokenizerFactory in Apache Lucene / Solr

When you created a custom TokenizerFactory for Apache Lucene, you need to register it so the SPI Loader can recognize it.

Let's assume, your have written a TokenizerFactory class like this:

package de.company.subdomain;

public class AwesomeTokenizerFactory extends TokenizerFactory {
  // Lot of awesome code
}

TokenizerFactories in Lucene are called with the AnalysisSPILoader, which searches for the name of the Tokenizer Factory. Hence, the code for calling your factory looks like this:

Tokenizer stream = tokenizerFactory("Awesome").create(newAttributeFactory());

And, before you ask, yes, Lucene knows that the name of the class is "Awesome" and not "AwesomeTokenizerFactory".

To register the class, you need to create the folder structure:

META-INF/services/

This folder should be in the directory where you compile your jar. In Maven with default settings, this would be placed in the src/main/resources/ folder, which is included automatically.

In the just created "service" folder, create a file with the name "org.apache.lucene.analysis.util.TokenizerFactory".

INTO this file you write the path to your class (or classes if you have multiple factories), which in the above given example would be "de.company.subdomain.AwesomeTokenizerFactory" (no quotation marks, only the text!).

You are done! Compile and it should run.

Reference

https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment