Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Register a Custom TokenizerFactory in Apache Lucene / Solr

When you created a custom TokenizerFactory for Apache Lucene, you need to register it so the SPI Loader can recognize it.

Let's assume, your have written a TokenizerFactory class like this:

package de.company.subdomain;

public class AwesomeTokenizerFactory extends TokenizerFactory {
  // Lot of awesome code
}

TokenizerFactories in Lucene are called with the AnalysisSPILoader, which searches for the name of the Tokenizer Factory. Hence, the code for calling your factory looks like this:

Tokenizer stream = tokenizerFactory("Awesome").create(newAttributeFactory());

And, before you ask, yes, Lucene knows that the name of the class is "Awesome" and not "AwesomeTokenizerFactory".

To register the class, you need to create the folder structure:

META-INF/services/

This folder should be in the directory where you compile your jar. In Maven with default settings, this would be placed in the src/main/resources/ folder, which is included automatically.

In the just created "service" folder, create a file with the name "org.apache.lucene.analysis.util.TokenizerFactory".

INTO this file you write the path to your class (or classes if you have multiple factories), which in the above given example would be "de.company.subdomain.AwesomeTokenizerFactory" (no quotation marks, only the text!).

You are done! Compile and it should run.

Reference

https://docs.oracle.com/javase/7/docs/api/java/util/ServiceLoader.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment