Skip to content

Instantly share code, notes, and snippets.

@karthikshiraly
Created December 19, 2011 04:54
Show Gist options
  • Save karthikshiraly/1495445 to your computer and use it in GitHub Desktop.
Save karthikshiraly/1495445 to your computer and use it in GitHub Desktop.
Unit testing solr 3.x tokenization
public static void main(String[] args) {
try {
StringReader inputText = new StringReader("RUNNING runnable");
Map<String, String> tkargs = new HashMap<String, String>();
tkargs.put("luceneMatchVersion", "LUCENE_33");
TokenizerFactory tkf = new WhitespaceTokenizerFactory();
tkf.init(tkargs);
Tokenizer tkz = tkf.create(inputText);
LowerCaseFilterFactory lcf = new LowerCaseFilterFactory();
lcf.init(tkargs);
TokenStream lcts = lcf.create(tkz);
TokenFilterFactory fcf = new SnowballPorterFilterFactory();
Map<String, String> params = new HashMap<String, String>();
params.put("language", "English");
fcf.init(params);
TokenStream ts = fcf.create(lcts);
CharTermAttribute termAttrib = (CharTermAttribute) ts.getAttribute(CharTermAttribute.class);
while (ts.incrementToken()) {
String term = termAttrib.toString();
System.out.println(term);
}
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment