Skip to content

Instantly share code, notes, and snippets.

@karthikshiraly
Created December 19, 2011 04:51
Show Gist options
  • Save karthikshiraly/1495440 to your computer and use it in GitHub Desktop.
Save karthikshiraly/1495440 to your computer and use it in GitHub Desktop.
Unit testing solr 1.4 tokenization
public static void main(String[] args) {
try {
StringReader inputText = new StringReader(args[0]);
TokenizerFactory tkf = new WhitespaceTokenizerFactory();
Tokenizer tkz = tkf.create(inputText);
LowerCaseFilterFactory lcf = new LowerCaseFilterFactory();
TokenStream lcts = lcf.create(tkz);
TokenFilterFactory fcf = new SnowballPorterFilterFactory();
Map params = new HashMap();
params.put("language", "English");
fcf.init(params);
TokenStream ts = fcf.create(lcts);
TermAttribute termAttrib = (TermAttribute) ts.getAttribute(TermAttribute.class);
while (ts.incrementToken()) {
String term = termAttrib.term();
System.out.println(term);
}
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment