Skip to content

Instantly share code, notes, and snippets.

@yauhen-info
Last active January 2, 2016 07:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yauhen-info/8267659 to your computer and use it in GitHub Desktop.
Save yauhen-info/8267659 to your computer and use it in GitHub Desktop.
Illinois POS tagger sample of usage. There are libs (LBJPOS.jar and LBJ2Library.jar) are required. http://cogcomp.cs.illinois.edu/page/software_view/3 http://cogcomp.cs.illinois.edu/page/software_view/11 http://en.wikipedia.org/wiki/Learning_Based_Java
package test;
import LBJ2.nlp.SentenceSplitter;
import LBJ2.nlp.WordSplitter;
import LBJ2.nlp.seg.PlainToTokenParser;
import LBJ2.nlp.seg.Token;
import edu.illinois.cs.cogcomp.lbj.pos.POSTagger;
public class EntryPointIllinoisTaggerTest {
public static void main(String[] args) {
SentenceSplitter sentenceSplitter = new SentenceSplitter("test.txt");
WordSplitter wordSplitter = new WordSplitter(sentenceSplitter);
PlainToTokenParser parser = new PlainToTokenParser(wordSplitter);
Token token = (Token) parser.next();
POSTagger tagger = new POSTagger();
while (token != null) {
String tag = tagger.discreteValue(token);
System.out.print(token.form + "(" + tag + ") ");
token = (Token) parser.next();
}
}
}
Output looks like:
if(IN) any(DT) ,(,) know(VB) more(JJR) about(IN) Java(NNP) performance(NN) tuning(VBG) than(IN) Java(NNP) Champion(NNP) Kirk(NNP) Pepperdine(NNP) ,(,) currently(RB) a(DT) principal(JJ) consultant(NN) at(IN) Kodewerk(NNP) ,(,) a(DT) company(NN) that(WDT) offers(VBZ) performancerelated(VBN) services(NNS) ,(,) training(NN) ,(,) and(CC) custom(NN) tooling(NN) .(.) Pepperdine(NNP) ,(,) a(DT) leading(VBG) consultant(NN) who(WP) has(VBZ) shared(VBN) his(PRP$) insight(NN) on(IN) performance(NN) tuning(VBG) at(IN) conferences(NNS) throughout(IN) the(DT) world(NN) ,(,) is(VBZ) highly(RB) regarded(VBN) for(IN) his(PRP$) workshops(NNS) and(CC) articles(NNS) .(.) In(IN) a(DT) world(NN) of(IN) rapid(JJ) changes(NNS) in(IN) both(DT) hardware(NN) and(CC) software(NN) ,(,) his(PRP$) voice(NN) is(VBZ) well(RB) worth(JJ) hearing(NN) .(.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment