Skip to content

Instantly share code, notes, and snippets.

@yu-tang
Created June 5, 2015 16:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yu-tang/e036149c1c6e89484f28 to your computer and use it in GitHub Desktop.
Save yu-tang/e036149c1c6e89484f28 to your computer and use it in GitHub Desktop.
test for OmegaT tokenizers
org.omegat.tokenizer.LuceneEnglishTokenizer
--------------------
0. "fooled"
1. "by"
2. "a"
3. "smile"
org.omegat.tokenizer.LuceneJapaneseTokenizer
--------------------
0. "fooled"
1. " "
2. "by"
3. " "
4. "a"
5. " "
6. "smile"
org.omegat.tokenizer.TinySegmenterJapaneseTokenizer
--------------------
0. "fooled"
1. " "
2. "by"
3. " "
4. "a"
5. " "
6. "smile"
スクリプトの戻り値
[Ljava.lang.String;@7cecd10e
def text = "fooled by a smile"
def dump = { tokenizer ->
console.println "\n${tokenizer.class.name}\n${'-'*20}"
def tokenList = tokenizer.tokenizeWordsForSpelling(text)
def words = org.omegat.util.Token.getTextsFromString(tokenList, text)
words.eachWithIndex {word, index ->
console.println "$index. \"$word\""
}
}
dump new org.omegat.tokenizer.LuceneEnglishTokenizer()
dump new org.omegat.tokenizer.LuceneJapaneseTokenizer()
dump new org.omegat.tokenizer.TinySegmenterJapaneseTokenizer()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment