Skip to content

Instantly share code, notes, and snippets.

@wannaphong
Last active March 21, 2018 13:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wannaphong/89b5ee65b4e5ed6b4dc3323afa078b72 to your computer and use it in GitHub Desktop.
Save wannaphong/89b5ee65b4e5ed6b4dc3323afa078b72 to your computer and use it in GitHub Desktop.
โค้ดตัดคำภาษาไทยด้วย ICU ใน Java ใช้งานได้ตั้งแต่ Java 1.4 เป็นต้นไป เดติดต้นฉบับ http://vuthi.blogspot.com.au/2004/08/java.html
// เดติดต้นฉบับจาก http://vuthi.blogspot.com.au/2004/08/java.html
public String icu_word_segmentation(String txt){
Locale thaiLocale = new Locale("th");
BreakIterator boundary = BreakIterator.getWordInstance(thaiLocale);
boundary.setText(txt);
StringBuffer strout = new StringBuffer();
int start = boundary.first();
for (int end = boundary.next();
end != BreakIterator.DONE;
start = end, end = boundary.next()) {
strout.append(txt.substring(start, end)+"|");
}
return strout.toString();
}
@wannaphong
Copy link
Author

ทดสอบระบบภาษาไทย
ทดสอบ|ระบบ|ภาษา|ไทย|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment