Created
October 10, 2010 11:39
-
-
Save osima/619176 to your computer and use it in GitHub Desktop.
形態素解析
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// g100pon #97 形態素解析 | |
// | |
// Yahoo Web API の日本語形態素(http://developer.yahoo.co.jp/webapi/jlp/ma/v1/parse.html)を使用します。 | |
// このコードを使うには事前にアプリケーションIDを取得する必要があります。 | |
// (アプリケーションIDはYahooの日本語形態素ページで無料で取得できます。) | |
// | |
// 使い方 : | |
// groovy -c UTF-8 ma アプリケーションID | |
// | |
@Grab(group='jdom', module='jdom', version='1.1') | |
import org.jdom.* | |
import org.jdom.input.* | |
if( args.length<1 ){ | |
println 'Usage : groovy ma ApplicationID' | |
System.exit(0) | |
} | |
text = '我が輩は猫である。名前はまだない。' | |
appid = args[0] | |
url = 'http://jlp.yahooapis.jp/MAService/V1/parse?'+ | |
"appid=${appid}" + | |
'&sentence=' + URLEncoder.encode(text, "UTF-8") + | |
'&response=surface,reading,pos' + | |
'&filter=' + | |
'&results=ma'; | |
ns = Namespace.getNamespace("urn:yahoo:jp:jlp") | |
new SAXBuilder().build(new URL(url)).rootElement.getChild("ma_result",ns).getChild("word_list",ns).getChildren("word",ns).each{ | |
def eSurface = it.getChild('surface',ns) | |
def eReading = it.getChild('reading',ns) | |
def ePos = it.getChild('pos',ns) | |
if( eSurface && eReading && ePos ){ | |
println "${eSurface.text}|${eReading.text}|${ePos.text}" | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
我が輩|わがはい|名詞 | |
は|は|助詞 | |
猫|ねこ|名詞 | |
で|で|助動詞 | |
ある|ある|助動詞 | |
。|。|特殊 | |
名前|なまえ|名詞 | |
は|は|助詞 | |
まだ|まだ|副詞 | |
ない|ない|形容詞 | |
。|。|特殊 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment