Skip to content

Instantly share code, notes, and snippets.

@johtani
Last active September 11, 2018 13:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johtani/1a844bdd3443fb43fb90 to your computer and use it in GitHub Desktop.
Save johtani/1a844bdd3443fb43fb90 to your computer and use it in GitHub Desktop.
Example using Kuromoji readingform and NGram
DELETE kuromoji_sample
PUT kuromoji_sample
{
"settings": {
"analysis": {
"analyzer": {
"hoge" : {
"type" : "custom",
"tokenizer" : "kuromoji_tokenizer",
"filter" : ["reading", "ngram_sample"]
}
},
"filter": {
"ngram_sample": {
"type": "nGram",
"min_gram": 2,
"max_gram": 2
},
"reading": {
"type": "kuromoji_readingform",
"use_romaji": false
}
}
}
},
"mappings": {
"sample_type" : {
"properties": {
"text": {
"type": "string",
"analyzer": "hoge"
}
}
}
}
}
POST kuromoji_sample/_extended_analyze?field=text
{"黒文字のNGram"}
Example Response:
{
"tokens": [
{
"token": "クロ",
"start_offset": 2,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "ロモ",
"start_offset": 2,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "モジ",
"start_offset": 2,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "NG",
"start_offset": 6,
"end_offset": 11,
"type": "word",
"position": 3
},
{
"token": "Gr",
"start_offset": 6,
"end_offset": 11,
"type": "word",
"position": 3
},
{
"token": "ra",
"start_offset": 6,
"end_offset": 11,
"type": "word",
"position": 3
},
{
"token": "am",
"start_offset": 6,
"end_offset": 11,
"type": "word",
"position": 3
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment