Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Haystack + Elasticsearch + kuromoji コトハジメ

Haystack + Elasticsearch + kuromoji コトハジメ

更新:2013-09-28
バージョン:0.0.9
作者:@voluntas
URL:http://voluntas.github.io/

Django + Elasticsearch コトハジメの補足記事です

https://gist.github.com/voluntas/21759d5c45aacc0e6656/

TODO

概要

目的

  • Haystack から簡単に日本語全文検索が出来るようにする
  • Haystack の Kuromoji 対応 Elasticsearch バックエンド作成する

環境

Python:2.7.5
Elasticsearch:0.90.5
redis:2.6.16

セットアップ

Elasticsearch は 0.90.5 がインストールされている前提

kuromoji をインストール

github:https://github.com/elasticsearch/elasticsearch-analysis-kuromoji

インストールはコマンドで一発で行けます。

$ cd elasticsearch-0.90.5
$ bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
-> Installing elasticsearch/elasticsearch-analysis-kuromoji/1.5.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-analysis-kuromoji/elasticsearch-analysis-kuromoji-1.5.0.zip...
Downloading .......................................DONE
Installed elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 into /Users/nakai/src/other/elasticsearch-0.90.5/plugins/analysis-kuromoji

kuromoji を使うよう elasticsearch-0.90.5/config/elasticsearch.yml を編集する

index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer

設定はソース参照、一応ハッシュ付きで URL を張っておく。

url:https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/fc23bfd8f2fc66b32bec0ab292c2cb9a50ef1783/src/test/java/org/elasticsearch/index/analysis/kuromoji_analysis.json
{
    "index":{
        "analysis":{
            "filter":{
                "kuromoji_rf":{
                    "type":"kuromoji_readingform",
                    "use_romaji" : "true"
                },
                "kuromoji_pos" : {
                    "type": "kuromoji_part_of_speech",
                    "enable_position_increment" : "false",
                    "stoptags" : ["# verb-main:", "動詞-自立"]
                },
                "kuromoji_ks" : {
                    "type": "kuromoji_stemmer",
                    "minimum_length" : 6
                }


            },

            "tokenizer" : {
                "kuromoji" : {
                   "type":"kuromoji_tokenizer"
                }

            },
            "analyzer" : {
                "kuromoji_analyzer" : {
                    "type" : "custom",
                    "tokenizer" : "kuromoji_tokenizer"
                }
            }

        }
    }
}

KuromojiElasticBackend

Kuromoji を追加した SETTINGS を追加する

from haystack.backends.elasticsearch_backend import (
    ElasticsearchSearchBackend,
    ElasticsearchSearchEngine,
)

class KuromojiElasticBackend(ElasticsearchSearchBackend):

    def __init__(self, connection_alias, **connection_options):
        super(KuromojiElasticBackend, self).__init__(
                                connection_alias, **connection_options)
        SETTINGS = {
            'settings': {
                "analysis": {
                    "analyzer": {
                        "ngram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_ngram"]
                        },
                        "edgengram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_edgengram"]
                        },
                        "kuromoji_analyzer" : {
                            "type" : "custom",
                            "tokenizer" : "kuromoji_tokenizer"
                        },
                    },
                    "tokenizer": {
                        "haystack_ngram_tokenizer": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15,
                        },
                        "haystack_edgengram_tokenizer": {
                            "type": "edgeNGram",
                            "min_gram": 2,
                            "max_gram": 15,
                            "side": "front"
                        },
                        "kuromoji" : {
                           "type":"kuromoji_tokenizer"
                        },
                    },
                    "filter": {
                        "haystack_ngram": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15
                        },
                        "haystack_edgengram": {
                            "type": "edgeNGram",
                            "min_gram": 5,
                            "max_gram": 15
                        },
                        "kuromoji_rf":{
                            "type":"kuromoji_readingform",
                            "use_romaji" : "true"
                        },
                        "kuromoji_pos" : {
                            "type": "kuromoji_part_of_speech",
                            "enable_position_increment" : "false",
                            "stoptags" : ["# verb-main:", "動詞-自立"]
                        },
                        "kuromoji_ks" : {
                            "type": "kuromoji_stemmer",
                            "minimum_length" : 6
                        },
                    }
                }
            }
        }
        setattr(self, 'DEFAULT_SETTINGS', SETTINGS)


class KuromojiElasticSearchEngine(ElasticsearchSearchEngine):
    backend = KuromojiElasticBackend
ELASTICSEARCH_DEFAULT_ANALYZER = "snowball"

おまけ

elasticsearch-head をインストール

github:https://github.com/mobz/elasticsearch-head
url:http://mobz.github.io/elasticsearch-head/

Elasticsearch Cluster を WebUI から見れるプラグイン。 Elasticsearch のプラグインとしてインストールが可能です。

$ bin/plugin -install mobz/elasticsearch-head
$ open http://127.0.0.1:9200/_plugin/head/

参考

elasticsearch/elasticsearch-py
https://github.com/elasticsearch/elasticsearch-py
Python Elasticsearch Client — Elasticsearch 0.4.1 documentation
http://elasticsearch-py.readthedocs.org/en/latest/
Stretching Haystack's ElasticSearch Backend — The Wellfire Blog
http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/
ElasticSearch で kuromoji を使う (ES 0.90.Beta1 + kuromoji 1.2.0篇) - Qiita [キータ]
http://qiita.com/hotchpotch/items/134b049a59fe396c9475
elasticsearch での Kuromoji の使い方 - akishin999の日記
http://d.hatena.ne.jp/akishin999/20130307/1362611100
elasticsearchとkuromojiプラグインで日本語の全文検索 - yuhei.kagaya
http://yuheikagaya.hatenablog.jp/entry/2013/08/06/012150
elasticsearchのGUI「elasticsearch-head」がとても便利 - yuhei.kagaya
http://yuheikagaya.hatenablog.jp/entry/2013/07/14/185752
elasticsearch - EdgeNgramField min and max letters in django haystack - Stack Overflow
http://stackoverflow.com/questions/18908131/edgengramfield-min-and-max-letters-in-django-haystack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment