Skip to content

Instantly share code, notes, and snippets.

@voluntas
Last active March 12, 2024 07:08
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save voluntas/6739918 to your computer and use it in GitHub Desktop.
Save voluntas/6739918 to your computer and use it in GitHub Desktop.
Haystack + Elasticsearch + kuromoji コトハジメ

Haystack + Elasticsearch + kuromoji コトハジメ

更新

2013-09-28

バージョン

0.0.9

作者

@voluntas

URL

http://voluntas.github.io/

Django + Elasticsearch コトハジメの補足記事です

https://gist.github.com/voluntas/21759d5c45aacc0e6656/

TODO

概要

目的

  • Haystack から簡単に日本語全文検索が出来るようにする
  • Haystack の Kuromoji 対応 Elasticsearch バックエンド作成する

環境

Python

2.7.5

Elasticsearch

0.90.5

redis

2.6.16

セットアップ

Elasticsearch は 0.90.5 がインストールされている前提

kuromoji をインストール

github

https://github.com/elasticsearch/elasticsearch-analysis-kuromoji

インストールはコマンドで一発で行けます。

$ cd elasticsearch-0.90.5
$ bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
-> Installing elasticsearch/elasticsearch-analysis-kuromoji/1.5.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-analysis-kuromoji/elasticsearch-analysis-kuromoji-1.5.0.zip...
Downloading .......................................DONE
Installed elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 into /Users/nakai/src/other/elasticsearch-0.90.5/plugins/analysis-kuromoji

kuromoji を使うよう elasticsearch-0.90.5/config/elasticsearch.yml を編集する

index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer

設定はソース参照、一応ハッシュ付きで URL を張っておく。

url

https://github.com/elasticsearch/elasticsearch-analysis-kuromoji/blob/fc23bfd8f2fc66b32bec0ab292c2cb9a50ef1783/src/test/java/org/elasticsearch/index/analysis/kuromoji_analysis.json

{
    "index":{
        "analysis":{
            "filter":{
                "kuromoji_rf":{
                    "type":"kuromoji_readingform",
                    "use_romaji" : "true"
                },
                "kuromoji_pos" : {
                    "type": "kuromoji_part_of_speech",
                    "enable_position_increment" : "false",
                    "stoptags" : ["# verb-main:", "動詞-自立"]
                },
                "kuromoji_ks" : {
                    "type": "kuromoji_stemmer",
                    "minimum_length" : 6
                }


            },

            "tokenizer" : {
                "kuromoji" : {
                   "type":"kuromoji_tokenizer"
                }

            },
            "analyzer" : {
                "kuromoji_analyzer" : {
                    "type" : "custom",
                    "tokenizer" : "kuromoji_tokenizer"
                }
            }

        }
    }
}

KuromojiElasticBackend

Kuromoji を追加した SETTINGS を追加する

from haystack.backends.elasticsearch_backend import (
    ElasticsearchSearchBackend,
    ElasticsearchSearchEngine,
)

class KuromojiElasticBackend(ElasticsearchSearchBackend):

    def __init__(self, connection_alias, **connection_options):
        super(KuromojiElasticBackend, self).__init__(
                                connection_alias, **connection_options)
        SETTINGS = {
            'settings': {
                "analysis": {
                    "analyzer": {
                        "ngram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_ngram"]
                        },
                        "edgengram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_edgengram"]
                        },
                        "kuromoji_analyzer" : {
                            "type" : "custom",
                            "tokenizer" : "kuromoji_tokenizer"
                        },
                    },
                    "tokenizer": {
                        "haystack_ngram_tokenizer": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15,
                        },
                        "haystack_edgengram_tokenizer": {
                            "type": "edgeNGram",
                            "min_gram": 2,
                            "max_gram": 15,
                            "side": "front"
                        },
                        "kuromoji" : {
                           "type":"kuromoji_tokenizer"
                        },
                    },
                    "filter": {
                        "haystack_ngram": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15
                        },
                        "haystack_edgengram": {
                            "type": "edgeNGram",
                            "min_gram": 5,
                            "max_gram": 15
                        },
                        "kuromoji_rf":{
                            "type":"kuromoji_readingform",
                            "use_romaji" : "true"
                        },
                        "kuromoji_pos" : {
                            "type": "kuromoji_part_of_speech",
                            "enable_position_increment" : "false",
                            "stoptags" : ["# verb-main:", "動詞-自立"]
                        },
                        "kuromoji_ks" : {
                            "type": "kuromoji_stemmer",
                            "minimum_length" : 6
                        },
                    }
                }
            }
        }
        setattr(self, 'DEFAULT_SETTINGS', SETTINGS)


class KuromojiElasticSearchEngine(ElasticsearchSearchEngine):
    backend = KuromojiElasticBackend
ELASTICSEARCH_DEFAULT_ANALYZER = "snowball"

おまけ

elasticsearch-head をインストール

github

https://github.com/mobz/elasticsearch-head

url

http://mobz.github.io/elasticsearch-head/

Elasticsearch Cluster を WebUI から見れるプラグイン。 Elasticsearch のプラグインとしてインストールが可能です。

$ bin/plugin -install mobz/elasticsearch-head
$ open http://127.0.0.1:9200/_plugin/head/

参考

elasticsearch/elasticsearch-py

https://github.com/elasticsearch/elasticsearch-py

Python Elasticsearch Client — Elasticsearch 0.4.1 documentation

http://elasticsearch-py.readthedocs.org/en/latest/

Stretching Haystack's ElasticSearch Backend — The Wellfire Blog

http://www.wellfireinteractive.com/blog/custom-haystack-elasticsearch-backend/

ElasticSearch で kuromoji を使う (ES 0.90.Beta1 + kuromoji 1.2.0篇) - Qiita [キータ]

http://qiita.com/hotchpotch/items/134b049a59fe396c9475

elasticsearch での Kuromoji の使い方 - akishin999の日記

http://d.hatena.ne.jp/akishin999/20130307/1362611100

elasticsearchとkuromojiプラグインで日本語の全文検索 - yuhei.kagaya

http://yuheikagaya.hatenablog.jp/entry/2013/08/06/012150

elasticsearchのGUI「elasticsearch-head」がとても便利 - yuhei.kagaya

http://yuheikagaya.hatenablog.jp/entry/2013/07/14/185752

elasticsearch - EdgeNgramField min and max letters in django haystack - Stack Overflow

http://stackoverflow.com/questions/18908131/edgengramfield-min-and-max-letters-in-django-haystack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment