Skip to content

Instantly share code, notes, and snippets.

@voluntas
Last active June 30, 2023 08:28
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save voluntas/21759d5c45aacc0e6656 to your computer and use it in GitHub Desktop.
Save voluntas/21759d5c45aacc0e6656 to your computer and use it in GitHub Desktop.
Django + Haystack + Elasticsearch コトハジメ

Django + Haystack + Elasticsearch コトハジメ

更新

2013-09-22

バージョン

0.0.2

作者

@voluntas

URL

http://voluntas.github.io/

環境

Python

2.7.5

Elasticsearch

0.90.5

redis

2.6.16

概要

  • Django で全文検索機能を使う
  • 全文検索には Elasticsearch を使う
  • Django の検索フレームワークには Haystack を使う
  • 検索のインデックス作成は Celery を使ったキューによる非同期処理を行う
  • キューには Redis を使用する

TODO

  • 秒間 10000 程度の更新負荷に耐えられるかどうか確認する
  • 外部キーのインデックス
  • 日本語全文検索

Elasticsearch

url

http://www.elasticsearch.org/

ダウンロードして、とりあえず動かすだけならとても簡単

$ curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.tar.gz
$ tar xvfz elasticsearch-0.90.5.tar.gz
$ cd elasticsearch-0.90.5
$ ./bin/elasticsearch -f
[2013-09-22 22:36:42,191][INFO ][node                     ] [The Night Man] version[0.90.5], pid[22901], build[c8714e8/2013-09-17T12:50:20Z]
[2013-09-22 22:36:42,192][INFO ][node                     ] [The Night Man] initializing ...
[2013-09-22 22:36:42,199][INFO ][plugins                  ] [The Night Man] loaded [], sites []
[2013-09-22 22:36:44,092][INFO ][node                     ] [The Night Man] initialized
[2013-09-22 22:36:44,092][INFO ][node                     ] [The Night Man] starting ...
[2013-09-22 22:36:44,181][INFO ][transport                ] [The Night Man] bound_address {inet[/0:0:0:0:0:0:0:0%0:9300]}, publish_address {inet[/192.0.2.1:9300]}
[2013-09-22 22:36:47,243][INFO ][cluster.service          ] [The Night Man] new_master [The Night Man][gjXA_KlvQ7aGmg2zoiRmPQ][inet[/192.0.2.1:9300]], reason: zen-disco-join (elected_as_master)
[2013-09-22 22:36:47,277][INFO ][discovery                ] [The Night Man] elasticsearch/gjXA_KlvQ7aGmg2zoiRmPQ
[2013-09-22 22:36:47,288][INFO ][http                     ] [The Night Man] bound_address {inet[/0:0:0:0:0:0:0:0%0:9200]}, publish_address {inet[/192.0.2.1:9200]}
[2013-09-22 22:36:47,289][INFO ][node                     ] [The Night Man] started
[2013-09-22 22:36:47,307][INFO ][gateway                  ] [The Night Man] recovered [0] indices into cluster_state

Redis

url

http://redis.io/

ダウンロードして、ビルドして、テストして、とりあえず動かすだけならとても簡単

$ curl -O http://download.redis.io/releases/redis-2.6.16.tar.gz
$ tar xvfz redis-2.6.16.tar.gz
$ cd redis-2.6.16
$ make
$ make test
$ ./src/redis-server
[22722] 22 Sep 22:25:10.242 # Warning: no config file specified, using the default config. In order to specify a config file use ./src/redis-server /path/to/redis.conf
[22722] 22 Sep 22:25:10.243 * Max number of open files set to 10032
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.6.16 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 22722
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

[22722] 22 Sep 22:25:10.244 # Server started, Redis version 2.6.16
[22722] 22 Sep 22:25:10.244 * The server is now ready to accept connections on port 6379

Django

url

https://www.djangoproject.com/

requirements.txt:

Django==1.5.4
celery==3.0.23
django-celery==3.0.23
celery-haystack==0.7.2
django-haystack==2.1.0
redis==2.8.0
$ pip install -r requirements.txt

settings.py にいくつか設定が必要、まずは最低限の設定

  • redis と elasticsearch はローカルで動かしてる
INSTALLED_APPS = (
    ...

    'djcelery',
    'haystack',
    'celery_haystack',
    ..
)

# haystack

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

HAYSTACK_SIGNAL_PROCESSOR = 'celery_haystack.signals.CelerySignalProcessor'

# celery

import djcelery
djcelery.setup_loader()

BROKER_URL = 'redis://127.0.0.1:6379/4'

models.py

from django.db import models

class Note(models.Model):
    title = models.CharField(max_length=255, blank=False, null=False)
    author = models.CharField(max_length=255, blank=False, null=False)
    text = models.TextField(blank=False, null=False)

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

search_indexes.py

Celery を使って更新する場合は Haystack の SearchIndex ではなく celery_haystack の CelerySearchIndex を継承させること

import datetime

from haystack import indexes
from celery_haystack.indexes import CelerySearchIndex
from .models import Note

class NoteIndex(CelerySearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    title = indexes.IntegerField(model_attr='title')
    author = indexes.IntegerField(model_attr='author')

    created_at = indexes.DateTimeField(model_attr='created_at')
    updated_at = indexes.DateTimeField(model_attr='updated_at')

    def get_model(self):
        return Note

    def index_queryset(self, using=None):
        return self.get_model().objects.filter(updated_at__lte=datetime.datetime.now())

セットアップ

$ python manage.py syncdb
$ python manage.py rebuild_index
$ python manage.py celery worker --loglevel=info
$ python manage.py runserver

参考

Django

Django documentation | Django documentation | Django

https://docs.djangoproject.com/en/1.5/

django/django

https://github.com/django/django

Elasticsearch

Open Source Distributed Real Time Search & Analytics | Elasticsearch

http://www.elasticsearch.org/

elasticsearch/elasticsearch

https://github.com/elasticsearch/elasticsearch

celery

Celery - Distributed Task Queue — Celery 3.0.23 documentation

http://docs.celeryproject.org/en/latest/index.html

celery/celery

https://github.com/celery/celery

django-haystack

Haystack - Search for Django

http://haystacksearch.org/

toastdriven/django-haystack

https://github.com/toastdriven/django-haystack

django-celery

Django — Celery 3.0.23 documentation

http://docs.celeryproject.org/en/latest/django/

celery/django-celery

https://github.com/celery/django-celery

celery-haystack

celery-haystack — celery-haystack 0.7.2 documentation

http://celery-haystack.readthedocs.org/en/latest/

redis-py

andymccurdy/redis-py

https://github.com/andymccurdy/redis-py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment