Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Field notes gathered during installing and configuring ElasticSearch for http://elasticsearch.org

ElasticSearch.org Website Search: Field Notes

These are field notes gathered during installation of website search facility for the ElasticSearch website.

You may re-use it to put a similar system in place.

The following assumes:

  • You are on a Ubuntu Linux system, or compatible/similar
  • You have sudo permisssions for the system

Update System

sudo apt-get update
sudo apt-get upgrade

Install Tools

sudo apt-get install build-essential curl vim nmap
sudo apt-get install ruby ruby-dev libopenssl-ruby

Install Git

We cannot install Git from packages, at the moment Ubuntu comes with 1.7.0.4, a year old version. Unbelievable.

sudo apt-get install libz-dev tk
cd ~
wget http://kernel.org/pub/software/scm/git/git-1.7.4.4.tar.bz2
./configure --prefix=/usr/local
sudo make install clean

Install Java

Install Sun Java.

Anectodal evidence suggests that any “open Java” will break stuff. Any evidence to the contrary seeked and desired.

sudo vim /etc/apt/sources.list
deb http://archive.canonical.com/ubuntu lucid partner
deb-src http://archive.canonical.com/ubuntu lucid partner

sudo apt-get install sun-java6-jdk
java -version

Install ElasticSearch a.k.a. “Let's define easy”

cd /usr/local/lib

sudo curl -k -L -o elasticsearch-0.15.0.tar.gz http://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.15.0.tar.gz
sudo tar -zxvf elasticsearch-0.15.2.tar.gz
rm elasticsearch-0.15.2.tar.gz

Configure ElasticSearch

Add user for ElasticSearch and other associated services:

sudo adduser --home /home/elasticsearch --disabled-password --system --group elasticsearch

Important! Increase the open files limit for the elasticsearch user:

sudo vim /etc/security/limits.conf
elasticsearch     -    nofile    32000
elasticsearch     -    memlock    unlimited

sudo vim /etc/pam.d/su
session    required   pam_limits.so

Set cluster name, paths where you want to store logs and data and other options for ElasticSearch:

cd /usr/local/lib/elasticsearch-0.15.2

sudo vim config/elasticsearch.yml
# Cluster Settings
cluster:
  name: elasticsearch_website

path:
  logs: /var/log/elasticsearch
  data: /var/data/elasticsearch

boostrap:
  mlockall: true

Make sure proper permissions are set:

sudo mkdir -p /var/log/elasticsearch
sudo chown -R elasticsearch:admin /var/log/elasticsearch
sudo chmod -R ug+rw /var/log/elasticsearch/

sudo mkdir -p /var/data/elasticsearch
sudo chown -R elasticsearch:admin /var/data/elasticsearch
sudo chmod -R ug+rw /var/data/elasticsearch

sudo mkdir -p /var/run/elasticsearch
sudo chown -R elasticsearch:admin /var/run/elasticsearch
sudo chmod -R ug+rw /var/run/elasticsearch

Start ElasticSearch

sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid
curl http://localhost:9200

Setup Website

cd /var/data
sudo git clone git://github.com/elasticsearch/elasticsearch.github.com.git elasticsearch_website

sudo chown -R elasticsearch:admin /var/data/elasticsearch_website
sudo chmod -R ug+rw /var/data/elasticsearch_website

sudo gem install jekyll

Setup Hide

Hide is tiny application to allow importing the Jekyll website data into ElasticSearch and to receive Github HTTP post-receive notifications.

sudo mkdir -p /var/applications
cd /var/applications/
sudo git clone git://github.com/karmi/hide.git

sudo chown -R elasticsearch:admin /var/applications
sudo chmod -R ug+rw /var/applications

cd /var/applications/hide
sudo cp config.example.rb config.rb
sudo vim config.rb
:path        => '/var/data/elasticsearch_website'
sudo chown -R elasticsearch:admin /var/applications/hide/config.rb

sudo gem install bundler
sudo -H -u elasticsearch bundle install --deployment

Import website data into ElasticSearch:

sudo -H -u elasticsearch rake index:destroy index:setup index:import

Start the post-receive hook server:

sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start

Test the post-receive hook via Github (https://github.com/elasticsearch/elasticsearch.github.com/admin/hooks#generic_minibucket). You can just click it.

Install Varnish

We will use Varnish to serve as a restricting proxy for ElasticSearch. (Of course, we could also use Nginx, Apache, etc. as a proxy.)

We will allow only GET requests to the _search endpoint. In the future, we may do more interesting tricks.

Install:

curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add -
sudo vim /etc/apt/sources.list
deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-2.1

sudo apt-get update
sudo apt-get install varnish

Configure:

sudo chown -R elasticsearch:admin /etc/varnish
sudo chmod -R ug+rw /etc/varnish

sudo chown -R elasticsearch:admin /var/lib/varnish/
sudo chmod -R ug+rw /var/lib/varnish/

sudo vim /etc/varnish/default.vcl

backend default {
    .host = "127.0.0.1";
    .port = "9200";
}

sub vcl_recv {
  if (req.request != "GET" || req.url !~ "/_search") {
    error 403;
  }
}

sub vcl_fetch {
    set beresp.grace = 30m;
}

sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";
    synthetic {"
<!DOCTYPE html>
<html>
  <head>
    <title>"} obj.status " " obj.response {"</title>
  </head>
  <body>
    <h1>Error "} obj.status " " obj.response {"</h1>
    <p>Use the <a href='/_search?pretty=true&q=*'>/<code>_search</code></a> API.</p>
    <hr>
    <p><a href='http://elasticsearch.org'>http://elasticsearch.org</a></p>
  </body>
</html>
"};
    return (deliver);
}

Start:

sudo mkdir -p /var/run/varnish/
sudo chown -R elasticsearch:admin /var/run/varnish
sudo chmod -R ug+rw /var/run/varnish

sudo su - elasticsearch -c "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid"

Setup Monit

We will put the system under surveillance with Monit.

Install and enable:

sudo apt-get install monit

sudo vim /etc/default/monit
# You must set this variable to for monit to start
startup=1

sudo /etc/init.d/monit start

Configure:

sudo vim /etc/monit/monitrc

# ###################
# Monit Configuration
# ###################

set daemon 120
  with start delay 240

set alert user@example.com
set mailserver localhost

set httpd port 2812 and
   use address localhost
   allow localhost

check system search.elasticsearch.org
  if loadavg (5min) > 10 then alert
  if memory usage > 80% then alert
  if cpu usage (user) > 90% then alert

check filesystem data with path /var
  if space usage > 80% for 5 times within 15 cycles then alert
  if inode usage > 90% then alert
  if space usage > 99% then stop
  if inode usage > 99% then stop
  group filesystem

check host elasticsearch with address 127.0.0.1
  if failed url http://127.0.0.1:9200/ with timeout 15 seconds then alert
  group elasticsearch

check process elasticsearch1 with pidfile /var/run/elasticsearch/elasticsearch1.pid
  start program = "/usr/bin/sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/run/elasticsearch/elasticsearch1.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 2 GB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group elasticsearch

check process varnishd with pidfile /var/run/varnish/varnishd.pid
  start program = "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/run/varnish/varnishd.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 500 MB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group elasticsearch

check process post_receive_server with pidfile   /var/applications/hide/tmp/thin.pid
  start program = "/usr/bin/sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/applications/hide/tmp/thin.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 2 GB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group git

Use SSH tunnel to connect to Monit GUI:

ssh elasticsearch -L 2812:localhost:2812
open http://localhost:2812

Otherwise, just check it on the CLI:

sudo monit status

To reload Monit configuration, use:

sudo monit reload

To start all services, use:

sudo monit start all

Wrap Up

Congratulations! You now have “continuous indexing” system set up for searching your Jekyll website with ElasticSearch.


Author: Karel Minarik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.