Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Field notes gathered during installing and configuring ElasticSearch for http://elasticsearch.org

ElasticSearch.org Website Search: Field Notes

These are field notes gathered during installation of website search facility for the ElasticSearch website.

You may re-use it to put a similar system in place.

The following assumes:

  • You are on a Ubuntu Linux system, or compatible/similar
  • You have sudo permisssions for the system

Update System

sudo apt-get update
sudo apt-get upgrade

Install Tools

sudo apt-get install build-essential curl vim nmap
sudo apt-get install ruby ruby-dev libopenssl-ruby

Install Git

We cannot install Git from packages, at the moment Ubuntu comes with 1.7.0.4, a year old version. Unbelievable.

sudo apt-get install libz-dev tk
cd ~
wget http://kernel.org/pub/software/scm/git/git-1.7.4.4.tar.bz2
./configure --prefix=/usr/local
sudo make install clean

Install Java

Install Sun Java.

Anectodal evidence suggests that any “open Java” will break stuff. Any evidence to the contrary seeked and desired.

sudo vim /etc/apt/sources.list
deb http://archive.canonical.com/ubuntu lucid partner
deb-src http://archive.canonical.com/ubuntu lucid partner

sudo apt-get install sun-java6-jdk
java -version

Install ElasticSearch a.k.a. “Let's define easy”

cd /usr/local/lib

sudo curl -k -L -o elasticsearch-0.15.0.tar.gz http://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.15.0.tar.gz
sudo tar -zxvf elasticsearch-0.15.2.tar.gz
rm elasticsearch-0.15.2.tar.gz

Configure ElasticSearch

Add user for ElasticSearch and other associated services:

sudo adduser --home /home/elasticsearch --disabled-password --system --group elasticsearch

Important! Increase the open files limit for the elasticsearch user:

sudo vim /etc/security/limits.conf
elasticsearch     -    nofile    32000
elasticsearch     -    memlock    unlimited

sudo vim /etc/pam.d/su
session    required   pam_limits.so

Set cluster name, paths where you want to store logs and data and other options for ElasticSearch:

cd /usr/local/lib/elasticsearch-0.15.2

sudo vim config/elasticsearch.yml
# Cluster Settings
cluster:
  name: elasticsearch_website

path:
  logs: /var/log/elasticsearch
  data: /var/data/elasticsearch

boostrap:
  mlockall: true

Make sure proper permissions are set:

sudo mkdir -p /var/log/elasticsearch
sudo chown -R elasticsearch:admin /var/log/elasticsearch
sudo chmod -R ug+rw /var/log/elasticsearch/

sudo mkdir -p /var/data/elasticsearch
sudo chown -R elasticsearch:admin /var/data/elasticsearch
sudo chmod -R ug+rw /var/data/elasticsearch

sudo mkdir -p /var/run/elasticsearch
sudo chown -R elasticsearch:admin /var/run/elasticsearch
sudo chmod -R ug+rw /var/run/elasticsearch

Start ElasticSearch

sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid
curl http://localhost:9200

Setup Website

cd /var/data
sudo git clone git://github.com/elasticsearch/elasticsearch.github.com.git elasticsearch_website

sudo chown -R elasticsearch:admin /var/data/elasticsearch_website
sudo chmod -R ug+rw /var/data/elasticsearch_website

sudo gem install jekyll

Setup Hide

Hide is tiny application to allow importing the Jekyll website data into ElasticSearch and to receive Github HTTP post-receive notifications.

sudo mkdir -p /var/applications
cd /var/applications/
sudo git clone git://github.com/karmi/hide.git

sudo chown -R elasticsearch:admin /var/applications
sudo chmod -R ug+rw /var/applications

cd /var/applications/hide
sudo cp config.example.rb config.rb
sudo vim config.rb
:path        => '/var/data/elasticsearch_website'
sudo chown -R elasticsearch:admin /var/applications/hide/config.rb

sudo gem install bundler -v 1.0.10
sudo -H -u elasticsearch bundle install

Import website data into ElasticSearch:

sudo -H -u elasticsearch bundle exec rake index:destroy index:setup index:import

Start the post-receive hook server:

sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start

Test the post-receive hook via Github (https://github.com/elasticsearch/elasticsearch.github.com/admin/hooks#generic_minibucket). You can just click it.

Install Varnish

We will use Varnish to serve as a restricting proxy for ElasticSearch. (Of course, we could also use Nginx, Apache, etc. as a proxy.)

We will allow only GET requests to the _search endpoint. In the future, we may do more interesting tricks.

Install:

curl http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add -
sudo vim /etc/apt/sources.list
deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-2.1

sudo apt-get update
sudo apt-get install varnish

Configure:

sudo chown -R elasticsearch:admin /etc/varnish
sudo chmod -R ug+rw /etc/varnish

sudo chown -R elasticsearch:admin /var/lib/varnish/
sudo chmod -R ug+rw /var/lib/varnish/

sudo vim /etc/varnish/default.vcl

backend default {
    .host = "127.0.0.1";
    .port = "9200";
}

sub vcl_recv {
  if (req.request != "GET" || req.url !~ "/_search") {
    error 403;
  }
}

sub vcl_fetch {
    set beresp.grace = 30m;
}

sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";
    synthetic {"
<!DOCTYPE html>
<html>
  <head>
    <title>"} obj.status " " obj.response {"</title>
  </head>
  <body>
    <h1>Error "} obj.status " " obj.response {"</h1>
    <p>Use the <a href='/_search?pretty=true&q=*'>/<code>_search</code></a> API.</p>
    <hr>
    <p><a href='http://elasticsearch.org'>http://elasticsearch.org</a></p>
  </body>
</html>
"};
    return (deliver);
}

Start:

sudo mkdir -p /var/run/varnish/
sudo chown -R elasticsearch:admin /var/run/varnish
sudo chmod -R ug+rw /var/run/varnish

sudo su - elasticsearch -c "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid"

Setup Monit

We will put the system under surveillance with Monit.

Install and enable:

sudo apt-get install monit

sudo vim /etc/default/monit
# You must set this variable to for monit to start
startup=1

sudo /etc/init.d/monit start

Configure:

sudo vim /etc/monit/monitrc

# ###################
# Monit Configuration
# ###################

set daemon 120
  with start delay 240

set alert user@example.com
set mailserver localhost

set httpd port 2812 and
   use address localhost
   allow localhost

check system search.elasticsearch.org
  if loadavg (5min) > 10 then alert
  if memory usage > 80% then alert
  if cpu usage (user) > 90% then alert

check filesystem data with path /var
  if space usage > 80% for 5 times within 15 cycles then alert
  if inode usage > 90% then alert
  if space usage > 99% then stop
  if inode usage > 99% then stop
  group filesystem

check host elasticsearch with address 127.0.0.1
  if failed url http://127.0.0.1:9200/ with timeout 15 seconds then alert
  group elasticsearch

check process elasticsearch1 with pidfile /var/run/elasticsearch/elasticsearch1.pid
  start program = "/usr/bin/sudo -H -u elasticsearch /usr/local/lib/elasticsearch-0.15.2/bin/elasticsearch -p /var/run/elasticsearch/elasticsearch1.pid" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/run/elasticsearch/elasticsearch1.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 2 GB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group elasticsearch

check process varnishd with pidfile /var/run/varnish/varnishd.pid
  start program = "/usr/sbin/varnishd -f /etc/varnish/default.vcl -a 0.0.0.0:80 -P /var/run/varnish/varnishd.pid" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/run/varnish/varnishd.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 500 MB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group elasticsearch

check process post_receive_server with pidfile   /var/applications/hide/tmp/thin.pid
  start program = "/usr/bin/sudo -H -u elasticsearch /usr/bin/env BUNDLE_GEMFILE=/var/applications/hide/Gemfile /usr/bin/bundle exec thin --chdir /var/applications/hide --rackup /var/applications/hide/config.ru --port 5000 --log /var/applications/hide/log/thin.log --pid /var/applications/hide/tmp/thin.pid --environment production --tag hide --daemonize start" with timeout 60 seconds
  stop program  = "/bin/kill $(/bin/cat /var/applications/hide/tmp/thin.pid)"
  if cpu > 90% for 5 cycles then restart
  if totalmem > 2 GB for 5 cycles then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout
  group git

Use SSH tunnel to connect to Monit GUI:

ssh elasticsearch -L 2812:localhost:2812
open http://localhost:2812

Otherwise, just check it on the CLI:

sudo monit status

To reload Monit configuration, use:

sudo monit reload

To start all services, use:

sudo monit start all

Wrap Up

Congratulations! You now have “continuous indexing” system set up for searching your Jekyll website with ElasticSearch.


Author: Karel Minarik

@emgiezet

This comment has been minimized.

Copy link

emgiezet commented May 8, 2012

Nice work!

@jboren

This comment has been minimized.

Copy link

jboren commented Sep 25, 2013

Fantastic! Thanks for posting this, very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.