davewongillies/ElasticSearch Quick Guide.md

## ElasticSearch Quick Guide.md

      
    Raw
  

              ElasticSearch Quick Guide.md
            
          
    ElasticSearch Quick Guide

ElasticSearch for Logstash Overview

There are two ways to send data to ElasticSearch from Logstash. The first is the 'elasticsearch' output and the other is
the 'elasticsearch_http' output. In a nutshell, the 'elasticsearch' output is tightly coupled with your elasticsearch
cluster, and the 'elasticsearch_http' output isn't.
What does this mean? The 'elasticsearch' output will always start up a local ElasticSearch node and try to join it to
your ElasticSearch cluster. This has the end goal of making Logstash aware of your cluster - if a node goes down,
Logstash can simply re-route the data to a functioning node. The difference is when you set 'embedded' => false in your
Logstash config, the Logstash node simply gets set to 'data = false' in the ElasticSearch configuration.
The 'elasticsearch_http' output uses port 9200 to send data. This connection uses the ElasticSearch HTTP API which sends
data via JSON. This makes it cross-version compatible - so you can run 0.90 ElasticSearch even though the embedded
ElasticSearch is only 0.20.5.
Planning for ElasticSearch


Events stored in ES will take 2-3x what a raw text event takes while compressed.
This can vary based on how the data in the event is modified during the filter stage - as an example, the 'geoip' filter adds a number of fields which obviously will take more space.
ES memory should be 50% of phyiscal memory up to 30GB.

Tips for running ElasticSearch Embedded


It will always start up, even if you have 'embedded' => false. This is due to the nature of how the plugin works.
When you set 'embedded' => false, there just won't be any local data stored. You probably don't want to run the embedded plugin as a data node.
The embedded ElasticSearch can be configured by either having an elasticsearch.yml file in the same directory as your Logstash process or by passing -Des.config.directive=foo along the commandline.
Make sure to prefix it with es.

Running ElasticSearch


The number of open files needed to run ElasticSearch will exceed 1024.
Make sure the user that ES is running under can open more than 1024 files by (most likely) editing /etc/security/limits.conf and modifying the following:

Ensure ElasticSearch can open files and lock memory!
    elasticsearch   soft    nofile          64000
    elasticsearch   hard    nofile          64000
    elasticsearch   -       memlock         unlimited

Then make sure the startup script does 'ulimit -n 64000' prior to starting up ES.

By default, ES is only given 1 GB of memory. This can be expanded to 30 GB - but a general recommendation is to use no more than 50% of your system memory up to 30 GB. There needs to be plenty of memory left over for the Linux filesystem cache.

Securing ElasticSearch

ElasticSearch isn't very secure by default. It doesn't have much by way of built in security - no users/groups/etc.
ElasticsSearch Tools


There's a number of plugins available for ES that make managing ES much easier. They're highly recommended to get and
use to simplify ES management.

ElasticSearch Head - This is an excellent plugin that will show basic
cluster status and custom queries can be created quickly with minimal APIknowledge.
BigDesk - Shows graphs of what your ES nodes are doing.

Quite helpful to diagnose GC related issues.
Can show the # of open files
All data in this plugin comes from ElasticSearch


Paramedic - Shows graphs of


Helpful Things to Read


ElasticSearch for Logging
Using Elasticsearch's Mappings
Untergeek's Elasticsearch posts
Untergeek's Logstash blog posts
The Logstash Book
Python script to delete indices older than a set number of days or hours
Silly Graphite Trick with ElasticSearch

Other things to add


section about mmapfs with more memory