Skip to content

Instantly share code, notes, and snippets.

@timconradinc
Last active December 17, 2015 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save timconradinc/5654473 to your computer and use it in GitHub Desktop.
Save timconradinc/5654473 to your computer and use it in GitHub Desktop.
ElasticSearch Quick Guide
ElasticSearch for Logstash Overview
There are two ways to send data to ElasticSearch from Logstash. The first is the 'elasticsearch' output and the other is
the 'elasticsearch_http' output. In a nutshell, the 'elasticsearch' output is tightly coupled with your elasticsearch
cluster, and the 'elasticsearch_http' output isn't.
What does this mean? The 'elasticsearch' output will *always* start up a local ElasticSearch node and try to join it to
your ElasticSearch cluster. This has the end goal of making Logstash aware of your cluster - if a node goes down,
Logstash can simply re-route the data to a functioning node. The difference is when you set 'embedded' => false in your
Logstash config, the Logstash node simply gets set to 'data = false' in the ElasticSearch configuration.
The 'elasticsearch_http' output uses port 9200 to send data. This connection uses the ElasticSearch HTTP API which sends
data via JSON. This makes it cross-version compatible - so you can run 0.90 ElasticSearch even though the embedded
ElasticSearch is only 0.20.5.
Planning for ElasticSearch
* Events stored in ES will take 2-3x what a raw text event takes while compressed. This can vary based on how the data
* in the event is modified during the filter stage - as an example, the 'geoip' filter adds a number of fields which
* obviously will take more space. ES memory should be 50% of phyiscal memory up to 30GB.
Tips for running ElasticSearch Embedded
* It will always start up, even if you have 'embedded' => false. This is due to the nature of how the plugin works.
* When you set 'embedded' => false, there just won't be any local data stored. You probably don't want to run the
* embedded plugin as a data node. The embedded ElasticSearch can be configured by either having an elasticsearch.yml
* file in the same directory as your Logstash process or by passing -Des.config.directive=foo along the commandline.
* Make sure to prefix it with es.
Running ElasticSearch
* The number of open files needed to run ElasticSearch will exceed 1024. Make sure the user that ES is running under
* can open more than 1024 files by (most likely) editing /etc/security/limits.conf and modifying the following:
# Ensure ElasticSearch can open files and lock memory!
elasticsearch soft nofile 64000
elasticsearch hard nofile 64000
elasticsearch - memlock unlimited
Then make sure the startup script does 'ulimit -n 64000' prior to starting up ES.
* By default, ES is only given 1 GB of memory. This can be expanded to 30 GB - but a general recommendation is to use
* no more than 50% of your system memory up to 30 GB. There needs to be plenty of memory left over for the Linux
* filesystem cache.
Securing ElasticSearch
ElasticSearch isn't very secure by default. It doesn't have much by way of built in security - no users/groups/etc.
ElasticsSearch Tools
* There's a number of plugins available for ES that make managing ES much easier. They're highly recommended to get and
use to simplify ES management.
* [ElasticSearch Head](http://mobz.github.io/elasticsearch-head/) - This is an excellent plugin that will show basic
cluster status and custom queries can be created quickly with minimal APIknowledge.
* [BigDesk](https://github.com/lukas-vlcek/bigdesk) - Shows graphs of what your ES nodes are doing.
* Quite helpful to diagnose GC related issues.
* Can show the # of open files
* All data in this plugin comes from ElasticSearch
* [Paramedic](https://github.com/karmi/elasticsearch-paramedic) - Shows graphs of
Helpful Things to Read
* [ElasticSearch for Logging](http://edgeofsanity.net/article/2012/12/26/elasticsearch-for-logging.html)
* [Using Elasticsearch's Mappings](http://untergeek.com/2012/10/12/using-elasticsearch-mappings-appropriately-to-map-as-type-ip-int-float-etc/)
Other things to add
* section about mmapfs with more memory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment