There are two ways to send data to ElasticSearch from Logstash. The first is the 'elasticsearch' output and the other is the 'elasticsearch_http' output. In a nutshell, the 'elasticsearch' output is tightly coupled with your elasticsearch cluster, and the 'elasticsearch_http' output isn't.
What does this mean? The 'elasticsearch' output will always start up a local ElasticSearch node and try to join it to your ElasticSearch cluster. This has the end goal of making Logstash aware of your cluster - if a node goes down, Logstash can simply re-route the data to a functioning node. The difference is when you set 'embedded' => false in your Logstash config, the Logstash node simply gets set to 'data = false' in the ElasticSearch configuration.
The 'elasticsearch_http' output uses port 9200 to send data. This connection uses the ElasticSearch HTTP API which sends data via JSON. This makes it cross-version compatible - so you can run 0.90 ElasticSearch even though the embedded ElasticSearch is only 0.20.5.
- Events stored in ES will take 2-3x what a raw text event takes while compressed.
- This can vary based on how the data in the event is modified during the filter stage - as an example, the 'geoip' filter adds a number of fields which obviously will take more space.
- ES memory should be 50% of phyiscal memory up to 30GB.
- It will always start up, even if you have 'embedded' => false. This is due to the nature of how the plugin works.
- When you set 'embedded' => false, there just won't be any local data stored. You probably don't want to run the embedded plugin as a data node.
- The embedded ElasticSearch can be configured by either having an elasticsearch.yml file in the same directory as your Logstash process or by passing -Des.config.directive=foo along the commandline.
- Make sure to prefix it with es.
- The number of open files needed to run ElasticSearch will exceed 1024.
- Make sure the user that ES is running under can open more than 1024 files by (most likely) editing /etc/security/limits.conf and modifying the following:
Ensure ElasticSearch can open files and lock memory!
elasticsearch soft nofile 64000
elasticsearch hard nofile 64000
elasticsearch - memlock unlimited
Then make sure the startup script does 'ulimit -n 64000' prior to starting up ES.
- By default, ES is only given 1 GB of memory. This can be expanded to 30 GB - but a general recommendation is to use no more than 50% of your system memory up to 30 GB. There needs to be plenty of memory left over for the Linux filesystem cache.
ElasticSearch isn't very secure by default. It doesn't have much by way of built in security - no users/groups/etc.
- There's a number of plugins available for ES that make managing ES much easier. They're highly recommended to get and
use to simplify ES management.
- ElasticSearch Head - This is an excellent plugin that will show basic cluster status and custom queries can be created quickly with minimal APIknowledge.
- BigDesk - Shows graphs of what your ES nodes are doing.
- Quite helpful to diagnose GC related issues.
- Can show the # of open files
- All data in this plugin comes from ElasticSearch
- Paramedic - Shows graphs of
- ElasticSearch for Logging
- Using Elasticsearch's Mappings
- Untergeek's Elasticsearch posts
- Untergeek's Logstash blog posts
- The Logstash Book
- Python script to delete indices older than a set number of days or hours
- Silly Graphite Trick with ElasticSearch
- section about mmapfs with more memory