This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a filter/rating class to look at log objects and decide if they're interesting (worthy of review). Error messages are rated higher, as are logs from production hosts.
So. I ran into a great deal of stress around ElasticSearch/Logstash performance lately. These are just a few lessons learned, documented so I have a chance of finding them again.
Logs
Both ElasticSearch and Logstash produce logs. On my RHEL install they're located in /var/log/elasticsearch and /var/log/logstash. These will give you some idea of problems then things go really wrong. For example, in my case, ElasticSearch got so slow that Logstash would time out sending it logs. These issues show up in the logs. Also, Elasticsearch would start logging problems when JVM Garbage collection took longer than 30 seconds, which is a good indicator of memory pressure on ElasticSearch.
Pending Tasks
ElasticSearch (and Logstash when it's joined to an ES Cluster) processes tasks in a queue, that you can peek into. Before realizing this I didn't have any way to understand what was happening in ElasticSearch besides the logs. You can look at the pending tasks queue with this command
The InfluxDB Docs give you a very brief overview of installing InfluxDB on a host. It boils down to 'here's the RPM, install it.' That's fine for looking at the software, but you'll probably want to adjust the configuration a bit for a production environment.
Remove Syslog line headers from multi-line logs in logstash
Overview
Rather than run a log shipper on hosts, we use Syslog when shipping logs out of monolog. This works great for single-line logs. It breaks when a log message gets split up by syslog. When syslog does this, it duplicates the line header, like so:
2015-06-09T05:39:31.457042-05:00 host.example.edu : This is a really really really
2015-06-09T05:39:31.475414-05:00 host.example.edu : really long message