Skip to content

Instantly share code, notes, and snippets.

ElasticSearch Tuning in Anger

So. I ran into a great deal of stress around ElasticSearch/Logstash performance lately. These are just a few lessons learned, documented so I have a chance of finding them again.

Logs

Both ElasticSearch and Logstash produce logs. On my RHEL install they're located in /var/log/elasticsearch and /var/log/logstash. These will give you some idea of problems then things go really wrong. For example, in my case, ElasticSearch got so slow that Logstash would time out sending it logs. These issues show up in the logs. Also, Elasticsearch would start logging problems when JVM Garbage collection took longer than 30 seconds, which is a good indicator of memory pressure on ElasticSearch.

Pending Tasks

ElasticSearch (and Logstash when it's joined to an ES Cluster) processes tasks in a queue, that you can peek into. Before realizing this I didn't have any way to understand what was happening in ElasticSearch besides the logs. You can look at the pending tasks queue with this command

NodeJS File parsing

Here's a skelleton for ripping files apart in NodeJS and processing each line.

var fs = require('fs');
var zlib = require('zlib');
var stream = require('stream');
var es = require('event-stream');

Log Filtering

This is a filter/rating class to look at log objects and decide if they're interesting (worthy of review). Error messages are rated higher, as are logs from production hosts.

'use strict'

module.exports = function(options) {
  var my = {};
@GaryRogers
GaryRogers / sysinfo.js
Created February 25, 2016 20:21
sysinfo.js, get system information in json format.
/**
* Get System Information in json format. Gets Run Queue, Memory and Swap Info.
*/
var os = require('os');
var fs = require('fs');
var sysinfo = {};
sysinfo.hostname = os.hostname();
@GaryRogers
GaryRogers / tc_jenkins_integration_notes.md
Last active April 29, 2019 17:43
TestComplete Jenkins Integration Notes

TestComplete Jenkins Integration Notes

Agent Node Setup

  • Triple check your GPOs.
  • Run Resultant Set of GPOs to make sure some up-stream GPO isn't doing something you don't expect.
  • Shadow the RDP session to see what TestExecute is doing.
    • If you don't see TestExecute start in a session, double check your username variable in the pipeline.
  • Run Agent Node as a windows service.
  • Let service interact with the desktop.
import dpath.util
def dpath_null(data: dict, path: str, default_return = None):
'''function to trap any KeyErrors for dpath and return an acceptable 'null' value when dpath can't find a path
Example 1
---------
# Will return None if /some/path/to/an/attribute can not be found
var = dpath_null(my_dictionary, '/some/path/to/an/attribute')