rahulwa/ELK_stack.md

## ELK_stack.md

      
    Raw
  

              ELK_stack.md
            
          
    TODO in production -

Elasticsearch


select large memory instance

A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing many, many small machines), and greater than 64 GB has problems.
In general, it is better to prefer medium-to-large boxes.


create swap using instance store disk, not EBS.
Disks should be ssd and iops

cfq (default I/O Scheduler in *nix) is inefficient for SSD, however, since there are no spinning platters involved. Instead, deadline or noop should be used instead. The deadline scheduler optimizes based on how long writes have been pending, while noop is just a simple FIFO queue.


always run the most recent version of the Java Virtual Machine (JVM).

Java 8 is preferred over Java 7. Java 6 is no longer supported.
Please Do Not Tweak JVM Settings.


Use configuration management software for deployment.
Elasticsearch ships with very good defaults, especially when it comes to performance- related settings and options. When in doubt, just leave the settings alone.

cluster.name: elk_production
node.name: es_001_data
path.data: /es_data   #/path/to/data1,/path/to/data2
# Path to log files:
path.logs: /path/to/logs #will use defaults
# Path to where plugins are installed:
path.plugins: /path/to/plugins #will use defaults
discovery.zen.minimum_master_nodes: 2

This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1.
It would be extremely irritating if you had to push new configurations to each node and restart your whole cluster just to change the setting.
For this reason, minimum_master_nodes (and other settings) can be configured via a dynamic API call. You can change the setting while your cluster is online:

PUT /_cluster/settings
{
    "persistent" : {
        "discovery.zen.minimum_master_nodes" : 2
    }
}

gateway.recover_after_nodes: 2
gateway.expected_nodes: 3
gateway.recover_after_time: 5m
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Do not change the default garbage collector! The official recommendation is to use Concurrent-Mark and Sweep (CMS).
The default threadpool settings in Elasticsearch are very sensible. So. Next time you want to tweak a threadpool, please don’t.

export ES_HEAP_SIZE=8g #memory/2
# Give (less than) Half Your Memory to Lucene
# Don’t Cross 32 GB!
ES_HEAP_SIZE=8g
# OR /etc/default/elasticsearch

Heap: Sizing and Swapping

# using syscall for lower swappiness
vm.swappiness = 1
# A swappiness of 1 is better than 0, since on some kernel versions a swappiness of 0 can invoke the OOM-killer.
bootstrap.mlockall: true
#This allows the JVM to lock its memory and prevent it from being swapped by the OS.

File Descriptors and MMap

# You should increase your file descriptor count to something very large, such as 64,000.
# /etc/default/elasticsearch
MAX_OPEN_FILES=131070
# /etc/security/limits.conf
*    soft nofile 64000
*    hard nofile 64000
root soft nofile 64000
root hard nofile 64000
# /etc/pam.d/common-session
session required pam_limits.so
# /etc/pam.d/common-session-noninteractive
session required pam_limits.so
sysctl -w vm.max_map_count=262144
# Or you can set it permanently by modifying `vm.max_map_count` setting in your /etc/sysctl.conf.


eliminate the possibility of an accidental mass-deletion of indices

action.destructive_requires_name: true

use index aliases

curl -XPOST 'localhost:9200/_aliases?pretty' -d'
{
    "actions": [
        { "remove": { "index": "my_index_v1", "alias": "my_index" }},
        { "add":    { "index": "my_index_v2", "alias": "my_index" }}
    ]
}'


Perhaps you are using Elasticsearch to index millions of log files, and you would prefer to optimize for index speed rather than near real-time search. You can reduce the frequency of refreshes on a per-index basis by setting the refresh_interval:

PUT /my_logs
{
  "settings": {
    "refresh_interval": "30s"
  }
}


Shards are flushed automatically every 30 minutes, or when the translog becomes too big. That said, it is beneficial to flush your indices before restarting a node or closing an index. When Elasticsearch tries to recover or reopen an index, it has to replay all of the operations in the translog, so the shorter the log, the faster the recovery.

POST /blogs/_flush
POST /_flush?wait_for_ongoing  #Flush all indices and wait until all flushes have completed before returning.


The optimize API is best described as the forced merge API. It forces a shard to be merged down to the number of segments specified in the max_num_segments parameter. The intention is to reduce the number of segments (usually to one) in order to speed up search performance. The typical use case is for logging, where logs are stored in an index per day, week, or month. Older indices are essentially read-only; they are unlikely to change. In this case, it can be useful to optimize the shards of an old index down to a single segment each; it will use fewer resources and searches will be quicker:

POST /logstash-2014-10/_optimize?max_num_segments=1
# Be aware that merges triggered by the optimize API are not throttled at all. They can consume all of the I/O on your nodes, leaving nothing for search and potentially making your cluster unresponsive.


Retiring data