Hammond95/readme.md

## readme.md

      
    Raw
  

              readme.md
            
          
    How to install 'logstash-input-ganglia' on an EMR cluster

Since I have wasted a lot of time trying to make this work, I just wanted to post this gist to help anyone having issues to setup the plugin.
What you need


An AWS account
A running cluster on EMR with Ganglia installed.
An elasticsaerch cluster running.

Steps

1. Installing logstash and logstash-input-ganglia plugin.


Create the file /etc/yum.repos.d/logstash.repo, with the following content (this may change based on your elastic stack version):

[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md


Run the following commands:

sudo yum install logstash -y
sudo usermod -a -G logstash hadoop
sudo usermod -a -G logstash ec2-user
sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-ganglia
2. Setup the pipeline to retrieve Ganglia Metrics


Export the following variables into logstash variables:

export CLUSTER_ID=$(cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId")
export COMPLETE_HOST=$(echo "$(hostname).eu-west-1.compute.internal")

sudo echo "CLUSTER_ID=${CLUSTER_ID}" | sudo tee -a /etc/default/logstash
sudo echo "COMPLETE_HOST=${COMPLETE_HOST}" | sudo tee -a /etc/default/logstash


Add you elasticsearch cluster certificate on your masternode.


Add the following file describing the ganglia metrics pipeline:


sudo cat << EOF > /etc/logstash/conf.d/ganglia_input.conf
input {
    ganglia {
        id => "ganglia"
        add_field => {
            "cluster_id" => "\${CLUSTER_ID}"
        }
        host => "\${COMPLETE_HOST}"
        port => 8100
    }
}

filter {
    if [name] !~ /heartbeat/ {
        mutate {
            rename => {
                "pkts_in" => "packets_in"
                "pkts_out" => "packets_out"
            }
        }
    } else {
        drop {}
    }
}

output {
    stdout {}
    elasticsearch {
        hosts => ["YOUR_ELASTIC_SEARCH_HOST"]
        index => "ganglia-metrics-%{+YYYY.MM.dd}"
        user => "YOUR_ES_USER"
        password => "\${ES_PASS}"
        ssl => true
        ssl_certificate_verification => true
        cacert => "/etc/certs/elastic/ca.tls"
    }
}
EOF
In the filter section we have just an example of some manipulations that you can perform.

Append a configuration for a new udp send channel in the gmond (Ganglia Monitoring Deamon) configuration file, on every node in the cluster:

sudo cat /etc/ganglia/gmond.conf > /tmp/gmond.conf
sudo bash -c 'cat << EOF >> /tmp/gmond.conf
udp_send_channel {
    host = $( echo "`hostname`.eu-west-1.compute.internal" )
    port = 8100
    ttl = 1
}
EOF'

sudo mv /tmp/gmond.conf /etc/ganglia/gmond.conf
This is necessary because the default udp channel (on port 8459) is already used (probably by ganglia webapp).

Restart gmond service on all nodes with:

sudo restart gmond


Restart logstash service on master node:

sudo restart logstash


Read logs at /var/log/logstash/logstash-plain.log

You will probably get some warnings due to nil values in data or some field detected as float but coming in different types.
To fix these you have to add a filter section in the ganglia pipeline /etc/logstash/conf.d/ganglia_input.conf,
reading examples on logstash docs (see: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html).

To see logs on kibana, you need to access as Admin and create a new index pattern
( Your Kibana > Management > Kibana: Index Patterns > Create Index Pattern ), based on the ganglia-metrics index.

Additional suggestions


You can directly read the content of ganglia metrics which are stored at /mnt/var/lib/ganglia/rrds/__SummaryInfo__/, using an already installed tool called rrdtool.