Skip to content

Instantly share code, notes, and snippets.

@Hammond95
Last active July 6, 2020 11:09
Show Gist options
  • Save Hammond95/0248a1ee6d5e04d9f8598bf4f97bcd08 to your computer and use it in GitHub Desktop.
Save Hammond95/0248a1ee6d5e04d9f8598bf4f97bcd08 to your computer and use it in GitHub Desktop.
Steps to install logstash-input-ganglia on an EMR cluster.

How to install 'logstash-input-ganglia' on an EMR cluster

Since I have wasted a lot of time trying to make this work, I just wanted to post this gist to help anyone having issues to setup the plugin.

What you need

  • An AWS account
  • A running cluster on EMR with Ganglia installed.
  • An elasticsaerch cluster running.

Steps

1. Installing logstash and logstash-input-ganglia plugin.

  1. Create the file /etc/yum.repos.d/logstash.repo, with the following content (this may change based on your elastic stack version):
[logstash-7.x]
name=Elastic repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
  1. Run the following commands:
sudo yum install logstash -y
sudo usermod -a -G logstash hadoop
sudo usermod -a -G logstash ec2-user
sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-ganglia

2. Setup the pipeline to retrieve Ganglia Metrics

  1. Export the following variables into logstash variables:
export CLUSTER_ID=$(cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId")
export COMPLETE_HOST=$(echo "$(hostname).eu-west-1.compute.internal")

sudo echo "CLUSTER_ID=${CLUSTER_ID}" | sudo tee -a /etc/default/logstash
sudo echo "COMPLETE_HOST=${COMPLETE_HOST}" | sudo tee -a /etc/default/logstash
  1. Add you elasticsearch cluster certificate on your masternode.

  2. Add the following file describing the ganglia metrics pipeline:

sudo cat << EOF > /etc/logstash/conf.d/ganglia_input.conf
input {
    ganglia {
        id => "ganglia"
        add_field => {
            "cluster_id" => "\${CLUSTER_ID}"
        }
        host => "\${COMPLETE_HOST}"
        port => 8100
    }
}

filter {
    if [name] !~ /heartbeat/ {
        mutate {
            rename => {
                "pkts_in" => "packets_in"
                "pkts_out" => "packets_out"
            }
        }
    } else {
        drop {}
    }
}

output {
    stdout {}
    elasticsearch {
        hosts => ["YOUR_ELASTIC_SEARCH_HOST"]
        index => "ganglia-metrics-%{+YYYY.MM.dd}"
        user => "YOUR_ES_USER"
        password => "\${ES_PASS}"
        ssl => true
        ssl_certificate_verification => true
        cacert => "/etc/certs/elastic/ca.tls"
    }
}
EOF

In the filter section we have just an example of some manipulations that you can perform.

  1. Append a configuration for a new udp send channel in the gmond (Ganglia Monitoring Deamon) configuration file, on every node in the cluster:
sudo cat /etc/ganglia/gmond.conf > /tmp/gmond.conf
sudo bash -c 'cat << EOF >> /tmp/gmond.conf
udp_send_channel {
    host = $( echo "`hostname`.eu-west-1.compute.internal" )
    port = 8100
    ttl = 1
}
EOF'

sudo mv /tmp/gmond.conf /etc/ganglia/gmond.conf

This is necessary because the default udp channel (on port 8459) is already used (probably by ganglia webapp).

  1. Restart gmond service on all nodes with:
sudo restart gmond
  1. Restart logstash service on master node:
sudo restart logstash
  1. Read logs at /var/log/logstash/logstash-plain.log

You will probably get some warnings due to nil values in data or some field detected as float but coming in different types. To fix these you have to add a filter section in the ganglia pipeline /etc/logstash/conf.d/ganglia_input.conf, reading examples on logstash docs (see: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html).

  1. To see logs on kibana, you need to access as Admin and create a new index pattern ( Your Kibana > Management > Kibana: Index Patterns > Create Index Pattern ), based on the ganglia-metrics index.

Additional suggestions

  • You can directly read the content of ganglia metrics which are stored at /mnt/var/lib/ganglia/rrds/__SummaryInfo__/, using an already installed tool called rrdtool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment