Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wolfdancer/1ccb6188efa416a2e2f1 to your computer and use it in GitHub Desktop.
Save wolfdancer/1ccb6188efa416a2e2f1 to your computer and use it in GitHub Desktop.
Step-by-step Instruction to set up Ping and Filesystem check with Server-Side Monitoring Configuration

Summary

This is a step-by-step instruction to manually set up Cloud Monitoring through the server-side configuration feature to get familiar with the process. The target use case for this is for automation like Chef Cookbook, Ansible Playbook or scripts.

In this example, we are going to set up the filesystem check because it is always a good idea to get notified BEFORE you run out of space. We are going to use Ubuntu as the example OS to simplify the steps.

Steps

  • Upgrade agent
  • Create YAML file
  • Copy file to the conf.d diretory
  • Restart agent

Upgrade Agent

Make sure that your agent is updated to the version that supports server-side monitoring configuration feature.

~$ rackspace-monitoring-agent --version
0.2.0-33

Update monitorig agent using apt-get:

sudo apt-get install rackspace-monitoring-agent

Create YAML files for monitroing configuraiton

After you SSH into the server, create a YAML file filesystem.yaml

type: agent.filesystem
label: Filesystem on /
disabled: false
period: 60
timeout: 30
details:
    target: /
alarms:
    alarm-disk-size:
        label: usage on /
        notification_plan_id: npTechnicalContactsEmail
        criteria: |
            if (percentage(metric['used'], metric['total']) > 90) {
                return new AlarmStatus(CRITICAL, 'Disk usage is above 90%, #{used} out of #{total}');
            }
            if (percentage(metric['used'], metric['total']) > 80) {
                return new AlarmStatus(WARNING, 'Disk usage is above 80%, #{used} out of #{total}');
            }

Copy YAML file to conf.d

If this is the first time, you will need to create the agent conf.d directory sudo mkdir /etc/rackspace-monitoring-agent.conf.d

Copy the files you just created to the conf.d directory: sudo cp filesystem.yaml /etc/rackspace-monitoring-agent.conf.d

Restart Agent

The monitoring agent reads the conf.d diretory every time it restarts. sudo service rackspace-monitoring-agent restart

Done!

That's it! You should see your checks showing up in no time!

Tips

Tail Agent Log for More Information

At the beginning, or when you need to gatehr more information from the agent, you can tail the log file for more information

sudo tail -f /var/log/rackspace-monitoring-agent.log

When all goes well, you will see lines in the log like the following.

Sun May 11 22:48:57 2014 INF: Confd -> config_file post overall success
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for file, handle: filesystem.yaml at parsing
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for check, handle: {"check":"default","filename":"filesystem.yaml"} at create
Sun May 11 22:48:57 2014 INF: Confd -> config_file post operation result: success for alarm, handle: {"alarm":"alarm-disk-size","filename":"filesystem.yaml"} at create

Errors

When there is an error in the YAML file, the agent will try to provide as much detailed informaiton as possible. The following is a list of error messages that you might see.

Tue May 13 02:03:56 2014 ERR: Confd -> config_file post operation result: failure for file, handle: filesystem.yaml at parsing, error {"message":"[object Object]","stack":"Error: [object Object]\n    at ConfigState.parseFile

-- You have a syntax error in YAML file. Consider run it through a YAML lint tool (e.g. http://yamllint.com/) to figure out what exactly is wrong.

Sun May 11 22:41:45 2014 ERR: Confd -> config_file post operation result: failure for check, handle: {"check":"default","filename":"ping-us.yaml"} at create validation, error {"key":"monitoring_zones_poll","message":"monitoring_zones_poll may not be empty"}

-- You are missing parameter monitoring_zones_poll in the file

Sun May 11 22:41:45 2014 ERR: Confd -> config_file post operation result: failure for alarm, handle: {"alarm":"packet-loss","filename":"ping-us.yaml"} at create validation, error {"message":"Not a string","key":"check_id","parentKeys":[]}

-- Chances are that you had an error in creating the check that this alarm is configured for.

Sun May 11 22:45:40 2014 ERR: Confd -> config_file post operation result: failure for check, handle: {"check":"default","filename":"ping-us.yaml"} at create, error {"stack":"Error: Object \"MonitoringZone\" with key \"dfw\" does not exist\n    at Object.construct ...

-- The monitoring_zone_poll is a configured list. The valid vaules are in the format of mz, for example, mzdfw

Sun May 11 22:47:15 2014 ERR: Confd -> config_file post operation result: failure for alarm, handle: {"alarm":"packet-loss","filename":"ping-us.yaml"} at create validation, error {"message":"Object \"NotificationPlan\" with key \"pagerduty\" does not exist","key":"notification_plan_id","parentKeys":[]}

-- The notification plan needs to be referenced through its ID and NOT its label.

YAML Syntax Check Tool

you can use http://yamllint.com/ to quickly test your YAML syntax before submit the files through agent.

Other Checks

The following is an example for Ping check. You can find more examples on https://github.com/virgo-agent-toolkit/rackspace-monitoring-agent/tree/master/examples/rackspace_monitoring_agent.conf.d

Ping check is another popular check to get a sense if the server is available. Please note that the target_alias field can be different from each server. We are looking into the possibility to make the name more consistent down the road.

ping-us-zones.yaml

type: remote.ping
label: pingv4 from US zones
disabled: false
period: 60
timeout: 30
details:
    conut: 5
monitoring_zones_poll:
    - mzdfw
    - mziad
    - mzord
target_alias: public1_v4
alarms:
    packet-loss:
        label: Ping v4 packet loss
        notification_plan_id: npabEPlbCc
        criteria: |
            :set consistencyLevel=ONE
            if (metric['available'] < 80) {
                return new AlarmStatus(CRITICAL, 'Packet loss is greater than 20%, availability at #{available}');
            }
            
            if (metric['available'] < 95) {
                return new AlarmStatus(WARNING, 'Packet loss is greater than 5%, availability at #{available}');
            }
            
            return new AlarmStatus(OK, 'Packet loss is normal, availabitiy at #{available}');

Feedback Appreciated

Your feedback is highly appreciated! Do you like it? Anything surprising? What do you see as good use for this feature?

More specifically, we wonder what you think about the design for treating YAML as the source of the truth. This means that even though you can still use API, UI or CLI to modify the checks and alarms created through server-side monitoring configuration, the changes is temporary and will be overriden next time the agent restarts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment