Skip to content

Instantly share code, notes, and snippets.

@mmoulton
Last active November 7, 2020 18:19
Show Gist options
  • Save mmoulton/6224509 to your computer and use it in GitHub Desktop.
Save mmoulton/6224509 to your computer and use it in GitHub Desktop.
Docker Container Stats Collection Using Collectd

Docker stats collection for collectd

This script can be used to feed collectd with cpu and memory usage statistics for running docker containers using the collectd exec plugin.

This script will report the used and cached memory as well as the user and system cpu usage by inspecting the appropriate cgroup stat file for each running container.

Usage

This script is intented to be executed by collectd on a host with running docker containers. To use, simply configure the exec plugin in collectd to execute the collectd-docker.sh script. You may need to adjust the script to match your particulars, such as the mount location for cgroup.

Example collectd.conf snippet

LoadPlugin exec

<Plugin exec>
 Exec ubuntu "/usr/share/collectd/collectd-docker.sh"
</Plugin>

This will execute the collectd-docker.sh you placed in /usr/share/collectd using the ubuntu user.

#!/bin/bash
#
# CollectD - Docker
#
# This script will collect cpu/memory stats on running docker containers using cgroup
# and output the data in a collectd-exec plugin friendly format.
#
# Author: Mike Moulton (mike@meltmedia.com)
# License: MIT
#
# Location of the cgroup mount point, adjust for your system
CGROUP_MOUNT="/sys/fs/cgroup"
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}"
INTERVAL="${COLLECTD_INTERVAL:-60}"
collect ()
{
cd "$1"
# If the directory length is 64, it's likely a docker instance
LENGTH=$(expr length $1);
if [ "$LENGTH" -eq "64" ]; then
# Shorten the name to 12 for brevity, like docker does
NAME=$(expr substr $1 1 12);
# If we are in a cpuacct cgroup, we can collect cpu usage stats
if [ -e cpuacct.stat ]; then
USER=$(cat cpuacct.stat | grep '^user' | awk '{ print $2; }');
SYSTEM=$(cat cpuacct.stat | grep '^system' | awk '{ print $2; }');
echo "PUTVAL \"$HOSTNAME/docker-$NAME/cpu-user\" interval=$INTERVAL N:$USER"
echo "PUTVAL \"$HOSTNAME/docker-$NAME/cpu-system\" interval=$INTERVAL N:$SYSTEM"
fi;
# If we are in a memory cgroup, we can collect memory usage stats
if [ -e memory.stat ]; then
CACHE=$(cat memory.stat | grep '^cache' | awk '{ print $2; }');
RSS=$(cat memory.stat | grep '^rss' | awk '{ print $2; }');
echo "PUTVAL \"$HOSTNAME/docker-$NAME/memory-cached\" interval=$INTERVAL N:$CACHE"
echo "PUTVAL \"$HOSTNAME/docker-$NAME/memory-used\" interval=$INTERVAL N:$RSS"
fi;
fi;
# Iterate over all sub directories
for d in *
do
if [ -d "$d" ]; then
( collect "$d" )
fi;
done
}
while sleep "$INTERVAL"; do
# Collect stats on memory usage
( collect "$CGROUP_MOUNT/memory" )
# Collect stats on cpu usage
( collect "$CGROUP_MOUNT/cpuacct" )
done
@kpnarayanan
Copy link

Hi,

May I kindly know is it possible to extract the details about disk and network I/O of each container?

Thanks,
Krishnaprasad

@napalmz
Copy link

napalmz commented Aug 20, 2020

@IngaFeick said:

I'm using this script right now and sometimes it produces invalid lines which contain nothing but a number:

   PUTVAL "localhost/docker-e1f1809fff1d/cpu-user" interval=60 N:69213
   PUTVAL "localhost/docker-e1f1809fff1d/memory-used" interval=60 N:258969600
   218103808
   PUTVAL "localhost/docker-e21a9943b639/memory-cached" interval=60 N:19808256
   PUTVAL "localhost/docker-e21a9943b639/memory-used" interval=60 N:116252672 
   50331648
   PUTVAL "localhost/docker-0fa7f4774522/cpu-user" interval=60 N:1385

This causes collectd to log error messages like these:

   [2016-02-02 16:02:31] [error] exec plugin: Unable to parse command, ignoring line: "218103808"
   [2016-02-02 16:02:31] [error] exec plugin: Unable to parse command, ignoring line: "50331648"

Happens quite seldomly though

This happens because there are 2 "rss" occurrence in memory.stat:

cache 3543040
rss 5881856
rss_huge 0 <<< THIS!

Change line 49 from:
RSS=$(cat memory.stat | grep '^rss' | awk '{ print $2; }');
to:
RSS=$(cat memory.stat | grep '^rss ' | awk '{ print $2; }');

@lucanello
Copy link

lucanello commented Nov 7, 2020

How can I convert these number values to CPU usage in percentage and RAM usage in gigabytes?
I get numbers like 738967552 for RAM usage and numbers like 103358 for CPU usage in a single container. I would like to display these numbers in my Grafana Dashboard as CPU percentage and RAM in gigabytes (or also percentage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment