-
-
Save jessereynolds/2878994 to your computer and use it in GitHub Desktop.
#!/bin/bash | |
HOSTNAME="${COLLECTD_HOSTNAME:-localhost}" | |
INTERVAL="${COLLECTD_INTERVAL:-10}" | |
while sleep "$INTERVAL"; do | |
VALUE=1.23 | |
echo "PUTVAL \"$HOSTNAME/exec-magic/gauge-magic_level\" interval=$INTERVAL N:$VALUE" | |
done |
root@jesse-precise-desktop:~# bin/test_collectd_exec.sh | |
Wed Jun 6 09:39:24 CST 2012: starting collectd | |
Starting statistics collection and monitoring daemon: collectd,,. | |
Wed Jun 6 09:39:24 CST 2012: exec started? | |
2461 pts/2 S+ 0:00 \_ /bin/bash bin/test_collectd_exec.sh | |
2470 pts/2 S+ 0:00 \_ grep collect | |
2466 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf | |
2468 ? SLl 0:00 \_ collectd -C /etc/collectd/collectd.conf -f | |
Wed Jun 6 09:39:26 CST 2012: stopping collectd | |
Stopping statistics collection and monitoring daemon: collectd. | |
collectd is stopped. | |
collectd is stopped. | |
collectd is stopped. | |
collectd is stopped. | |
Wed Jun 6 09:39:30 CST 2012: killing any remaining processes | |
collectd: no process found | |
Wed Jun 6 09:39:31 CST 2012: starting collectd | |
Starting statistics collection and monitoring daemon: collectd,,. | |
Wed Jun 6 09:39:31 CST 2012: exec started? | |
2461 pts/2 S+ 0:00 \_ /bin/bash bin/test_collectd_exec.sh | |
2520 pts/2 S+ 0:00 \_ grep collect | |
2516 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf | |
2518 ? RLl 0:00 \_ collectd -C /etc/collectd/collectd.conf -f | |
Wed Jun 6 09:39:33 CST 2012: stopping collectd | |
Stopping statistics collection and monitoring daemon: collectd. | |
collectd (2531) is running. | |
collectd (2531) is running. | |
^C | |
root@jesse-precise-desktop:~# ps afx | grep collectd | |
2531 ? S 0:00 collectd -C /etc/collectd/collectd.conf -f | |
root@jesse-precise-desktop:~# strace -p 2531 | |
Process 2531 attached - interrupt to quit | |
futex(0x7fa983a9edb0, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...> | |
Process 2531 detached |
#!/bin/bash | |
# this starts and stops collectd and looks to see if any collectd processes remain | |
while true ; do | |
echo "`date`: starting collectd" | |
/etc/init.d/collectd start | |
echo "`date`: exec started?" | |
ps afx | grep collect | |
sleep 2 | |
echo "`date`: stopping collectd" | |
/etc/init.d/collectd stop | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
sleep 1 | |
/etc/init.d/collectd status | |
echo "`date`: killing any remaining processes" | |
killall -9 collectd | |
sleep 1 | |
done |
collectd configuration (contents of /etc/collectd) from the jesse-precise-desktop vm can be downloaded from here: http://jessereynolds.com/etc_collectd_precise_desktop-20120606-0940.tgz
I've managed to reproduce this on a lucid vm (ubuntu 10.04.3, 32 bit) with collectd 4.8.2 but only the once, out of about fifty attempts:
Wed Jun 6 22:24:14 CST 2012: starting collectd
Starting statistics collection and monitoring daemon: collectd.
Wed Jun 6 22:24:15 CST 2012: exec started?
1518 pts/0 S+ 0:00 _ /bin/bash ./test_collectd_exec.sh
2065 pts/0 S+ 0:00 _ grep collect
2052 ? Ss 0:00 /usr/sbin/collectdmon -P /var/run/collectdmon.pid -- -C /etc/collectd/collectd.conf
2054 ? Sl 0:00 _ collectd -C /etc/collectd/collectd.conf -f
2062 ? S 0:00 _ collectd -C /etc/collectd/collectd.conf -f
Wed Jun 6 22:24:17 CST 2012: stopping collectd
Stopping statistics collection and monitoring daemon: collectd.
collectd (2062) is running.
collectd (2062) is running.
collectd (2062) is running.
collectd (2062) is running.
Wed Jun 6 22:24:21 CST 2012: killing any remaining processes
Linux lucid32 2.6.32-33-generic #70-Ubuntu SMP Thu Jul 7 21:09:46 UTC 2011 i686 GNU/Linux
On a slackware-ish linux running collectd 4.10.7 I get this error about one in ten times (roughly) - but it seems to go away when I comment out the ping plugin...
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: exec_read_one: Waiting for `/usr/bin/diskmonitor' to exit.
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: Child 7067 exited with status 15.
[/var/log/collectd.log] 2012-06-06 16:06:24 UTC exec plugin: Sent SIGTERM to 0
[/var/log/syslog] [2012-06-06 16:06:24 UTC] [INFO] daemon 127.0.0.1 collectd[7055]: exec plugin: Sent SIGTERM to 0
WIth collectd 5 this happens much more often, and again commenting out the ping plugin makes the problem go away.
I've raised a bug report here: collectd/collectd#89
It seems to be easier to reproduce after a fresh reboot of the VM.
The problem was first observed on Ubuntu Precise 12.04 LTS (Server) 64 bit running on VMWare ESX:
The environment used to reproduce this error (above) is Ubuntu Precise 12.04 LTS (Desktop) 64 bit running on VirtualBox 4 on Mac OS X 10.7.4: