Skip to content

Instantly share code, notes, and snippets.

@conorsch
Created March 24, 2020 21:00
Show Gist options
  • Save conorsch/bb8b573a6a7a98af70db2a20b4866122 to your computer and use it in GitHub Desktop.
Save conorsch/bb8b573a6a7a98af70db2a20b4866122 to your computer and use it in GitHub Desktop.
Helper scripts to manage Qubes memory balance service
#!/bin/bash
# Utility script to check whether Qubes memory balancing
# service has failed. Compares the timestamps of the last
# success balance operation and the most recent "EOF"
# message available in the log file. If EOF is more
# recent, declare service broken. Recommended invocation:
#
# watch -n5 ./check-qmemman.sh
#
set -e
set -u
set -o pipefail
get_last_balance_time() {
grep -P 'balance_when_enough_memory' /var/log/qubes/qmemman.log \
| tail -n1 \
| perl -nE '/^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+)/ and say $1' \
| xargs -d '\n' date +%s -d
}
get_last_eof_time() {
grep -P 'EOF$' /var/log/qubes/qmemman.log \
| tail -n1 \
| perl -nE '/^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+)/ and say $1' \
| xargs -d '\n' date +%s -d
}
if [[ (( $(get_last_eof_time) > $(get_last_balance_time) )) ]]; then
echo "Looks like qmembalance has failed."
echo "You should restarted it with:"
echo "sudo systemctl restart qubes-qmemman"
exit 1
else
echo "The qmembalance service appears to be working correctly."
fi
#!/bin/bash
# Utility script to restart the Qubes memory balancing
# service if it's failed. Depends on another script
# to determine whether it's failed or not.
set -e
set -u
set -o pipefail
if ! test -e check-qmemman.sh ; then
echo "Could not find check-qmemman.sh script!"
exit 1
fi
echo "$(date) Begin monitoring qmemman-behavior" >> /tmp/qmemman-check.log
while true; do
clear
if ! ./check-qmemman.sh ; then
echo "$(date) qmemman service failed" >> /tmp/qmemman-check.log
sudo systemctl restart qubes-qmemman
echo "$(date) qmemman service restarted" >> /tmp/qmemman-check.log
fi
sleep 5
done
@eloquence
Copy link

This gist now fails for me with

date: option requires an argument -- 'd'

Perhaps something in the logfile format is throwing it off? Here are my most recent log lines:

2020-04-06 13:39:50,644 qmemman.systemstate[1070]: mem-set domain 68 to 3064472926
2020-04-06 13:39:50,644 qmemman.systemstate[1070]: mem-set domain 66 to 1406969363
2020-04-06 13:39:50,645 qmemman.systemstate[1070]: mem-set domain 58 to 1390834826
2020-04-06 13:39:50,747 qmemman.systemstate[1070]: mem-set domain 56 to 1486505588
2020-04-06 13:39:50,747 qmemman.systemstate[1070]: mem-set domain 60 to 1524216867
2020-04-06 13:39:50,748 qmemman.systemstate[1070]: mem-set domain 0 to 4294967296
2020-04-06 13:39:50,748 qmemman.systemstate[1070]: mem-set domain 62 to 1906835891
2020-04-06 13:56:25,430 qmemman.daemon.algo[1070]: balance_when_enough_memory(xen_free_memory=16351358, total_mem_pref=5055738752.0, total_available_memory=10035415363.0)
2020-04-06 13:56:25,431 qmemman.daemon.algo[1070]: left_memory=1215775481 acceptors_count=6
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '66' act=1406969363 pref=398247116.8 last_target=1406969363
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '60' act=1524216867 pref=437251276.8 last_target=1524216867
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '62' act=1906835891 pref=564535296.0 last_target=1906835891
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '58' act=1390834826 pref=392879718.40000004 last_target=1390834826
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '56' act=1486505588 pref=424706048.0 last_target=1486505588
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '0' act=4294967296 pref=1848020659.2 last_target=4294967296
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: dom '68' act=3064472926 pref=990098636.8000001 last_target=3064472926
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: stat: xenfree=68780158 memset_reqs=[('66', 1389987703), ('58', 1373982281), ('56', 1468887451), ('60', 1506296951), ('62', 1885854120), ('0', 4294967296), ('68', 3154871376)]
2020-04-06 13:56:25,431 qmemman.systemstate[1070]: mem-set domain 66 to 1389987703
2020-04-06 13:56:25,432 qmemman.systemstate[1070]: mem-set domain 58 to 1373982281
2020-04-06 13:56:25,432 qmemman.systemstate[1070]: mem-set domain 56 to 1468887451
2020-04-06 13:56:25,433 qmemman.systemstate[1070]: mem-set domain 60 to 1506296951
2020-04-06 13:56:25,433 qmemman.systemstate[1070]: mem-set domain 62 to 1885854120
2020-04-06 13:56:25,434 qmemman.systemstate[1070]: mem-set domain 0 to 4294967296
2020-04-06 13:56:25,434 qmemman.systemstate[1070]: mem-set domain 68 to 3154871376

@eloquence
Copy link

I'm also not noticing a single EOF line in the log currently so maybe get_last_eof_time is erroring out because of that.

@conorsch
Copy link
Author

conorsch commented Apr 6, 2020

If no EOF, then yes, that's the problem. The --no-run-if-empty flag for xargs is made for this case. I'm currently running the patches from QubesOS/qubes-core-admin#331 so I also have zero EOFs locally.

@eloquence
Copy link

Yup just adding -r to xargs resolves. Will keep this running again during my next update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment