Skip to content

Instantly share code, notes, and snippets.

@markfaine
Created February 17, 2021 21:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save markfaine/d54667b78ffc81ba686894065e533fa5 to your computer and use it in GitHub Desktop.
Save markfaine/d54667b78ffc81ba686894065e533fa5 to your computer and use it in GitHub Desktop.
#!/bin/bash
# shellcheck shell=bash
message="Message from $0"
cpuload="$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g")"
message="${message}\
cpuload: ${cpuload}"
threshold=500
message="${message}\
threshold: ${threshold}\n"
pid="$(ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 1 | xargs | cut -d' ' -f 1)"
message="${message}\
Higest CPU Usage PID: ${pid}"
if ps -ef -o pid="$pid" | head -n 1 | xargs; then
name="$(tr -d '\0' <"/proc/${pid}/cmdline")"
message="${message}\
Higest CPU Usage Process: ${name}"
fi
pid2="$(ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 2 | sort -r | head -n 1 | xargs | cut -d ' ' -f 1)"
message="${message}\
Second Higest CPU Usage PID: ${pid2}"
if ps -ef -o pid="${pid2}" | head -n 1 | xargs; then
name2="$(tr -d '\0' <"/proc/${pid}/cmdline")"
message="${message}\
Second Higest CPU Usage Process: ${name}"
fi
if [[ ${cpuload#0} -gt $threshold ]] ; then
message="${message}\
High CPU Threshold exceeded: (${cpuload}/{$threshold})"
if [[ -n "$name" ]]; then
# Exclude plex
if [[ "$name" =~ Plex ]]; then
if [[ -e "/proc/$pid2/cmdline" ]]; then
name="$(cat "/proc/${pid2}/cmdline")"
message="${message}\
Won't kill plex"
if [[ -n "$name" ]]; then
message="${message}\
Killing $name2"
fi
kill -9 "$pid2"
fi
else
message="${message}\
Killing ${name}"
kill -9 "$pid"
fi
fi
else
message="Message from $0\
Success, CPU Threshold maintained: (${cpuload}/${threshold})"
fi
if [[ -n "$message" ]]; then
/root/bin/pushover.sh "$message"
fi
@boppy
Copy link

boppy commented Feb 18, 2021

Please be very, very careful with this script. It does not add up on a lab-ubuntu I use.

(I needed a break from Golang, so I took a look at your script. Please file my nagging in the "German behavior" category 🤣 - it's only well-intentioned ^^)

Some smaller things I spotted:

  1. Line 19 does not sort naturally, so 4.0 would be considered greater than 10.0.
  2. Line 28 only removes the first "0". So it works, if load is 0.10 but will not on 0.01 (should be ${var##0*}); Problem with leading zeros is that bash will recognize the number as octal ([[ "010" -eq "10" ]] is false)
  3. The {} in line 30 are misplaced, leading to plain output of the brackets.
  4. The condition in line 31 can only be false, if the process exited between line 11 and line 14, what is highly unlikely.
  5. The condition in line 54 can never the false.
  6. You are calling ps 4 times, while 1 should be sufficient.
  7. All the ps calls and many pipes are not needed, if the threshold isn't reached at all, because there's nothing appended to your message.
  8. All the line breaks inside your message are not stored as line breaks making the output one long line with many spaces.
  9. The "getting second most consuming" part is messed up. The second sort in line 19 sorts by PID not PCPU and the name collected in line 23 gets its command from $pid, not $pid2.
  10. For checking if a process is still active

The first of two bigger problems I see is, that line 5 assumes that there is a fixed number of "," in the return of uptime, that in my case is not true:

# Newly started Ubuntu 18.04 shows 4 commas in total
> 13:37:42 up 0 min,  1 user,  load average: 2.85, 0.69, 0.23

# Long running Ubuntu 18.04 shows 5 commas in total
> 13:37:42 up 191 days, 10:04,  2 users,  load average: 0.02, 0.01, 0.00"

Instead of

uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g"

I would therefor recommend (with loadindex=3 if the 15 min avg should be considered):

cut -d" " -f"$loadIndex" /proc/loadavg | tr -d '.'

As second most-important thing, there is massive usage of pipes (and with that processes spawned) while it's not necessary at all. Example:

ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 1 | xargs | cut -d' ' -f 1

could also be 50% less processes with

ps -eo pid:1=,pcpu=,command= --sort pcpu | tail -1 | cut -d" " -f1

while :1 at pid-field sets field width to one, so the line will never start with a space-char. Appending = to all fields will suppress headlines, and --sort pcpu will (oh, wonder ^^) sort the list by the CPU load.


So, finally, I reimplemented the script with 47% less processes running (even more, if threshold isn't hit), better readability, and way less indentation (and little more verbosity in line 50):

#!/bin/bash
# shellcheck shell=bash

function join() {
    local IFS="$1"
    shift
    echo "$*"
}

loadIndex=3   # Select the avg-load index that should be considered. One of (1 2 3)
threshold=500 # Set the load threshold multiplied by 100 (remember that full load on an n-Core system is n*1!)

messages=("Message from $0")
cpuload="$(cut -d" " -f"${loadIndex}" /proc/loadavg | tr -d '.')"

if [[ "${cpuload##0*}" -lt "$threshold" ]]; then
    messages+=("Success, CPU Threshold maintained: (${cpuload}/${threshold})")
else
    messages+=("High CPU Threshold exceeded: (${cpuload}/${threshold})")
    # messages+=("cpuload: ${cpuload}") # skipped, because info is already in line above...
    # messages+=("threshold: ${threshold}")

    mostConsumers="$(ps -eo pid:1=,pcpu=,command= --sort pcpu | tail -2 | tr -s " ")"

    high_line="$(tail -1 <<<"$mostConsumers")"
    high_pid="$(cut -d' ' -f1 <<<"$high_line")"
    high_cmd="$(cut -d' ' -f3- <<<"$high_line")"
    messages+=("Highest CPU Usage PID: ${high_pid}")
    messages+=("Highest CPU Usage Process: ${high_cmd}")

    killPid="$high_pid"
    killCmd="$high_cmd"

    low_line="$(head -1 <<<"$mostConsumers")"
    low_pid="$(cut -d' ' -f1 <<<"$low_line")"
    low_cmd="$(cut -d' ' -f3- <<<"$low_line")"
    messages+=("Second Highest CPU Usage PID: ${low_pid}")
    messages+=("Second Highest CPU Usage Process: ${low_cmd}")

    if [[ "$high_cmd" =~ Plex ]]; then
        messages+=("Won't kill plex")
        killPid="$low_pid"
        killCmd="$low_cmd"
    fi

    if kill -0 "$killPid" 2>/dev/null; then
        messages+=("Killing ${killCmd}")
        kill -9 "$killPid"
    else
        messages+=("Process to kill already shut down")
    fi
fi

/root/bin/pushover.sh "$(join $'\n' "${messages[@]}")"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment