Skip to content

Instantly share code, notes, and snippets.

@ssimpson89
Last active April 30, 2022 13:54
Show Gist options
  • Star 54 You must be signed in to star a gist
  • Fork 26 You must be signed in to fork a gist
  • Save ssimpson89/7207165 to your computer and use it in GitHub Desktop.
Save ssimpson89/7207165 to your computer and use it in GitHub Desktop.
Just a simple Mysql Replication Health Check script I wrote. You can put this in a cron.
#!/bin/bash
### VARIABLES ### \
EMAIL=""
SERVER=$(hostname)
MYSQL_CHECK=$(mysql -e "SHOW VARIABLES LIKE '%version%';" || echo 1)
LAST_ERRNO=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Last_Errno" | awk '{ print $2 }')
SECONDS_BEHIND_MASTER=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G"| grep "Seconds_Behind_Master" | awk '{ print $2 }')
IO_IS_RUNNING=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_IO_Running" | awk '{ print $2 }')
SQL_IS_RUNNING=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running" | awk '{ print $2 }')
ERRORS=()
### Run Some Checks ###
## Check if I can connect to Mysql ##
if [ "$MYSQL_CHECK" == 1 ]
then
ERRORS=("${ERRORS[@]}" "Can't connect to MySQL (Check Pass)")
fi
## Check For Last Error ##
if [ "$LAST_ERRNO" != 0 ]
then
ERRORS=("${ERRORS[@]}" "Error when processing relay log (Last_Errno)")
fi
## Check if IO thread is running ##
if [ "$IO_IS_RUNNING" != "Yes" ]
then
ERRORS=("${ERRORS[@]}" "I/O thread for reading the master's binary log is not running (Slave_IO_Running)")
fi
## Check for SQL thread ##
if [ "$SQL_IS_RUNNING" != "Yes" ]
then
ERRORS=("${ERRORS[@]}" "SQL thread for executing events in the relay log is not running (Slave_SQL_Running)")
fi
## Check how slow the slave is ##
if [ "$SECONDS_BEHIND_MASTER" == "NULL" ]
then
ERRORS=("${ERRORS[@]}" "The Slave is reporting 'NULL' (Seconds_Behind_Master)")
elif [ "$SECONDS_BEHIND_MASTER" > 60 ]
then
ERRORS=("${ERRORS[@]}" "The Slave is at least 60 seconds behind the master (Seconds_Behind_Master)")
fi
### Send and Email if there is an error ###
if [ "${#ERRORS[@]}" -gt 0 ]
then
MESSAGE="An error has been detected on ${SERVER} involving the mysql replciation. Below is a list of the reported errors:\n\n
$(for i in $(seq 1 ${#ERRORS[@]}) ; do echo "\t${ERRORS[$i]}\n" ; done)
Please correct this ASAP
"
echo -e $MESSAGE | mail -s "Mysql Replication for $SERVER is reporting Error" ${EMAIL}
fi
@6d61726b760a
Copy link

Rather than running mysql multiple times, wouldnt it be better to run it just once and parse the output.?

also, if you are using pipefail (http://redsymbol.net/articles/unofficial-bash-strict-mode/) you dont specifically need to check if the connection to mysql succeeded: the script will abort if unable to connect to mysql.

https://gist.github.com/roadst4r/2cb42793c0a7f9a7237a#file-mysql_repl_check-sh

@ssimpson89
Copy link
Author

@roadst4r Im not sure why I didnt just store the output in a variable. Your right, its much better to make one call and then parse it from there.

@chungyan5
Copy link

chungyan5 commented Jul 9, 2016

@@ -1,13 +1,15 @@
 #!/bin/bash

 ### VARIABLES ### \
-EMAIL=""
+EMAIL="XXXX@xxx.xxx"
 SERVER=$(hostname)
-MYSQL_CHECK=$(mysql -e "SHOW VARIABLES LIKE '%version%';" || echo 1)
-LAST_ERRNO=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Last_Errno" | awk '{ print $2 }')
-SECONDS_BEHIND_MASTER=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G"| grep "Seconds_Behind_Master" | awk '{ print $2 }')
-IO_IS_RUNNING=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_IO_Running" | awk '{ print $2 }')
-SQL_IS_RUNNING=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running" | awk '{ print $2 }')
+MYSQL='/usr/bin/mysql -u mysql_user -pyourpassword'
+MYSQL_CHECK=$($MYSQL -e "SHOW VARIABLES LIKE '%version%';" || echo 1)
+LAST_ERRNO=$($MYSQL -e "SHOW SLAVE STATUS\G" | grep "Last_Errno" | awk '{ print $2 }')
+SECONDS_BEHIND_MASTER=$($MYSQL -e "SHOW SLAVE STATUS\G"| grep "Seconds_Behind_Master" | awk '{ print $2 }')
+IO_IS_RUNNING=$($MYSQL -e "SHOW SLAVE STATUS\G" | grep "Slave_IO_Running" | awk '{ print $2 }')
+SQL_IS_RUNNING=$($MYSQL -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running" | awk '{ print $2 }')
+ERR_FLAG="/tmp/mysql_replication_err_flag.dat"
 ERRORS=()

 ### Run Some Checks ###
@@ -40,7 +42,7 @@
 if [ "$SECONDS_BEHIND_MASTER" == "NULL" ]
 then
     ERRORS=("${ERRORS[@]}" "The Slave is reporting 'NULL' (Seconds_Behind_Master)")
-elif [ "$SECONDS_BEHIND_MASTER" > 60 ]
+elif [ "$SECONDS_BEHIND_MASTER" -gt 60 ]
 then
     ERRORS=("${ERRORS[@]}" "The Slave is at least 60 seconds behind the master (Seconds_Behind_Master)")
 fi
@@ -48,9 +50,16 @@
 ### Send and Email if there is an error ###
 if [ "${#ERRORS[@]}" -gt 0 ]
 then
-    MESSAGE="An error has been detected on ${SERVER} involving the mysql replciation. Below is a list of the reported errors:\n\n
-    $(for i in $(seq 1 ${#ERRORS[@]}) ; do echo "\t${ERRORS[$i]}\n" ; done)
-    Please correct this ASAP
-    "
-    echo -e $MESSAGE | mail -s "Mysql Replication for $SERVER is reporting Error" ${EMAIL}
+
+   if [ ! -f ${ERR_FLAG} ]
+   then
+       MESSAGE="An error has been detected on ${SERVER} involving the mysql replciation. Below is a list of the reported errors:\n\n
+       $(for i in $(seq 1 ${#ERRORS[@]}) ; do echo "\t${ERRORS[$i]}\n" ; done)
+       Please correct this ASAP\n\n
+       Please remove ${ERR_FLAG} after settle this issue.     
+       "
+       echo -e $MESSAGE | mail -s "Mysql Replication for $SERVER is reporting Error" ${EMAIL}
+       touch $ERR_FLAG
+   fi
+    
 fi

I modified as:

  1. select a mysql login user
  2. avoid repeating sending email during error situation but still not yet settle. (in my case, sometimes one mysql LAN is broken(unstable) and need to wait ISP recover it)
  3. [ "$SECONDS_BEHIND_MASTER" > 60 ] to [ "$SECONDS_BEHIND_MASTER" -gt 60 ]

@shollenstein
Copy link

shollenstein commented Jul 28, 2016

Thanks for the script! I corrected the output of the messages, since the first error was missing, caused by the index starting with 1.

MESSAGE="An error has been detected on ${SERVER} involving the mysql replication. Below is a list of the reported errors:\n\n
$(for i in "${ERRORS[@]}"; do
    echo "\t$i\n"
done)
Please correct this ASAP"

@IT-Juggler
Copy link

IT-Juggler commented Dec 21, 2016

Stephen, I also am grateful for your script. Thank you for posting it.

I made the following corrections:
Added a ":" to exclude "Slave_SQL_Running_State".
SQL_IS_RUNNING=$(/usr/bin/mysql -e "SHOW SLAVE STATUS\G" | grep "Slave_SQL_Running:" | awk '{ print $2 }')

Changed the 1 to 0 to get the entire error message list.
$(for i in $(seq 0 ${#ERRORS[@]}) ; do echo "\t${ERRORS[$i]}\n" ; done)

I also tweaked a few things to my liking:
Added a sending user for the email
SENDER=user@domain.com
echo -e $MESSAGE | mail -s "Mysql Replication for $SERVER is reporting Error" -r ${SENDER} ${EMAIL}

Added more detail if ERRNO is set.
## Check For Last Error ##
if [ "$LAST_ERRNO" != 0 ]
then
LAST_ERROR=$(mysql -e "SHOW SLAVE STATUS\G" | grep "Last_Error" | awk '{ print $2 }')
ERRORS=("${ERRORS[@]}" "Error when processing relay log (Last_Errno = $LAST_ERRNO)")
ERRORS=("${ERRORS[@]}" "(Last_Error = $LAST_ERROR)")
fi

Added more detail for seconds_behind_master and set the threshold to 1 hour.
## Check how slow the slave is ##
if [ "$SECONDS_BEHIND_MASTER" == "NULL" ]
then
ERRORS=("${ERRORS[@]}" "The Slave is reporting 'NULL' (Seconds_Behind_Master)")
elif [ "$SECONDS_BEHIND_MASTER" -gt 3600 ]
then
BEHINDHOURS=$(expr $SECONDS_BEHIND_MASTER / 3600)
ERRORS=("${ERRORS[@]}" "The Slave is more than an hour behind the master (Seconds_Behind_Master = $SECONDS_BEHIND_MASTER [$BEHINDHOURS+ hours])")
fi

I hope others find these tweaks useful.

@FireInTheHold
Copy link

FireInTheHold commented Feb 15, 2017

I another improvement. rather than perform SHOW SLAVE STATUS multiple times for each variable do this

MYSQL_CHECK=$(mysql --defaults-file=/var/lib/mysql/repl_chk.cnf -e "SHOW VARIABLES LIKE '%version%';" || echo 1)
STATUS_LINE=$(mysql --defaults-file=/var/lib/mysql/repl_chk.cnf -e "SHOW SLAVE STATUS\G")"1"
LAST_ERRNO=$(grep "Last_Errno" <<< "$STATUS_LINE" | awk '{ print $2 }')
SECONDS_BEHIND_MASTER=$( grep "Seconds_Behind_Master" <<< "$STATUS_LINE" | awk '{ print $2 }')
IO_IS_RUNNING=$( grep "Slave_IO_Running" <<< "$STATUS_LINE" | awk '{ print $2 }')
SQL_IS_RUNNING=$(grep "Slave_SQL_Running" <<< "$STATUS_LINE" | awk '{ print $2 }')
MASTER_LOG_FILE=$(grep " Master_Log_File" <<< "$STATUS_LINE" | awk '{ print $2 }')
RELAY_MASTER_LOG_FILE=$(grep "Relay_Master_Log_File" <<< "$STATUS_LINE" | awk '{ print $2 }')

@ouija
Copy link

ouija commented Mar 22, 2017

Just a note that (with Ubuntu 16.0-4) I had to wrap the 'greater than' comparisions in an extra set of square brackets for them to work properly.
For example:
elif [[ "$SECONDS_BEHIND_MASTER" > 60 ]]
and
if [[ "${#ERRORS[@]}" > 0 ]]

Also, there are two different lines found where returning the query for 'Slave_SQL_Running' (because of the additoinal 'Slave_SQL_Running_State' message, so I recommend adding a colon after 'Slave_SQL_Running' grep line (as IT-Juggler recommended), and I opted to add this to all grep queries except the first one (LAST_ERRNO, SECONDS_BEHIND_MASTER, IO_IS_RUNNING, and SQL_IS_RUNNING)

I have also opted to remove the 'mysql: [Warning] Using a password on the command line interface can be insecure.' message that returns when passing a sql password to the commands via the script, by appending 2>&1 | grep -v "Warning: Using a password" after the mysql command.

For example:
LAST_ERRNO=$(/usr/bin/mysql -u user -ppassword -e "SHOW SLAVE STATUS\G" 2>&1 | grep -v "Warning: Using a password" | grep "Last_Errno:" | awk '{ print $2 }')

@ascii-soup
Copy link

Hi, this is great. Any chance you could add a license? :)

@kschleicherql
Copy link

For the line:

MYSQL='/usr/bin/mysql -u mysql_user -pyourpassword'

You may want to set up the /var/lib/mysql/repl_chk.cnf file with the MySQL root user ID password and make sure that it's only readable by the root and/or mysql user and then use this:

MYSQL='/usr/bin/mysql --defaults-file=/var/lib/repl_chk.cnf'

This way your root password isn't readable in your script. You can also use this for other monitoring scripts as well.

@KevinBott
Copy link

I took a different approach to parsing. I used sed to delete the first row, then remove leading spaces, and replace colon space with colon (to strip the leading space from $value"):
mysql ${connect_options} -e "SHOW SLAVE STATUS \G" | sed '1d;s/^ *//;s/: /:/;' > $slave_status

Then reading through my file I split the keys and values with a colon delimiter.

while read line; do
  key=$(echo $line | cut -f1 -d:)
  value=$(echo $line | cut -f2 -d:)

  case $key in
          Slave_*|SQL_*|Last_*)
                #Use Indirect Variable Assigment to assign "Value" to all "Key"s
                eval "$key=\$value"
                ;;
  esac
done < $slave_status

When the case statement matches a key, then using eval for indirect variable assignment. For example, I end up with variables $Last_IO_Errno, $Slave_IO_Running, etc. If you prefer UPPERCASE variable names, use tr.
key=$(echo $line | cut -f1 -d: | tr [:lower:] [:upper:])

I'm working on adding threshold for cases where Slave is stopped gracefully - so some short maintenance is possible without getting alerted.

@lemnisk8
Copy link

lemnisk8 commented Nov 8, 2017

ubuntu 16.04 MariaDB-10.2.10 - needed to add the exit in awk to stop after the first line
SQL_IS_RUNNING=$(grep "Slave_SQL_Running" <<< "$STATUS_LINE" | awk '{ print $2; exit }')

@xikaos
Copy link

xikaos commented Jun 14, 2018

Checkout for the login-path option in MySQL client so you can sleep at night knowing your MySQL root user is not in pure plain-text, as suggested previously here.

  1. Generate a login-path using mysql_config_editor as explained here.
  2. Add the --login-path option: MYSQL_CHECK=$(mysql --login-path=login_path_generated -e "SHOW VARIABLES LIKE '%version%';" || echo 1) in other statements as well.

The file encryption can be cracked, so ensure no one beside root have access to it.

@OliverBailey
Copy link

OliverBailey commented May 15, 2019

Nice script.

Forked here if anyone wants to see the revision

Added some extra checks.

Used --login-path=mysql_login instead of using a mysql -u -p via the mysql_config_editor

Added SSMTP sending of email instead of something like postfix.

Added AWS S3 backup granted the check goes through ok.

Changed it to utilise the $STATUS_LINE variable instead of calling multiple times to SHOW SLAVE STATUS

Few other changes as per the comments above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment