Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
I use this script to backup my QEMU/KVM/libVirt virtual machines. The script requires KVM 2.1+ since it uses the live blockcommit mode. This means the data in the snapshot disk is rolled back into the original instead of the other way around. Script does NOT handle spaces in paths.
#!/bin/bash
#
BACKUPDEST="$1"
DOMAIN="$2"
MAXBACKUPS="$3"
if [ -z "$BACKUPDEST" -o -z "$DOMAIN" ]; then
echo "Usage: ./vm-backup <backup-folder> <domain> [max-backups]"
exit 1
fi
if [ -z "$MAXBACKUPS" ]; then
MAXBACKUPS=6
fi
echo "Beginning backup for $DOMAIN"
#
# Generate the backup path
#
BACKUPDATE=`date "+%Y-%m-%d.%H%M%S"`
BACKUPDOMAIN="$BACKUPDEST/$DOMAIN"
BACKUP="$BACKUPDOMAIN/$BACKUPDATE"
mkdir -p "$BACKUP"
#
# Get the list of targets (disks) and the image paths.
#
TARGETS=`virsh domblklist "$DOMAIN" --details | grep ^file | awk '{print $3}'`
IMAGES=`virsh domblklist "$DOMAIN" --details | grep ^file | awk '{print $4}'`
#
# Create the snapshot.
#
DISKSPEC=""
for t in $TARGETS; do
DISKSPEC="$DISKSPEC --diskspec $t,snapshot=external"
done
virsh snapshot-create-as --domain "$DOMAIN" --name backup --no-metadata \
--atomic --disk-only $DISKSPEC >/dev/null
if [ $? -ne 0 ]; then
echo "Failed to create snapshot for $DOMAIN"
exit 1
fi
#
# Copy disk images
#
for t in $IMAGES; do
NAME=`basename "$t"`
cp "$t" "$BACKUP"/"$NAME"
done
#
# Merge changes back.
#
BACKUPIMAGES=`virsh domblklist "$DOMAIN" --details | grep ^file | awk '{print $4}'`
for t in $TARGETS; do
virsh blockcommit "$DOMAIN" "$t" --active --pivot >/dev/null
if [ $? -ne 0 ]; then
echo "Could not merge changes for disk $t of $DOMAIN. VM may be in invalid state."
exit 1
fi
done
#
# Cleanup left over backup images.
#
for t in $BACKUPIMAGES; do
rm -f "$t"
done
#
# Dump the configuration information.
#
virsh dumpxml "$DOMAIN" >"$BACKUP/$DOMAIN.xml"
#
# Cleanup older backups.
#
LIST=`ls -r1 "$BACKUPDOMAIN" | grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}\.[0-9]+$'`
i=1
for b in $LIST; do
if [ $i -gt "$MAXBACKUPS" ]; then
echo "Removing old backup "`basename $b`
rm -rf "$b"
fi
i=$[$i+1]
done
echo "Finished backup"
echo ""
@vchakoshy
Copy link

vchakoshy commented Jun 16, 2016

Good job. ;)

@boulate
Copy link

boulate commented Oct 26, 2016

Thank you for this share ! I was looking for a script to do exactly this kind of work without having to do it by hand.

Juste one thing : I think there is a mistake at the end of the script :

    if [ $i -gt "$MAXBACKUPS" ]; then
        echo "Removing old backup "`basename $b`
        rm -rf "$b"
    fi

Must be :

    if [ $i -gt "$MAXBACKUPS" ]; then
        echo "Removing old backup "`basename $b`
        rm -rf "$BACKUPDOMAIN/$b"
    fi

If you want the script to take the originial path in consideration. No ? :)

@ASAPHAANING
Copy link

ASAPHAANING commented Mar 2, 2017

Thank you very much for this useful utility script. I've added a wrapper that will handle the domains using virsh list --all and loop the backup script for all of those. By default this keeps 1 copy.

#!/bin/bash
#
# Get the list of all domains, active and inactive, iterate over the list with the main backup-script such that all domains get backed up.
test="$(virsh list --all | awk {'print $2'} | tail -n +3)"
while read -r line; do
    bash <vm-backup.sh location> <backup dir> "$line" 1
done <<< "$test"

@Monkybusiness
Copy link

Monkybusiness commented Mar 7, 2017

Thanks for this script, very helpful. I had some problems though which turned out to be whenever there is a CDROM or FLOPPY in the hardware list for a VM. The problem is that the domblklist command used isn't filtered enough. It is filtered so only lines starting with "file" are used for building targets but the problem here is that things like a cdrom or a floppy will also be included and have no source. Output from the filtered domblklist can look like this:

Type Device Target Source

---------------------------------------------------

file disk hda /mnt/whatever/filename-flat.vmdk

file cdrom hdb -

This throws up an error as the snapshot command tries to create a snapshot for the cdrom without a source file. We need to get rid of any cdrom and floppy (could be other things, but this is what I had also starting with "file"), so I have substituted (in 3 places), where it says:

grep ^file

change to:

grep ^file | grep -v 'cdrom' | grep -v 'floppy'

After this, it works fine again. The alternative is of course to just remove all CDROM's and FLOPPY-units.

@hadrins
Copy link

hadrins commented Sep 17, 2017

Great script. It helped me get my own started. I need a script that would do a live backup on all the VM on the KVM. I was able to use your code and create a script that will backup all the running/paused VM.

I will have to fine tune it and post.
Thanks.

@shurak
Copy link

shurak commented Jan 21, 2018

Great script and great comments. FYI your script is running in a nuclear facility :>

@GerhardK90
Copy link

GerhardK90 commented Feb 28, 2018

Thank you for the script.
I agree with @boulate, without the modification the cleanup of old backups won't work properly.
Furthermore you should be aware, that the created backups might be in inconsistent states. While creating the snapshot it is not ensured, that the Disk is in a consistent state. This can lead to partial dataloss.
You should install qemu-agent on your VMs and run the snapshot with --quiesce.
Another option is to use virsh domfsfreeze and virsh domfsthaw.

@sebastiaanfranken
Copy link

sebastiaanfranken commented Mar 21, 2018

I've used your script as a starting point. My script does a backup of all domains on the KVM server it's on. I run this every week for our weekly backup cycle. Works like a charm so far.

#!/bin/bash

# Set the language to English so virsh does it's output
# in English as well
LANG=en_US

# Define the script name, this is used with systemd-cat to
# identify this script in the journald output
SCRIPTNAME=kvm-backup

# List domains
DOMAINS=$(virsh list | tail -n +3 | awk '{print $2}')

# Loop over the domains found above and do the
# actual backup

for DOMAIN in $DOMAINS; do

	echo "Starting backup for $DOMAIN on $(date +'%d-%m-%Y %H:%M:%S')" | systemd-cat -t $SCRIPTNAME

	# Generate the backup folder URI - this is something you should
	# change/check
	BACKUPFOLDER=/mnt/$DOMAIN/$(date +%d-%m-%Y)
	mkdir -p $BACKUPFOLDER

	# Get the target disk
	TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')

	# Get the image page
	IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	# Create the snapshot/disk specification
	DISKSPEC=""

	for TARGET in $TARGETS; do
		DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
	done

	virsh snapshot-create-as --domain $DOMAIN --name "backup-$DOMAIN" --no-metadata --atomic --disk-only $DISKSPEC 1>/dev/null 2>&1

	if [ $? -ne 0 ]; then
		echo "Failed to create snapshot for $DOMAIN" | systemd-cat -t $SCRIPTNAME
		exit 1
	fi

	# Copy disk image
	for IMAGE in $IMAGES; do
		NAME=$(basename $IMAGE)
		cp $IMAGE $BACKUPFOLDER/$NAME
	done

	# Merge changes back
	BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	for TARGET in $TARGETS; do
		virsh blockcommit $DOMAIN $TARGET --active --pivot 1>/dev/null 2>&1

		if [ $? -ne 0 ]; then
			echo "Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in invalid state." | systemd-cat -t $SCRIPTNAME
			exit 1
		fi
	done

	# Cleanup left over backups
	for BACKUP in $BACKUPIMAGES; do
		rm -f $BACKUP
	done

	# Dump the configuration information.
	virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN.xml 1>/dev/null 2>&1

	echo "Finished backup of $DOMAIN at $(date +'%d-%m-%Y %H:%M:%S')" | systemd-cat -t $SCRIPTNAME
done

exit 0

@nebbian
Copy link

nebbian commented May 4, 2018

Hey @sebastiaanfranken, thanks for posting. I'm manually going through this script to see how it works, and am a little confused by these lines:

BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

# Cleanup left over backups
	for BACKUP in $BACKUPIMAGES; do
		rm -f $BACKUP
	done

It seems to me that this script will delete the images from the running VMs, instead of the old backups. Is this a copy/paste issue from the lines above (getting the original list of images), or am I reading this wrong?

@tevkar
Copy link

tevkar commented May 12, 2018

@nebbian,

Setting BACKUPIMAGES before running blockcommit gives a list of snapshot image files which will be comitted. They can safely be removed if blockcommit operation was successful.

@v0112358
Copy link

v0112358 commented May 24, 2018

Great ! Thank you for your script. But I think you should use '--spare=always' option with cp command.

@churusaa
Copy link

churusaa commented Jun 11, 2018

Updated version that uses rsync instead of cp so that progress, elapsed time, rate, etc... are shown. Example of pv included (not tested) not used because it doesn't preserve permissions.


# Set the language to English so virsh does it's output
# in English as well
# LANG=en_US

# Define the script name, this is used with systemd-cat to
# identify this script in the journald output
SCRIPTNAME=kvm-backup

# List domains
DOMAINS=$(virsh list | tail -n +3 | awk '{print $2}')

# Loop over the domains found above and do the
# actual backup

for DOMAIN in $DOMAINS; do

	echo "Starting backup for $DOMAIN on $(date +'%d-%m-%Y %H:%M:%S')" | systemd-cat -t $SCRIPTNAME

	# Generate the backup folder URI - this is something you should
	# change/check
	BACKUPFOLDER=/mnt/backups/$DOMAIN/$(date +%d-%m-%Y)
	mkdir -p $BACKUPFOLDER

	# Get the target disk
	TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')

	# Get the image page
	IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	# Create the snapshot/disk specification
	DISKSPEC=""

	for TARGET in $TARGETS; do
		DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
	done

	virsh snapshot-create-as --domain $DOMAIN --name "backup-$DOMAIN" --no-metadata --atomic --disk-only $DISKSPEC 1>/dev/null 2>&1

	if [ $? -ne 0 ]; then
		echo "Failed to create snapshot for $DOMAIN" | systemd-cat -t $SCRIPTNAME
		exit 1
	fi

	# Copy disk image
	for IMAGE in $IMAGES; do
		NAME=$(basename $IMAGE)
                # cp $IMAGE $BACKUPFOLDER/$NAME
                # pv $IMAGE > $BACKUPFOLDER/$NAME
		rsync -ah --progress $IMAGE $BACKUPFOLDER/$NAME
	done

	# Merge changes back
	BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	for TARGET in $TARGETS; do
		virsh blockcommit $DOMAIN $TARGET --active --pivot 1>/dev/null 2>&1

		if [ $? -ne 0 ]; then
			echo "Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in invalid state." | systemd-cat -t $SCRIPTNAME
			exit 1
		fi
	done

	# Cleanup left over backups
	for BACKUP in $BACKUPIMAGES; do
		rm -f $BACKUP
	done

	# Dump the configuration information.
	virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN.xml 1>/dev/null 2>&1

	echo "Finished backup of $DOMAIN at $(date +'%d-%m-%Y %H:%M:%S')" | systemd-cat -t $SCRIPTNAME
done

exit 0

@bttrfngrs
Copy link

bttrfngrs commented Nov 19, 2018

@churusaa I was loving this script .. until a recent upgrade on our ubuntu to the latest Bionic Beaver the script quickly returns to prompt without executing any commands..

@andreaswork
Copy link

andreaswork commented Jan 8, 2019

I had some problems with blockjob still being active when the script tried to blockcommit on VM's with more than 1 disk, had to manually abort the blockjob and redo blockcommit on both disks.

To prevent this issue, i added "--wait" option on line 60, so it actually checks and doesn't assume the blockjob is complete before starting blockcommit.

ps. this is not an issue with this script, this is an issue with Virsh tool, --wait fixes this problem.

@davidshomelab
Copy link

davidshomelab commented Jan 19, 2019

Loving this script, I just have one issue with it. Some of my VMs have physical disks attached to them meaning that this script returns the error: "error: unsupported configuration: source for disk 'sda' is not a regular file; refusing to generate external snapshot name"

I already have backups in place for the files on these disks so I don't need them included in the snapshot. Is there a way I can exclude block devices within the script?

@berlinguyinca
Copy link

berlinguyinca commented Feb 28, 2019

well done!

@reinistihovs
Copy link

reinistihovs commented Jul 25, 2019

Heres my version, which creates backups with rsync --sparse and --inplace.
What this means?
First time when script is running, the whole .qcow2 file is transfered,
Next time only the changes inside the .qcow2 file are synced into the backup file.
This makes the backup complete 10x faster.
This also saves a TON of space.

#!/bin/bash

#To exclude a domain, please add to its name "nobackup"
#First shutdown the guest, then use this command: virsh domrename oldname newname.

DATE=`date +%Y-%m-%d.%H:%M:%S`
LOG=/var/log/kvm-backup.$DATE.LOG
BACKUPROOT=/backup


DOMAINS=$(virsh list --all | tail -n +3 | awk '{print $2}')

for DOMAIN in $DOMAINS; do
        echo "-----------WORKER START $DOMAIN-----------" > $LOG
        echo "Starting backup for $DOMAIN on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

        if [[ $DOMAIN == *"nobackup"* ]];then
                echo "Skipping $DOMAIN , because its excluded." > $LOG
                exit 1
        fi

        VMSTATE=`virsh list --all | grep $DOMAIN | awk '{print $3}'`
        if [[ $VMSTATE != "running" ]]; then
                echo "Skipping $DOMAIN , because its not running." > $LOG
                exit 1
        fi

        BACKUPFOLDER=$BACKUPROOT/KVM-BACKUPS/$DOMAIN
        mkdir -p $BACKUPFOLDER
        TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
        IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
        DISKSPEC=""
        for TARGET in $TARGETS; do
                DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
        done

        virsh snapshot-create-as --domain $DOMAIN --name "backup-$DOMAIN" --no-metadata --atomic --disk-only $DISKSPEC >> $LOG
        if [ $? -ne 0 ]; then
                echo "Failed to create snapshot for $DOMAIN" > $LOG
                exit 1
        fi

        for IMAGE in $IMAGES; do
                NAME=$(basename $IMAGE)
                if test -f "$BACKUPFOLDER/$NAME"; then
                echo "Backup exists, merging only changes to image" > $LOG
                rsync -apvz --inplace $IMAGE $BACKUPFOLDER/$NAME >> $LOG
                else
                echo "Backup does not exist, creating a full sparse copy" > $LOG
                rsync -apvz --sparse $IMAGE $BACKUPFOLDER/$NAME >> $LOG
                fi

        done

        BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
        for TARGET in $TARGETS; do
                virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG

                if [ $? -ne 0 ]; then
                        echo "Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in invalid state." > $LOG
                        exit 1
                fi
        done

        for BACKUP in $BACKUPIMAGES; do
                if [[ $BACKUP == *"backup-"* ]];then

                echo "deleted temporary image $BACKUP" > $LOG
                rm -f $BACKUP
                fi
        done

        virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN.xml
        echo "-----------WORKER END $DOMAIN-----------" >> $LOG
        echo "Finished backup of $DOMAIN at $(date +'%d-%m-%Y %H:%M:%S')" >> $LOG
done

exit 0

@fuznutz04
Copy link

fuznutz04 commented Aug 2, 2019

@reinistihovs, I'll give your version a try. I have been using the original one posted in this thread, and it is successfully taking snapshots, transferring them etc. It is also generating a domain.xml file. However, the file is blank. If I manually do a dumpxml, it generates the proper configuration file. But it does not do so properly via the script. Any ideas?

@fuznutz04
Copy link

fuznutz04 commented Sep 5, 2019

Does anyone else have this issue? This will generate a domain.xml file, but the file is blank. If I manually do a dumpxml, it generates the proper configuration file. But it does not do so properly via the script. Any ideas?

@brentl99
Copy link

brentl99 commented Jan 9, 2020

Thank you for posting scripts, they are VERY VERY helpful. I have posted below a variation most suitable for my use:

#!/bin/bash
#
# This script backs up a list of VMs.
# An overview of the process is as follows:
# * invokes a "snapshot" which transfers VM disk I/O to new "snapshot" image file(s).
# * copy (and encrypt if applicable) the VM's image file(s) to a backup
# * invoke a "blockcommit" which merges (or "pivots") the snapshot image back
#   to the VM's primary image file(s)
# * delete the snapshot image file(s)
# * make a copy of the VM define/XML file
# * delete old images and XMLs, retaining images for x dates where x = IMAGECNT
#
# Note: On CentOS 7 snapshotting requires the "-ev" version of qemu.
#       yum install centos-release-qemu-ev qemu-kvm-ev libvirt
#
# The script uses gzip to compress the source image (e.g. qcow2) on the fly 
# to the destination backup image. bzip2 was also tested, but bzip2 (and 
# other compression utilities) provide better compression (15-20%) but gzip
# is 7-10 times faster.
# 
# The script uses gpg symmetric encryption to encrypt the source image. The 
# encryption password is set using the ENCPASS field, and items must be decrypted
# before they are unzipped. Encrypted files will have an extension of .gz.gpg. By 
# leaving the ENCPASS field blank you can disable this feature.
#
# If the process fails part way through the snapshot, copy, or blockcommit, 
# the VM may be left running on the snapshot file which is Not desirable.
#

# define an emergency mail recipient
EMR=servermessages@mydomain.com
# encryption password. if left blank, files are not encrypted
ENCPASS="MyPassworD"
HOST="$(hostname)"
SHCMD="$(basename -- $0)"
BACKUPROOT=/mydata01/qemu_backups
IMAGECNT=3
[ ! -f $BACKUPROOT/logs ] && mkdir -p $BACKUPROOT/logs
DATE="$(date +%Y-%m-%d.%H%M%S)"
LOG="$BACKUPROOT/logs/qemu-backup.$(date +%Y-%m-%d).log"
ERRORED=0
BREAK=false
SNAPPREFIX=snaptemp-

#Optionally list all VMs and back them all up
#DOMAINS=$(virsh list --all | tail -n +3 | awk '{print $2}')
DOMAINS="myVMa myVMb"

# extract the date coding in filename (note: filename format must be YYYY-MM-DD)
dtmatch () { sed -n -e 's/.*\(2[0-1][0-9][0-9]-[0-1][0-9]-[0-3][0-9]\).*/\1/p'; }

echo "$SHCMD: Starting backups on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG
for DOMAIN in $DOMAINS; do
	BREAK=false

        echo "---- VM Backup start $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

        VMSTATE=$(virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')
        if [[ $VMSTATE != "running" ]]; then
                echo "Skipping $DOMAIN , because it is not running." >> $LOG
                continue
        fi

        BACKUPFOLDER=$BACKUPROOT/$DOMAIN
        [ ! -d $BACKUPFOLDER ] && mkdir -p $BACKUPFOLDER
        TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
        IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	# check to make sure the VM is running on a standard image, not
	# a snapshot that may be from a backup that previously failed
        for IMAGE in $IMAGES; do
                set -o noglob
                if [[ $IMAGE == *${SNAPPREFIX}* ]]; then
                        set +o noglob
                	ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $IMAGE"
			echo $ERR >> $LOG
			echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Command:    virsh domblklist $DOMAIN --details" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
                	BREAK=true
			ERRORED=$(($ERRORED+1))
			break
                fi
		set +o noglob
        done
	[ $BREAK == true ] && continue

	# gather all the disks being used by the VM so they can be collectively snapshotted
        DISKSPEC=""
        for TARGET in $TARGETS; do
                set -o noglob
                if [[ $TARGET == *${SNAPPREFIX}* ]]; then
                        set +o noglob
                	ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $TARGET"
			echo $ERR >> $LOG
			echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Command:    $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
			BREAK=true
                	break
                fi
                set +o noglob
                DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
        done
	[ $BREAK == true ] && continue

	# transfer the VM to snapshot disk image(s)
        CMD="virsh snapshot-create-as --domain $DOMAIN --name ${SNAPPREFIX}$DOMAIN-$DATE --no-metadata --atomic --disk-only $DISKSPEC >> $LOG 2>&1"
        echo "Command: $CMD" >> $LOG 2>&1
        eval "$CMD"
        if [ $? -ne 0 ]; then
                ERR="Failed to create snapshot for $DOMAIN"
		echo $ERR >> $LOG
		echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
		ERRORED=$(($ERRORED+1))
                continue
        fi

	# copy/back/compress the VM's disk image(s)
        for IMAGE in $IMAGES; do
		echo "Copying $IMAGE to $BACKUPFOLDER" >> $LOG
		ZFILE="$BACKUPFOLDER/$(basename -- $IMAGE)-$DATE.gz"
		# determine whether the gzip is to be encrypted or not
		if [ -z "${ENCPASS}" ]; then 
			CMD="gzip < $IMAGE > $ZFILE 2>> $LOG"
		else
			exec {pwout}> /tmp/agspw.$$
			exec {pwin}< /tmp/agspw.$$
			rm /tmp/agspw.$$
			echo $ENCPASS >&$pwout
			ZFILE="$ZFILE.gpg"
			CMD="gzip < $IMAGE --to-stdout | gpg --batch --yes -o $ZFILE --passphrase-fd $pwin -c >> $LOG"
		fi
		echo "Command: $CMD" >> $LOG
		SECS=$(printf "%.0f" $(/usr/bin/time -f %e sh -c "$CMD" 2>&1))
		printf '%s%dh:%dm:%ds\n' "Duration: " $(($SECS/3600)) $(($SECS%3600/60)) $(($SECS%60)) >> $LOG
		# clear fds if necessary
		if [ -n "${ENCPASS}" ]; then 
			exec {pwout}>&-
			exec {pwin}<&-
			unset pwout pwin
		fi
		BYTES=$(stat -c %s $IMAGE) 
		printf "%s%'d\n" "Source MB: " $(($BYTES/1024/1024)) >> $LOG
		printf "%s%'d\n" "kB/Second: " $(($BYTES/$SECS/1024)) >> $LOG
		ZBYTES=$(stat -c %s $ZFILE) 
		printf "%s%'d\n" "Destination MB: " $(($ZBYTES/1024/1024)) >> $LOG
		printf "%s%d%s\n" "Compression: " $((($BYTES-$ZBYTES)*100/$BYTES)) "%" >> $LOG
        done

	# Update the VM's disk image(s) with any changes recorded in the snapshot 
	# while the copy process was running.  In qemu lingo this is called a "pivot"
        BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
        for TARGET in $TARGETS; do
                CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
                echo "Command: $CMD" >> $LOG 
                eval "$CMD"

                if [ $? -ne 0 ]; then
			ERR="Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in an invalid state."
                        echo $ERR >> $LOG
			echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
                        BREAK=true
			ERRORORED=$(($ERRORED+1))
			break
                fi
        done
	[ $BREAK == true ] && continue

	# Now that the VM's disk image(s) have been successfully committed/pivoted to
	# back to the main disk image, remove the temporary snapshot image file(s)
        for BACKUP in $BACKUPIMAGES; do
                set -o noglob
                if [[ $BACKUP == *${SNAPPEFIX}* ]]; then
                        set +o noglob
			CMD="rm -f $BACKUP >> $LOG 2>&1"
                	echo " Deleting temporary image $BACKUP" >> $LOG
                	echo "Command: $CMD" >> $LOG
			eval "$CMD"
                fi
                set +o noglob
        done

	# capture the VM's definition in use at the time the backup was done
        CMD="virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN-$DATE.xml 2>> $LOG"
        echo "Command: $CMD" >> $LOG 
        eval "$CMD"

	# Tracks whether xmls have been cleared
	DDEL='no'
	# check image retention count
	for IMAGE in $IMAGES; do 
		COUNT=`find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*'.gz'*' -print | dtmatch | sort -u | wc -l`
		if [ $COUNT -gt $IMAGECNT  ]; then
			echo "$SHCMD: Count for BACKUPFOLDER ($BACKUPFOLDER) for image ($(basename -- $IMAGE)) too high ($COUNT), deleting historical files over $IMAGECNT..."
			LIST=`find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*'.gz'*' -print | dtmatch | sort -ur | sed -e "1,$IMAGECNT"d`
			
			# make sure LIST has a value otherwise fgrep will allow the entire find
			# result to be passed to xarg rm
			if [ -n "$LIST" ]; then
				# Delete the specific images in the dates list
				find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*' | fgrep "$LIST" | xargs rm
				# Only delete old xmls once
				if [[ $DDEL == 'no' ]]; then
					# Delete the xmls in the dates list
					find $BACKUPFOLDER -type f -name $DOMAIN-'*' | fgrep "$LIST" | xargs rm
					DDEL='yes'
				fi
			fi
		fi
	done

        echo "---- Backup done $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S') ----" >> $LOG
done
echo "$SHCMD: Finished backups at $(date +'%d-%m-%Y %H:%M:%S')
====================" >> $LOG

exit $ERRORED

@panciom
Copy link

panciom commented Jan 29, 2020

Great job.
I have modified the script for making remote rsync backup via SSH. With bwlimit.
It require password-less login to remote server.
It enforce on VM (name not win as it not works very qell with guest agent installed) --quiesce option to flush cache to disk (guest agent required on VM).
It send a mail at the end of the script.
It works on running and stopped VM.

It's not very fast. Rsync on hundreds GB of data is slow only for reading all.
I'm thinking to use PHP to make a script for making multiple external live snapshot for transferring only changed data already on single little file.
PHP because i'm friendly with it.

Good backup.

#!/bin/bash

# Script: kvm-backup.sh
# By: Pancio <....@.....it>
# Date: 21/12/2019
#
# Descr: Live backup of KVM VM on host via SSH...
# Source: https://gist.github.com/cabal95/e36c06e716d3328b512b
# Credits: See link above...

# NOTES:
# To exclude a domain, please add to its name "nobackup"
# First shutdown the guest, then use this command: virsh domrename oldname newname.




###########################################################
################### START CONFIGURATION ###################

# FQDN server name...
SERVER_NAME=$(hostname -f)

# Mail sender and destination...
MAIL_TO=to@gmail.com
MAIL_FROM=from@gmail.com

# Path to log files...
LOG_PATH=/var/log/kvm-backup

# Path where to place backup files...
# Used with local or SSH remote...
# Use absolute path "/" or related to home "~/"...
BACKUP_PATH=/Backup-KVM

# If you have guest-agent installed on VM "--quiesce"
# can flush data to disk before taking snapshot... 
SNAP_CREATE_PARAMS="--quiesce"

# Disable --quiesce for win VM...
DISABLE_SNAP_CREATE_PARAMS_WIN=1

# Be verbose on pivoting snapshot to original disk-file...
SNAP_COMM_PARAMS="--verbose"

# SSH remote rsync backup these are examples...
# Remember to configure passordless login to remote server...
RSYNC_SSH_PARAMS='-p2222'
SCP_SSH_PARAMS='-P2222'
RSYNC_DEST="root@srv.remote.it"

# Max KB/s rsync data transfer (ex. 1200KB/s=10Mbps)...
RSYNC_SPEED="--bwlimit=2400"

# Path to tmp for storing dumpxml...
PATH_TMP_XML=/tmp

# Path to XML files for stopped VM...
PATH_LIBVIRT_QEMU="/etc/libvirt/qemu"

################### STOP CONFIGURATION ###################
###########################################################




DATE=`date +%Y-%m-%d_%H-%M-%S`
LOG=$LOG_PATH/kvm-backup.$DATE.log
WARNING=0


DOMAINS=$(virsh list --all 2> /dev/null | tail -n +3 | awk '{print $2}')

if [ -z "$DOMAINS" ]; then
  echo "Error extracting VM list from libvirt!" >> $LOG
  cat $LOG | mail -s "[KVM-Backup] Node='$SERVER_NAME' ERROR EXTRACTING VM LIST" $MAIL_TO -aFrom:$MAIL_FROM	
  exit 1
fi

NR_DOMAINS=$(echo "$DOMAINS" | wc -l)


for DOMAIN in $DOMAINS; do
  echo "" >> $LOG
  echo "-----------WORKER START FOR VM $DOMAIN-----------" >> $LOG
  echo "-> Starting backup for VM $DOMAIN on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

  if [[ $DOMAIN == *"nobackup"* ]];then
    echo "-> Skipping VM $DOMAIN because its excluded." >> $LOG
    continue
  fi

  VM_RUNNING=1
  VMSTATE=`virsh list --all | grep $DOMAIN | awk '{print $3}'`
	if [[ $VMSTATE != "in" ]]; then
    echo "-> VM $DOMAIN not running. No snapshot and blockcommit. Only rsync." >> $LOG
    VM_RUNNING=0
  fi

	MY_SNAP_CREATE_PARAMS=$SNAP_CREATE_PARAMS
  if [[ $DOMAIN == *"win"* ]];then
    echo "-> Skipping SNAP_CREATE_PARAMS for $DOMAIN because its *win*." >> $LOG
    MY_SNAP_CREATE_PARAMS=""
  fi

	# Images to copy...
	IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	if [[ $VM_RUNNING -eq 1 ]]; then
		TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
		DISKSPEC=""

		for TARGET in $TARGETS; do
		  DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
		done

		virsh snapshot-create-as --domain $DOMAIN --name "backup-$DOMAIN" --no-metadata --atomic --disk-only $MY_SNAP_CREATE_PARAMS $DISKSPEC >> $LOG
		if [ $? -ne 0 ]; then
		  echo "-> Failed to create snapshot for VM $DOMAIN. Try verify GuestAgent is running inside." >> $LOG
		  WARNING=1    
		  continue
		fi
	fi


  BACKUPFOLDER=$BACKUP_PATH/$SERVER_NAME/$DOMAIN
  ssh $RSYNC_SSH_PARAMS $RSYNC_DEST ''mkdir -p $BACKUPFOLDER''

  for IMAGE in $IMAGES; do
    NAME=$(basename $IMAGE)
    DUMMY=$((ssh $RSYNC_SSH_PARAMS $RSYNC_DEST stat $BACKUPFOLDER/$NAME) 2>&1)

    if [ $? -eq 0 ]; then
      echo "-> Backup exists on $RSYNC_DEST:$BACKUPFOLDER/$NAME, merging only changes to image" >> $LOG
      rsync -apvz -e "ssh $RSYNC_SSH_PARAMS" $RSYNC_SPEED --inplace $IMAGE $RSYNC_DEST:$BACKUPFOLDER/$NAME >> $LOG
    else
      echo "-> Backup does not exist on $RSYNC_DEST:$BACKUPFOLDER/$NAME, creating a full sparse copy" >> $LOG
      rsync -apvz -e "ssh $RSYNC_SSH_PARAMS" $RSYNC_SPEED --sparse $IMAGE $RSYNC_DEST:$BACKUPFOLDER/$NAME >> $LOG
    fi
  done

	if [[ $VM_RUNNING -eq 1 ]]; then
		BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
		
		for TARGET in $TARGETS; do
		  virsh blockcommit $DOMAIN $TARGET --active --pivot $SNAP_COMM_PARAMS >> $LOG

		  if [ $? -ne 0 ]; then
		    echo "-> Could not merge changes for disk of $TARGET of VM $DOMAIN. VM may be in invalid state." >> $LOG
		    WARNING=1    
		    continue
		  fi
		done

		for BACKUP in $BACKUPIMAGES; do
		  if [[ $BACKUP == *"backup-"* ]];then
		    echo "-> Deleted temporary image $BACKUP" >> $LOG
		    rm -f $BACKUP
		  fi
		done
	fi

	if [[ $VM_RUNNING -eq 1 ]]; then
  	virsh dumpxml $DOMAIN > $PATH_TMP_XML/$DOMAIN.xml
		scp $SCP_SSH_PARAMS $PATH_TMP_XML/$DOMAIN.xml $RSYNC_DEST:$BACKUPFOLDER/$DOMAIN.xml >> $LOG
		rm $PATH_TMP_XML/$DOMAIN.xml
	else
		scp $SCP_SSH_PARAMS $PATH_LIBVIRT_QEMU/$DOMAIN.xml $RSYNC_DEST:$BACKUPFOLDER/$DOMAIN.xml >> $LOG
	fi

  echo "-> Finished backup of $DOMAIN at $(date +'%d-%m-%Y %H:%M:%S')" >> $LOG
  echo "-----------WORKER END FOR VM $DOMAIN-----------" >> $LOG
done

echo "" >> $LOG


# Send email of results for comodity...
cat $LOG | mail -s "[KVM-Backup] Node='$SERVER_NAME' VMcount='$NR_DOMAINS' Warning='$WARNING'" $MAIL_TO -aFrom:$MAIL_FROM


exit 0

@brentl99
Copy link

brentl99 commented Feb 5, 2020

I discovered a bug in posted sample scripts that include the following:
VMSTATE=`virsh list --all | grep $DOMAIN | awk '{print $3}'

If you have domain names like "windows10", "windows10vm", "dows10", etc, grep fails to extract the correct line from the virsh list.
In my example this occurs because "windows10" matches "windows10" and "windows10vm".
And "dows10" matches "windows10", "windows10vm" and "dows10".

The solution that I made to correct this was the following edit:
VMSTATE=`virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}'

@Ryushin
Copy link

Ryushin commented Feb 18, 2020

Here is a version that uses Borg Backup. I used brentl99's version for my template. You can configure Borg to use local storage or remote storage using SSH. I've added quite a few options. Including skipping specific domains or specific virtual disks. There are also options for checking the Borg Repositories on a specific day of the week or specific day on certain weeks in the month. I've been running it in production for a couple of weeks now with good success.

#!/bin/bash
#
# This script backs up a list of VMs using Borg Backup.
# An overview of the process is as follows:
# * invokes a "snapshot" which transfers VM disk I/O to new "snapshot" image file(s).
# * Use borg to backup the VM's image file(s)
# * invoke a "blockcommit" which merges (or "pivots") the snapshot image back
#   to the VM's primary image file(s)
# * delete the snapshot image file(s)
# * make a copy of the VM define/XML file
#
# If the process fails part way through the snapshot, copy, or blockcommit,
# the VM may be left running on the snapshot file which is Not desirable.
#
# Note: Paths and the virtual domains cannot contain spaces

# Define email recipient
EMAIL_RECIPIENT="user@domain.net"

HOST="$(hostname)"
SHCMD="$(basename -- $0)"
LOGS_DIR=/var/log/vm_backups
DATE="$(date +%Y%m%d_%H%M)"
LOG="$LOGS_DIR/vm_backups.$DATE.log"
ERRORED=0
BREAK=false
QEMU_XML_BACKUPS="/var/log/vm_backups/qemu_xml_backups"

# List of any domains to not back up.  Separate with a pipe |
SKIP_DOMAINS=""

# List of specific virtual disks to not back up. Separate with a pipe |
SKIP_DISKS=""

# Virsh snapshot extension name
VIRSH_SNAP_NAME="snaptemp"

# Send summary email at the end of the backup?
EMAIL_SUMMARY="yes"

# Output borg summary at end of backup?
END_SUMMARY="yes"

# How many days to keep logs and qemu.xml files.
KEEP_FILES_FOR="14"

# Should ionice and nice be used when creating backups?
BE_NICE="yes"

# Borg environment varibles
#export BORG_SSH_SERVER='borg-backup@nas.example.net'
export BORG_REPO='/volume1/borgbackuprepo'
#export BORG_REPO="ssh://$BORG_SSH_SERVER/volume1/BorgBackupRepo"
#export BORG_RSH='ssh -i /home/user/.ssh/id_ed25519 -o BatchMode=yes -o VerifyHostKeyDNS=yes'
# See https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones
export BORG_FILES_CACHE_TTL='100'

# Borg create options
# Note: ZFS and BTRFS should use native compression.  
#	ionice and nice is used with the borg create command.
BORG_CREATE_OPTIONS="--compression none --list --stats --files-cache=mtime,size --noctime --noatime"

# Borg init options
BORG_INIT_OPTIONS="--make-parent-dirs --encryption=none"

# How long to keep borg archives
# --keep-hourly=24 --keep-daily=14 --keep-weekly=4 --keep-monthly=2"
# --keep-within=14d
BORG_PRUNE_OPTIONS="-v --list --keep-within=14d"

# Borg mtime touch file
# See: https://borgbackup.readthedocs.io/en/stable/faq.html#i-am-seeing-a-added-status-for-an-unchanged-file
BORG_MTIME_FILE="borg_touch_mtime.txt"

# How often to perform full check on borg repositories.
# Day of the week to perform the full checK.  Use full weekday name (date %A).
CHECK_DOW="Friday"
# Which week in the month to perform the full check.  Put any number(s) 1-5.
CHECK_WEEKS="12345"
# Send email of check results?
EMAIL_CHECK_RESULTS="yes"
# Borg check options
BORG_CHECK_OPTIONS=""

##################  End Configuration Options  ##################

# Create directories if they are missing
[ ! -f $LOGS_DIR ] && mkdir -p $LOGS_DIR
[ ! -f $QEMU_XML_BACKUPS ] && mkdir -p $QEMU_XML_BACKUPS

# SKIP_DOMAINS and SKIP_DISKS cannot be blank or egrep will match everything
[ "$SKIP_DOMAINS" = "" ] && SKIP_DOMAINS="$(mktemp --dry-run XXXXXXXXXXXXXXXXXXXX)"
[ "$SKIP_DISKS" = "" ] && SKIP_DISKS="$(mktemp --dry-run XXXXXXXXXXXXXXXXXXXX)"

# Check for $BE_NICE and set options
[ "$BE_NICE" = "yes" ] && NICE_OPTIONS="ionice -c2 -n7 nice -n19"

# Create list of domains to backup except those in SKIP_DOMAINS
DOMAINS=$(virsh list --all | tail -n +3 | egrep -v "$SKIP_DOMAINS" | awk '{print $2}' | sort | sed '/^[[:space:]]*$/d')

# Create summary temp file if required
[ "$EMAIL_SUMMARY" = "yes" -o "$END_SUMMARY" = "yes" ] && SUMMARY_FILE="$(mktemp /tmp/summary_XXXXXXX)"

# Check to see if the borg repo is using ssh and test connection.
echo "$BORG_REPO" | grep "ssh://" > /dev/null
if [ "$?" -eq "0" -a "$BORG_SSH_SERVER" != "" ]
then
	ssh -oBatchMode=yes $BORG_SSH_SERVER ls > /dev/null 2>&1
	if [ "$?" -ne "0" ]
	then
		ERR="$SHCMD: Error!  Cannot connect to $BORG_SSH_SERVER with SSH key."
		echo $ERR >> $LOG
		echo "$ERR
Host:       $HOST
Command:    ssh -oBatchMode=yes $BORG_SSH_SERVER ls" | mail -s "$SHCMD ssh connection failed" $EMAIL_RECIPIENT
	echo $ERR
	exit 1
	fi
fi

DAY_OF_WEEK="$(date +'%A')"
WEEK_IN_MONTH=$(echo $((($(date +%-d)-1)/7+1)))

echo "$SHCMD: Starting backups on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

# Check borg respositories if day of week and week in month match options.
echo $CHECK_WEEKS | grep -q $WEEK_IN_MONTH && CURRENT_WEEK="true"
if [ "$DAY_OF_WEEK" = "$CHECK_DOW" -a "$CURRENT_WEEK" = "true" ] 
then
	CHECK_RESULTS_FILE="$(mktemp /tmp/check_results_XXXXXXX)"
	echo -e "Perform full check of borg repositories\n" > $CHECK_RESULTS_FILE
	for DOMAIN in $DOMAINS; do
		borg info $BORG_REPO/$DOMAIN > /dev/null
		if [ "$?" -eq "0" ]
		then
			echo "Checking borg repository $BORG_REPO/$DOMAIN:" >> $CHECK_RESULTS_FILE
			borg --verbose check $BORG_CHECK_OPTIONS $BORG_REPO/$DOMAIN >> $CHECK_RESULTS_FILE 2>&1
			if [ "$?" -ne "0" ]
			then
				echo "Errors found in $BORG_REPO/$DOMAIN repository!" >> $CHECK_RESULTS_FILE
				echo "Manual intervention is required." >> $CHECK_RESULTS_FILE
				REPOSITORY_ERRORS="true"
			fi
		fi
	done
	if [ "$REPOSITORY_ERRORS" = "true" ]
	then
        echo -e "Borg repository errors found:\n\n
Host: $HOST

$(cat $CHECK_RESULTS_FILE)" | mail -s "Borg Repository Errors Found for $HOST" $EMAIL_RECIPIENT
	fi
	if [ "$EMAIL_CHECK_RESULTS" = "yes" -a "$REPOSITORY_ERRORS" != "true" ]
	then
        echo -e "Borg repository check results:\n\n
Host:       $HOST

$(cat $CHECK_RESULTS_FILE)" | mail -s "Borg Repository Check Results for $HOST" $EMAIL_RECIPIENT
	fi
fi


for DOMAIN in $DOMAINS; do
        BREAK=false

        echo "---- VM Backup start $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

        VMSTATE=$(virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')

        BORG_ARCHIVE_FOLDER=$BORG_REPO/$DOMAIN
        [ ! -d $BORG_ARCHIVE_FOLDER ] && mkdir -p $BORG_ARCHIVE_FOLDER
        TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
        IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | egrep -v "$SKIP_DISKS" | awk '{print $4}')
        # check to make sure the VM is running on a standard image, not
        # a snapshot that may be from a backup that previously failed
	if [ "$VMSTATE" = "running" ]
	then
	        for IMAGE in $IMAGES; do
	                if [[ $IMAGE == *"$VIRSH_SNAP_NAME-"* ]]; then
	                        ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $IMAGE"
	                        echo $ERR >> $LOG
	                        echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Logfile:    $LOG
Command:    virsh domblklist $DOMAIN --details" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMAIL_RECIPIENT
	                        BREAK=true
	                        ERRORED=$(($ERRORED+1))
	                        break
	                fi
	        done
	        [ $BREAK == true ] && continue

	        # gather all the disks being used by the VM so they can be collectively snapshotted
	        DISKSPEC=""
	        for TARGET in $TARGETS; do
	                if [[ $TARGET == *"$VIRSH_SNAP_NAME-"* ]]; then
	                        ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $TARGET"
	                        echo $ERR >> $LOG
	                        echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Logfile:    $LOG
Command:    $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMAIL_RECIPIENT
	                        BREAK=true
	                        break
	                fi
	                DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
	        done
	        [ $BREAK == true ] && continue

	        # Transfer the VM to snapshot disk image(s)
	        CMD="virsh snapshot-create-as --domain $DOMAIN --name $VIRSH_SNAP_NAME-$DOMAIN-$DATE --no-metadata --atomic --disk-only $DISKSPEC >> $LOG 2>&1"
	        echo "Command: $CMD" >> $LOG
	        eval "$CMD" >> $LOG 2>&1
	        if [ $? -ne 0 ]; then
	                ERR="Failed to create snapshot for $DOMAIN"
	                echo $ERR >> $LOG
	                echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Logfile: $LOG
Command: $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMAIL_RECIPIENT
	                ERRORED=$(($ERRORED+1))
	                continue
	        fi
	fi

        # Use borg to backup the VM's disk image(s)
	if [ "$IMAGES" != "" ]
	then
		echo -e "\nUsing borg to backup $IMAGES to $BORG_ARCHIVE_FOLDER" >> $LOG
		# Check to see if the borg repo exists and if not, create it.
		echo -e "\nChecking to see if the borg $BORG_ARCHIVE_FOLDER repository exists using borg info..." >> $LOG
		CMD="borg info $BORG_ARCHIVE_FOLDER >> $LOG  2>&1"
		echo "Command: $CMD" >> $LOG
		eval "$CMD"
		if [ "$?" -ne "0" ]
		then
			echo -e "\nBorg Repository does not exist.  Creating $BORG_ARCHIVE_FOLDER" >> $LOG
			CMD="borg init $BORG_INIT_OPTIONS $BORG_ARCHIVE_FOLDER >> $LOG 2>&1"
			echo "Command: $CMD" >> $LOG
			eval "$CMD"
		fi

		# Backup using borg
		NOW=$(date +%Y%m%d_%H%M)
		echo -e "\nBacking up using borg.." >> $LOG
		echo "Create mtime file and wait two seconds." >> $LOG
		IMAGE_DIR=$(dirname $(echo $IMAGES | awk {'print $1'} | head -1))
		touch $IMAGE_DIR/$BORG_MTIME_FILE; sleep 2
		CMD="$NICE_OPTIONS borg create $BORG_CREATE_OPTIONS $BORG_ARCHIVE_FOLDER::${DOMAIN}_$NOW $IMAGE_DIR/${BORG_MTIME_FILE} ${IMAGES//$'\n'/ } >> $LOG 2>&1"
		echo "Command: $CMD" >> $LOG
		eval "$CMD"
		if [ "$?" -ne "0" ]
		then
			ERR="$SHCMD: Error!  Borg failed to backup $DOMAIN"
			echo $ERR >> $LOG
			echo "$ERR
Host:     $HOST
Domain:   $DOMAIN
Logfile:  $LOG
Command:  $CMD" | mail -s "$SHCMD borg backup failed" $EMAIL_RECIPIENT
			echo $ERR
		fi
		rm $IMAGE_DIR/$BORG_MTIME_FILE

		# Summary information about the last backup.
		echo -e "\nShow summary info about the last backup" >> $LOG
		CMD="borg info $BORG_ARCHIVE_FOLDER --last 1 >> $LOG 2>&1"
		echo "Command: $CMD" >> $LOG
		eval "$CMD"
                if [ "$?" -ne "0" ]
                then
                        ERR="$SHCMD: Error!  Borg summary failed for $DOMAIN"
                        echo $ERR >> $LOG
                        echo "$ERR
Host:     $HOST
Domain:   $DOMAIN
Logfile:  $LOG
Command:  $CMD" | mail -s "$SHCMD borg summary failed" $EMAIL_RECIPIENT
                        echo $ERR
                fi
		# Create summary temp file if required
		if [ "$EMAIL_SUMMARY" = "yes" -o "$END_SUMMARY" = "yes" ]
		then
			echo "Summary of $DOMAIN" >> $SUMMARY_FILE
			borg info $BORG_ARCHIVE_FOLDER --last 1 >> $SUMMARY_FILE
			echo -e "-----------------------------------------------\n\n" >> $SUMMARY_FILE
		fi

		# Prune borg archives
		echo -e "\nPrune borg archives older than $BORG_PRUNE_OPTIONS days." >> $LOG
		CMD="borg prune $BORG_PRUNE_OPTIONS $BORG_ARCHIVE_FOLDER >> $LOG 2>&1"
		echo "Command: $CMD" >> $LOG
		eval "$CMD"
		if [ "$?" -ne "0" ]
                then
                        ERR="$SHCMD: Error!  Failed pruning borg archive for $DOMAIN"
                        echo $ERR >> $LOG
                        echo "$ERR
Host:     $HOST
Domain:   $DOMAIN
Logfile:  $LOG
Command:  $CMD" | mail -s "$SHCMD borg prune failed" $EMAIL_RECIPIENT
                        echo $ERR
                fi
	fi

	if [ "$VMSTATE" = "running" ]
	then
	        # Update the VM's disk image(s) with any changes recorded in the snapshot
	        # while the copy process was running.  In qemu lingo this is called a "pivot"
	        BACKUP_IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | egrep -v "$SKIP_DISKS" | awk '{print $4}')
	        for TARGET in $TARGETS; do
	                CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
	                echo "Command: $CMD" >> $LOG
	                eval "$CMD"
	                if [ $? -ne 0 ]; then
	                        ERR="Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in an invalid state."
	                        echo $ERR >> $LOG
	                        echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Logfile: $LOG
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMAIL_RECIPIENT
        	                BREAK=true
        	                ERRORORED=$(($ERRORED+1))
	                        break
	                fi
	        done
	        [ $BREAK == true ] && continue

	        # Now that the VM's disk image(s) have been successfully committed/pivoted to
	        # back to the main disk image, remove the temporary snapshot image file(s)
	        for BACKUP in $BACKUP_IMAGES; do
	                if [[ $BACKUP == *"$VIRSH_SNAP_NAME-"* ]]; then
	                        CMD="rm -f $BACKUP >> $LOG 2>&1"
	                        echo " Deleting temporary image $BACKUP" >> $LOG
	                        echo "Command: $CMD" >> $LOG
	                        eval "$CMD"
	                fi
	        done
	fi

        # Capture the VM's definition in use at the time the backup was done
        CMD="virsh dumpxml $DOMAIN > $QEMU_XML_BACKUPS/$DOMAIN.xml.qemuconfig.$(date +'%d-%m-%Y') 2>> $LOG"
        echo "Command: $CMD" >> $LOG
        eval "$CMD"
        echo "---- Backup done $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S') ----" >> $LOG
	# Prune old definition backup files
	echo "Remove old qemu definition files older than $KEEP_FILES_FOR days." >> $LOG
	find $QEMU_XML_BACKUPS -maxdepth 1 -mtime +$KEEP_FILES_FOR -name "*xml.qemuconfig" -exec rm -vf {} \; >> $LOG
done
# Remove old log files
echo "Remove log files older than $KEEP_FILES_FOR days" >> $LOG
find $LOGS_DIR -maxdepth 1 -mtime +$KEEP_FILES_FOR -name "*.log" -exec rm -vf {} \; >> $LOG

echo "$SHCMD: Finished backups at $(date +'%d-%m-%Y %H:%M:%S')
====================" >> $LOG

if [ "$EMAIL_SUMMARY" = "yes" ]
then

	echo -e "Summary of Borg Backup:\n\n
Host:       $HOST
Domains:    ${DOMAINS//$'\n'/ }

$(cat $SUMMARY_FILE)" | mail -s "Borg Backup Summary for $HOST" $EMAIL_RECIPIENT
fi

if [ "$END_SUMMARY" = "yes" ]
then
	echo -e "Borg Backup Summary:\n\n
Host:       $HOST
Domains:    ${DOMAINS//$'\n'/ }

$(cat $SUMMARY_FILE)\n"
fi

# Remove temp files
rm -f $SUMMARY_FILE $CHECK_RESULTS_FILE

exit $ERRORED

@Marc176
Copy link

Marc176 commented Feb 24, 2020

@Ryushin how to restore such a backup ?

@Ryushin
Copy link

Ryushin commented Feb 24, 2020

@Ryushin how to restore such a backup ?

Just use the borg tool to list the backups and you can actually restore using the borg too itself or you can use borg to mount the backup archive as a fuse file system and copy the files you want that way. The beauty of borg is that it's a dedupe backup. So only the delta changes are saved between backups.

@Marc176
Copy link

Marc176 commented Feb 24, 2020

@Ryushin how to restore such a backup ?

Just use the borg tool to list the backups and you can actually restore using the borg too itself or you can use borg to mount the backup archive as a fuse file system and copy the files you want that way. The beauty of borg is that it's a dedupe backup. So only the delta changes are saved between backups.

I´ve tested it a few times now -thanks for that nice script.

I´ve run it 2-3 times now, to backup VMs with 3 discs.. i mounted it afterwards but i only had one backup in the borg? I thought every backup run would create a own "version" ?

@brentl99
Copy link

brentl99 commented Feb 25, 2020

I recently updated my template. I was never happy with hard coding the temporary snapshot file prefix. Now the prefix of "snaptemp-" is provided for in a shell variable. The key to providing this change is handling globbing so the variable is not expanded by the shell before being passed as a command parameter. This is done by turning off globbing with "set -o noglob" and turning it back on using "set +o noglob". This further allowed me to remove poor/incorrect inline documentation on managing changing the prefix.

@Ryushin
Copy link

Ryushin commented Feb 25, 2020

I´ve run it 2-3 times now, to backup VMs with 3 discs.. i mounted it afterwards but i only had one backup in the borg? I thought every backup run would create a own "version" ?

It does create it's own version. Just specify the "::" and the backup name. For example, I have a VM called Sonny that I backup:

borg list /netshares/borg_backup_repos/windwalker/Sonny
Sonny_20200217_1337                  Mon, 2020-02-17 13:37:31 [7aa5397cf62398704293993b3870c12ce4c3c2ce28fb3cef3484ae70cce0736c]
Sonny_20200219_2301                  Wed, 2020-02-19 23:01:16 [dfc50a6c7c4c030d035ad182d83af671bb8e5c7b3b7e2c58915daea762506c7d]
Sonny_20200220_2301                  Thu, 2020-02-20 23:01:07 [99084ae50a5adf168f4287700681f5b3131fa7419d0195553313646d305749ed]
Sonny_20200221_2327                  Fri, 2020-02-21 23:27:05 [8a8134ba7960f5f43957ce744887d22c99bdd06c3b9931e01c1278f160c91a76]
Sonny_20200222_2301                  Sat, 2020-02-22 23:01:21 [ceff36df7a6f37d5b268b514db5fe757ed721f0f82c05ada57891b286f4d8e8d]
Sonny_20200223_2305                  Sun, 2020-02-23 23:05:26 [3fb9f327729948d09b7cde055de6fa41f5e97ff8c05bb2627aad3021da56c5f3]
Sonny_20200224_2303                  Mon, 2020-02-24 23:03:37 [a8803b00a8744a58a2e848dde35567ebc72c8db51cb1e9efd4a3a8611bf45f49]

Now I mount the backup I want:
borg mount /netshares/borg_backup_repos/windwalker/Sonny::Sonny_20200220_2301 /tmp/borg_mount

I can see the backup from that date and you can restore your data:
ls -la /tmp/borg_mount/var/lib/libvirt/images/*
-rw-r----- 1 root         root                    0 Feb 20 23:01 /tmp/borg_mount/var/lib/libvirt/images/borg_touch_mtime.txt
-rw-r----- 1 libvirt-qemu libvirt-qemu 102097158144 Feb 20 23:01 /tmp/borg_mount/var/lib/libvirt/images/Sonny.qcow2

Then unmount it when you are done:
umount /tmp/borg_mount

@Marc176
Copy link

Marc176 commented Feb 25, 2020

Yes, you made DAILY backups.. but if you do a backup several times at the same day, with the script you only have one version!

@Ryushin
Copy link

Ryushin commented Feb 25, 2020

Yes, you made DAILY backups.. but if you do a backup several times at the same day, with the script you only have one version!

There does seem to be a problem with multiple backups in the same day with the last backup replacing the previous one. I'll look into it.

@Marc176
Copy link

Marc176 commented Feb 25, 2020

Thank you very much for your support - maybe also an option to disable "ionice" - if you want to do it fast? Would be nice

@Marc176
Copy link

Marc176 commented Feb 25, 2020

Also - i am from Germany - and with de_DE locale its called "laufend" not "running" - is there a way to make it generic? 💃
I mean here "if [ "$VMSTATE" = "running" ]"

@Ryushin
Copy link

Ryushin commented Feb 26, 2020

Yes, you made DAILY backups.. but if you do a backup several times at the same day, with the script you only have one version!

I found the issue with the borg prune. I've updated the script to use a different prune option to keep all archives within a certain number of days. I've also added the BE_NICE option for running nice and ionice or not.

I'm not sure how to deal with the different language issue. Not sure if there is away around it except to be aware of it and modify the script appropriately.

@Ryushin
Copy link

Ryushin commented Feb 26, 2020

I recently updated my template. I was never happy with hard coding the temporary snapshot file prefix. Now the prefix of "snaptemp-" is provided for in a shell variable. The key to providing this change is handling globbing so the variable is not expanded by the shell before being passed as a command parameter. This is done by turning off globbing with "set -o noglob" and turning it back on using "set +o noglob". This further allowed me to remove poor/incorrect inline documentation on managing changing the prefix.

I also remove the hard coding of the snapshot prefix and I purposely caused failures to pivot back and the next backup detected the failed pivot and sent the alert email. I'm trying to see how the noglob is changing the behavior. Where did you find the shell variable not expanding correctly?

@brentl99
Copy link

brentl99 commented Feb 26, 2020

When I originally coded my version of the script I found that the pattern match was being expanded by the shell with local matching files before being passed as a regular expression. This may only happen if there are lingering snapshots from a terminated snapshot. To be honest, I cannot recall the exact case, it goes back a while. My use of set -o/+o noglob may be overly cautiuous. In general, the issue can be shown in the following sample code:

#!/bin/bash
PATTERNMATCH='*'
set -o noglob
echo GLOBBING OFF
echo $PATTERNMATCH
echo
echo GLOBBING ON - default behavior
set +o noglob
echo $PATTERNMATCH

Does this help answer your question?

I also remove the hard coding of the snapshot prefix and I purposely caused failures to pivot back and the next backup detected the failed pivot and sent the alert email. I'm trying to see how the noglob is changing the behavior. Where did you find the shell variable not expanding correctly?

@Marc176
Copy link

Marc176 commented Mar 2, 2020

@Ryushin: btw. there is an expection in the logs, if you run the backup first time for a VM thats not yet in the repo:

AssertionError: cleanup happened in Repository.del

@Ryushin
Copy link

Ryushin commented Mar 2, 2020

AssertionError: cleanup happened in Repository

I think the exception is caused by an older version of borg. I've been downloading the binaries from borg's git and just putting them in /usr/bin.
borgbackup/borg#4411

I just deleted a repo and ran the backup and I did not get any error. I think the only error that might occur is if you add a new VM that is not in the repo and that is the day you have chosen for a full repository check. I did not write logic for that occurrence since I thought it would be fairly rare

@Marc176
Copy link

Marc176 commented Mar 17, 2020

@Ryushin: One question, what happeds if i turn out a VM, and leave it open for over 14 days? Do the daily backups get deleted then ??

@Ryushin
Copy link

Ryushin commented Mar 17, 2020

@Ryushin: One question, what happeds if i turn out a VM, and leave it open for over 14 days? Do the daily backups get deleted then ??

I assume you mean turn off a VM. It backs up every day regardless. However, if the VM has not changed, the backup will only take a few seconds for that VM. One thing to note though, if you delete a VM, it will not delete it's disk volume backups, but it will delete it's xml config after the retention period. So if you delete a VM, you will need to remove the backup.

@Marc176
Copy link

Marc176 commented Mar 17, 2020

@Ryushin: ok thank you.!

@Marc176
Copy link

Marc176 commented Mar 24, 2020

@Ryushin: what actions are needed if this happends:

"# If the process fails part way through the snapshot, copy, or blockcommit,

the VM may be left running on the snapshot file which is Not desirable."

@Ryushin
Copy link

Ryushin commented Mar 24, 2020

@Ryushin: what actions are needed if this happends:

If the process fails part way through the snapshot, copy, or blockcommit,
the VM may be left running on the snapshot file which is Not desirable.

The domain failed to backup. So the process will be the same as any of the other scripts.
First find out which device needs to be committed:
virsh domblklist domainvm
Then block commit the device. I'm using sda as the sample.
virsh blockcommit domainvm sda --active --pivot
Run virsh domblklist domainvm again to verify that it did pivot back. If it did, then delete the temporary block device that was used that will have snaptemp in it's name (the default in the script).

@Marc176
Copy link

Marc176 commented Mar 24, 2020

@Ryushin: Thanks! :)

@brentl99
Copy link

brentl99 commented Apr 16, 2020

I have posted an update to the script I posted some time ago. There was a bug in the file retention logic (the last loop of the script) that assumed the name of the image matched the VM name. Also added to this script is the option to stream encrypt the images using gpg.

@tuomotalvitie
Copy link

tuomotalvitie commented Apr 21, 2020

Thank you for sharing these, and thank you @brentl99 for your variation. I added backing up the offline (well... not running) instances as well using the logic @panciom used for it, and created a check for skipping the backup if the modified timestamp of all the disks is older than the newest backup. Otherwise the logic is untouched (pivoting, retention, gpg).
It is quite possible that the check for not running should be explicitly for specific states (Shut off - Crashed? ...). Panciom checks for "in" which should cover "in shutdown".
Two disclaimers: 1. I don't speak .sh natively 2. I've been only running the script on dev servers.

#!/bin/bash
#
# [This script is slightly modified version of brentl99's script at
# https://gist.github.com/cabal95/e36c06e716d3328b512b - non-running instances 
# are also backed up, as implemented by panciom - additionally if the instance is
# offline, and the most recent backup is newer than disk modification times, the backup
# is skipped. Note: .xml is not checked for modification.]
#
# This script backs up a list of VMs.
# An overview of the process is as follows:
# * invokes a "snapshot" which transfers VM disk I/O to new "snapshot" image file(s).
# * copy (and encrypt if applicable) the VM's image filbrentl99e(s) to a backup
# * invoke a "blockcommit" which merges (or "pivots") the snapshot image back
#   to the VM's primary image file(s)
# * delete the snapshot image file(s)
# * make a copy of the VM define/XML file
# * delete old images and XMLs, retaining images for x dates where x = IMAGECNT
#
# Note: On CentOS 7 snapshotting requires the "-ev" version of qemu.
#       yum install centos-release-qemu-ev qemu-kvm-ev libvirt
#
# The script uses gzip to compress the source image (e.g. qcow2) on the fly 
# to the destination backup image. bzip2 was also tested, but bzip2 (and 
# other compression utilities) provide better compression (15-20%) but gzip
# is 7-10 times faster.
# 
# The script uses gpg symmetric encryption to encrypt the source image. The 
# encryption password is set using the ENCPASS field, and items must be decrypted
# before they are unzipped. Encrypted files will have an extension of .gz.gpg. By 
# leaving the ENCPASS field blank you can disable this feature.
#
# If the process fails part way through the snapshot, copy, or blockcommit, 
# the VM may be left running on the snapshot file which is Not desirable.
#
# TODO: optionally check the xml for modiication as well (and perhaps only 
# back it up, if only it has changes)
# TODO: parametrize Domains for clarity?
# TODO: perhaps check the permissions before starting, or run in test mode
# if no parameters, including sending mail

# define an emergency mail recipient
EMR=servermessages@mydomain.com
# encryption password. if left blank, files are not encrypted
#### gpg logic untouched from brentl99 version ####
#ENCPASS="MyPassworD"
HOST="$(hostname)"
SHCMD="$(basename -- $0)"
BACKUPROOT=/mydata01/qemu_backups
IMAGECNT=3
[ ! -f $BACKUPROOT/logs ] && mkdir -p $BACKUPROOT/logs
DATE="$(date +%Y-%m-%d.%H%M%S)"
LOG="$BACKUPROOT/logs/qemu-backup.$(date +%Y-%m-%d).log"
ERRORED=0
BREAK=false
SNAPPREFIX=snaptemp-

#Uncomment to backup domains myVMa and myVMb
#DOMAINS="myVMa myVMb"

#Alternatively list all VMs and back them all up
#DOMAINS=$(virsh list --all | tail -n +3 | awk '{print $2}')

# Path to XML files for stopped VM...
PATH_LIBVIRT_QEMU="/etc/libvirt/qemu"

# extract the date coding in filename (note: filename format must be YYYY-MM-DD)
dtmatch () { sed -n -e 's/.*\(2[0-1][0-9][0-9]-[0-1][0-9]-[0-3][0-9]\).*/\1/p'; }

echo "$SHCMD: Starting backups on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG
for DOMAIN in $DOMAINS; do
	BREAK=false

	echo "---- VM Backup start $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

	VM_RUNNING=1
	VMSTATE=$(virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')
	if [[ $VMSTATE != "running" ]]; then
		echo "-> VM $DOMAIN not running. No snapshot and blockcommit. Only copy." >> $LOG
    		VM_RUNNING=0
	fi

	BACKUPFOLDER=$BACKUPROOT/$DOMAIN
	[ ! -d $BACKUPFOLDER ] && mkdir -p $BACKUPFOLDER
	TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
	IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	# check to make sure the VM is running on a standard image, not
	# a snapshot that may be from a backup that previously failed
	for IMAGE in $IMAGES; do
		set -o noglob
		if [[ $IMAGE == *${SNAPPREFIX}* ]]; then
			set +o noglob
			ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $IMAGE"
			echo $ERR >> $LOG
			echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Command:    virsh domblklist $DOMAIN --details" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
			BREAK=true
			ERRORED=$(($ERRORED+1))
			break
		fi
		set +o noglob
	done
	[ $BREAK == true ] && continue

	if [[ $VM_RUNNING -eq 1 ]]; then
	# gather all the disks being used by the VM so they can be collectively snapshotted
	# TODO: Does this prefix check make sense for the targets (images certainly)
	DISKSPEC=""
	for TARGET in $TARGETS; do
		set -o noglob
		if [[ $TARGET == *${SNAPPREFIX}* ]]; then
			set +o noglob
			ERR="$SHCMD: Error VM $DOMAIN is running on a snapshot disk image: $TARGET"
			echo $ERR >> $LOG
			echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Command:    $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
			BREAK=true
			break
		fi
		set +o noglob
		DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
	done
		[ $BREAK == true ] && continue
		# transfer the VM to snapshot disk image(s)
			CMD="virsh snapshot-create-as --domain $DOMAIN --name ${SNAPPREFIX}$DOMAIN-$DATE --no-metadata --atomic --disk-only $DISKSPEC >> $LOG 2>&1"
			echo "Command: $CMD" >> $LOG 2>&1
			eval "$CMD"
			if [ $? -ne 0 ]; then
				ERR="Failed to create snapshot for $DOMAIN"
				echo $ERR >> $LOG
				echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
				ERRORED=$(($ERRORED+1))
				continue
			fi
	else
		unset -v LATEST
		for FILE in "$BACKUPFOLDER"/$(basename -- $IMAGE)-*.gz*; do
			if [[ ! -e "$FILE" ]]; then continue; fi
			[[ $FILE -nt $LATEST ]] && LATEST=$FILE
		done
		BACKUP=false
		for IMAGE in $IMAGES; do
			if [[ $IMAGE -nt $LATEST ]]; then
				BACKUP=true 
				break
			fi
		done
		if [ $BACKUP == false ]; then
			echo "-> Images of VM $DOMAIN older than the last backup (based on Modified)." >> $LOG
			echo "-> NOTE: .xml NOT checked" >> $LOG
			continue;
		fi
	fi

	# copy/back/compress the VM's disk image(s)
	for IMAGE in $IMAGES; do
		echo "Copying $IMAGE to $BACKUPFOLDER" >> $LOG
		ZFILE="$BACKUPFOLDER/$(basename -- $IMAGE)-$DATE.gz"
		# determine whether the gzip is to be encrypted or not
		if [ -z "${ENCPASS}" ]; then 
			CMD="gzip < $IMAGE > $ZFILE 2>> $LOG"
		else
			exec {pwout}> /tmp/agspw.$$
			exec {pwin}< /tmp/agspw.$$
			rm /tmp/agspw.$$
			echo $ENCPASS >&$pwout
			ZFILE="$ZFILE.gpg"
			CMD="gzip < $IMAGE --to-stdout | gpg --batch --yes -o $ZFILE --passphrase-fd $pwin -c >> $LOG"
		fi
		echo "Command: $CMD" >> $LOG
		SECS=$(printf "%.0f" $(/usr/bin/time -f %e sh -c "$CMD" 2>&1))
		printf '%s%dh:%dm:%ds\n' "Duration: " $(($SECS/3600)) $(($SECS%3600/60)) $(($SECS%60)) >> $LOG
		# clear fds if necessary
		if [ -n "${ENCPASS}" ]; then 
			exec {pwout}>&-
			exec {pwin}<&-
			unset pwout pwin
		fi
		BYTES=$(stat -c %s $IMAGE)
		printf "%s%'d\n" "Source MB: " $(($BYTES/1024/1024)) >> $LOG
                # Commented out below, as seed image operation takes 0s
		# printf "%s%'d\n" "kB/Second: " $(($BYTES/$SECS/1024)) >> $LOG
		ZBYTES=$(stat -c %s $ZFILE)
		printf "%s%'d\n" "Destination MB: " $(($ZBYTES/1024/1024)) >> $LOG
		printf "%s%d%s\n" "Compression: " $((($BYTES-$ZBYTES)*100/$BYTES)) "%" >> $LOG
	done

	if [[ $VM_RUNNING -eq 1 ]]; then
		# Update the VM's disk image(s) with any changes recorded in the snapshot
		# while the copy process was running.  In qemu lingo this is called a "pivot"
			BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
			for TARGET in $TARGETS; do
				CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
				echo "Command: $CMD" >> $LOG
				eval "$CMD"

				if [ $? -ne 0 ]; then
					ERR="Could not merge changes for disk of $TARGET of $DOMAIN. VM may be in an invalid state."
					echo $ERR >> $LOG
					echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
					BREAK=true
					ERRORORED=$(($ERRORED+1))
					break
				fi
			done
		[ $BREAK == true ] && continue

		# Now that the VM's disk image(s) have been successfully committed/pivoted to
		# back to the main disk image, remove the temporary snapshot image file(s)
			for BACKUP in $BACKUPIMAGES; do
				set -o noglob
				if [[ $BACKUP == *${SNAPPREFIX}* ]]; then
					set +o noglob
					CMD="rm -f $BACKUP >> $LOG 2>&1"
					echo " Deleting temporary image $BACKUP" >> $LOG
					echo "Command: $CMD" >> $LOG
					eval "$CMD"
				fi
				set +o noglob
			done

		# capture the VM's definition in use at the time the backup was done
			CMD="virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN-$DATE.xml 2>> $LOG"
			echo "Command: $CMD" >> $LOG
			eval "$CMD"
	else
		# copy the VM's definition
		CMD="cp $PATH_LIBVIRT_QEMU/$DOMAIN.xml $BACKUPFOLDER/$DOMAIN-$DATE.xml 2>> $LOG"
		echo "Command: $CMD" >> $LOG
		eval "$CMD"
	fi

	# Tracks whether xmls have been cleared
	DDEL='no'
	# check image retention count
	for IMAGE in $IMAGES; do
		COUNT=`find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*'.gz'*' -print | dtmatch | sort -u | wc -l`
		if [ $COUNT -gt $IMAGECNT  ]; then
			echo "$SHCMD: Count for BACKUPFOLDER ($BACKUPFOLDER) for image ($(basename -- $IMAGE)) too high ($COUNT), deleting historical files over $IMAGECNT..."
			LIST=`find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*'.gz'*' -print | dtmatch | sort -ur | sed -e "1,$IMAGECNT"d`
			# make sure LIST has a value otherwise fgrep will allow the entire find
			# result to be passed to xarg rm
			if [ -n "$LIST" ]; then
				# Delete the specific images in the dates list
				find $BACKUPFOLDER -type f -name $(basename -- $IMAGE)-'*' | fgrep "$LIST" | xargs rm
				# Only delete old xmls once
				if [[ $DDEL == 'no' ]]; then
					# Delete the xmls in the dates list
					find $BACKUPFOLDER -type f -name $DOMAIN-'*' | fgrep "$LIST" | xargs rm
					DDEL='yes'
				fi
			fi
		fi
	done

	echo "---- Backup done $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S') ----" >> $LOG
done
echo "$SHCMD: Finished backups at $(date +'%d-%m-%Y %H:%M:%S')
====================" >> $LOG

exit $ERRORED

@lightninjay
Copy link

lightninjay commented Apr 23, 2020

@tuomotalvitie
A small portion of your script has a spelling error. "SNAPPEFIX" should be "SNAPPREFIX"

  # Now that the VM's disk image(s) have been successfully committed/pivoted to
  # back to the main disk image, remove the temporary snapshot image file(s)
  	for BACKUP in $BACKUPIMAGES; do
  		set -o noglob
  		if [[ $BACKUP == *${SNAPPEFIX}* ]]; then
  			set +o noglob
  			CMD="rm -f $BACKUP >> $LOG 2>&1"
  			echo " Deleting temporary image $BACKUP" >> $LOG
  			echo "Command: $CMD" >> $LOG
  			eval "$CMD"
  		fi
  		set +o noglob
  	done

I took some time modifying the script for my own use, since I run libvirt through the QEMU:///system hypervisor rather than the QEMU:///session one. Had to change the virsh commands to include "-c QEMU:///system" to redirect calls to that hypervisor.

@tuomotalvitie
Copy link

tuomotalvitie commented Apr 24, 2020

@lightninjay
Thank you, I updated the script to fix the spelling error.

( Ping @brentl99 )

@reinistihovs
Copy link

reinistihovs commented Jun 25, 2020

Hi!

I ran in an interesting problem,
I wasn't able to blockcommit the backup image in to the base image,
I got this error: failed to pivot job for disk vda error: block copy still active: disk 'vda' not ready for pivot yet
Next i found some similar cases online, and found out that its probably a libvirt bug, that is fixed in a newer release, i wasnt able to update libvirt, because this host had more live machines on it, that couldnt be stopped.

I tried to restart the blockcommit many times, but without result, same error every time.
In the end I solved the problem by turning off the VM and do the blockcommit offline by using qemu-img commit /path/to/uncommited/external/image/hostname.backup
I had to use qemu-img, because virsh blockcommit doesnt work, while the VM is offline :(
then i edited VM configuration, virsh edit VMNAME and changed the storage from .backup image to the original .qcow2 image.

Here are some handy commands to solve a similar case:

By running this command, you will see all the VM disks, and where their data is stored physically.

virsh domblklist VMNAME --details

Output:

Type      Device     Target     Source
------------------------------------------------
file       disk       vda        /./.VMNAME.backup

if you see (VMNAME.backup), that means that data is still stored on external image.

By running this command you can see the progress of blockcommit, in my case it was stuck at 100%, you have to run this for each disk.
virsh blockjob VMNAME vda --info

To manually start a blockcommit:
virsh blockcommit VMNAME vda --active --pivot

To abort the blockcommit:
virsh blockjob VMNAME vda --abort

Here is a nice article, that helped me:
https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html

@Marc176
Copy link

Marc176 commented Jun 25, 2020

Interesting - which version of libivirt had this issue? Which OS ?

@reinistihovs
Copy link

reinistihovs commented Jun 25, 2020

Hi!

Ubuntu 16.04.2 LTS
libvirt version: 1.3.1

the primary Image is 950GB large.

@tmuhr85111
Copy link

tmuhr85111 commented Jul 14, 2020

Hey folks,
thanks for this discussion! I did a bit of fine tuning on @tuomotalvitie 's script and all the input I got from here certainly got me
going forward. I do like the concept this is following and together with borg/borgmatic it meanwhile makes a pretty high performing backup solution for our KVM hypervisors:

Among other things

echo "$SHCMD: Count for BACKUPFOLDER ($BACKUPFOLDER) for image ($(basename -- $IMAGE)) too high ($COUNT), deleting historical files over $IMAGECNT..."

was not going to the logfile, but instead triggered a cron mail when run automatically - which actually is really a very small flaw only...

Working on CentOS I also spent some time in figuring out how to optimize email for spam-proof transmission of logfiles (using msmtp for this), make the subject lines of those a bit more redable and informative plus send the mail body in monospaced html for better readability.

Currently I'm running this in a test environment for the next couple of weeks and will post the findings / modifications if they should prove to be working stably over time.

Thanks, meanwhile for the effort you spent on this!

@Ryushin
Copy link

Ryushin commented Jul 14, 2020

Not sure if you saw above, but I spent a couple of weeks making a version that uses borg:
https://gist.github.com/cabal95/e36c06e716d3328b512b#gistcomment-3181333

@tmuhr85111
Copy link

tmuhr85111 commented Jul 14, 2020

@Ryushin:
thanks for the hint! Yes, I actually used your work, too, which made a great reference. Sorry for not having mentioned you and @brentl99 as well as all the other contributors in my previous post.

@Marc176
Copy link

Marc176 commented Jul 18, 2020

@Ryushin: i am also using your script since a few months now on production and works very well (Centos 8 with libvirt)

I have one question.. for performance reasons i did not use compression yet for the backups (to have a fast as possible backup) but since we keep one week for every VM, it takes a big of space...
Can i enable compression now afterwards? Or do i have to creat new borg repo?

@Ryushin
Copy link

Ryushin commented Jul 19, 2020

@Ryushin: i am also using your script since a few months now on production and works very well (Centos 8 with libvirt)

I have one question.. for performance reasons i did not use compression yet for the backups (to have a fast as possible backup) but since we keep one week for every VM, it takes a big of space...
Can i enable compression now afterwards? Or do i have to creat new borg repo?

Good to hear my script it working good for someone else. I spent more time on it than I thought I would. But I have it running in four other installations and it's working well.

I would google the Borg compression question. I would think you could enable compression though. From what I remember though, borg is single threaded. So the compression by borg might greatly add to your time to backup a job. I highly recommend using a file system that supports compression such as ZFS. That is what I do for all my backups. Let ZFS handle the compression. You can change the type and level of compression in ZFS. Plus you know your data won't be affected by bit rot. All my standalone servers run ZFS. The VMs will use ext4 for Linux VMs and NTFS for Windows and since they reside on top of ZFS, all is good.

@brentl99
Copy link

brentl99 commented Jul 19, 2020

Hello. You may be able to retrofit my compression method https://gist.github.com/cabal95/e36c06e716d3328b512b#gistcomment-3132817 back into the borg backup script. Although borg's differential backup strategy may not be effective when it doesn’t manage compression.

I used streamed compression specifically so that it would not impact backup duration. Most multiple cpu and/or multicore systems will be io bound during the backup not cpu bound and therefore able to compress the backup without affecting backup duration. Writing compressed data means you are writing fewer bytes to disk which may make the backup faster. Also, if you are later transferring this data across a network it is precompressed saving possible further compression cycles or network bandwidth.

Using a ZFS target, as suggested above, is a good solution assuming that your backups do not then need to be transferred across a network to another storage device.

@flexoid
Copy link

flexoid commented Jul 20, 2020

@Ryushin
Do you think it's worth publishing your script as a separate gist?
I would be glad to fork it to enhance it a bit (create a more robust snapshot with support of QEMU guest agent if possible).

@Ryushin
Copy link

Ryushin commented Jul 20, 2020

@Ryushin
Do you think it's worth publishing your script as a separate gist?
I would be glad to fork it to enhance it a bit (create a more robust snapshot with support of QEMU guest agent if possible).

I was thinking of doing that for a long time. There are links on the Internet that come here though. Even though this thread is getting very long and detailed. Almost all the scripts are derivatives of the original though. Let me think on it a couple of days. it would also be nice if cabal95 if he thinks these separate scripts should be in their own github/gist site. I also don't want to take away from his work.

@brentl99
Copy link

brentl99 commented Jul 20, 2020

By all means, my git knowledge and skills are modest. It has got a lot more mileage than I ever expected. I was lazy in not forking it. Certainly happy to see all the interest. :)

@brentl99
Copy link

brentl99 commented Jul 20, 2020

If it is forked it would be good to add a comment here that points to the fork as this thread comes up readily via Google search as Ryushin has elluded.

@cabal95
Copy link
Author

cabal95 commented Jul 20, 2020

I am fine either way. To be honest, I've been amazed at how much work people have done on what I originally posted. I no longer using the script myself as I am now using a NAS that has all that VM + backup features built-in. I'm okay with either choice, if you want to fork and create a new gist I can edit the original gist and add a link to the new one so that people don't have to dig down into the comments to find where the latest info is.

I also know QEMU, virsh etc. have come a long way since I originally wrote the script so it might make sense to "start clean" in that sense too.

@flexoid
Copy link

flexoid commented Jul 20, 2020

Yep, sorry for missing some credits, it is a really great example of collaboration.

I spent quite a lot of time trying to find simple and ready to use solution, as it's only for my home NAS and my job and professional skills are in a bit different area. And looks like there's not much in google except for this gist & thread.

What would be great is to create a repo with a set of scripts based on this approach, that can be more extendable and configurable. For example, VM preparing (snapshot creation etc.) and actual backup (cp, rsync, borg, whatever...) can be implemented as separate modules, with the ability to add alternative ones. Having it as a repo would also mean that we can go forward with proper contribution, which is impossible with gist.

I know how open source works. Unfortunately, my bash skills are weak so most likely I won't be that person. Posting it just in case anyone will think it's a good idea to spend some time on it.

@Marc176
Copy link

Marc176 commented Sep 8, 2020

@Ryushin: noticed that with the options:

How often to perform full check on borg repositories.

Day of the week to perform the full checK. Use full weekday name (date %A).

CHECK_DOW="Friday"

Which week in the month to perform the full check. Put any number(s) 1-5.

CHECK_WEEKS="12345"

a check will never be performed... any idea why?

@Marc176
Copy link

Marc176 commented Sep 8, 2020

Ok... Weekday name must match language ;)

@tuomotalvitie
Copy link

tuomotalvitie commented Sep 9, 2020

On related note,
SECS=$(printf "%.0f" $(/usr/bin/time -f %e sh -c "$CMD" 2>&1))
ended up creating a bit of manual work on a machine due to language differences:
printf: 891.16: invalid number
which will result in a division by 0 error.
There's probably a smart way to do this, but as a quick fix I'm going with
LC_NUMERIC="en_US.UTF-8" SECS=$(printf "%.0f" $(/usr/bin/time -f %e sh -c "$CMD" 2>&1))

@norik83
Copy link

norik83 commented Dec 8, 2020

#!/bin/bash

#################### config variables########################

#Uncomment to backup domains myVMa and myVMb
DOMAINS="myVMa myVMb"

#Alternatively list all VMs and back them all up
#DOMAINS=$(virsh list --all | tail -n +3 | awk '{print $2}')

BACKUPROOT=/mnt/remote/backup_vms/$(hostname)

LOG="$BACKUPROOT/logs/qemu-backup.$(date +%Y-%m-%d).log"
SNAPPREFIX=snaptemp-

# Path to XML files for stopped VM...
PATH_LIBVIRT_QEMU="/etc/libvirt/qemu"

#pause before blockcommit
pause_domain_bc=false
#pause after blockcommit fail. if pause_domain_bc is set to true, this option is skiped
pause_domain_after_bc_fail=true
#on fail blockcommit try blockjob
enable_bj_attemts=true
blockjob_retrycount=1
blockjob_delay="30s"
pause_while_bj_attemtps=true
#if blockjob attempts fail, try pause domain? (if pause_while_bj_attemtps is set to true, script ignore this option)
pause_domain_after_bj_fail=true

EMR="email@example.com"
#****************************
#you can create conf file with variables which overwrite those above
#fo example if name of yours script is backup_vms.sh,create file in location wher is your script and name him->  backup_vms.conf
#if you want other name and path, you need to find and change value for variable below (outsite config variables block) named-> confFile
#it is usefull if we work with many hypervisiors and make some changes in script
#****************************

#################### end of config variables#################

SHCMD="$(basename -- $0)"
SHCMD_path=$(dirname $(readlink -f $0))
confFile="$SHCMD_path/${SHCMD%.*}.conf"
if [ -f "$confFile" ]; then
    source "$confFile"
    echo "overwrite vars from config file"
else
    echo "no config file, but if you set variables on begining it is not necessary"
fi

[ ! -f $BACKUPROOT/logs ] && mkdir -p $BACKUPROOT/logs
DATE="$(date +%Y-%m-%d.%H%M%S)"

ERRORED=0
COMMITSNAPTAB[0]=false
BREAK=false






# extract the date coding in FILEFULLNAME (note: filename format must be YYYY-MM-DD)
dtmatch () { sed -n -e 's/.*\(2[0-1][0-9][0-9]-[0-1][0-9]-[0-3][0-9]\).*/\1/p'; }
#check if previous instance of script is not running, it's mandatory for blockcommit after interput script from some reason
if pidof -x "$SHCMD" -o $$ >/dev/null;then
	ERR="An another instance of this script is already running, please clear all the sessions of this script before starting a new one
	If you are sure to stop previous instance, try command: kill -9 \$(pidof -x \"$SHCMD\")"
	echo "$ERR" >> $LOG
	echo "$ERR"
	exit 1
fi
echo "$SHCMD: Starting backups on $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG
for DOMAIN in $DOMAINS; do
	BREAK=false
	#check if domain exists
	VMSRC=$(virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')
	VMSRC=${#VMSRC}
	if [[ $VMSRC -eq 0 ]]; then
		ERR="Domain $DOMAIN on this hypervisior not exists"
		echo "$ERR" >> $LOG
		echo "$ERR"	
		continue
	fi
	
	echo "---- VM Backup start $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S')"  >> $LOG

	VM_RUNNING=1
	
	VMSTATE=$(LC_ALL=en_EN virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')
	echo "$VMSTATE"
	if [[ $VMSTATE == "running" || $VMSTATE == "paused" ]]; then
		MSG="-> VM $DOMAIN running or paused."
		echo "$MSG" >> $LOG
		echo "$MSG"		
	else

		MSG="-> VM $DOMAIN not running. No snapshot and blockcommit. Only copy."
		echo "$MSG" >> $LOG
		echo "$MSG"
    	VM_RUNNING=0
	fi

	BACKUPFOLDER=$BACKUPROOT/$DOMAIN
	[ ! -d $BACKUPFOLDER ] && mkdir -p $BACKUPFOLDER
	TARGETS=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $3}')
	IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')

	# check to make sure the VM is running on a standard image, not
	# a snapshot that may be from a backup that previously failed
	unset COMMITSNAPTAB[*]
	i=0
	COMMITSNAP=false
	for IMAGE in $IMAGES; do
		set -o noglob
		if [[ $IMAGE == *${SNAPPREFIX}* ]]; then
			set +o noglob
			if [[ $VM_RUNNING -eq 0 ]]; then
				ERR="$SHCMD: Error VM $DOMAIN is not running but is on a snapshot disk image: $IMAGE"
				echo $ERR >> $LOG
				echo "$ERR"
				echo "$ERR
Host:       $HOST
Disk Image: $IMAGE
Domain:     $DOMAIN
Command:    virsh domblklist $DOMAIN --details" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
				BREAK=true
				ERRORED=$(($ERRORED+1))
				break
			else
				COMMITSNAPTAB[$i]=true
				COMMITSNAP=true
				MSG="VM $DOMAIN is running but is on snapshot image $IMAGE so mark him for blockcommit first"
				echo "$MSG" >> $LOG
				echo "$MSG"
			fi
		else
			# if not running on snapshot still must check if ther is no tmp file, if is presents, must be deleted before start new snapshot
			FILEPATH="${IMAGE%/*}"
			FILEFULLNAME=`basename "$IMAGE"`
			FILENAME="${FILEFULLNAME%.*}"
			#FILEEXT="${FILEFULLNAME#*.}
			TMPFILE="$FILEPATH/$FILENAME.$SNAPPREFIX$DOMAIN"
			if [ -f "$TMPFILE" ]; then
				CMD="rm -f $TMPFILE >> $LOG 2>&1"
				MSG=" Deleting temporary image $TMPFILE after making snapshot"
				echo "$MSG" >>$LOG
				echo "$MSG"
				echo "Command: $CMD" >> $LOG
				echo "Command: $CMD"
				eval "$CMD"
			fi
		fi
		set +o noglob
		i=$i+1
	done
	[ $BREAK == true ] && continue

	if [[ $VM_RUNNING -eq 1 ]]; then
		#if vm running on snap, first i need to merge
		#i=0
		if [[ $COMMITSNAP == true ]]; then
			i=0
			for TARGET in $TARGETS; do
				if [[ ${COMMITSNAPTAB[$i]} == true ]]; then
					if [[ $VMSTATE == "paused" ]]; then
						pause_enabled=true
					else
						pause_enabled=false
					fi
					if [[ $pause_domain_bc == true && $pause_enabled == false ]]; then
						MSG="Try to pause domain $DOMAIN first"
						echo "$MSG" >> $LOG
						echo "$MSG"
						CMDst="virsh suspend $DOMAIN"
						echo "Command: $CMDst" >> $LOG
						echo "Command: $CMDst"
						eval "$CMDst"
						if [ $? -ne 0 ]; then
							ERR="can't pause domain $DOMAIN, continue anyway"
							echo "$ERR" >> $LOG
							echo "$ERR"
						else
							pause_enabled=true
						fi
					fi
					MSG="$TARGET BLOCKCOMMIT previous snapshot "
					echo "$MSG" >> $LOG
					echo "$MSG"
					CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
					echo "Command: $CMD" >> $LOG
					echo "Command: $CMD"
					eval "$CMD"
					if [ $? -ne 0 ]; then
						ERR="Could not merge changes for disk $TARGET of $DOMAIN with blockcommit. VM may be in an invalid state."
						echo "$ERR" >> $LOG
						echo "$ERR"
						echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
						if [[ $pause_domain_after_bc_fail == true && $pause_enabled == false ]]; then
							MSG="Try to pause domain $DOMAIN and retry bc"
							echo "$MSG" >> $LOG
							echo "$MSG"
							CMDst="virsh suspend $DOMAIN"
							echo "Command: $CMDst" >> $LOG
							echo "Command: $CMDst"
							eval "$CMDst"
							if [ $? -ne 0 ]; then
								ERR="can't pause domain $DOMAIN, continue anyway"
								echo "$ERR" >> $LOG
								echo "$ERR"
							else
								pause_enabled=true
							fi
							MSG="$TARGET BLOCKCOMMIT previous snapshot "
							echo "$MSG" >> $LOG
							echo "$MSG"
							CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
							echo "Command: $CMD" >> $LOG
							echo "Command: $CMD"
							eval "$CMD"
							if [ $? -ne 0 ]; then
								ERR="Could not merge changes for disk $TARGET of $DOMAIN (paused) with blockcommit. VM may be in an invalid state."
								echo "$ERR" >> $LOG
								echo "$ERR"
								echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
							else
								#if bc succes need to disable bj, so set to false
								enable_bj_attemts=false
							fi
						fi	
						if [ $enable_bj_attemts == true ]; then
							if [[ $pause_while_bj_attemtps == true ]]; then
								MSG="Try to pause domain $DOMAIN first"
								echo "$MSG" >> $LOG
								echo "$MSG"
								CMDst="virsh suspend $DOMAIN"
								echo "Command: $CMDst" >> $LOG
								echo "Command: $CMDst"
								eval "$CMDst"
								if [ $? -ne 0 ]; then
									ERR="can't pause domain $DOMAIN, continue anyway"
									echo "$ERR" >> $LOG
									echo "$ERR"
								else
									pause_enabled=true
								fi
							else
								if [[ $pause_enabled == true && $VMSTATE == "running" ]]; then
									MSG="Resume domain $DOMAIN"
									echo "$MSG" >> $LOG
									echo "$MSG"
									CMDst="virsh resume $DOMAIN"
									echo "Command: $CMDst" >> $LOG
									echo "Command: $CMDst"
									eval "$CMDst"
									if [ $? -ne 0 ]; then
										ERR="can't resume domain $DOMAIN, continue anyway"
										echo "$ERR" >> $LOG
										echo "$ERR"
									else
										pause_enabled=false
									fi
								fi
							fi
						
							for (( j=1; j<=$blockjob_retrycount; j++ )) do
								CMD="virsh blockjob --domain $DOMAIN --pivot $TARGET"
								echo "Command: $CMD" >> $LOG
								echo "Command: $CMD"
								eval "$CMD"
								if [ $? -ne 0 ]; then
									ERR="virsh blockjob attempt no $j --domain $DOMAIN --pivot $TARGET failed"
									echo "$ERR" >> $LOG
									echo "$ERR"
									if [[ $j -eq $blockjob_retrycount ]]; then
										ERR="all blockjob attempts --domain $DOMAIN --pivot $TARGET failed."
										echo "$ERR" >> $LOG
										echo "$ERR"
										echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockjob  Exception for $DOMAIN" $EMR
										if [[ $pause_domain_after_bj_fail == true && $pause_enabled == false && $VMSTATE == "running" ]]; then
											MSG="Try to pause domain $DOMAIN first"
											echo "$MSG" >> $LOG
											echo "$MSG"
											CMDst="virsh suspend $DOMAIN"
											echo "Command: $CMDst" >> $LOG
											echo "Command: $CMDst"
											eval "$CMDst"
											if [ $? -ne 0 ]; then
												ERR="can't pause domain $DOMAIN"
												echo "$ERR" >> $LOG
												echo "$ERR"
											else
												echo "Command: $CMD" >> $LOG
												echo "Command: $CMD"
												eval "$CMD"
													if [ $? -ne 0 ]; then
														ERR="can't blockjob --piovot $TARGET anyway, I'm giving up"
														echo "$ERR" >> $LOG
														echo "$ERR"
														echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockjob Exception for $DOMAIN" $EMR
														echo "$ERR" >> $LOG
														echo "$ERR"
														BREAK=true
														ERRORORED=$(($ERRORED+1))
													fi
												MSG="Resume domain $DOMAIN"
												echo "$MSG" >> $LOG
												echo "$MSG"
												CMDst="virsh resume $DOMAIN"
												echo "Command: $CMDst" >> $LOG
												echo "Command: $CMDst"
												eval "$CMDst"
												if [ $? -ne 0 ]; then
													ERR="can't resume domain $DOMAIN, continue anyway"
													echo "$ERR" >> $LOG
													echo "$ERR"
												else
													pause_enabled=false
												fi
											fi
										else
											BREAK=true
											ERRORORED=$(($ERRORED+1))
											break
										fi
									fi
									sleep $blockjob_delay
								else
									MSG="virsh blockjob attempt no $j --domain $DOMAIN --pivot $TARGET success"
									echo "$MSG" >> $LOG
									echo "$MSG"
									break
								fi
							done
							[ $BREAK == true ] && break
						fi
					fi
					if [[ $pause_enabled == true && $VMSTATE == "running" ]]; then
						MSG="Resume domain $DOMAIN"
						echo "$MSG" >> $LOG
						echo "$MSG"
						CMDst="virsh resume $DOMAIN"
						echo "Command: $CMDst" >> $LOG
						echo "Command: $CMDst"
						eval "$CMDst"
						if [ $? -ne 0 ]; then
							ERR="can't resume domain $DOMAIN, continue anyway"
							echo "$ERR" >> $LOG
							echo "$ERR"
						else
							pause_enabled=false
						fi
					fi
				fi
				i=$i+1
			done
			[ $BREAK == true ] && continue
			#delete tmp images after blockcommit
			for IMAGE in $IMAGES; do
				set -o noglob
				if [[ $IMAGE == *${SNAPPREFIX}* ]]; then
					set +o noglob
					CMD="rm -f $IMAGE >> $LOG 2>&1"
					MSG=" Deleting temporary image $IMAGE"
					echo "$MSG" >> $LOG
					echo "$MSG"
					echo "Command: $CMD" >> $LOG
					echo "Command: $CMD"
					eval "$CMD"
				fi
				set +o noglob
			done
			#Reload images after blockcommit
			IMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
		fi
		DISKSPEC=""
		for TARGET in $TARGETS; do
			DISKSPEC="$DISKSPEC --diskspec $TARGET,snapshot=external"
		done

		# transfer the VM to snapshot disk image(s)
			CMD="virsh snapshot-create-as --domain $DOMAIN --name ${SNAPPREFIX}$DOMAIN --no-metadata --atomic --disk-only --quiesce $DISKSPEC >> $LOG 2>&1"
			echo "Command: $CMD" >> $LOG 2>&1
			echo "Command: $CMD"
			eval "$CMD"
			if [ $? -ne 0 ]; then
				MSG="Error create snapshot with --quiesce mode (it's require qemu-guest-agent), try without"
				echo "$MSG" >> $LOG
				echo "$MSG"
				CMD="virsh snapshot-create-as --domain $DOMAIN --name ${SNAPPREFIX}$DOMAIN --no-metadata --atomic --disk-only $DISKSPEC >> $LOG 2>&1"
				echo "Command: $CMD" >> $LOG 2>&1
				echo "Command: $CMD"
				eval "$CMD"
			fi
			if [ $? -ne 0 ]; then
				ERR="Failed to create snapshot for $DOMAIN"
				echo "$ERR" >> $LOG
				echo "$ERR"
				echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD snapshot Exception for $DOMAIN" $EMR
				ERRORED=$(($ERRORED+1))
				continue
			fi
		#i=$i+1

	fi

	
	####################################
	#Rsync							   #
	####################################
	MSG="Rsync"
	echo "$MSG" >> $LOG
	echo "$MSG"
	for IMAGE in $IMAGES; do
		FILEFULLNAME=`basename "$IMAGE"`
		if test -f "$BACKUPFOLDER/$FILEFULLNAME"; then
			MSG="Backup exists, merging only changes to image $BACKUPFOLDER/$FILEFULLNAME"
			echo "$MSG" >> $LOG
			echo "$MSG"
			#CMD="rsync -apvhz --inplace --progress $IMAGE $BACKUPFOLDER/$FILEFULLNAME "
			CMD="rsync -apvhz --inplace --progress $IMAGE $BACKUPFOLDER/$FILEFULLNAME "
		else
			MSG="Backup does not exist, creating a full sparse copy of image $IMAGE"
			echo "$MSG" >> $LOG
			echo "$MSG"
			#CMD="rsync -apvhz --sparse --progress $IMAGE $BACKUPFOLDER/$FILEFULLNAME "
			CMD="rsync -apvhz --sparse --progress $IMAGE $BACKUPFOLDER/$FILEFULLNAME "
		fi
		echo "Command: $CMD" >> $LOG
		echo "Command: $CMD"
		eval "$CMD"
	done
	########################################
	# After sync file merge snapshots back #
	########################################

	if [[ $VM_RUNNING -eq 1 ]]; then
		# Update the VM's disk image(s) with any changes recorded in the snapshot
		# while the copy process was running.  In qemu lingo this is called a "pivot"
			BACKUPIMAGES=$(virsh domblklist $DOMAIN --details | grep disk | awk '{print $4}')
			for TARGET in $TARGETS; do
				#if [[ ${COMMITSNAPTAB[$i]} == true ]]; then
					#rsync can tak long time and state could changed so check once again 
					#if state shut there will simple error becouse blockcommi and blockjob can't be done on shutdown domain.
					VMSTATE=$(LC_ALL=en_EN virsh list --all | grep [[:space:]]$DOMAIN[[:space:]] | awk '{print $3}')
					if [[ $VMSTATE == "paused" ]]; then
						pause_enabled=true
					else
						pause_enabled=false
					fi
					if [[ $pause_domain_bc == true && $pause_enabled == false ]]; then
						MSG="Try to pause domain $DOMAIN first"
						echo "$MSG" >> $LOG
						echo "$MSG"
						CMDst="virsh suspend $DOMAIN"
						echo "Command: $CMDst" >> $LOG
						echo "Command: $CMDst"
						eval "$CMDst"
						if [ $? -ne 0 ]; then
							ERR="can't pause domain $DOMAIN, continue anyway"
							echo "$ERR" >> $LOG
							echo "$ERR"
						else
							pause_enabled=true
						fi
					fi
					MSG="$TARGET BLOCKCOMMIT previous snapshot "
					echo "$MSG" >> $LOG
					echo "$MSG"
					CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
					echo "Command: $CMD" >> $LOG
					echo "Command: $CMD"
					eval "$CMD"
					if [ $? -ne 0 ]; then
						ERR="Could not merge changes for disk $TARGET of $DOMAIN with blockcommit. VM may be in an invalid state."
						echo "$ERR" >> $LOG
						echo "$ERR"
						echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
						if [[ $pause_domain_after_bc_fail == true && $pause_enabled == false ]]; then
							MSG="Try to pause domain $DOMAIN and retry bc"
							echo "$MSG" >> $LOG
							echo "$MSG"
							CMDst="virsh suspend $DOMAIN"
							echo "Command: $CMDst" >> $LOG
							echo "Command: $CMDst"
							eval "$CMDst"
							if [ $? -ne 0 ]; then
								ERR="can't pause domain $DOMAIN, continue anyway"
								echo "$ERR" >> $LOG
								echo "$ERR"
							else
								pause_enabled=true
							fi
							MSG="$TARGET BLOCKCOMMIT previous snapshot "
							echo "$MSG" >> $LOG
							echo "$MSG"
							CMD="virsh blockcommit $DOMAIN $TARGET --active --pivot >> $LOG 2>&1"
							echo "Command: $CMD" >> $LOG
							echo "Command: $CMD"
							eval "$CMD"
							if [ $? -ne 0 ]; then
								ERR="Could not merge changes for disk $TARGET of $DOMAIN (paused) with blockcommit. VM may be in an invalid state."
								echo "$ERR" >> $LOG
								echo "$ERR"
								echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockcommit Exception for $DOMAIN" $EMR
							else
								#if bc succes need to disable bj, so set to false
								enable_bj_attemts=false
							fi
						fi	
						if [ $enable_bj_attemts == true ]; then
							if [[ $pause_while_bj_attemtps == true ]]; then
								MSG="Try to pause domain $DOMAIN first"
								echo "$MSG" >> $LOG
								echo "$MSG"
								CMDst="virsh suspend $DOMAIN"
								echo "Command: $CMDst" >> $LOG
								echo "Command: $CMDst"
								eval "$CMDst"
								if [ $? -ne 0 ]; then
									ERR="can't pause domain $DOMAIN, continue anyway"
									echo "$ERR" >> $LOG
									echo "$ERR"
								else
									pause_enabled=true
								fi
							else
								if [[ $pause_enabled == true && $VMSTATE == "running" ]]; then
									MSG="Resume domain $DOMAIN"
									echo "$MSG" >> $LOG
									echo "$MSG"
									CMDst="virsh resume $DOMAIN"
									echo "Command: $CMDst" >> $LOG
									echo "Command: $CMDst"
									eval "$CMDst"
									if [ $? -ne 0 ]; then
										ERR="can't resume domain $DOMAIN, continue anyway"
										echo "$ERR" >> $LOG
										echo "$ERR"
									else
										pause_enabled=false
									fi
								fi
							fi
						
							for (( j=1; j<=$blockjob_retrycount; j++ )) do
								CMD="virsh blockjob --domain $DOMAIN --pivot $TARGET"
								echo "Command: $CMD" >> $LOG
								echo "Command: $CMD"
								eval "$CMD"
								if [ $? -ne 0 ]; then
									ERR="virsh blockjob attempt no $j --domain $DOMAIN --pivot $TARGET failed"
									echo "$ERR" >> $LOG
									echo "$ERR"
									if [[ $j -eq $blockjob_retrycount ]]; then
										ERR="all blockjob attempt --domain $DOMAIN --pivot $TARGET failed."
										echo "$ERR" >> $LOG
										echo "$ERR"
										echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockjob  Exception for $DOMAIN" $EMR
										if [[ $pause_domain_after_bj_fail == true && $pause_enabled == false && $VMSTATE == "running" ]]; then
											MSG="Try to pause domain $DOMAIN first"
											echo "$MSG" >> $LOG
											echo "$MSG"
											CMDst="virsh suspend $DOMAIN"
											echo "Command: $CMDst" >> $LOG
											echo "Command: $CMDst"
											eval "$CMDst"
											if [ $? -ne 0 ]; then
												ERR="can't pause domain $DOMAIN"
												echo "$ERR" >> $LOG
												echo "$ERR"
											else
												echo "Command: $CMD" >> $LOG
												echo "Command: $CMD"
												eval "$CMD"
													if [ $? -ne 0 ]; then
														ERR="can't blockjob --piovot $TARGET anyway, I'm giving up"
														echo "$ERR" >> $LOG
														echo "$ERR"
														echo "$ERR
Host:    $HOST
Domain:  $DOMAIN
Command: $CMD" | mail -s "$SHCMD blockjob Exception for $DOMAIN" $EMR
														echo "$ERR" >> $LOG
														echo "$ERR"
														BREAK=true
														ERRORORED=$(($ERRORED+1))
													fi
												MSG="Resume domain $DOMAIN"
												echo "$MSG" >> $LOG
												echo "$MSG"
												CMDst="virsh resume $DOMAIN"
												echo "Command: $CMDst" >> $LOG
												echo "Command: $CMDst"
												eval "$CMDst"
												if [ $? -ne 0 ]; then
													ERR="can't resume domain $DOMAIN, continue anyway"
													echo "$ERR" >> $LOG
													echo "$ERR"
												else
													pause_enabled=false
												fi
											fi
										else
											BREAK=true
											ERRORORED=$(($ERRORED+1))
											break
										fi
									fi
									sleep $blockjob_delay
								else
									MSG="virsh blockjob attempt no $j --domain $DOMAIN --pivot $TARGET success"
									echo "$MSG" >> $LOG
									echo "$MSG"
									break
								fi
							done
							[ $BREAK == true ] && break
						fi
					fi
					if [[ $pause_enabled == true && $VMSTATE == "running" ]]; then
						MSG="Resume domain $DOMAIN"
						echo "$MSG" >> $LOG
						echo "$MSG"
						CMDst="virsh resume $DOMAIN"
						echo "Command: $CMDst" >> $LOG
						echo "Command: $CMDst"
						eval "$CMDst"
						if [ $? -ne 0 ]; then
							ERR="can't resume domain $DOMAIN, continue anyway"
							echo "$ERR" >> $LOG
							echo "$ERR"
						else
							pause_enabled=false
						fi
					fi
				#fi
				i=$i+1
			done
		[ $BREAK == true ] && continue

		# Now that the VM's disk image(s) have been successfully committed/pivoted to
		# back to the main disk image, remove the temporary snapshot image file(s)
			for BACKUP in $BACKUPIMAGES; do
				set -o noglob
				if [[ $BACKUP == *${SNAPPREFIX}* ]]; then
					set +o noglob
					CMD="rm -f $BACKUP >> $LOG 2>&1"
					MSG=" Deleting temporary image $BACKUP"
					echo "$MSG" >> $LOG
					echo "$MSG"
					MSG="Command: $CMD"
					echo "$MSG" >> $LOG
					echo "$MSG"
					eval "$CMD"
				fi
				set +o noglob
			done

		# capture the VM's definition in use at the time the backup was done
			CMD="virsh dumpxml $DOMAIN > $BACKUPFOLDER/$DOMAIN.xml 2>> $LOG"
			MSG="Command: $CMD"
			echo "$MSG" >> $LOG
			echo "$MSG"
			eval "$CMD"
	else
		# copy the VM's definition
		CMD="cp $PATH_LIBVIRT_QEMU/$DOMAIN.xml $BACKUPFOLDER/$DOMAIN.xml 2>> $LOG"
		MSG="Command: $CMD"
		echo "$MSG" >> $LOG
		echo "$MSG"
		eval "$CMD"
	fi


	MSG="---- Backup done $DOMAIN ---- $(date +'%d-%m-%Y %H:%M:%S') ----"
	echo "$MSG" >> $LOG
	echo "$MSG"
done
MSG="$SHCMD: Finished backups at $(date +'%d-%m-%Y %H:%M:%S')
====================" >> $LOG
echo "$MSG" >> $LOG
echo "$MSG"

exit $ERRORED

@juliyvchirkov
Copy link

juliyvchirkov commented Aug 16, 2021

my 5 cents, live full and incremental backups of kvm guests

https://gist.github.com/juliyvchirkov/663eb6f5c18600a7414528beee6a7f3a

@tuomotalvitie
Copy link

tuomotalvitie commented Oct 19, 2021

I guess I need to do some converting to qcow3 to use the tool (at least to full potential). Thank you @juliyvchirkov

Meanwhile, "virsh blockcommit $DOMAIN $TARGET --active --pivot" in the script will most likely cause havoc if one uses base images.

Shouldn't be too difficult to fix to shorten the chain to the original length:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_administration_guide/sub-sect-domain_commands-using_blockcommit_to_shorten_a_backing_chain

Seed image operations also take 0s to complete, so I dropped the division by time line.

@abbbi
Copy link

abbbi commented Nov 2, 2021

I guess I need to do some converting to qcow3 to use the tool (at least to full potential). Thank you @juliyvchirkov

yes, if you want to use the full feature set with incremental backups, you need to convert. Otherwise you can still
use backup mode copy, to do at least a full backup, heres the full documentation:

https://github.com/abbbi/virtnbdbackup#readme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment