Skip to content

Instantly share code, notes, and snippets.

@melnik13 melnik13/diskhogs.sh
Last active Dec 8, 2019

Embed
What would you like to do?
Ever craved for a tool to monitor what files are growing faster than other ones? Just run it by cron: diskhogs.sh /var/lib/libvirt/images
#!/bin/sh
# Checking the spool directory
SPOOL=/var/spool/diskhogs
if [ ! -e "${SPOOL}" ]; then
mkdir -p "${SPOOL}"
fi
if [ ! -d "${SPOOL}" ]; then
echo "There are no ${SPOOL} directory" >&2
exit 1
fi
if [ -z "${1}" ]; then
DIR=.
else
DIR="${1}"
fi
FILES=$(find "${DIR}" -type f)
TIME=$(date +%s)
if [ -z "${TIME}" ]; then
echo "Can't determine current time" >&2
exit 1
fi
for FILE in ${FILES}; do
SIZE=$(ls -nl ${FILE} | awk '{ print $5 }')
if [ -z "${SIZE}" ]; then
echo "Can't determine size of the ${FILE} file" >&2
continue
fi
sqlite3 "${SPOOL}/db" "INSERT INTO sizes VALUES ('${FILE}', '${TIME}', '${SIZE}');"
if [ ${?} -ne 0 ]; then
continue
fi
done
for PERIOD in 60 300 600 1800 3600 86400; do
TIME_WAS=$((${TIME} - ${PERIOD}))
(
echo "*** Since $(date --date="@${TIME_WAS}") (${PERIOD} seconds ago) ***"
sqlite3 \
"${SPOOL}/db" \
"SELECT MAX(size) - MIN(size) AS mm, name
FROM sizes
WHERE time >= '${TIME_WAS}'
GROUP BY name
ORDER BY mm
;"
) > "${SPOOL}/report_${PERIOD}"
done
@coder-sreeraj

This comment has been minimized.

Copy link

coder-sreeraj commented Oct 13, 2015

Getting error
Error: no such table: sizes
Error: no such table: sizes
Error: no such table: sizes

@calvinjtaylor

This comment has been minimized.

Copy link

calvinjtaylor commented Nov 14, 2016

I made some progress when I used

CREATE TABLE sizes(
   name           TEXT    NOT NULL,
   time            TEXT     NOT NULL,
   size        INT NOT NULL
);

and changed the SPOOL var in the script to point to /root/diskhogs

I ran diskhogs via

while true; diskhogs.sh /var/log; sleep 10; done
@baltar

This comment has been minimized.

Copy link

baltar commented Dec 4, 2019

I created a version that runs on MacOS (where date has different parameters) and adds some other improvements, such as creating the DB if not existing, making it work with filenames which have spaces by streaming the results from find into the loop (which also should reduce memory usage for dirs with a large number of files), making it usable by normal users by using ~/.diskhogs instead of /var/spool/diskhogs, using db and report file names which include the absolute path of the scanned directory to be able to scan different dirs without having to delete the results of other dirs.

@baltar

This comment has been minimized.

Copy link

baltar commented Dec 6, 2019

I also replaced the find + ls with du, which is much faster and creates entries for sub-directories instead of files.

@baltar

This comment has been minimized.

Copy link

baltar commented Dec 8, 2019

I split it into separate scripts for sampling and querying, and created a repo for it on Bitbucket, also renaming it to "diskdelta".

I put in an acknowledgment for you @melnik13 and added you in the copyright in the license file, hope you're ok with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.