Skip to content

Instantly share code, notes, and snippets.

@jcfr
Last active March 7, 2017 05:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jcfr/ea9ef199bd5a3e071b8f to your computer and use it in GitHub Desktop.
Save jcfr/ea9ef199bd5a3e071b8f to your computer and use it in GitHub Desktop.
Collection of SQL queries helping to find out the path of the Midas bitstream that should be removed from the asset store.

Slicer Midas Cleanup

Overview

Storing Slicer application and extension packages is becoming an issue.

Problem 1

First, the midas server itself is not able to support a growing collection of item without sizing up the PHP memory on the server. See mailing list [1] and associated midas issue [2].

[1] http://slicer-devel.65872.n3.nabble.com/Extensions-server-maintenance-tp4031117.html

[2] See midasplatform/Midas#69

Problem 2

The over growing collection of packages require more storage and impacts the overall backup process associated the hosting server.

Constraints

As discussed in [3], download statistics for both Slicer application and extension packages should be kept.

[3] http://slicer-devel.65872.n3.nabble.com/Extensions-server-maintenance-tp4031117p4031122.html

Proposed solutions

Problem 1 could be addressed ensuring issues like midasplatform/Midas#69 are addressed through the entire Midas stack.

Problem 2 could be addressed by removing the asset-store files associated with the packages. See below for more details.

Asset-store file removal

The query referenced below will return asset-store paths associated with packages matching the following conditions:

  • the package is older than the last stable release '4.6.2'

  • the package is not release

  • asset-store paths do NOT match the one associated with package we want to keep

Generate list of asset-store file paths:

Remove asset-store file paths:

  • Python script will be used to process the text file obtained after running one of the query.

Note: In order to keep the download stats, the file should NOT be removed from the asset-store using either the Midas web interface or REST API. Instead, a shell script (or similar method) should be used.

###########################################################################
#
# Query_1
#
# Get all packages that are
# (1) older than the last stable release '4.6.2'
# AND
# (2) not release themselves.
#
###########################################################################
SELECT *
FROM slicerpackages_package AS plist
WHERE plist.package_id <
(
SELECT min(p.package_id)
FROM slicerpackages_package AS p
WHERE p.release = "4.6.2"
)
and plist.release = "";
###########################################################################
#
# Query_2
#
# Get extension that are
#
# (1) associated with packages older than the last stable release '4.6.2'
#
# AND
#
# (2) associated with packages which are not release themselves.
#
###########################################################################
LAST_SLICER_STABLE_RELEASE=4.6.2
DB_PWD=$(cat /var/www/midas3/core/configs/database.local.ini | grep "database.params.password" | head -1 | cut -d"=" -f2)
WHAT=extension_list
FILE_PREFIX=~/$(date +%Y-%m-%d)_${LAST_SLICER_STABLE_RELEASE}_slicer_${WHAT}
SQL_SCRIPT=${FILE_PREFIX}.sql
echo '
SELECT
extension.item_id,
extension.productname,
extension.revision,
extension.slicer_revision,
package.release,
package.os,
package.arch,
package.item_id,
package.package_id
FROM
slicerpackages_extension AS extension,
(
SELECT
package_to_clean.item_id,
package_to_clean.package_id,
package_to_clean.revision,
package_to_clean.os,
package_to_clean.arch,
package_to_clean.release
FROM slicerpackages_package AS package_to_clean
WHERE package_to_clean.package_id <
(
SELECT min(p.package_id)
FROM slicerpackages_package AS p
WHERE p.release = @LAST_SLICER_STABLE_RELEASE
)
AND package_to_clean.release = ""
) AS package
WHERE
package.revision = extension.revision AND
package.os = extension.os AND
package.arch = extension.arch;
' > $SQL_SCRIPT
mysql -u midas -p${DB_PWD//\"} -D midas -e "
set @LAST_SLICER_STABLE_RELEASE='${LAST_SLICER_STABLE_RELEASE}';
source ${SQL_SCRIPT};
" \
| tr '\t' ',' > $OUTPUT
###########################################################################
#
# Query_3
#
# Get all extension bitstream paths for a given slicer_revision, os AND arch.
#
###########################################################################
SELECT
bitstream.path
FROM
slicerpackages_extension AS extension,
item,
itemrevision,
bitstream
WHERE
extension.slicer_revision = "22380" AND
extension.os = "win" AND
extension.arch = "amd64" AND
item.item_id = extension.item_id AND
itemrevision.item_id = item.item_id AND
bitstream.itemrevision_id = itemrevision.itemrevision_id;
###########################################################################
#
# Query_4 (Combining Query_2 AND Query_3)
#
# Get extension bitstream paths that are
#
# (1) associated with packages older than the last stable release '4.6.2'
#
# AND
#
# (2) associated with packages which are not release themselves.
#
#
# THIS QUERY ONLY WORK FOR THE 10 MOST RECENT PACKAGE
#
###########################################################################
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
LAST_SLICER_STABLE_RELEASE=4.6.2
WHERE=~/slicer_midas_cleanup
mkdir -p $WHERE
DB_PWD=$(cat /var/www/midas3/core/configs/database.local.ini | grep "database.params.password" | head -1 | cut -d"=" -f2)
WHAT=extension_packages_assetstore_paths
FILE_PREFIX=$WHERE/$(date +%Y-%m-%d)_${LAST_SLICER_STABLE_RELEASE}_slicer_${WHAT}
SQL_SCRIPT=${FILE_PREFIX}.sql
OUTPUT=${FILE_PREFIX}.txt
echo '
SELECT
concat(assetstore.path, "/", bitstream.path),
package.release,
package.os,
package.arch,
package.item_id,
package.package_id,
extension.productname,
extension.revision,
extension.slicer_revision
FROM
slicerpackages_extension AS extension,
(
SELECT
package_to_clean.item_id,
package_to_clean.package_id,
package_to_clean.revision,
package_to_clean.os,
package_to_clean.arch,
package_to_clean.release
FROM slicerpackages_package AS package_to_clean
WHERE package_to_clean.package_id <
(
SELECT min(p.package_id)
FROM slicerpackages_package AS p
WHERE p.release = @LAST_SLICER_STABLE_RELEASE
)
AND package_to_clean.release = ""
) AS package,
item,
itemrevision,
bitstream,
assetstore
WHERE package.revision = extension.slicer_revision AND
package.os = extension.os AND
package.arch = extension.arch AND
item.item_id = extension.item_id AND
itemrevision.item_id = item.item_id AND
bitstream.itemrevision_id = itemrevision.itemrevision_id AND
bitstream.assetstore_id = assetstore.assetstore_id;
' > $SQL_SCRIPT
mysql -u midas -p${DB_PWD//\"} -D midas -e "
set @LAST_SLICER_STABLE_RELEASE='${LAST_SLICER_STABLE_RELEASE}';
source ${SQL_SCRIPT};
" \
| tr '\t' ',' > $OUTPUT
#
# QA the generated file
#
NO_RELEASE_EXPECTED=$(tail -n +2 $OUTPUT | cut -d"," -f2 | sed '/^$/d')
[[ -z $NO_RELEASE_EXPECTED ]] || (echo "
Do NOT remove the asset-store files.
File $OUTPUT references package associated with one or more Slicer release"; exit 1)
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
###########################################################################
#
# Query_5
#
# Get all slicer package bitstream paths that are
#
# (1) associated with packages older than the last stable release '4.6.2'
#
# AND
#
# (2) associated with packages which are not release themselves.
#
###########################################################################
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
LAST_SLICER_STABLE_RELEASE=4.6.2
WHERE=~/slicer_midas_cleanup
mkdir -p $WHERE
DB_PWD=$(cat /var/www/midas3/core/configs/database.local.ini | grep "database.params.password" | head -1 | cut -d"=" -f2)
WHAT=application_packages_assetstore_paths
FILE_PREFIX=$WHERE/$(date +%Y-%m-%d)_${LAST_SLICER_STABLE_RELEASE}_slicer_${WHAT}
SQL_SCRIPT=${FILE_PREFIX}.sql
OUTPUT=${FILE_PREFIX}.txt
echo '
SELECT
concat(assetstore.path, "/", bitstream.path),
package.release,
package.os,
package.arch,
package.item_id,
package.package_id
FROM
slicerpackages_package AS package,
item,
itemrevision,
bitstream,
assetstore
WHERE
package.package_id <
(
SELECT min(p.package_id)
FROM slicerpackages_package AS p
WHERE p.release = @LAST_SLICER_STABLE_RELEASE
) AND
package.release = "" AND
item.item_id = package.item_id AND
itemrevision.item_id = item.item_id AND
bitstream.itemrevision_id = itemrevision.itemrevision_id AND
bitstream.assetstore_id = assetstore.assetstore_id AND
bitstream.path NOT IN
(
SELECT
bitstream_to_keep.path
FROM
slicerpackages_package AS package_to_keep,
item as item_to_keep,
itemrevision as itemrevision_to_keep,
bitstream as bitstream_to_keep,
assetstore as assetstore_to_keep
WHERE
package_to_keep.package_id >=
(
SELECT min(p.package_id)
FROM slicerpackages_package AS p
WHERE p.release = @LAST_SLICER_STABLE_RELEASE
) AND
item_to_keep.item_id = package_to_keep.item_id AND
itemrevision_to_keep.item_id = item_to_keep.item_id AND
bitstream_to_keep.itemrevision_id = itemrevision_to_keep.itemrevision_id AND
bitstream_to_keep.assetstore_id = assetstore_to_keep.assetstore_id
)
' > $SQL_SCRIPT
mysql -u midas -p${DB_PWD//\"} -D midas -e "
set @LAST_SLICER_STABLE_RELEASE='${LAST_SLICER_STABLE_RELEASE}';
source ${SQL_SCRIPT};
" \
| tr '\t' ',' > $OUTPUT
#
# QA the generated file
#
NO_RELEASE_EXPECTED=$(tail -n +2 $OUTPUT | cut -d"," -f2 | sed '/^$/d')
[[ -z $NO_RELEASE_EXPECTED ]] || (echo "
Do NOT remove the asset-store files.
File $OUTPUT references package associated with one or more Slicer release"; exit 1)
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
#!/usr/bin/env python
"""
To effectively remove the file, you will have to
(1) sudo as 'www-data' user.
$ sudo su - www-data
(2) update LAST_SLICER_STABLE_RELEASE in slicer_midas_cleanup_query(4|5).sh scripts
(3) run the updated scripts
(4) run this program using the text file generated by the scripts
"""
import argparse
import os
import sys
LOG_FILE = None
def log(msg, append=True):
with open(LOG_FILE, "a" if append else "w") as log_file:
log_file.write(msg + "\n")
def check_release(release):
if release.strip() != "":
msg = " [ERROR] release field is '%s'. It is expected to be an empty string." % release
print(msg)
log(msg)
print("\n>>>>> Check log file %s\n" % LOG_FILE)
sys.exit(1)
def parse_line(line):
tokens = line.split(",")
path = tokens[0]
release = tokens[1]
# os = tokens[2]
# arch = tokens[3]
# item_id = tokens[4]
# package_id = tokens[5]
# productname = tokens[6]
# revision = tokens[7]
# slicer_revision = tokens[8]
check_release(release)
return path
def remove_package_file(path):
if os.path.exists(path):
os.remove(path)
msg = " [OK]"
print(msg)
log(msg)
else:
msg = " [SKIPPED] File does NOT exist."
print(msg)
log(msg)
def main():
parser = argparse.ArgumentParser(description='Remove package files listed in '
'"packages_assetstore_paths.txt" file.')
parser.add_argument("--index-resume", type=int, default=1)
parser.add_argument(
"input_text", metavar="INPUT_TEXT",
help="Path to YYYY-MM-DD_X.Y.Z_slicer_(application|extension)_packages_assetstore_paths.txt")
args = parser.parse_args()
input_text = args.input_text
index_resume = args.index_resume
if not os.path.exists(input_text):
print("input_text [%s] does NOT exist" % input_text)
sys.exit(1)
if not os.path.isfile(input_text):
print("input_text [%s] is not a file" % input_text)
sys.exit(1)
if index_resume < 1:
print("index_resume [%d] is expected to be >= 1")
sys.exit(1)
# Initialize log
global LOG_FILE
LOG_FILE = os.path.splitext(input_text)[0] + "_REMOVE_LOG.txt"
print("LOG_FILE: %s" % LOG_FILE)
if index_resume == 1:
print("Cleared LOG_FILE")
log(LOG_FILE, append=False)
total = sum(1 for _ in open(input_text))
with open(input_text) as input_file:
for index, line in enumerate(input_file):
if index == 0:
continue
if line.strip() == "":
continue
path = parse_line(line)
print("%s/%s - Removing %s" % (index, total, path))
if index < index_resume:
print("Skip index %s" % index)
continue
log(line)
remove_package_file(path)
if __name__ == "__main__":
main()
#
# To effectively remove the file, you will have to
#
# (1) sudo as 'www-data' user.
# $ sudo su - www-data
#
# (2) update the INPUT variable to point to the txt file generated
# using one of the query.
#
# (3) copy all the line below.
#
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
remove_assets() {
INPUT=$1
REMOVE_LOG=$2
INDEX_RESUME=$3
TOTAL=$(cat $INPUT | wc -l)
INDEX=1
for line in $(tail -n +2 $INPUT); do
if [[ $INDEX -gt $INDEX_RESUME ]];
then
path=$(echo $line | cut -d"," -f1)
release=$(echo $line | cut -d"," -f2)
os=$(echo $line | cut -d"," -f3)
arch=$(echo $line | cut -d"," -f4)
item_id=$(echo $line | cut -d"," -f5)
package_id=$(echo $line | cut -d"," -f6)
productname=$(echo $line | cut -d"," -f7)
revision=$(echo $line | cut -d"," -f8)
slicer_revision=$(echo $line | cut -d"," -f9)
echo "$INDEX/$TOTAL - Removing $path";
if [[ $release ]]; then
msg=" [ERROR] release field is '$release'. It is expected to be an empty string."
echo "$msg"
echo "$msg" >> $REMOVE_LOG
printf "\n>>>>> Check log file $REMOVE_LOG\n\n";
break;
fi
echo $line >> $REMOVE_LOG
if [ -f $path ]; then
rm $path
retcode=$?
if [ $retcode -eq 0 ]; then
msg=" [OK]"
echo "$msg"
echo "$msg" >> $REMOVE_LOG
else
msg=" [ERROR] Failed to remove the file [retcode: $retcode]"
echo "$msg"
echo "$msg" >> $REMOVE_LOG
printf "\n>>>>> Check log file $REMOVE_LOG\n\n";
break;
fi
else
msg=" [SKIPPED] File does NOT exist."
echo "$msg"
echo "$msg" >> $REMOVE_LOG
fi
else
echo "Skip index $INDEX"
fi
INDEX=$((INDEX + 1));
done
}
err() {
echo -e >&2 ERROR: $@\\n
}
die() {
err $@
exit 1
}
INPUT=$1
if [ ! -e "${INPUT}" ]; then
cat >&2 <<ENDHELP
Usage: slicer_midas_cleanup_remove_assetstore YYYY-MM-DD_X.Y.Z_slicer_(application|extension)_packages_assetstore_paths.txt
ENDHELP
exit 1
fi
WHERE=~/slicer_midas_cleanup
mkdir -p $WHERE
filename=$(basename $INPUT)
REMOVE_LOG=$WHERE/${filename%.*}_REMOVE_LOG.txt
echo "REMOVE_LOG:$REMOVE_LOG"
INDEX_RESUME=1 # This can be used to restart removal where
# it has previously stopped because of an error.
if [[ $INDEX_RESUME -eq 1 ]];
then
echo "" > $REMOVE_LOG
fi
retcode=$?
if [ $retcode -eq 0 ]; then
remove_assets "$INPUT" "$REMOVE_LOG" "$INDEX_RESUME"
else
echo " [ERROR] Can NOT log messages to file $REMOVE_LOG"
fi
# ----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<----8<
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment