Skip to content

Instantly share code, notes, and snippets.

@dmabamboo
Last active April 16, 2024 14:50
Show Gist options
  • Star 24 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dmabamboo/e4c0c4a356116c026a86628c11864ab0 to your computer and use it in GitHub Desktop.
Save dmabamboo/e4c0c4a356116c026a86628c11864ab0 to your computer and use it in GitHub Desktop.
Delete all archives from AWS Glacier vault
#!/usr/bin/env bash
#Checking pre-requisites (aws cli v2 and jq installed)
if ! command -v jq &> /dev/null
then
echo "jq could not be found - check how to download and install it here https://stedolan.github.io/jq/download/"
exit
fi
if ! command -v aws --version &> /dev/null
then
echo "AWS CLI could not be found - check how to download and install it here https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html"
echo "How to configure the AWS CLI to use secrets for your Glacier IAM - https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html"
echo "You need to configure it with your appropriate secrets for an IAM that has full access over your Glacier resources (AmazonGlacierFullAccess)."
echo "In JSON: "
echo "{"
echo " \"Version\": \"2012-10-17\","
echo " \"Statement\": ["
echo " {"
echo " \"Action\": \"glacier:*\","
echo " \"Effect\": \"Allow\","
echo " \"Resource\": \"*\""
echo " }"
echo " ]"
echo "}"
exit
fi
account_id=$1
region=$2
vault_name=$3
if [[ -z ${account_id} ]] || [[ -z ${region} ]] || [[ -z ${vault_name} ]]; then
echo "#################################################################"
echo "Attention!!! Parameters required are missing."
echo "Account: ${account_id}"
echo "Region: ${region}"
echo "Vault: ${vault_name}"
echo "#################################################################"
echo "run this command like: sh ./delete-aws-glacier-vault-archives.sh AWS_ACCOUNT_ID AWS_REGION AWS_GLACIER_VAULT_NAME"
exit 1
fi
echo "Initiating delete process for the vault."
echo " Account:${account_id}"
echo " Region:${region}"
echo " Vault:${vault_name}"
echo "Starting Step 1/4 - Glacier Inventory Retrieval Job - it's Async and can take hours or days to complete"
# Step 1 - inventory retrieval job for the given vault
job_initiation_file=./glacier-inventory-retrieval-job-file-${account_id}-${region}-${vault_name}.json
if test -f "${job_initiation_file}"; then
echo "There is already a file for this job. Using it now. If you don't want to use it you need to delete the file ${job_initiation_file}."
else
echo "No previous job file found for this vault."
echo "Starting a new Job."
aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id ${account_id} --region ${region} --vault-name ${vault_name} &> ${job_initiation_file}
echo "Job request made."
fi
echo "Checking if the job initiation file is in good shape."
job_id="Undefined"
if cat ${job_initiation_file} | jq ".jobId" > /dev/null; then
job_id=$(cat ${job_initiation_file} | jq -r ".jobId")
echo "File is OK, jobId=${job_id}"
else
echo "Failed to obtain Job Id from file, file may be corrupted or your retrieve-inventory call failed - check parameters passed to this script or aws cli config and connectivity."
fi
echo "Starting Step 2/4 - Checking state of the Job to see if it's completed and can have its inventory retrieved for deletion."
job_completed_flag=false
job_status_file=./glacier-describe-job-file-${account_id}-${region}-${vault_name}-${job_id}.json
while [ "${job_completed_flag}" = false ]
do
aws glacier describe-job --account-id ${account_id} --region ${region} --vault-name ${vault_name} --job-id ${job_id} &> ${job_status_file}
if cat ${job_status_file} | jq ".Completed" > /dev/null; then
job_completed_flag=$(cat ${job_status_file} | jq -r ".Completed")
echo "File is OK. Job completed? ${job_completed_flag}"
if ${job_completed_flag} = true; then
break
fi
else
echo "$(date) Failed to check status from describe job."
fi
#sleeps for 1/2 hour - 1800 seconds before trying to fetch status again - Glacier is slow...
echo "$(date) Will try again in 1/2 hour... "
sleep 1800
done
echo "Starting Step 3/4 - Obtaining output from retrieval job - finally getting archive ids to delete"
inventory_output_file=./glacier-inventory-output-file-${account_id}-${region}-${vault_name}-${job_id}.json
aws glacier get-job-output --account-id ${account_id} --region ${region} --vault-name ${vault_name} --job-id ${job_id} ${inventory_output_file}
echo "Output file: ${inventory_output_file} created for vault ${vault_name} and job ${job_id}"
inventory_id_file=./glacier-inventory-output-file-${account_id}-${region}-${vault_name}-${job_id}.txt
echo "Creating archive list from output file at ${inventory_id_file}"
if [[ ! -f ${inventory_id_file} ]]; then
cat ${inventory_output_file} | jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" > ${inventory_id_file} 2> /dev/null
fi
total=$(wc -l ${inventory_id_file} | awk '{print $1}')
echo "Total archives to delete: ${total} in vault ${vault_name}"
echo "Starting Step 4/4 - Delete process starting now $(date)"
case "$(uname -s)" in
Linux*) numCPU="$(nproc)";;
Darwin*) numCPU="$(sysctl -n hw.logicalcpu)";;
*) numCPU=1
esac
num=0
while read -r archive_id; do
num=$((num+1))
aws glacier delete-archive --account-id ${account_id} --region ${region} --vault-name ${vault_name} --archive-id=${archive_id} &
[ $( jobs | wc -l ) -ge $numCPU ] && wait
echo "Archive ${num}/${total} deleted at $(date) - id: ${archive_id}"
done < "${inventory_id_file}"
wait
echo "Finished at $(date)"
echo "Deleted all archives listed in ${inventory_id_file}"

Simple script to delete all archives from a given AWS Glacier Vault

I thought it would be good to share with others facing the incovenient process of deleting AWS Glacier Vaults - it was always something that kept me paying AWS for file archives I no longer needed for years! Now I finally decided to eliminate this waste.

I've written my own because I couldn't find a convenient one to use, this script needs only an account id, aws region and vault name to do its job for you (after you've followed the pre-requisites).

I've borrowed a lot from a previous gist, mentioned in the acknowledgements (including a number of comments from others that used it) but decided to build something more end-to-end and remove any manual steps so I could start it and leave it alone. Hope that's useful to you too!

Pre-requisites

  • I've run this on Mac and Linux (AWS) - I ended up running it on an EC2 instance as it can be a very long running process.
  • Install jq https://stedolan.github.io/jq/download/
  • Install the AWS CLI v2 https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html
  • Create an IAM that has full access to AWS Glacier resources (AmazonGlacierFullAccess policy)
  • Copy the IAM's Access Key ID and Secret Access Key for your cli configuration.
  • Run aws configure and provide your IAM's Access Key ID and Secret Access Key and default output format to JSON (region in this case is not so important as I've decided to have it as a parameter on the script itself).

Setting it up

  • Copy the attached file named: delete-aws-glacier-vault-archives.sh
  • Make it executable by running $ chmod 744 ./delete-aws-glacier-vault-archives.sh

Running it

This script take three parameters to run: AWS Account ID, AWS Region for the Glacier Vault in Question and the Vault Name you want to delete archives from.

Based on comments from the gist mentioned in my early acknowledgments, a good way to run it is to keep it running in background and make it log to a file so you can check on progress which can be achieved running the command below:

$ nohup ./delete-aws-glacier-vault-archives.sh AWS_ACCOUNT_ID AWS_REGION AWS_GLACIER_VAULT_NAME > delete_AWS_ACCOUNT_ID_AWS_REGION_AWS_GLACIER_VAULT_NAME.log 2>&1 &

Please note that you should replace the AWS_ACCOUNT_ID, AWS_REGION and AWS_GLACIER_VAULT_NAME on the script and log file name to be able to monitor and run multiple process in parallel in case you need to delete multiple vaults.

Hope it works out for you as it did for me :)

Notes on runtime behaviour

  • It will try and avoid creating unecessary inventory-retrieve tasks, so it creates a file to store the job information against a given vault.
  • It will create files for the job output (which contains the JSON returned from Glacier containing all archives on the vault).
  • It will create an input file containing only the archive ids extracted from the previous JSON file.
  • Logs will give you a sense of where you are and will contain each archive id deleted.

Notes on Glacier behaviour

Even after you delete all your archives they will still show as if nothing has happened in the AWS Glacier console as the information there is only computed daily by AWS. You will only be able to delete the Vault in the console after AWS refreshes its information and show as an empty vault... Annoying but nothing I can do about it.

Acknowledgements

@aimestereo
Copy link

aimestereo commented Dec 7, 2020

@dmabamboo, many thanks!

the only issues I had is caused by AWS, though I will mention them:

  • on a first run, after inventory retrieved, aws glacier get-job-output ... produced an empty file(*). The timing was not an issue, I've tried many hours later with the same result. After rerun, with new inventory retrieved job - all was good. Maybe it's because I've tried it on a recently created vault?
  • Assuming the goal is to remove Glacier vault: after script finish its work, we need to run another inventory retrieved job for glacier to understand that now the vault is empty.
  • script doesn't work with aws profile, so I had to manually help him with export AWS_PROFILE=<My_AWS_Profile>

(*) first inventory file just makes no sense:

{"VaultARN":"<MyArn>","InventoryDate":"1970-01-01T00:00:00Z","ArchiveList":[]}

@dmabamboo
Copy link
Author

Thanks for you feedback @aimestero
On the first point you might be up to something on recently created vault, in my case they were ancient and everything is batch/delayed when dealing with Glacier. I'll give it a try and see if I run into the same scenario.
On the second point - after a while Glacier does process the fact that it's now empty - if this is something that can be achieved by doing another inventory retrieval it sounds like a great improvement on this script - do you want to send a snippet so we can add it here?
AWS profile support sounds like a great improvement as well, do you mind adding a snippet?

Sorry for the late reply, I wasn't using Github for a while.

@johandebeurs
Copy link

Note for others - nproc is not a default command on MacOS. Equivalent is sysctl -n hw.logicalcpu for step 4

@kellyatkinson
Copy link

kellyatkinson commented Oct 27, 2021

I love this script. Thank you so much for adding this robustness, @dmabamboo .

After getting this going, I added a couple of $now into the echo outputs so I can keep track of the time of the waiting steps. Otherwise I was just counting the numbers of "will try again in 1/2 hour"s. :) It was great to be able to stop the script and restart it with the modification, without feeling like I was losing my progress in the inventory retrieval job.

Thanks also to @johandebeurs for the note on macOS compatibility - I've replaced nproc by the equivalent - works perfectly at the deletion stage.

@vanm
Copy link

vanm commented May 20, 2022

Thank you @dmabamboo! This was vexing to figure out even what was necessary to clean up old Glacier vaults and your script made it much easier to wrap my head around. Thanks for the MacOS nproc replacement @johandebeurs 🙏

@dmabamboo
Copy link
Author

Thank you @dmabamboo! This was vexing to figure out even what was necessary to clean up old Glacier vaults and your script made it much easier to wrap my head around. Thanks for the MacOS nproc replacement @johandebeurs 🙏

Glad it has been useful to you and others. 😊

@woook
Copy link

woook commented Oct 7, 2022

This is much better than my hacky version - thanks!

@JeffHochberg
Copy link

JeffHochberg commented Nov 26, 2022

11/27/2022 (Follow-up):
Outside of what I covered below...this script is freaking awesome! I'd been trying to use FastGlacier to delete 5 Glacier vaults and it was not deleting the archives. Once I got past the errors, your script wiped the archives in each vault the first try. Thank you! You just saved me a lot of $$$ - now I don't need to keep paying Amazon for storage that I didn't really care to continue storing.

11/26/2022:
I tried running your script from an Ubuntu 20.04 LTS instance running in Windows Subsystem for Linux 2 on Windows 10. I'm not sure what the issue is, but the checks performed in the beginning to determine whether or not jq and aws-cli are installed are failing, however they are both installed and in the locations the script expects to see them in.

user@castamere:~/github/delete-aws-glacier-vault-archives$ command -v jq
/usr/bin/jq

user@castamere:~/github/delete-aws-glacier-vault-archives$ command -v aws
/usr/local/bin/aws

I ended up commenting out the checks in the beginning and things seem to be running OK. I guess I'll find out shortly! :-)

Also - I'm not sure why the script is complaining about not being able to find the --job-id argument:

aws: error: argument --job-id: expected one argument

Full output of the script thus far...

user@castamere:~/github/delete-aws-glacier-vault-archives$ sh ./delete-aws-glacier-vault-archives.sh 1234567891011 us-east-1 synology_<REDACTED>1
./delete-aws-glacier-vault-archives.sh: 32: [[: not found
./delete-aws-glacier-vault-archives.sh: 32: [[: not found
./delete-aws-glacier-vault-archives.sh: 32: [[: not found
Initiating delete process for the vault.
    Account:216572199283
    Region:us-east-1
    Vault:synology_<REDACTED>1
Starting Step 1/4 - Glacier Inventory Retrieval Job - it's Async and can take hours or days to complete
No previous job file found for this vault.
Starting a new Job.
Job request made.
Checking if the job initiation file is in good shape.
File is OK, jobId=
Starting Step 2/4 - Checking state of the Job to see if it's completed and can have its inventory retrieved for deletion.
File is OK. Job completed?
./delete-aws-glacier-vault-archives.sh: 80: =: not found
Sat Nov 26 03:26:56 EST 2022 Will try again in 1/2 hour...

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help

aws: error: argument --job-id: expected one argument

{
    "location": "/<REDACTED>/vaults/synology_<REDACTED>1/jobs/Wi_u6uVh2EBWF<REDACTED>Wr-I3YHQVdG0nBU3OC93zXDxc5Omwe2_EXQ",
    "jobId": "Wi_u6uVh2EBWFzXCqd6<REDACTED>U3OC93zXDxc5Omwe2_EXQ"
}

@jubrele
Copy link

jubrele commented Jan 6, 2023

Thank you for this resource. I am a little stuck on a basic step - exactly where in the script do I add the id, region and vault name? I've tried lines 28-30 but keep getting command not found errors in my log file. Are the three parameters meant to be entered on these lines? If so, can you provide an example of what they should look like (e.g. do I replace the "1" after $1 with the Acct ID?)

If not lines 28-30, do I fill in the parameters everywhere in the script where the generic labels for them appear?
I realize this is a very basic question - I am learning as I go, just trying to delete these darn vaults! Thank you.

@blakenan-bellese
Copy link

A sincere thank you for sharing this script. Super valuable. Thanks!

@deno825
Copy link

deno825 commented Jun 2, 2023

+1 Agreed. Thank you for sharing this amazing script!

Note to self: Never, ever, use AWS Glacier again.

@vanm
Copy link

vanm commented Jun 2, 2023 via email

@wildekek
Copy link

wildekek commented Jun 4, 2023

Fuckings to glacier, glory to Daniel.

@phwolf01
Copy link

phwolf01 commented Apr 16, 2024

For anyone like me struggling with getting this script to run on a Windows environment: I made a very simple Powershell version that takes a slightly modified output.json (inventory file) as input and runs through each archive ID calling the aws CLI via command line.
(assuming you are working with a single account ID and vault)

It takes a long time to complete but gets the job done just fine. I'd be happy to share it, just drop me a line if anyone is interested.

Thank everyone above for inspiring me to finally break free from the clutches of aws glacier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment