Skip to content

Instantly share code, notes, and snippets.

@veuncent
Last active July 12, 2024 09:22
Show Gist options
  • Save veuncent/ac21ae8131f24d3971a621fac0d95be5 to your computer and use it in GitHub Desktop.
Save veuncent/ac21ae8131f24d3971a621fac0d95be5 to your computer and use it in GitHub Desktop.
Delete all archives in an AWS Vault

AWS Glacier: Delete vault

Follow these steps to remove all archives from an AWS vault. After this is finished, you will be able to delete the vault itself through the browser console.

Step 1 / Retrieve inventory

This will create a job that collects required information about the vault.

$ aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

This can take hours or even days, depending on the size of the vault. Use the following command to check if it is ready:

aws glacier list-jobs --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

Copy the JobId (including the quotes) for the next step.

Step 2 / Get the ArchivesIds

The following command will result in a file listing all archive IDs, required for step 3.

$ aws glacier get-job-output --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id YOUR_JOB_ID ./output.json

Step 3 / Delete archives

Set the following parameters through environment variables:

export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME

Create a file with the following content and run it:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
	echo "Please set the following environment variables: "
	echo "AWS_ACCOUNT_ID"
	echo "AWS_REGION"
	echo "AWS_VAULT_NAME"
	exit 1
fi

archive_ids=$(jq .ArchiveList[].ArchiveId < $file)

for archive_id in ${archive_ids}; do
    echo "Deleting Archive: ${archive_id}"
    aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done

echo "Finished deleting archives"

Acknowledgement

This tutorial is based on this one: https://gist.github.com/Remiii/507f500b5c4e801e4ddc

@oddly-fixated
Copy link

It's really quick in comparison with removing each archive one-by-one

It still has to iterate through a list of archive IDs from the inventory JSON blob and, without a retry_mode, parallel deletion calls could silently hit a Glacier rate limit (which can bring deletion TPS as low as 15).

@aivus
Copy link

aivus commented Mar 6, 2024

@oddly-fixated I can see the logic which handles this:
https://github.com/leeroybrun/glacier-vault-remove/blob/2feb4accd12faab976a9d6bd59f121e7a195c3e7/removeVault.py#L29-L43

I just run a removal of 100k archives with 10 parallel requests and didn't hit any limits. Removal done in 20 mins

@oddly-fixated
Copy link

I can see the logic which handles this

You're right @aivus, but that doesn't adapt client behaviour based upon the API state - it's a try/fail/retry (which is helpful of course but can be bettered).

I just run a removal of 100k

Using the CLI, you'd average about one delete operation a second so clearly any Boto3 script is an improvement.

When you're processing 10 vaults at a time, each with +/- 100M archives, you'll find retry_mode = adaptive is worth your consideration.

Glacier's behaviour under load can be a little unpredictable. Granting more client-side retry behaviour is very useful. :-)

@marcodpt
Copy link

marcodpt commented Jun 6, 2024

This was very helpful. Anyone else feel like a Glacier vault is the digital equivalent of Hotel California...

They gathered for the feast They stab it with their steely knives But they just can't kill the beast

Relax, said the night man
We are programmed to receive
You can check out any time you like
But you can never leave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment