Follow these steps to remove all archives from an AWS vault. After this is finished, you will be able to delete the vault itself through the browser console.
This will create a job that collects required information about the vault.
$ aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME
This can take hours or even days, depending on the size of the vault. Use the following command to check if it is ready:
aws glacier list-jobs --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME
Copy the JobId
(including the quotes) for the next step.
The following command will result in a file listing all archive IDs, required for step 3
.
$ aws glacier get-job-output --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id YOUR_JOB_ID ./output.json
Set the following parameters through environment variables:
export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME
Create a file with the following content and run it:
#!/bin/bash
file='./output.json'
if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
echo "Please set the following environment variables: "
echo "AWS_ACCOUNT_ID"
echo "AWS_REGION"
echo "AWS_VAULT_NAME"
exit 1
fi
archive_ids=$(jq .ArchiveList[].ArchiveId < $file)
for archive_id in ${archive_ids}; do
echo "Deleting Archive: ${archive_id}"
aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done
echo "Finished deleting archives"
This tutorial is based on this one: https://gist.github.com/Remiii/507f500b5c4e801e4ddc
Unfortunately, the
aws
CLI client is inefficient because of the overhead of client creation.After experimenting with the
aws
CLI client and the completely awesome GNU Parallel, I switched to Python.Yes, yes, I could refactor this in Go and deploy it to a fleet of Kubernetes-orchestrated services but sometimes a hack is just enough.
In CLI examples, I tend to follow the O'Reilly Style Guide in case you're unsure about
\
to break lines and a leading>
for$PS2
.I'll have to assume you know your way around the *nix command line and the vagaries of AWS, Python,
pip
and module installation. If you don't, stop here and RTFM before you shoot yourself and your colleagues in the foot, face and backside.max_concurrent_requests = 2
because I uset3.nano
worker instances to delete archivesmax_concurrent_requests
ensures thes3 cp
succeeds - YMMV~/.aws/config
rather than set client config in the scriptHere's an example means to create a new
~/.aws/config
:If you want to better understand
max_attempts
andretry_mode
, then the AWS documentation is reasonable. I needed to change the default behaviour for reasons too. You may decide this is not needed, but I am managing hundreds of vaults - each with millions of archives and some interesting retention requirements.You'll need to use
pip
to installboto3
andjq
(as above) to stream the JSON inventory blob to the script - it reads the archive IDs from STDIN.The script accepts two arguments:
You're responsible for your own AWS MFA, auth, role, keys, etc.
Anyhow, try this to call the script:
The log's date string is generated from
%s
because logging things per epoch second allows you to do trivial mathematics if you want to calculate runtime (the difference between the firstdeleted archive
message and the lastdeleted archive
message) and so on.As any fule kno,
%s
is seconds since 1970-01-01 00:00:00 UTC.If you're really bored, you can create a histogram showing deletions per seconds - this helps visualise API behaviour.
Here's the script:
Hopefully: