Skip to content

Instantly share code, notes, and snippets.

@veuncent
Last active July 12, 2024 09:22
Show Gist options
  • Save veuncent/ac21ae8131f24d3971a621fac0d95be5 to your computer and use it in GitHub Desktop.
Save veuncent/ac21ae8131f24d3971a621fac0d95be5 to your computer and use it in GitHub Desktop.
Delete all archives in an AWS Vault

AWS Glacier: Delete vault

Follow these steps to remove all archives from an AWS vault. After this is finished, you will be able to delete the vault itself through the browser console.

Step 1 / Retrieve inventory

This will create a job that collects required information about the vault.

$ aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

This can take hours or even days, depending on the size of the vault. Use the following command to check if it is ready:

aws glacier list-jobs --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

Copy the JobId (including the quotes) for the next step.

Step 2 / Get the ArchivesIds

The following command will result in a file listing all archive IDs, required for step 3.

$ aws glacier get-job-output --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id YOUR_JOB_ID ./output.json

Step 3 / Delete archives

Set the following parameters through environment variables:

export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME

Create a file with the following content and run it:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
	echo "Please set the following environment variables: "
	echo "AWS_ACCOUNT_ID"
	echo "AWS_REGION"
	echo "AWS_VAULT_NAME"
	exit 1
fi

archive_ids=$(jq .ArchiveList[].ArchiveId < $file)

for archive_id in ${archive_ids}; do
    echo "Deleting Archive: ${archive_id}"
    aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done

echo "Finished deleting archives"

Acknowledgement

This tutorial is based on this one: https://gist.github.com/Remiii/507f500b5c4e801e4ddc

@marshalleq
Copy link

marshalleq commented Jan 7, 2022

Me too on the 'got 0' output. Weird.

Edit - You've got to install jq and in my case (Mac) had to rename the downloaded binary to jq and add it to the path. https://stedolan.github.io/jq/download/

I tried a few of the linked items above, but came back to this one as it's the simplest. Even so, I still don't know if it's working - I suspect Glacier has to update the inventory again before I know, which I guess I need to trigger. Will report back for others like me whom are not developers, trying to figure out this nightmare. Never again Glacier, never again!

Reporting Back
Yeah, so once deleted you have to wait overnight and then check the next day for the deleted files to show up in the GUI - for me, one vault was able to be deleted straight away and the other had only a few files left. So I'm in the midst of running the process again for the vault with some files remaining, but it looks like that'll do it! 6-7 years of not being able to get rid of these files lol. They make it so easy to put them in!

@subscribe88
Copy link

Thank you all, I am now on my way to deleting my ghost vaults. Yes, it does feel like Hotel California unfortunately ...

@jfprieur
Copy link

jfprieur commented Jul 8, 2022

Thank you so much for this, Synology backup created over 200K archives when I backed up 8TB of data, was contemplating my next week of copy pasting archive IDs until I found your script!

@Islanoobar
Copy link

Islanoobar commented Jul 22, 2022

I am still getting Getting archive ids from ./output.json...got 0 with this in Ubuntu. I have tried several of the latter scripts.

I have jq installed, configured, in path and can run simple commands that use jq, so I know that is working. I have regenerated the output.json as well

The scripts also work fine in AWS CloudShell, but with a 120mb output.json file the timeout there (20mins) negates this as an option.

Desperate to get these 270k archives gone.

(Edit - the first script is working, slowly, but working)

@david-montgomery
Copy link

If the completed ids file './output-archive-ids.txt' (or whatever you've set id_file to) already exists, it won't read the input file and you'll see "got 0":

You have to delete or rename that file each time you run it.

@Islanoobar
Copy link

Thanks much!

@mackaaij
Copy link

Thanks for this! I changed $( nproc ) into 1 on my Macbook, otherwise the script would consume too much CPU (doesn't recognise nproc).

Also, I had a few network hiccups - the script missed some - so I created another job for the remaining archives to delete. These id's get appended to id_file by default. So either (re-)create this file in the script or remove output-archive-ids.txt before the next run :)

@simplycloud
Copy link

Thanks for sharing this, super helpful. I was able to get $( nproc ) working on MacOS by installing coreutils (e.g. brew install coreutils). May be helpful for others running a large archive deletion on a Macbook.

@achekhirov
Copy link

Good script, thanks a lot!

@hiway
Copy link

hiway commented May 8, 2023

Thank you!

@ibrohimislam
Copy link

ibrohimislam commented Aug 30, 2023

Here is script that using xargs with 8 parallel process:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
        echo "Please set the following environment variables: "
        echo "AWS_ACCOUNT_ID"
        echo "AWS_REGION"
        echo "AWS_VAULT_NAME"
        exit 1
fi

jq -r .ArchiveList[].ArchiveId < $file | xargs -P8 -n1 bash -c "echo \"Deleting: \$1\"; aws glacier delete-archive --archive-id=\$1 --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}" {}

@oddly-fixated
Copy link

Unfortunately, the aws CLI client is inefficient because of the overhead of client creation.

After experimenting with the aws CLI client and the completely awesome GNU Parallel, I switched to Python.

Yes, yes, I could refactor this in Go and deploy it to a fleet of Kubernetes-orchestrated services but sometimes a hack is just enough.

In CLI examples, I tend to follow the O'Reilly Style Guide in case you're unsure about \ to break lines and a leading > for $PS2.

I'll have to assume you know your way around the *nix command line and the vagaries of AWS, Python, pip and module installation. If you don't, stop here and RTFM before you shoot yourself and your colleagues in the foot, face and backside.

  • Note, the BSD 4 Clause "attribution" license applies to the example script below
    • The example is abridged for copy/pasta purposes
    • The real script is more complicated and has extensive exception handling
      • It's bundled with a Systemd service that quietly polls for work, deletes stuff and sends reports to Slack
    • I set S3's max_concurrent_requests = 2 because I use t3.nano worker instances to delete archives
      • The Glacier inventories are held in S3, so need to be downloaded first
      • Setting max_concurrent_requests ensures the s3 cp succeeds - YMMV
  • For reasons, I modify ~/.aws/config rather than set client config in the script
    • You may choose to do otherwise - again, YMMV

Here's an example means to create a new ~/.aws/config:

$ cat > ~/.aws/config <<EOF
[default]
s3 =
    max_concurrent_requests = 2
region = <your region>
max_attempts = 20
retry_mode = adaptive
cli_pager =
EOF

If you want to better understand max_attempts and retry_mode, then the AWS documentation is reasonable. I needed to change the default behaviour for reasons too. You may decide this is not needed, but I am managing hundreds of vaults - each with millions of archives and some interesting retention requirements.

You'll need to use pip to install boto3 and jq (as above) to stream the JSON inventory blob to the script - it reads the archive IDs from STDIN.

The script accepts two arguments:

  1. The name of your Glacier vault from which the archive ID JSON blob was generated
  2. The name of the file to which the script will write its logs

You're responsible for your own AWS MFA, auth, role, keys, etc.

Anyhow, try this to call the script:

$ jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" my-blob.json 2>/dev/null \
> | ./my-script.py my-vault ./my-vault-nuke.log

The log's date string is generated from %s because logging things per epoch second allows you to do trivial mathematics if you want to calculate runtime (the difference between the first deleted archive message and the last deleted archive message) and so on.

As any fule kno, %s is seconds since 1970-01-01 00:00:00 UTC.

If you're really bored, you can create a histogram showing deletions per seconds - this helps visualise API behaviour.

Here's the script:

#!/usr/bin/env python

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in
#    the documentation and/or other materials provided with the
#    distribution.
#
# 3. Neither the name of the copyright holder nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# 4. Redistributions of any form whatsoever must retain the following
#    acknowledgment: 'This product includes software developed by:
#    Oddly Fixated - https://github.com/oddly-fixated'
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
# TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import boto3
from botocore.exceptions import ClientError
import logging
import sys

def delete_glacier_archives(my_vault):
    glacier = boto3.client('glacier')

    # delete all the things streamed from STDIN
    for my_archive_id in sys.stdin:
        try:
            my_archive_id = my_archive_id.strip()
            logging.info('deleting archive %s',my_archive_id)
            response = glacier.delete_archive(
                vaultName=my_vault,
                archiveId=my_archive_id
            )
            # success
            logging.info('deleted archive %s', my_archive_id)
        # failure - ClientError
        except ClientError as e:
            logging.warning('client error')
            logging.warning('api response - %s', e)
        # failure - Exception
        except Exception:
            logging.warning('generic exception')

if __name__ == "__main__":

    # script name
    my_basename = sys.argv[0]

    # vault name should be the first argument
    if len(sys.argv) < 3:
        print(f"Usage: {my_basename} <my_vault> <logfile>")
        sys.exit(1)

    # vault name is the first argument
    my_vault = sys.argv[1]

    # logfile is the second argument
    my_log = sys.argv[2]

    # epoch seconds is good enough here
    logging.basicConfig(
            filename=my_log,
            format='%(asctime)s - %(levelname)s - %(message)s',
            datefmt='%s',level=logging.INFO)

    logging.info('vault name is %s', my_vault)

    # call the delete function
    delete_glacier_archives(my_vault)

Hopefully:

  • I have not introduced too many typos in hacking together the abridged script
  • You may find the script useful

@ivanhoe011
Copy link

Step 1.1) You can check the status of the running jobs by:
aws glacier describe-job --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id JOB_ID_FROM_INITIATE_JOB_OUTPUT

@sethrj
Copy link

sethrj commented Mar 2, 2024

It took me an hour to figure out that the reason my command failed:

$ aws glacier list-vaults --account-id xxxx-yyyy-zzzz

An error occurred (AccessDeniedException) when calling the ListVaults operation: User: arn:aws:iam::xxxxyyyyzzzz:user/foo is not authorized to perform: glacier:ListVaults on resource: arn:aws:glacier:us-east-1:xxxx-yyyy-zzzz:vaults/

is not because of a missing permission but because the account ID shouldn't have the - character listed 🤪

$ aws glacier list-vaults --account-id xxxxyyyyzzzz

@aivus
Copy link

aivus commented Mar 6, 2024

I would recommend using https://github.com/leeroybrun/glacier-vault-remove

It's really quick in comparison with removing each archive one-by-one

@oddly-fixated
Copy link

It's really quick in comparison with removing each archive one-by-one

It still has to iterate through a list of archive IDs from the inventory JSON blob and, without a retry_mode, parallel deletion calls could silently hit a Glacier rate limit (which can bring deletion TPS as low as 15).

@aivus
Copy link

aivus commented Mar 6, 2024

@oddly-fixated I can see the logic which handles this:
https://github.com/leeroybrun/glacier-vault-remove/blob/2feb4accd12faab976a9d6bd59f121e7a195c3e7/removeVault.py#L29-L43

I just run a removal of 100k archives with 10 parallel requests and didn't hit any limits. Removal done in 20 mins

@oddly-fixated
Copy link

I can see the logic which handles this

You're right @aivus, but that doesn't adapt client behaviour based upon the API state - it's a try/fail/retry (which is helpful of course but can be bettered).

I just run a removal of 100k

Using the CLI, you'd average about one delete operation a second so clearly any Boto3 script is an improvement.

When you're processing 10 vaults at a time, each with +/- 100M archives, you'll find retry_mode = adaptive is worth your consideration.

Glacier's behaviour under load can be a little unpredictable. Granting more client-side retry behaviour is very useful. :-)

@marcodpt
Copy link

marcodpt commented Jun 6, 2024

This was very helpful. Anyone else feel like a Glacier vault is the digital equivalent of Hotel California...

They gathered for the feast They stab it with their steely knives But they just can't kill the beast

Relax, said the night man
We are programmed to receive
You can check out any time you like
But you can never leave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment