Skip to content

Instantly share code, notes, and snippets.

@veuncent
Last active December 4, 2023 07:29
  • Star 86 You must be signed in to star a gist
  • Fork 19 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
Delete all archives in an AWS Vault

AWS Glacier: Delete vault

Follow these steps to remove all archives from an AWS vault. After this is finished, you will be able to delete the vault itself through the browser console.

Step 1 / Retrieve inventory

This will create a job that collects required information about the vault.

$ aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

This can take hours or even days, depending on the size of the vault. Use the following command to check if it is ready:

aws glacier list-jobs --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME 

Copy the JobId (including the quotes) for the next step.

Step 2 / Get the ArchivesIds

The following command will result in a file listing all archive IDs, required for step 3.

$ aws glacier get-job-output --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id YOUR_JOB_ID ./output.json

Step 3 / Delete archives

Set the following parameters through environment variables:

export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME

Create a file with the following content and run it:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
	echo "Please set the following environment variables: "
	echo "AWS_ACCOUNT_ID"
	echo "AWS_REGION"
	echo "AWS_VAULT_NAME"
	exit 1
fi

archive_ids=$(jq .ArchiveList[].ArchiveId < $file)

for archive_id in ${archive_ids}; do
    echo "Deleting Archive: ${archive_id}"
    aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done

echo "Finished deleting archives"

Acknowledgement

This tutorial is based on this one: https://gist.github.com/Remiii/507f500b5c4e801e4ddc

@cwilper
Copy link

cwilper commented May 1, 2019

Thanks for sharing this, @veuncent

Here's a tweaked version of the script that processes in a stream (lower memory requirement for huge vaults), gives counts and timestamps, incorporates @joel1di1's fix, and uses AWS_PROFILE, if defined.

#!/usr/bin/env bash

file='./output.json'
id_file='./output-archive-ids.txt'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
        echo "Please set the following environment variables: "
        echo "AWS_ACCOUNT_ID"
        echo "AWS_REGION"
        echo "AWS_VAULT_NAME"
        exit 1
fi

echo "Started at $(date)"

echo -n "Getting archive ids from $file..."
if [[ ! -f $id_file ]]; then
  cat $file | jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" > $id_file 2> /dev/null
fi
total=$(wc -l $id_file | awk '{print $1}')
echo "got $total"

num=0
while read -r archive_id; do
  num=$((num+1))
  echo "Deleting archive $num/$total at $(date)"
  if [[ $AWS_PROFILE ]]; then
    aws --profile $AWS_PROFILE glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
  else
    aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
  fi
done < "$id_file"

echo "Finished at $(date)"
echo "Deleted archive ids are in $id_file"

I'd recommend naming it delete-archives.sh and running it in the background on a machine that's going to be on a network for a long time, e.g.:

chmod 755 delete-archives.sh
nohup ./delete-archives.sh > delete-archives.log 2>&1 &
tail -f delete-archives.log

@veuncent
Copy link
Author

veuncent commented May 1, 2019

Nice, thanks for sharing @cwilper !

@veuncent
Copy link
Author

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_ACCOUNT_ID} ]]; then

Should be :
if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then

Thanks @joel1di1 ! I updated the script. (sorry for the late reply, I didn't get a notification when you posted this)

@m-leishman
Copy link

Hi @cwilper
I'm not returning anything from the output.json file. There are archive-ids in there but it returns "Getting archive ids from ./output.json...got 0". Any thoughts

@goodspeedal
Copy link

goodspeedal commented Sep 10, 2019

Thank you @cwilper, your script let us to delete a 128TB vault (24GB json in file size).

Update:
@@ my 1G Centos VM estimated 28 months to delete all the files, (1.5 sec/file deletion)
It will be great if anyone has a multiple thread enabled script :-)

@goodspeedal
Copy link

goodspeedal commented Sep 25, 2019

Finally I have combined multi thread idea with @cwilper script, hope anyone can use it. The best of the script is that it can be utilise all the machine RAM (as low as 2GB) and the Max CPU thread to carry out the task without crash the machine while doing it.

#!/usr/bin/env bash

file='./output.json'
id_file='./output-archive-ids.txt'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
        echo "Please set the following environment variables: "
        echo "AWS_ACCOUNT_ID"
        echo "AWS_REGION"
        echo "AWS_VAULT_NAME"
        exit 1
fi

echo "Started at $(date)"

echo -n "Getting archive ids from $file..."
if [[ ! -f $id_file ]]; then
  cat $file | jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" > $id_file 2> /dev/null
fi
total=$(wc -l $id_file | awk '{print $1}')
echo "got $total"

num=0
while read -r archive_id; do
  num=$((num+1))
  echo "Deleting archive $num/$total at $(date)"
  aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION} &
  [ $( jobs | wc -l ) -ge $( nproc ) ] && wait
done < "$id_file"

wait
echo "Finished at $(date)"
echo "Deleted archive ids are in $id_file"

You can freely to change the variable "$( nproc )" for the CPU thread will be used. We tested this above script will already getting the max machine thread number and give the best performance a already.
Just execute it as @cwilper said.

chmod 755 delete-archives.sh
nohup ./delete-archives.sh > delete-archives.log 2>&1 &
tail -f delete-archives.log

@cwilper
Copy link

cwilper commented Sep 25, 2019

Nice @goodspeedal, looks like a good approach to get better throughput.

Copy link

ghost commented Oct 24, 2019

Great job, thanks @cwilper @goodspeedal !

@vbuser2004
Copy link

This was very helpful. Anyone else feel like a Glacier vault is the digital equivalent of Hotel California...

They gathered for the feast
They stab it with their steely knives
But they just can't kill the beast

@veuncent
Copy link
Author

:D

@codyburleson
Copy link

This was very helpful. Anyone else feel like a Glacier vault is the digital equivalent of Hotel California...

They gathered for the feast
They stab it with their steely knives
But they just can't kill the beast

Yes! I'm kind of sorry I ever stored anything on Glacier.

@reedbn
Copy link

reedbn commented Sep 11, 2020

Very helpful! Thank you for posting this!

@tonymet
Copy link

tonymet commented Oct 9, 2020

awesome guidance here!

@dmabamboo
Copy link

Hi @veuncent. Thanks for this. I've borrowed from it and created what I think it's a more end-to-end version of it. You just need to pass account_id, aws_region and vault_name and it will do the rest. Hope it's useful to yourself and others.

https://gist.github.com/dmabamboo/e4c0c4a356116c026a86628c11864ab0

@adosztal
Copy link

Works like a charm, thank you! Btw the need to have this script is one of the reasons I'm moving to Azure Archive Storage; the other is the half of the price. :)

@guillem
Copy link

guillem commented May 5, 2021

Oh, great. I just wrote this https://github.com/guillem/glacier-delete and now I find out you people already have made the two biggest improvements my version is missing (multithreading and "end to end" process) xD Anyway, here it is.

@shyward1
Copy link

shyward1 commented Sep 5, 2021

Many thanks to everyone here!!

@cerealkella
Copy link

This is awesome. Thanks!!! This saved me from a wasted morning writing code I did not want to write just to exit an ecosystem I no longer need or want.

@etawiah
Copy link

etawiah commented Dec 31, 2021

Similar to @m-leishman I'm getting the same result using the combined code posted by @goodspeedal

Getting archive ids from ./output.json...got 0

echo -n "Getting archive ids from $file..."

  • echo -n 'Getting archive ids from ./output.json...'
    Getting archive ids from ./output.json...if [[ ! -f $id_file ]]; then
    cat $file | jq -r --stream ". | { (.[0][2]): .[1]} | select(.ArchiveId) | .ArchiveId" > $id_file 2> /dev/null
    fi
  • [[ ! -f ./output-archive-ids.txt ]]
    total=$(wc -l $id_file | awk '{print $1}')
    ++ awk '{print $1}'
    ++ wc -l ./output-archive-ids.txt
  • total=0
    echo "got $total"
  • echo 'got 0'
    got 0

num=0

  • num=0
    while read -r archive_id;

@marshalleq
Copy link

marshalleq commented Jan 7, 2022

Me too on the 'got 0' output. Weird.

Edit - You've got to install jq and in my case (Mac) had to rename the downloaded binary to jq and add it to the path. https://stedolan.github.io/jq/download/

I tried a few of the linked items above, but came back to this one as it's the simplest. Even so, I still don't know if it's working - I suspect Glacier has to update the inventory again before I know, which I guess I need to trigger. Will report back for others like me whom are not developers, trying to figure out this nightmare. Never again Glacier, never again!

Reporting Back
Yeah, so once deleted you have to wait overnight and then check the next day for the deleted files to show up in the GUI - for me, one vault was able to be deleted straight away and the other had only a few files left. So I'm in the midst of running the process again for the vault with some files remaining, but it looks like that'll do it! 6-7 years of not being able to get rid of these files lol. They make it so easy to put them in!

@subscribe88
Copy link

Thank you all, I am now on my way to deleting my ghost vaults. Yes, it does feel like Hotel California unfortunately ...

@jfprieur
Copy link

jfprieur commented Jul 8, 2022

Thank you so much for this, Synology backup created over 200K archives when I backed up 8TB of data, was contemplating my next week of copy pasting archive IDs until I found your script!

@Islanoobar
Copy link

Islanoobar commented Jul 22, 2022

I am still getting Getting archive ids from ./output.json...got 0 with this in Ubuntu. I have tried several of the latter scripts.

I have jq installed, configured, in path and can run simple commands that use jq, so I know that is working. I have regenerated the output.json as well

The scripts also work fine in AWS CloudShell, but with a 120mb output.json file the timeout there (20mins) negates this as an option.

Desperate to get these 270k archives gone.

(Edit - the first script is working, slowly, but working)

@david-montgomery
Copy link

If the completed ids file './output-archive-ids.txt' (or whatever you've set id_file to) already exists, it won't read the input file and you'll see "got 0":

You have to delete or rename that file each time you run it.

@Islanoobar
Copy link

Thanks much!

@mackaaij
Copy link

Thanks for this! I changed $( nproc ) into 1 on my Macbook, otherwise the script would consume too much CPU (doesn't recognise nproc).

Also, I had a few network hiccups - the script missed some - so I created another job for the remaining archives to delete. These id's get appended to id_file by default. So either (re-)create this file in the script or remove output-archive-ids.txt before the next run :)

@simplycloud
Copy link

Thanks for sharing this, super helpful. I was able to get $( nproc ) working on MacOS by installing coreutils (e.g. brew install coreutils). May be helpful for others running a large archive deletion on a Macbook.

@achekhirov
Copy link

Good script, thanks a lot!

@hiway
Copy link

hiway commented May 8, 2023

Thank you!

@ibrohimislam
Copy link

ibrohimislam commented Aug 30, 2023

Here is script that using xargs with 8 parallel process:

#!/bin/bash

file='./output.json'

if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
        echo "Please set the following environment variables: "
        echo "AWS_ACCOUNT_ID"
        echo "AWS_REGION"
        echo "AWS_VAULT_NAME"
        exit 1
fi

jq -r .ArchiveList[].ArchiveId < $file | xargs -P8 -n1 bash -c "echo \"Deleting: \$1\"; aws glacier delete-archive --archive-id=\$1 --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}" {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment