Skip to content

Instantly share code, notes, and snippets.

@cloventt
Created July 23, 2020 00:27
Show Gist options
  • Save cloventt/a0ad8c045e856d070b9a2dc1efb7f515 to your computer and use it in GitHub Desktop.
Save cloventt/a0ad8c045e856d070b9a2dc1efb7f515 to your computer and use it in GitHub Desktop.
Bulk deleting of Amazon S3 delete markers to undelete a lot of data

A coworker of mine accidentally deleted a large S3 prefix containing tens of thousands of keys. Luckily we have versioning enabled on the bucket, or it would have been gone for good. With versioning enabled, recovering a deleted file from S3 is trivial by just deleting the Delete Marker. But this presented the next challenge: how to perform a bulk undelete of deleted keys in S3.

I found a few scripts online that claimed to do this, but they either performed a delete operation per-object (incredibly slow), or they didn't deal with the S3 API limit of 1000 keys per delete operation, or they tried to pipe a massive blob of JSON to the awscli in Bash, which has a limit of 128KiB on arguments lists.

This script resolves these issues by:

  • performing a bulk retrieval of the delete markers you want to delete
  • splitting the list of keys into chunks of 1000 keys at a time
  • passing the keys to aws s3api via a file so Bash doesn't freak out

You will need jq installed for this script to work.

#!/bin/bash
BUCKET=$1
PREFIX=$2
echo "Collecting delete markers to purge (may take a while)..."
aws s3api list-object-versions --bucket $BUCKET --prefix $PREFIX --output=json --query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}' |
jq -c '.Objects | {Objects: _nwise(1000)}' | while read OBJECTS
do
echo "Deleting markers..."
echo $OBJECTS > deleting.json
aws s3api delete-objects --bucket $BUCKET --delete file://deleting.json
done
rm deleting.json
echo "Done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment