Skip to content

Instantly share code, notes, and snippets.

@zoltanctoth
Created September 22, 2020 07:25
Show Gist options
  • Save zoltanctoth/623b28df865e295f86404c9046c52854 to your computer and use it in GitHub Desktop.
Save zoltanctoth/623b28df865e295f86404c9046c52854 to your computer and use it in GitHub Desktop.
delete thousands or millions of objects in S3
# Hint: If you are stuck by having tens of millions of files under an S3 Prefix, perhaps
# the easiest is to set the prefix's Expiration to one day in the Lifecycle Management
# pane of the bucket in the Web UI and Amazon will take care of the object deletion for you
# A good resource where I've gotten the scripts is this:
https://serverfault.com/questions/679989/most-efficient-way-to-batch-delete-s3-files#comment1200074_917740
# List all objects
aws s3api list-objects --output text --bucket <<BUCKET_NAME>> --query 'Contents[].[Key]' --prefix <<prefix, like tmp/sandbox>> | pv -l
# DELETE OBJECTS (you can start executing this right after you started listing the objects into `to-delete.keys`)
tail -n+0 to-delete.keys | pv -l | grep -v -e "'" | tr '\n' '\0' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket <<BUCKET_NAME>> --delete "Objects=[$(printf "{Key=%q}," "$@")],Quiet=true"'
# Even though you are deleting 1000 objects with every HTTP request, this can take long-long hours if we are talking hundreds of millions of records
@russlamoreaux
Copy link

russlamoreaux commented Aug 17, 2022

I have found that this use of xargs incorrectly skips the first element in the keys file for every iteration.
Entries in file not included in the Key list are - lines 1, 1001, 2001, etc.
The solution is to put an underscore at the end of the line "$@")],Quiet=true"' _

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment