Skip to content

Instantly share code, notes, and snippets.

@clementnuss
Created January 7, 2023 09:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save clementnuss/1aa1562ecc2617b2da0b1ce6b5b702b9 to your computer and use it in GitHub Desktop.
Save clementnuss/1aa1562ecc2617b2da0b1ce6b5b702b9 to your computer and use it in GitHub Desktop.
Batch deletion of S3 objects #blog

Batch deletion of S3 objects

If you ever tried to delete more than a few hundred files on S3, you might have noticed how slow it was.

To speed-up the deletion, we can use a few bash commands to parallelize the deletion, and we can also use some json description of the objets we want to delete.

Concretely, it permits us to delete e.g. 1000 files with a single s3 API request.

How ?

To do so, we first need to fetch the list of objects that we want to delete. Then, we need to parallelize the requests (with xargs) and to create the json containing the list of objects we want to delete.

Note: to work with a custom S3 endpoint, use for example:
alias aws="aws --endpoint-url https://s3.swiss-backup02.infomaniak.com"

Listing the objects

we list the objects with the command `aws s3 ls "s3://grange/videos/" --recursive, which yields:

🕙 10:12:50  [⚡ 126] ❯ aws s3 ls "s3://grange/videos/" --recursive
2022-06-13 22:44:56          0 videos/
2022-06-14 07:48:12 1505900535 videos/2022-04.mov
2022-06-14 07:49:16 1768963999 videos/2022-05.mov
2022-06-14 07:51:56  766723187 videos/2022-06 01-13.mov
2022-08-29 09:16:14 1058135937 videos/2022-06--14-30.mov
2022-08-29 08:43:12 1929698829 videos/2022-07.mov
2022-10-14 09:51:06 1877769797 videos/2022-08.mov
...

we filter the output to only keep the path of the files to remove:

aws s3 ls "s3://grange/videos/" --recursive | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ ||p" > objects-list

we now filter the objects we want to delete, and use xargs and printf to generate the list of keys/objects we want to delete:

# shows what will be sent to the S3 API endpoint:
cat objects-list | grep 2022 | xargs -P8 -n 1000 bash -c 'echo $(printf "{Key=%s}," "$@")' _
# and finally we can process with the deletion:
cat objects-list | grep 2022 | xargs -P8 -n 1000 bash -c 'aws --endpoint-url https://s3.swiss-backup02.infomaniak.com s3api delete-objects --bucket grange --delete "Objects=[$(printf "{Key=%s}," "$@")],Quiet=true"' _
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment