If you ever tried to delete more than a few hundred files on S3, you might have noticed how slow it was.
To speed-up the deletion, we can use a few bash
commands to parallelize the deletion, and we can also use some json
description of the objets we want to delete.
Concretely, it permits us to delete e.g. 1000 files with a single s3
API request.
To do so, we first need to fetch the list of objects that we want to delete.
Then, we need to parallelize the requests (with xargs
) and to create the json
containing the list of objects we want to delete.
Note: to work with a custom S3 endpoint, use for example:
alias aws="aws --endpoint-url https://s3.swiss-backup02.infomaniak.com"
we list the objects with the command `aws s3 ls "s3://grange/videos/" --recursive, which yields:
🕙 10:12:50 [⚡ 126] ❯ aws s3 ls "s3://grange/videos/" --recursive
2022-06-13 22:44:56 0 videos/
2022-06-14 07:48:12 1505900535 videos/2022-04.mov
2022-06-14 07:49:16 1768963999 videos/2022-05.mov
2022-06-14 07:51:56 766723187 videos/2022-06 01-13.mov
2022-08-29 09:16:14 1058135937 videos/2022-06--14-30.mov
2022-08-29 08:43:12 1929698829 videos/2022-07.mov
2022-10-14 09:51:06 1877769797 videos/2022-08.mov
...
we filter the output to only keep the path of the files to remove:
aws s3 ls "s3://grange/videos/" --recursive | sed -nre "s|[0-9-]+ [0-9:]+ +[0-9]+ ||p" > objects-list
we now filter the objects we want to delete, and use xargs
and printf
to generate the list of keys/objects we want to delete:
# shows what will be sent to the S3 API endpoint:
cat objects-list | grep 2022 | xargs -P8 -n 1000 bash -c 'echo $(printf "{Key=%s}," "$@")' _
# and finally we can process with the deletion:
cat objects-list | grep 2022 | xargs -P8 -n 1000 bash -c 'aws --endpoint-url https://s3.swiss-backup02.infomaniak.com s3api delete-objects --bucket grange --delete "Objects=[$(printf "{Key=%s}," "$@")],Quiet=true"' _