Workers to ensure all objects in a S3 bucket have the 'public-read' canned ACL permission. Designed to work with iron.io's IronWorker product to use it's scalable workers to set the permissions quickly and afforably.
This software needs a few 3rd party resources to work. All these resources can be just dropped in this directory.
- aws.phar - The Amazon AWS SDK for PHP.
- iron_worker.phar - The iron_worker PHP library.
- iron.json - Your IronWorker credentials
You will also need the iron_worker_ng RubyGem installed so you can upload and queue the workers.
First the workers need to be uploaded to your project. The following commands will do that:
iron_worker upload list
iron_worker upload assign
Next simply queue a list task with the proper arguments. Here is an example:
iron_worker queue list --payload '{"key": "YOUR-AMAZON-KEY", "secret": "YOUR-AMAZON-SECRET", "region": "us-east-1", "bucket": "your-amazon-bucket"}'
Replace the amazon argument for your Amazon account. Also update the region if needed. The full arguments for the worker are:
bucket: What bucket we are assigning permissions.
key: Your amazon access key
secret: Your amazon secret key
region: Region your bucket it in
Optional additional arguments are:
list_size: Max num of objs listed per request. Default to 1000
chunk_size: Max number of objs per child worker. Default to 100
The basic workflow of this script is:
- Get a list of all objects in the bucket. This requires multiple iterations as Amazon can only return a max of 1000 objects at a time. You can configure it to be less than 1000 but there is really no reason to since that only slows things down and increases the costs due to higher number of requests.
- It then breaks up the list into chunks of objects (default 100). For each chunk it fires up a new worker to assign the permissions on all objects in that worker.
- The worker will query the current permissions on each object and if it does not already have public-read it will assign it that canned permission. Technically this increases the requests as it requires both a GET (to read the current permissions) then possibly a PUT (to assign permissions if necessary). We could reduce the number of requests by just doing a PUT on every object. But the extra requests are kept since GET request are cheaper than PUT requests ($0.01 per 10,000 vs $0.01 per 1,000). Also GET requests are faster. So assuming most objects already have the correct permissions things will get done sooner and more affordably.
If you increase the chunk_size you reduce the parralization which slows down the overall process. But there is diminishing returns since there is overhead to spinning up child processes. So while technically you could make the chunk size 1, it wouldn't have as big of an impact on overall performance as you might hope.