public
Last active

S3 - Set permissions public on all files in a bucket using IronWorker.

  • Download Gist
README.md
Markdown

Purpose

Workers to ensure all objects in a S3 bucket have the 'public-read' canned ACL permission. Designed to work with iron.io's IronWorker product to use it's scalable workers to set the permissions quickly and afforably.

Setup

This software needs a few 3rd party resources to work. All these resources can be just dropped in this directory.

  • aws.phar - The Amazon AWS SDK for PHP.
  • iron_worker.phar - The iron_worker PHP library.
  • iron.json - Your IronWorker credentials

You will also need the iron_worker_ng RubyGem installed so you can upload and queue the workers.

Usage

First the workers need to be uploaded to your project. The following commands will do that:

iron_worker upload list
iron_worker upload assign

Next simply queue a list task with the proper arguments. Here is an example:

iron_worker queue list --payload '{"key": "YOUR-AMAZON-KEY", "secret": "YOUR-AMAZON-SECRET", "region": "us-east-1", "bucket": "your-amazon-bucket"}'

Replace the amazon argument for your Amazon account. Also update the region if needed. The full arguments for the worker are:

bucket: What bucket we are assigning permissions.
key:    Your amazon access key
secret: Your amazon secret key
region: Region your bucket it in

Optional additional arguments are:

list_size:  Max num of objs listed per request. Default to 1000
chunk_size: Max number of objs per child worker. Default to 100

Implementation

The basic workflow of this script is:

  1. Get a list of all objects in the bucket. This requires multiple iterations as Amazon can only return a max of 1000 objects at a time. You can configure it to be less than 1000 but there is really no reason to since that only slows things down and increases the costs due to higher number of requests.
  2. It then breaks up the list into chunks of objects (default 100). For each chunk it fires up a new worker to assign the permissions on all objects in that worker.
  3. The worker will query the current permissions on each object and if it does not already have public-read it will assign it that canned permission. Technically this increases the requests as it requires both a GET (to read the current permissions) then possibly a PUT (to assign permissions if necessary). We could reduce the number of requests by just doing a PUT on every object. But the extra requests are kept since GET request are cheaper than PUT requests ($0.01 per 10,000 vs $0.01 per 1,000). Also GET requests are faster. So assuming most objects already have the correct permissions things will get done sooner and more affordably.

Performance Notes

If you increase the chunk_size you reduce the parralization which slows down the overall process. But there is diminishing returns since there is overhead to spinning up child processes. So while technically you could make the chunk size 1, it wouldn't have as big of an impact on overall performance as you might hope.

assign.php
PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
<?php
# Will query and assign the permissions as directed by the list worker
 
require_once 'aws.phar';
 
$args = (array) getPayload();
echo "Assigning permissions\n";
 
require 's3_connect.php';
$all_users = array('URI' => 'http://acs.amazonaws.com/groups/global/AllUsers');
 
foreach( $args['keys'] as $key ) {
echo "Querying $key permission\n";
$grants = $s3->getObjectAcl(array(
'Bucket' => $args['bucket'],
'Key' => $key,
))->get('Grants');
 
$public_read = false;
foreach($grants as $grant) {
$is_all_users = $grant['Grantee'] == $all_users;
$is_read = $grant['Permission'] == 'READ';
if( $is_all_users && $is_read ) $public_read = true;
}
 
if( !$public_read ) {
echo "- Assigning public-read\n";
$s3->putObjectAcl(array(
'Bucket' => $args['bucket'],
'Key' => $key,
'ACL' => 'public-read',
));
}
}
assign.worker
1 2 3 4
runtime "php"
exec 'assign.php'
file 'connect.php'
file 'aws.phar'
list.php
PHP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
<?php
# Will get a list of all objects, break the work up into chuncks and
# create a new job worker for each chunk to actually set permissions.
 
require_once 'aws.phar';
require_once 'iron_worker.phar';
$worker = new IronWorker();
 
$args = (array) getPayload();
if(!isset($args['list_size'])) $args['list_size'] = 1000;
if(!isset($args['chunk_size'])) $args['chunk_size'] = 500;
 
echo "Starting permissions reset using the following specs\n";
print_r($args);
 
require 's3_connect.php';
 
$marker = null;
$fetch_more = true;
 
while( $fetch_more ) {
$results = $s3->listObjects(array(
'Bucket' => $args['bucket'],
'Marker' => $marker,
'MaxKeys' => $args['list_size'],
));
echo "$args[list_size] objects retrieved\n";
 
$chunks = array_chunk($results->get('Contents'), $args['chunk_size']);
foreach($chunks as $chunk) {
for($i=0; $i<count($chunk); $i++)
$marker = $chunk[$i] = $chunk[$i]['Key'];
echo "- Spin up worker\n";
$worker->postTask('assign', array_merge($args, array('keys' => $chunk)));
}
 
if( !$results->get('IsTruncated') ) $fetch_more = false;
}
list.worker
1 2 3 4 5 6
runtime "php"
exec 'list.php'
file 'connect.php'
file 'iron.json'
file 'aws.phar'
file 'iron_worker.phar'
s3_connect.php
PHP
1 2 3 4 5 6 7 8 9 10
<?php
 
use Aws\Common\Aws;
$s3_credentials = array(
'key' => $args['key'],
'secret' => $args['secret'],
'region' => $args['region'],
);
$s3 = Aws::factory($s3_credentials)->get('s3');
echo "Connected to S3\n";

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.