Skip to content

Instantly share code, notes, and snippets.

@eric1234
Last active December 11, 2015 22:08
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save eric1234/4667599 to your computer and use it in GitHub Desktop.
Save eric1234/4667599 to your computer and use it in GitHub Desktop.
S3 - Set permissions public on all files in a bucket using IronWorker.

Purpose

Workers to ensure all objects in a S3 bucket have the 'public-read' canned ACL permission. Designed to work with iron.io's IronWorker product to use it's scalable workers to set the permissions quickly and afforably.

Setup

This software needs a few 3rd party resources to work. All these resources can be just dropped in this directory.

  • aws.phar - The Amazon AWS SDK for PHP.
  • iron_worker.phar - The iron_worker PHP library.
  • iron.json - Your IronWorker credentials

You will also need the iron_worker_ng RubyGem installed so you can upload and queue the workers.

Usage

First the workers need to be uploaded to your project. The following commands will do that:

iron_worker upload list
iron_worker upload assign

Next simply queue a list task with the proper arguments. Here is an example:

iron_worker queue list --payload '{"key": "YOUR-AMAZON-KEY", "secret": "YOUR-AMAZON-SECRET", "region": "us-east-1", "bucket": "your-amazon-bucket"}'

Replace the amazon argument for your Amazon account. Also update the region if needed. The full arguments for the worker are:

bucket: What bucket we are assigning permissions.
key:    Your amazon access key
secret: Your amazon secret key
region: Region your bucket it in

Optional additional arguments are:

list_size:  Max num of objs listed per request. Default to 1000
chunk_size: Max number of objs per child worker. Default to 100

Implementation

The basic workflow of this script is:

  1. Get a list of all objects in the bucket. This requires multiple iterations as Amazon can only return a max of 1000 objects at a time. You can configure it to be less than 1000 but there is really no reason to since that only slows things down and increases the costs due to higher number of requests.
  2. It then breaks up the list into chunks of objects (default 100). For each chunk it fires up a new worker to assign the permissions on all objects in that worker.
  3. The worker will query the current permissions on each object and if it does not already have public-read it will assign it that canned permission. Technically this increases the requests as it requires both a GET (to read the current permissions) then possibly a PUT (to assign permissions if necessary). We could reduce the number of requests by just doing a PUT on every object. But the extra requests are kept since GET request are cheaper than PUT requests ($0.01 per 10,000 vs $0.01 per 1,000). Also GET requests are faster. So assuming most objects already have the correct permissions things will get done sooner and more affordably.

Performance Notes

If you increase the chunk_size you reduce the parralization which slows down the overall process. But there is diminishing returns since there is overhead to spinning up child processes. So while technically you could make the chunk size 1, it wouldn't have as big of an impact on overall performance as you might hope.

<?php
# Will query and assign the permissions as directed by the list worker
require_once 'aws.phar';
$args = (array) getPayload();
echo "Assigning permissions\n";
require 's3_connect.php';
$all_users = array('URI' => 'http://acs.amazonaws.com/groups/global/AllUsers');
foreach( $args['keys'] as $key ) {
echo "Querying $key permission\n";
$grants = $s3->getObjectAcl(array(
'Bucket' => $args['bucket'],
'Key' => $key,
))->get('Grants');
$public_read = false;
foreach($grants as $grant) {
$is_all_users = $grant['Grantee'] == $all_users;
$is_read = $grant['Permission'] == 'READ';
if( $is_all_users && $is_read ) $public_read = true;
}
if( !$public_read ) {
echo "- Assigning public-read\n";
$s3->putObjectAcl(array(
'Bucket' => $args['bucket'],
'Key' => $key,
'ACL' => 'public-read',
));
}
}
runtime "php"
exec 'assign.php'
file 'connect.php'
file 'aws.phar'
<?php
# Will get a list of all objects, break the work up into chuncks and
# create a new job worker for each chunk to actually set permissions.
require_once 'aws.phar';
require_once 'iron_worker.phar';
$worker = new IronWorker();
$args = (array) getPayload();
if(!isset($args['list_size'])) $args['list_size'] = 1000;
if(!isset($args['chunk_size'])) $args['chunk_size'] = 500;
echo "Starting permissions reset using the following specs\n";
print_r($args);
require 's3_connect.php';
$marker = null;
$fetch_more = true;
while( $fetch_more ) {
$results = $s3->listObjects(array(
'Bucket' => $args['bucket'],
'Marker' => $marker,
'MaxKeys' => $args['list_size'],
));
echo "$args[list_size] objects retrieved\n";
$chunks = array_chunk($results->get('Contents'), $args['chunk_size']);
foreach($chunks as $chunk) {
for($i=0; $i<count($chunk); $i++)
$marker = $chunk[$i] = $chunk[$i]['Key'];
echo "- Spin up worker\n";
$worker->postTask('assign', array_merge($args, array('keys' => $chunk)));
}
if( !$results->get('IsTruncated') ) $fetch_more = false;
}
runtime "php"
exec 'list.php'
file 'connect.php'
file 'iron.json'
file 'aws.phar'
file 'iron_worker.phar'
<?php
use Aws\Common\Aws;
$s3_credentials = array(
'key' => $args['key'],
'secret' => $args['secret'],
'region' => $args['region'],
);
$s3 = Aws::factory($s3_credentials)->get('s3');
echo "Connected to S3\n";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment