Skip to content

Instantly share code, notes, and snippets.

@scpike
Forked from scharfie/copy-s3-bucket.rb
Created September 18, 2012 20:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save scpike/3745634 to your computer and use it in GitHub Desktop.
Save scpike/3745634 to your computer and use it in GitHub Desktop.
Copy contents of an S3 bucket to a another bucket using an EC2 instance and a simple Ruby script. Useful for transferring large amounts of data and will work across geographic regions.
require 'rubygems'
require 'right_aws'
require 'yaml'
filename = ENV['FILE'].to_s
source = ENV['FROM'].to_s
destination = ENV['TO'].to_s
dry_run = true
puts "Please provide filename of s3 configuration" and exit(1) if filename == ""
s3_config = YAML.load(File.open(filename))
aws_access_key_id = s3_config[source]['access_key_id']
aws_secret_access_key = s3_config[source]['secret_access_key']
source_bucket = s3_config[source]['bucket']
destination_bucket = s3_config[destination]['bucket']
source_bucket_with_prefix = source_bucket
destination_bucket_with_prefix = destination_bucket
source_bucket, source_prefix = source_bucket.split('/', 2)
destination_bucket, destination_prefix = destination_bucket.split('/', 2)
puts "Preparing to copy from #{source_bucket} (prefix: #{source_prefix}) to #{destination_bucket} (prefix: #{destination_prefix})"
# exit
s3 = RightAws::S3Interface.new(aws_access_key_id, aws_secret_access_key)
destination_keys = Array.new
s3.incrementally_list_bucket(destination_bucket, 'prefix' => destination_prefix) do |key_set|
destination_keys << key_set[:contents].map{|k| k[:key]}.flatten
end
destination_keys.flatten!
s3.incrementally_list_bucket(source_bucket, 'prefix' => source_prefix) do |key_set|
key_set[:contents].each do |key|
key = key[:key]
source_key = key
destination_key = source_prefix ? key.sub(%r[^#{source_prefix}], destination_prefix) : key
if destination_keys.include?(destination_key)
puts "Skipping: #{destination_bucket}/#{destination_key}"
else
puts " Copying: #{source_bucket}/#{source_key}\n" +
" to: #{destination_bucket}/#{destination_key}"
retries=0
begin
if dry_run
" * dry-run, no copying will be performed"
else
# s3.copy(source_bucket, key, destination_bucket, destination_key)
end
rescue Exception => e
puts "cannot copy key, #{e.inspect}\nretrying #{retries} out of 10 times..."
retries += 1
retry if retries <= 10
end
end
end
end
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install ruby
sudo apt-get install rubygems
sudo apt-get install libopenssl-ruby
gem install right_aws
# create Ruby script (see copy-s3-bucket.rb above)
# run the script in the background
nohup ruby copy-s3-bucket.rb &
# watch the output
tail -f nohup.out
# for total bucket size, you can install and use s3cmd
sudo apt-get install s3cmd
# configure with s3 credentials
s3cmd --configure
s3cmd du s3://your.bucket
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment