Skip to content

Instantly share code, notes, and snippets.

@flah00
Last active July 19, 2018 08:49
Show Gist options
  • Save flah00/959251f74ba2aa797b24 to your computer and use it in GitHub Desktop.
Save flah00/959251f74ba2aa797b24 to your computer and use it in GitHub Desktop.
Upload an archive to AWS Glacier, automatically splitting the file, if necessary
#!/usr/bin/env ruby
# Inspired by https://github.com/aws/aws-sdk-core-ruby/blob/5bdfbebc6d845ac24fb977cb078a920b03b791ac/features/glacier/step_definitions.rb#L28
# Usage: glacier_uploader.rb VAULT /PATH/TO/FILE [NUM_PARTS] [RETRIES]
# files larger than 100mb must be broken up into chunks 1mb-4gb
# total chunks cannot exceed 10,000
# chunk sizes must be a power of 2
# Metadata stored in simpledb
GDRIVE_DOMAIN = 'GlacierGdrive'
Aws.config[:logger] = Logger.new $stdout
@vault_name = ARGV[0] || raise("Missing vault name")
@file_path = ARGV[1] || raise("Missing file path")
@nparts = ARGV[2].nil? ? 9_999 : ARGV[2].to_i
@max_retries= ARGV[3].nil? ? 3 : ARGV[3].to_i
@glacier_n = Aws::Glacier::Client.new logger: nil
@glacier = Aws::Glacier::Client.new
@sdb = Aws::SimpleDB::Client.new
@file = File.open(@file_path, "r")
def abort_upload(e)
if @upload_id
@glacier.abort_multipart_upload(
vault_name: @vault_name,
upload_id: @upload_id,
)
end
$stderr.puts "ERROR #{e.class}: #{e}"
exit(1)
end
sdomains = []
d = @sdb.list_domains
loop do
sdomains += d.domain_names
if d.next_page?
d = d.next_page
else
break
end
end
if sdomains.select{|d|d==GDRIVE_DOMAIN}.empty?
@sdb.create_domain(domain_name: GDRIVE_DOMAIN)
puts "INFO Waiting for domain creation..."
sleep(11)
end
if (size = File.size?(@file_path)) > 1073741824
s = size / @nparts
@part_size = 2 ** Math.log2(s).ceil
resp = @glacier.initiate_multipart_upload(
vault_name: @vault_name,
part_size: @part_size,
archive_description: @file.path,
)
@upload_id = resp.data.upload_id
# Keep a rolling tree hash of the entire file, required to complete the
# multipart upload at the end
tree_hash = Aws::TreeHash.new
# Upload the file in chunks and extract from each response the tree hash
# for that chunk. This eliminates the need to compute the total tree hash
# of the object in a second pass.
offset = 0
retries = 0
until @file.eof?
chunk = @file.read(@part_size)
begin
resp = @glacier_n.upload_multipart_part(
vault_name: @vault_name,
upload_id: @upload_id,
body: chunk,
range: "bytes #{offset}-#{offset+chunk.bytesize-1}/*"
)
rescue Aws::Glacier::Errors::ServiceError => e
if retries > @max_retries
abort_upload(e)
else
puts "#{Time.now.to_json} ERROR offset #{offset} #{e.class}: #{e}"
retries += 1
sleep 60
retry
end
end
tree_hash.hashes.concat(resp.context[:tree_hash].hashes)
puts "INFO Uploaded offset #{offset}"
offset += chunk.bytesize
retries = 0
end
# complete the multipart upload
begin
resp = @glacier.complete_multipart_upload(
vault_name: @vault_name,
upload_id: @upload_id,
archive_size: File.size?(@file.path),
checksum: tree_hash.digest
)
rescue Aws::Glacier::Errors::ServiceError => e
abort_upload(e)
end
else
begin
resp = @glacier_n.upload_archive(
vault_name: @vault_name,
archive_description: @file.path,
body: @file
)
rescue Aws::Glacier::Errors::ServiceError => e
abort_upload(e)
end
end
puts "ARCHIVE_ID: #{resp.archive_id}"
attributes = [
{name: "archive_id", value: resp.archive_id, replace: true},
{name: "size", value: size.to_s, replace: true},
{name: "date", value: Date.today.to_s, replace: true}
]
begin
resp = @sdb.put_attributes(
domain_name: GDRIVE_DOMAIN,
item_name: @file.path,
attributes: attributes,
)
pp resp
# Sometimes the record isn't saved... wtf?
resp = @sdb.get_attributes(
domain_name: GDRIVE_DOMAIN,
item_name: @file.path
)
pp resp
rescue Aws::SimpleDB::Errors::ServiceError => e
puts "File successfully uploaded"
puts "ERROR Unable to save attributes for #{@file.path}: #{attributes}"
puts "ERROR #{e.class}: #{e}"
exit(2)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment