Create a gist now

Instantly share code, notes, and snippets.

@cowlicks /upload.py
Last active Jan 30, 2017

What would you like to do?
Get data from githubarchive.com, then upload all files in the current directory to an S3 bucket.
"""
Script for uploading all the files in a directory to an S3 bucket.
If this does not work check your permissions to the bucket and that you have your ~/.boto file with access keys.
A few comments are include for generalizing this script.
"""
import os
import boto
# put your bucket name here
bucket = boto.connect_s3().get_bucket('githubarchive-data')
# put the directory you want to upload here. Subdirectories might breake this.
filenames = os.listdir('.')
if 'upload.py' in filenames: # remove this file if we're in the same directory.
filenames.remove('upload.py')
for fn in filenames:
print(fn)
k = bucket.new_key(fn)
k.set_contents_from_filename(fn)
# {..} ranges are inclusive, requesting days that don't exist is okay (like 2015-02-30-9.json.gz).
wget http://data.githubarchive.org/2015-{01..05}-{01..31}-{0..23}.json.gz
python upload.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment