Skip to content

Instantly share code, notes, and snippets.

@cowlicks
Last active January 30, 2017 14:44
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save cowlicks/7973b68e34808ddf97e2 to your computer and use it in GitHub Desktop.
Save cowlicks/7973b68e34808ddf97e2 to your computer and use it in GitHub Desktop.
Get data from githubarchive.com, then upload all files in the current directory to an S3 bucket.
"""
Script for uploading all the files in a directory to an S3 bucket.
If this does not work check your permissions to the bucket and that you have your ~/.boto file with access keys.
A few comments are include for generalizing this script.
"""
import os
import boto
# put your bucket name here
bucket = boto.connect_s3().get_bucket('githubarchive-data')
# put the directory you want to upload here. Subdirectories might breake this.
filenames = os.listdir('.')
if 'upload.py' in filenames: # remove this file if we're in the same directory.
filenames.remove('upload.py')
for fn in filenames:
print(fn)
k = bucket.new_key(fn)
k.set_contents_from_filename(fn)
# {..} ranges are inclusive, requesting days that don't exist is okay (like 2015-02-30-9.json.gz).
wget http://data.githubarchive.org/2015-{01..05}-{01..31}-{0..23}.json.gz
python upload.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment