Skip to content

Instantly share code, notes, and snippets.

@donwany
Last active April 1, 2018 03:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donwany/b459671993eeb37a65b704ebdf8a6905 to your computer and use it in GitHub Desktop.
Save donwany/b459671993eeb37a65b704ebdf8a6905 to your computer and use it in GitHub Desktop.
These snippets of code will guide you to uploading/downloading files to amazon s3 buckets with the require permissions to read and write objects.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Author: Theophilus Siameh , \n",
"### Email: theodondre@gmail.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to \n",
"write software that makes use of Amazon services like S3 and EC2. \n",
"Boto provides an easy to use, object-oriented API as well as low-level direct service access."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install the latest Boto 3 release via pip:\n",
"\n",
"- pip install boto3\n",
"\n",
"You may also install a specific version:\n",
"\n",
"- pip install boto3==1.0.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install AWS CLI\n",
"Read more:\n",
"\n",
"https://docs.aws.amazon.com/cli/latest/userguide/installing.html\n",
"\n",
"- pip install awscli --upgrade --user\n",
"\n",
"- aws --version"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration\n",
"Before you can begin using Boto 3, you should set up authentication credentials. Credentials for your AWS account can be found in the IAM Console. You can create or use an existing user. Go to manage access keys and generate a new set of keys.\n",
"\n",
"If you have the AWS CLI installed, then you can use it to configure your credentials file:\n",
"\n",
"aws configure\n",
"\n",
"Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials:\n",
"\n",
"[default]\n",
"aws_access_key_id = YOUR_ACCESS_KEY\n",
"\n",
"aws_secret_access_key = YOUR_SECRET_KEY\n",
"\n",
"You may also want to set a default region. This can be done in the configuration file. By default, its location is at ~/.aws/config:\n",
"\n",
"[default]\n",
"region=us-east-1\n",
"\n",
"Alternatively, you can pass a region_name when creating clients and resources.\n",
"\n",
"This sets up credentials for the default profile as well as a default region to use when creating connections. See Credentials for in-depth configuration sources and options."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Boto 3\n",
"\n",
"To use Boto 3, you must first import it and tell it what service you are going to use:\n",
"\n",
" import boto3\n",
"\n",
"# Let's use Amazon S3\n",
" s3 = boto3.resource('s3')\n",
"\n",
"Now that you have an s3 resource, you can make requests and process responses from the service. The following uses the buckets collection to print out all bucket names:\n",
"\n",
"# Print out bucket names\n",
" for bucket in s3.buckets.all():\n",
" print(bucket.name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import boto3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize variables"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bucket_name = \"siameh-sagemaker\"\n",
"file_name = \"demo_file.csv\"\n",
"key = \"demo_file.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## uploading files to s3 bucket"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def write_to_s3(filename, bucket, key):\n",
" '''\n",
" - # http://boto3.readthedocs.io/en/latest/guide/s3.html\n",
" - Write and Read from S3 is just as easy\n",
" - files are referred as objects in S3. \n",
" - file name is referred as key name in S3\n",
" - Files stored in S3 are automatically replicated across different availability zones \n",
" in the region where the bucket was created.\n",
" \n",
" '''\n",
" with open(filename,'rb') as f: # Read in binary mode\n",
" return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Calling the write_to_s3 function\n",
"# 1. Pass in existing location of file\n",
"# 2. Pass in bucket name- make sure you have this name created in aws console.\n",
"# 3. pass in file name\n",
"write_to_s3(file_name, bucket_name, key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## downloading from s3 bucket"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"file_n = 'demo_file_from_s3.csv'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def download_from_s3(filename, bucket, key):\n",
" '''\n",
" - http://boto3.readthedocs.io/en/latest/guide/s3.html\n",
" '''\n",
" with open(filename,'wb') as f:\n",
" return boto3.Session().resource('s3').Bucket(bucket).Object(key).download_fileobj(f)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Calling the download_from_s3 function\n",
"download_from_s3(file_n,bucket_name,key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading file from local file system"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x1</th>\n",
" <th>x2</th>\n",
" <th>x3</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.221993</td>\n",
" <td>153</td>\n",
" <td>2.041547</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.870732</td>\n",
" <td>180</td>\n",
" <td>1.190954</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.206719</td>\n",
" <td>127</td>\n",
" <td>8.779031</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.918611</td>\n",
" <td>144</td>\n",
" <td>5.236753</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.488411</td>\n",
" <td>177</td>\n",
" <td>4.921360</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>0.611744</td>\n",
" <td>175</td>\n",
" <td>7.318711</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>0.765908</td>\n",
" <td>165</td>\n",
" <td>0.145808</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.518418</td>\n",
" <td>147</td>\n",
" <td>0.933630</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>0.296801</td>\n",
" <td>130</td>\n",
" <td>8.265542</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.187721</td>\n",
" <td>184</td>\n",
" <td>8.334927</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x1 x2 x3 y\n",
"0 0.221993 153 2.041547 0\n",
"1 0.870732 180 1.190954 0\n",
"2 0.206719 127 8.779031 1\n",
"3 0.918611 144 5.236753 1\n",
"4 0.488411 177 4.921360 1\n",
"5 0.611744 175 7.318711 1\n",
"6 0.765908 165 0.145808 0\n",
"7 0.518418 147 0.933630 0\n",
"8 0.296801 130 8.265542 0\n",
"9 0.187721 184 8.334927 1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv(file_n).head(20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment