Skip to content

Instantly share code, notes, and snippets.

@monkut
Last active September 11, 2017 04:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save monkut/025e720d703e4d220b6d15496ba3756a to your computer and use it in GitHub Desktop.
Save monkut/025e720d703e4d220b6d15496ba3756a to your computer and use it in GitHub Desktop.
Getting Started with AWS and Dask

Prereqs

NOTE: this assumes python >= 3.6

On ubuntu make sure that you have the python3.6-dev package installed in order to build the necessary packages

  • python3.6
  • jq

Preparation

  1. Setup AWS Account
# if you don't have an account signup through aws.amazon.com

# Obtain 'AWS Access Key ID'
1. login to aws.amazon.com

2. In IAM create a user/group:
- dask
- dask-group (Permissions: x, y, z)
> When you create the user download the resulting `credentials.csv`, which contains the access information you'll need below.

# Necessary Permissions
- ecr:CreateRepository
  1. Install awscli
# install awscli
sudo python -m pip install awscli

# configure awscli
$ aws configure
AWS Access Key ID [None]: [ENTER info from the credentials.csv]
AWS Secret Access Key [None]: [ENTER info from the credentials.csv]
Default region name [None]: us-west-2
Default output format [None]: [LEAVE Blank (defaults to JSON)]
  1. Collect sample data:

  2. Send Data to s3:

  • Create Bucket
aws s3 create-bucket --bucket dask-taxi-data --region us-west-2 --create-bucket-configuration LocationConstraint=us-west-2

#. Generate your keypair for accessing your EC2 instances, and import to aws:

# make sure you have a ~/.ssh directory
mkdir ~/.ssh

# create your key
aws ec2 create-key-pair --key-name MYKEYNAME --query 'KeyMaterial' --output text > ~/.ssh/my-aws-key.pem

# update so only you can read the private key
sudo chmod 400 ~/.ssh/my-aws-key.pem  
  1. Prepare dask-ec2:
# create venv
python -m venv .venv

# enter .venv
source .venv/bin/activate

# install dask[complete]
python -m pip install dask[complete]

# install dask-ec2
python -m pip install dask-ec2

#. Boot up an ec2 instance:

# copy the default dask AMI to your region
aws ec2 copy-image --source-image-id ami-d05e75b8 --source-region us-east-1 --region us-west-2 --name "dask-ec2-ami"
...Take note of this ImageId and use it with the dask-ec2 up command

# start dask-ec2 instance
dask-ec2 up --keyname pycon2017 --keypair ~/.ssh/my-aws-key.pem --region-name us-west-2 --ami AMI-ID

NOTE: If you just started, you may see a PendingVerification Error, and you'll need to verify your Account - scheduler - workers (how many?)

-- Can we setup only the scheduler with this?
  1. Connect Dask Client

Create Repository in ECS

aws ecs create-repository --repository
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment