NOTE: this assumes python >= 3.6
On ubuntu make sure that you have the python3.6-dev package installed in order to build the necessary packages
- python3.6
- jq
- Setup AWS Account
# if you don't have an account signup through aws.amazon.com
# Obtain 'AWS Access Key ID'
1. login to aws.amazon.com
2. In IAM create a user/group:
- dask
- dask-group (Permissions: x, y, z)
> When you create the user download the resulting `credentials.csv`, which contains the access information you'll need below.
# Necessary Permissions
- ecr:CreateRepository
- Install awscli
# install awscli
sudo python -m pip install awscli
# configure awscli
$ aws configure
AWS Access Key ID [None]: [ENTER info from the credentials.csv]
AWS Secret Access Key [None]: [ENTER info from the credentials.csv]
Default region name [None]: us-west-2
Default output format [None]: [LEAVE Blank (defaults to JSON)]
-
Collect sample data:
-
Send Data to s3:
- Create Bucket
aws s3 create-bucket --bucket dask-taxi-data --region us-west-2 --create-bucket-configuration LocationConstraint=us-west-2
#. Generate your keypair for accessing your EC2 instances, and import to aws:
# make sure you have a ~/.ssh directory
mkdir ~/.ssh
# create your key
aws ec2 create-key-pair --key-name MYKEYNAME --query 'KeyMaterial' --output text > ~/.ssh/my-aws-key.pem
# update so only you can read the private key
sudo chmod 400 ~/.ssh/my-aws-key.pem
- Prepare dask-ec2:
# create venv
python -m venv .venv
# enter .venv
source .venv/bin/activate
# install dask[complete]
python -m pip install dask[complete]
# install dask-ec2
python -m pip install dask-ec2
#. Boot up an ec2 instance:
# copy the default dask AMI to your region
aws ec2 copy-image --source-image-id ami-d05e75b8 --source-region us-east-1 --region us-west-2 --name "dask-ec2-ami"
...Take note of this ImageId and use it with the dask-ec2 up command
# start dask-ec2 instance
dask-ec2 up --keyname pycon2017 --keypair ~/.ssh/my-aws-key.pem --region-name us-west-2 --ami AMI-ID
NOTE: If you just started, you may see a PendingVerification Error, and you'll need to verify your Account - scheduler - workers (how many?)
-- Can we setup only the scheduler with this?
- Connect Dask Client
aws ecs create-repository --repository