Skip to content

Instantly share code, notes, and snippets.

@mrocklin
Last active November 17, 2015 17:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mrocklin/3c1e47f403490edb9473 to your computer and use it in GitHub Desktop.
Save mrocklin/3c1e47f403490edb9473 to your computer and use it in GitHub Desktop.

Notes on how I set up a four-node cluster on EC2 with anaconda-cluster

Disclaimer: I'm dogfooding anaconda-cluster, a non-free Continuum product. I only use free parts of acluster (up to four nodes) and I had a good experience but still, you should probably take anything I say with salt.

Credentials

I signed up for AWS and anaconda.org.

From AWS I had to collect:

  1. AWS Key
  2. AWS Secret
  3. SSH key

From anaconda.org I had to collect a username and password.

Install anaconda-cluster

To install anaconda cluster I first needed to login:

$ conda install -y anaconda-client
$ anaconda login  # provide username and password
$ conda install anaconda-cluster -c anaconda-cluster

Set up credentials and profile

I ran acluster once to populate the ~/.acluster directory:

$ acluster

Then I added my credentials to ~/.acluster/providers.yaml, notably the following fields

  • private_key:
  • secret_id:
  • secret_key:
  • keyname:

And tweaked the sample profile to look like the following:

$ cat .acluster/profiles.d/aws_profile_sample.yaml 
name: aws_profile_sample
node_id: ami-08faa660
node_type: m3.xlarge
num_nodes: 4
provider: aws_east
user: ubuntu
profiles:
  - notebook

Apparently the free version only gets you up to 4 nodes. I'm sufficiently averse to administrative tasks that I'm not bothering to get a license, so I'll just stick at 4.

Start cluster, install stuff

I create a cluster

$ acluster create test1 --profile aws_profile_sample
# lots of stuff happens, takes a few minutes

I install stuff with conda

$ acluster conda install ipython numpy pandas dill dask toolz futures

I install stuff without conda, notably my software on github

$ acluster cmd "sudo apt-get install -y git && pip install git+https://github.com/mrocklin/distributed.git && pip install git+https://github.com/blaze/dask.git --upgrade"

SSH into boxes and run my custom software

I opened four terminal windows and ran the following

mrocklin@notebook $ acluster ssh 0  # center node
     ubuntu@node0 $ dcenter
                  Start Center at node0:8787
                  
mrocklin@notebook $ acluster ssh 1  # worker node
     ubuntu@node1 $ dworker node0:8787
                  Start worker at node1:8787

mrocklin@notebook $ acluster ssh 2  # worker node
     ubuntu@node2 $ dworker node0:8787
                  Start worker at node2:8787

mrocklin@notebook $ acluster ssh 3  # worker node
     ubuntu@node3 $ dworker node0:8787
                  Start worker at node3:8787

I think that I could have used acluster cmd to do this somehow, but this wasn't too bad.

SSH one more time and open IPython

$ acluster ssh 0
$ ipython
In  [1]: from distributed import Executor
In  [2]: pool = Executor('node0:8787')

or use the Jupyter notebook

$ acluster start notebook
<opens browser>

Better ways

If I was doing this for real I would probably make distributed into a service that could be launched and managed by anaconda-cluster. I'll save that for another day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment