Disclaimer: I'm dogfooding anaconda-cluster,
a non-free Continuum product. I only use free parts of acluster
(up to four nodes)
and I had a good experience but still, you should probably take anything I say with salt.
I signed up for AWS and anaconda.org.
From AWS I had to collect:
- AWS Key
- AWS Secret
- SSH key
From anaconda.org I had to collect a username and password.
To install anaconda cluster I first needed to login:
$ conda install -y anaconda-client
$ anaconda login # provide username and password
$ conda install anaconda-cluster -c anaconda-cluster
I ran acluster once to populate the ~/.acluster
directory:
$ acluster
Then I added my credentials to ~/.acluster/providers.yaml
, notably the following fields
- private_key:
- secret_id:
- secret_key:
- keyname:
And tweaked the sample profile to look like the following:
$ cat .acluster/profiles.d/aws_profile_sample.yaml
name: aws_profile_sample
node_id: ami-08faa660
node_type: m3.xlarge
num_nodes: 4
provider: aws_east
user: ubuntu
profiles:
- notebook
Apparently the free version only gets you up to 4 nodes. I'm sufficiently averse to administrative tasks that I'm not bothering to get a license, so I'll just stick at 4.
I create a cluster
$ acluster create test1 --profile aws_profile_sample
# lots of stuff happens, takes a few minutes
I install stuff with conda
$ acluster conda install ipython numpy pandas dill dask toolz futures
I install stuff without conda, notably my software on github
$ acluster cmd "sudo apt-get install -y git && pip install git+https://github.com/mrocklin/distributed.git && pip install git+https://github.com/blaze/dask.git --upgrade"
I opened four terminal windows and ran the following
mrocklin@notebook $ acluster ssh 0 # center node
ubuntu@node0 $ dcenter
Start Center at node0:8787
mrocklin@notebook $ acluster ssh 1 # worker node
ubuntu@node1 $ dworker node0:8787
Start worker at node1:8787
mrocklin@notebook $ acluster ssh 2 # worker node
ubuntu@node2 $ dworker node0:8787
Start worker at node2:8787
mrocklin@notebook $ acluster ssh 3 # worker node
ubuntu@node3 $ dworker node0:8787
Start worker at node3:8787
I think that I could have used acluster cmd
to do this somehow, but this wasn't too bad.
$ acluster ssh 0
$ ipython
In [1]: from distributed import Executor
In [2]: pool = Executor('node0:8787')
or use the Jupyter notebook
$ acluster start notebook
<opens browser>
If I was doing this for real I would probably make distributed into a service that
could be launched and managed by anaconda-cluster
. I'll save that for another day.