- See documentation at http://star.mit.edu/cluster/docs/latest/installation.html
- pip install StarCluster
- Follow directions at http://star.mit.edu/cluster/docs/latest/quickstart.html to create config template, SSH keypair, etc.
- Get 12-digit user ID for your AWS account (not your IAM username) from AWS dashboard.
- Specify a name for the SSH key that is unique to the user (e.g., "janedoe_ec2" instead of "mykey")
- In the config file, set NODE_IMAGE_ID to the HVM AMI key
- Set the NODE_INSTANCE_TYPE (e.g., c3.xlarge, etc.)
Start cluster and SSH into root node:
starcluster start mycluster starcluster sshmaster mycluster
The StarCluster AMI includes numpy, scipy, matplotlib, ipython, pip, unzip, git, libmpich2, mpich2.
Change directory to /home/sgeadmin so that any downloaded software goes here. Only files in /home can be seen by all nodes:
cd /home/sgeadmin
The mpi4py package included with the StarCluster AMI was compiled against the wrong MPI library (mpich instead of openmpi). To fix this, it needs to be uninstalled and explicitly compiled against openmpi:
pip uninstall -y mpi4py update-alternatives --set mpi /usr/lib/openmpi/include pip install mpi4py
The above steps (and any other custom configuration or dependency setup for your specific project) can be put into a script which is copied over to the cluster and then, run, as in the following:
starcluster put mycluster provision_cluster.sh /home/sgeadmin starcluster sshmaster mycluster '/home/sgeadmin/provision_cluster.sh'
It may be preferable for MPI jobs to allocate slots in a fill-up (assign all slots from a given machine before going to the next machine) rather than a round-robin fashion (assign one slot from each machine in turn). To check the allocation method, run and check the allocation_rule field:
qconf -sp orte
Parallel jobs can be submitted using SGE by:
qsub -b y -cwd -pe orte 24 mpirun ./mpi-executable arg1 arg2 [...]
-b y
specifies that the executable is a binary.-cwd
executes the job from the current working directory.-pe orte
specifies the name of the parallel environment and the number of nodes (24).
EC2 vCPUs are equivalent to hyperthreads, so if your workload needs full cores (e.g. FPU-heavy code) you should schedule half as many jobs per instance as there are vCPUs.
If you are allocating one CPU per worker,and your code involves OpenBLAS, you should disable OpenBLAS's automatic threading by exporting OMP_NUM_THREADS=1 in your environment.