cfnCluster on AWS Tutorial by AWS cnfCluster documentation
CfnCluster constructs an HPC environment with the “look and feel” of conventional HPC clusters but with the added benefit of being scalable:
- Jobs are submitted to a queue
- nodes spin up as needed
- jobs are automatically launched
- as nodes become idle, they are automatically shut down
[cfnCluster] is written in python. The source code is available on github and releases are available via pypi.
I recommend installing cnfCluster
using conda
conda
enabling its use in different conda environments. (Once a conda environment is active,
packages installed via pip will also be tracked.)
# switch to the root conda environment
source activate
pip install cfncluster
To create the default configuration execute the following steps:
cfncluster configure
- Accept the defaults for the first three entries (Cluster template, AWS Access Key ID, AWS Secret Access Key ID. Pick suitable region, ssh keys, etc.
- Choose the VPC ID ending in
84
. - Choose the subnet ID ending in
67
.
The config file will be generated as ~/.cfncluster/config
and will look like this:
[aws]
aws_region_name = us-west-2
[cluster default]
vpc_settings = *** redacted ***
key_name = *** redacted ***
[vpc public]
master_subnet_id = *** redacted ***
vpc_id = *** redacted ***
[global]
update_check = true
sanity_check = true
cluster_template = default
For the full list of customization options, see the documentation.
When operating in a private network and public IPs are not needed, avoid creating (and paying) for them by adding the following line to the [vpc public]
section of the config file (the public
part is just the name you chose for the VPC during setup):
use_public_ips = false
The configuration file can define one or more clusters for different types of jobs or workloads.
Each cluster is defined in its own section, identified by the [cluster CLUSTERNAME]
header (replace CLUSTERNAME with your own cluster name, eg bigmemory, smalljobs, etc).
See an example config file here
Here an abbreviated list of important options that can be specified:
compute_instance_type
= t2.micro (default: t2.micro)master_instance_type
= t2.micro (default: t2.micro)initial_queue_size
= 0 (default: 2)max_queue_size
= 3 (default: 10)scheduler
= sge (default: sge; valid options are sge, openlava, torque, or slurm)cluster_type
= ondemand (default: ondemand, valid options are ondemand or spot)custom_ami
= NONE (default by region)s3_read_write_resource
= NONE (default: NONE, see here)pre_install
= NONE (default: NONE)ephemeral_dir
= /scratch (default: /scratch)shared_dir
= /shared (default: /shared, see here)master_root_volume_size
= 10 (default: 10)compute_root_volume_size
= 10 (default: 10)
For testing, specify a cluster called test
with 0 < n <= 2 compute nodes by adding the following lines to
the ~/.cfncluster/config
file:
[cluster test]
initial_queue_size = 0
max_queue_size = 2
To launch your CfnCluster, enter the following at the command line prompt:
cfncluster create test
You can follow the progress of the deployment (which may take a while) in the AWS cloudformation console.
It is common to install large frequently-used HPC applications to the shared drive /shared that resides on an Amazon EBS volume. For example, the bcbio-nextgen workflow could be an example.
By creating a snapshot of this EBS volume, you can deploy the same pre-configured software on future clusters.
To create a snapshot via the AWS console, navigate to the master
instance in the
AWS EC2 console
and scroll to the block devices
section. Look for /dev/sdb
, click on the volume id (vol-xxxxxxxx) to bring up the volume dashboard and create a snapshot of the volume.
The snapshot id (e.g. snap-0896bea72d42813f3) can be specified in the [ebs]
section of the cluster configuration file.
Different [ebs] sections can be specified for the different clusters defined in the same configuration.
The following example specifies a snapshot specifically for the test cluster.
[cluster test]
initial_queue_size = 0
max_queue_size = 2
ebs_settings = testebs
[ebs testebs]
ebs_snapshot_id = snap-XXXXXXXXXXXXXXXX (your EBS snapshot ID)