tomsing1/cfnCluster_introdution.md

## cfnCluster_introdution.md

      
    Raw
  

              cfnCluster_introdution.md
            
          
    HPC cluster deployment on AWS

cfnCluster on AWS
Tutorial by AWS
cnfCluster documentation
CfnCluster constructs an HPC environment with the “look and feel” of conventional HPC
clusters but with the added benefit of being scalable:

Jobs are submitted to a queue
nodes spin up as needed
jobs are automatically launched
as nodes become idle, they are automatically shut down

Installation

[cfnCluster] is written in python. The source code is available on
github
and releases are available via
pypi.
I recommend installing cnfCluster using conda
conda
enabling its use in different conda environments. (Once a conda environment is active,
packages installed via pip will also be tracked.)
# switch to the root conda environment
source activate
pip install cfncluster
Creating the default cluster configuration file

Documentation
To create the default configuration execute the following steps:
cfncluster configure

Accept the defaults for the first three entries (Cluster template, AWS Access Key ID,
AWS Secret Access Key ID. Pick suitable region, ssh keys, etc.
Choose the VPC ID ending in 84.
Choose the subnet ID ending in 67.

The config file will be generated as ~/.cfncluster/config and will look like this:
[aws]
aws_region_name = us-west-2

[cluster default]
vpc_settings = *** redacted ***
key_name = *** redacted ***

[vpc public]
master_subnet_id = *** redacted ***
vpc_id = *** redacted ***

[global]
update_check = true
sanity_check = true
cluster_template = default

Customizing the cluster configuration

For the full list of customization options, see the
documentation.
Public IPs

When operating in a private network and public IPs are not needed, avoid creating (and paying) for them by adding the following line to the [vpc public] section of the config file (the public part is just the name you chose for the VPC during setup):
use_public_ips = false

Clusters

The configuration file can define one or more clusters for different types of jobs
or workloads.
Documentation
Each cluster is defined in its own section, identified by the [cluster CLUSTERNAME]
header (replace CLUSTERNAME with your own cluster name, eg bigmemory, smalljobs, etc).
See an example config file
here
Here an abbreviated list of important
options
that can be specified:

compute_instance_type = t2.micro (default: t2.micro)
master_instance_type = t2.micro (default: t2.micro)
initial_queue_size = 0 (default: 2)
max_queue_size = 3 (default: 10)
scheduler = sge (default: sge; valid options are sge, openlava, torque, or slurm)
cluster_type = ondemand (default: ondemand, valid options are ondemand or spot)
custom_ami = NONE (default by region)
s3_read_write_resource = NONE (default: NONE, see here)
pre_install = NONE (default: NONE)
ephemeral_dir = /scratch (default: /scratch)
shared_dir = /shared (default: /shared, see here)
master_root_volume_size = 10 (default: 10)
compute_root_volume_size = 10 (default: 10)

For testing, specify a cluster called test with 0 < n <= 2 compute nodes by adding the following lines to
the ~/.cfncluster/config file:
[cluster test]
initial_queue_size = 0
max_queue_size = 2

Launching a cluster

To launch your CfnCluster, enter the following at the command line prompt:
cfncluster create test

You can follow the progress of the deployment (which may take a while) in the
AWS cloudformation console.
Creating an EBS Volume Snapshot for Cluster Reusability

It is common to install large frequently-used HPC applications to the shared
drive /shared that resides on an Amazon EBS volume. For example, the
bcbio-nextgen
workflow could be an example.
By creating a snapshot of this EBS volume, you can deploy the same pre-configured software on future clusters.
To create a snapshot via the AWS console, navigate to the master instance in the
AWS EC2 console
and scroll to the block devices section. Look for /dev/sdb, click on the volume id (vol-xxxxxxxx) to bring up the volume dashboard and create a snapshot of the volume.
The snapshot id (e.g. snap-0896bea72d42813f3) can be specified in the [ebs] section of the cluster configuration file.
Defining cluster-specific EBS volumes

Different [ebs] sections can be specified for the different clusters defined in the same configuration.
The following example specifies a snapshot specifically for the test cluster.
[cluster test]
initial_queue_size = 0
max_queue_size = 2
ebs_settings = testebs

[ebs testebs]
ebs_snapshot_id = snap-XXXXXXXXXXXXXXXX (your EBS snapshot ID)