Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Last active January 23, 2019 00:43
Show Gist options
  • Save tomsing1/7c5fe1ec38f2d89b981f70ed339471dc to your computer and use it in GitHub Desktop.
Save tomsing1/7c5fe1ec38f2d89b981f70ed339471dc to your computer and use it in GitHub Desktop.
Introduction to provision HPCs on AWS with cfnCluster

HPC cluster deployment on AWS

cfnCluster on AWS Tutorial by AWS cnfCluster documentation

CfnCluster constructs an HPC environment with the “look and feel” of conventional HPC clusters but with the added benefit of being scalable:

  • Jobs are submitted to a queue
  • nodes spin up as needed
  • jobs are automatically launched
  • as nodes become idle, they are automatically shut down

Installation

[cfnCluster] is written in python. The source code is available on github and releases are available via pypi.

I recommend installing cnfCluster using conda conda enabling its use in different conda environments. (Once a conda environment is active, packages installed via pip will also be tracked.)

# switch to the root conda environment
source activate
pip install cfncluster

Creating the default cluster configuration file

Documentation

To create the default configuration execute the following steps:

cfncluster configure
  • Accept the defaults for the first three entries (Cluster template, AWS Access Key ID, AWS Secret Access Key ID. Pick suitable region, ssh keys, etc.
  • Choose the VPC ID ending in 84.
  • Choose the subnet ID ending in 67.

The config file will be generated as ~/.cfncluster/config and will look like this:

[aws]
aws_region_name = us-west-2

[cluster default]
vpc_settings = *** redacted ***
key_name = *** redacted ***

[vpc public]
master_subnet_id = *** redacted ***
vpc_id = *** redacted ***

[global]
update_check = true
sanity_check = true
cluster_template = default

Customizing the cluster configuration

For the full list of customization options, see the documentation.

Public IPs

When operating in a private network and public IPs are not needed, avoid creating (and paying) for them by adding the following line to the [vpc public] section of the config file (the public part is just the name you chose for the VPC during setup):

use_public_ips = false

Clusters

The configuration file can define one or more clusters for different types of jobs or workloads.

Documentation

Each cluster is defined in its own section, identified by the [cluster CLUSTERNAME] header (replace CLUSTERNAME with your own cluster name, eg bigmemory, smalljobs, etc).

See an example config file here

Here an abbreviated list of important options that can be specified:

  • compute_instance_type = t2.micro (default: t2.micro)
  • master_instance_type = t2.micro (default: t2.micro)
  • initial_queue_size = 0 (default: 2)
  • max_queue_size = 3 (default: 10)
  • scheduler = sge (default: sge; valid options are sge, openlava, torque, or slurm)
  • cluster_type = ondemand (default: ondemand, valid options are ondemand or spot)
  • custom_ami = NONE (default by region)
  • s3_read_write_resource = NONE (default: NONE, see here)
  • pre_install = NONE (default: NONE)
  • ephemeral_dir = /scratch (default: /scratch)
  • shared_dir = /shared (default: /shared, see here)
  • master_root_volume_size = 10 (default: 10)
  • compute_root_volume_size = 10 (default: 10)

For testing, specify a cluster called test with 0 < n <= 2 compute nodes by adding the following lines to the ~/.cfncluster/config file:

[cluster test]
initial_queue_size = 0
max_queue_size = 2

Launching a cluster

To launch your CfnCluster, enter the following at the command line prompt:

cfncluster create test

You can follow the progress of the deployment (which may take a while) in the AWS cloudformation console.

Creating an EBS Volume Snapshot for Cluster Reusability

It is common to install large frequently-used HPC applications to the shared drive /shared that resides on an Amazon EBS volume. For example, the bcbio-nextgen workflow could be an example.

By creating a snapshot of this EBS volume, you can deploy the same pre-configured software on future clusters.

To create a snapshot via the AWS console, navigate to the master instance in the AWS EC2 console and scroll to the block devices section. Look for /dev/sdb, click on the volume id (vol-xxxxxxxx) to bring up the volume dashboard and create a snapshot of the volume.

The snapshot id (e.g. snap-0896bea72d42813f3) can be specified in the [ebs] section of the cluster configuration file.

Defining cluster-specific EBS volumes

Different [ebs] sections can be specified for the different clusters defined in the same configuration.

The following example specifies a snapshot specifically for the test cluster.

[cluster test]
initial_queue_size = 0
max_queue_size = 2
ebs_settings = testebs

[ebs testebs]
ebs_snapshot_id = snap-XXXXXXXXXXXXXXXX (your EBS snapshot ID)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment