Skip to content

Instantly share code, notes, and snippets.

@dpwrussell
Created August 16, 2018 17:32
Show Gist options
  • Save dpwrussell/a907ba252efc65d06ebdfc1dd64c11fb to your computer and use it in GitHub Desktop.
Save dpwrussell/a907ba252efc65d06ebdfc1dd64c11fb to your computer and use it in GitHub Desktop.
Create/Update/Delete Batch Cluster and Job Queue

Example Usage:

Create/Update/Delete of a batch cluster and job queue requires a YAML configuration file of the following shape.

Note: These values are not the actual values for our account.

Region: us-east-1
StackPrefix: stack-name
Stage: prod
ProjectTag: aws-billing-tag
# Default Security Group
SecurityGroup: sg-00000001
# Use existing public subnets
Subnets:
  - subnet-00000001
  - subnet-00000002
# Batch compute environments
BatchClusterEC2MinCpus: 0
# Allow up to 512 CPUs in EC2
BatchClusterEC2MaxCpus: 512
BatchClusterEC2DesiredCpus: 0
BatchClusterSpotMinCpus: 0
# Allow up to 1024 CPUs in Spot
BatchClusterSpotMaxCpus: 1024
BatchClusterSpotDesiredCpus: 0
# Accept a spot price up-to 75% of EC2 cost
BatchClusterSpotBidPercentage: 75
BatchServiceRole: arn:aws:iam::123456789012:role/service-role/AWSBatchServiceRole
EcsInstanceProfile: arn:aws:iam::123456789012:instance-profile/ecsInstanceRole
SpotFleetRole: arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-role
# Create
python batch_cluster.py config.yml create

# Update
python batch_cluster.py config.yml update

# Delete
python batch_cluster.py config.yml delete
import sys
import argparse
from ruamel.yaml import YAML
import boto3
yaml = YAML()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('configfile', type=argparse.FileType('r'),
help='YAML configuration filename')
parser.add_argument('operation',
choices=['create', 'update', 'delete', 'validate'],
help='Operation')
args = parser.parse_args()
try:
config = yaml.load(args.configfile)
except Exception as e:
print('Error reading configuration YAML: {}'.format(e))
sys.exit(1)
# Validate the number of subnets
if len(config['Subnets']) <= 1:
print('More than one subnet required')
sys.exit(1)
region = config['Region']
prefix = config['StackPrefix']
stage = config['Stage']
name = '{}-batch-cluster'.format(prefix)
project_tag = config['ProjectTag']
batch_cluster_ec2_min_cpus = config['BatchClusterEC2MinCpus']
batch_cluster_ec2_max_cpus = config['BatchClusterEC2MaxCpus']
batch_cluster_ec2_desired_cpus = config['BatchClusterEC2DesiredCpus']
batch_cluster_spot_min_cpus = config['BatchClusterSpotMinCpus']
batch_cluster_spot_max_cpus = config['BatchClusterSpotMaxCpus']
batch_cluster_spot_desired_cpus = config['BatchClusterSpotDesiredCpus']
batch_cluster_spot_bid_percentage = config['BatchClusterSpotBidPercentage']
subnets = ','.join(config['Subnets'])
batch_service_role = config['BatchServiceRole']
ecs_instance_profile = config['EcsInstanceProfile']
spot_fleet_role = config['SpotFleetRole']
security_group = config['SecurityGroup']
with open('main.yml', 'r') as f:
template_body = f.read()
cf = boto3.client('cloudformation', region_name=region)
if args.operation in ['create', 'update']:
if args.operation == 'create':
cf_method = cf.create_stack
elif args.operation == 'update':
cf_method = cf.update_stack
response = cf_method(
StackName=name,
TemplateBody=template_body,
Parameters=[
{
'ParameterKey': 'StackPrefix',
'ParameterValue': prefix
},
{
'ParameterKey': 'Stage',
'ParameterValue': stage
},
{
'ParameterKey': 'ProjectTag',
'ParameterValue': project_tag
},
{
'ParameterKey': 'BatchClusterEC2MinCpus',
'ParameterValue': str(batch_cluster_ec2_min_cpus)
},
{
'ParameterKey': 'BatchClusterEC2MaxCpus',
'ParameterValue': str(batch_cluster_ec2_max_cpus)
},
{
'ParameterKey': 'BatchClusterEC2DesiredCpus',
'ParameterValue': str(batch_cluster_ec2_desired_cpus)
},
{
'ParameterKey': 'BatchClusterSpotMinCpus',
'ParameterValue': str(batch_cluster_spot_min_cpus)
},
{
'ParameterKey': 'BatchClusterSpotMaxCpus',
'ParameterValue': str(batch_cluster_spot_max_cpus)
},
{
'ParameterKey': 'BatchClusterSpotDesiredCpus',
'ParameterValue': str(batch_cluster_spot_desired_cpus)
},
{
'ParameterKey': 'BatchClusterSpotBidPercentage',
'ParameterValue': str(batch_cluster_spot_bid_percentage)
},
{
'ParameterKey': 'Subnets',
'ParameterValue': subnets
},
{
'ParameterKey': 'BatchServiceRole',
'ParameterValue': batch_service_role
},
{
'ParameterKey': 'EcsInstanceProfile',
'ParameterValue': ecs_instance_profile
},
{
'ParameterKey': 'SpotFleetRole',
'ParameterValue': spot_fleet_role
},
{
'ParameterKey': 'SecurityGroup',
'ParameterValue': security_group
}
],
Capabilities=[
'CAPABILITY_NAMED_IAM',
],
Tags=[{
'Key': 'project',
'Value': project_tag
}]
)
elif args.operation == 'delete':
cf.delete_stack(StackName=name)
else:
print('Method not implemented')
sys.exit(1)
print('Stack {} completed: {}'.format(args.operation, response['StackId']))
if __name__ == "__main__":
main()
Parameters:
StackPrefix:
Type: String
Description: Unique prefix used in related stacks for use by export
Stage:
Type: String
Description: Deployment stage
ProjectTag:
Type: String
Description: Project tag
BatchClusterEC2MinCpus:
Type: String
Description: Minimum EC2 cluster size
BatchClusterEC2MaxCpus:
Type: String
Description: Maximum EC2 cluster size
BatchClusterEC2DesiredCpus:
Type: String
Description: Desired EC2 cluster size
BatchClusterSpotMinCpus:
Type: String
Description: Minimum Spot cluster size
BatchClusterSpotMaxCpus:
Type: String
Description: Maximum Spot cluster size
BatchClusterSpotDesiredCpus:
Type: String
Description: Desired Spot cluster size
BatchClusterSpotBidPercentage:
Type: String
Description: Spot cluster maximum bid percentage
Subnets:
Type: List<AWS::EC2::Subnet::Id>
Description: Public subnet IDs. Must be exactly two!
BatchServiceRole:
Type: String
Description: Batch service role ARN
EcsInstanceProfile:
Type: String
Description: ECS instance profile ARN
SpotFleetRole:
Type: String
Description: Spot fleet role ARN
SecurityGroup:
Type: String
Description: Default VPC Security Group ID
Resources:
# EC2 Compute Environment
Ec2Env:
Type: AWS::Batch::ComputeEnvironment
Properties:
Type: MANAGED
ComputeEnvironmentName: !Sub ${StackPrefix}-${Stage}-ec2
ServiceRole: !Ref BatchServiceRole
ComputeResources:
Type: EC2
MinvCpus: !Ref BatchClusterEC2MinCpus
MaxvCpus: !Ref BatchClusterEC2MaxCpus
DesiredvCpus: !Ref BatchClusterEC2DesiredCpus
InstanceRole: !Ref EcsInstanceProfile
InstanceTypes:
- optimal
SecurityGroupIds:
- !Ref SecurityGroup
Subnets: !Ref Subnets
Tags:
project: !Ref ProjectTag
State: ENABLED
# Spot Compute Environment
SpotEnv:
Type: AWS::Batch::ComputeEnvironment
Properties:
Type: MANAGED
ComputeEnvironmentName: !Sub ${StackPrefix}-${Stage}-spot
ServiceRole: !Ref BatchServiceRole
ComputeResources:
Type: SPOT
MinvCpus: !Ref BatchClusterSpotMinCpus
MaxvCpus: !Ref BatchClusterSpotMaxCpus
DesiredvCpus: !Ref BatchClusterSpotDesiredCpus
InstanceRole: !Ref EcsInstanceProfile
InstanceTypes:
- optimal
SecurityGroupIds:
- !Ref SecurityGroup
Subnets: !Ref Subnets
Tags:
project: !Ref ProjectTag
SpotIamFleetRole: !Ref SpotFleetRole
BidPercentage: !Ref BatchClusterSpotBidPercentage
State: ENABLED
# Job Queue
JobQueue:
Type: AWS::Batch::JobQueue
Properties:
JobQueueName: !Sub ${StackPrefix}-${Stage}-queue
ComputeEnvironmentOrder:
- ComputeEnvironment: !Ref SpotEnv
Order: 1
- ComputeEnvironment: !Ref Ec2Env
Order: 2
Priority: 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment