Skip to content

Instantly share code, notes, and snippets.

@eddking
Created January 22, 2016 20:14
Show Gist options
  • Save eddking/e365dff33ef4b889bf15 to your computer and use it in GitHub Desktop.
Save eddking/e365dff33ef4b889bf15 to your computer and use it in GitHub Desktop.
Amazon Machine Image Automation

Image Automation

Images are built using packer. Packer takes care of booting a source image, bootstrapping it, running your configuration management and then shutting down and creating an ami. It can run multiple builds in parallel

To make image building super simple, there is a python script build_image.py which simply takes the name of an image to build, the various images are configured in images.yml. The build_image.py script will find the appropriate source ami, generate the configuration for packer, and then run packer.

Installation

  • brew install packer
  • brew install python
  • pip install docopt
  • pip install boto
  • pip install PyYAML

for help using the build_image.py script, just pass the help option build_image.py --help

Image Configuration

The images.yml file defines images and how they are built. Each image is the result of running an ansible playbook,

Sources

First a collection of source images is queried from a source Available source types: ubuntu-api dependency image-search

ubuntu-api

image-search

dependency

  • this searches for an AMI built by us with the name given in the depends_on parameter
  • the filters follow the same format as the image-search
  • where there are multiple versions of the same image, the one with the latest timestamp is returned

Outputs

Then the results of the search are matched to outputs. Each combination of outputs will result in a seperate image. so if you have config like this:

  output:
    virtualization_type: ['hvm', 'paravirtual']
    architecture:  ['x86_64']
    region: ['eu-west-1', 'sa-east-1', 'us-west-1', 'us-west-2']

Then you would end up with 8 images (2 x 1 x 4)

Playbook

Then packer will build the images, running the given playbook. Note that packer will run ansible in local mode, in order to do this, the zoo repo will be checked out on the remote host, so you must have committed and pushed any changes you want to take effect. To make this easier, the build script optionally takes a branch name, so you can test out changes on a branch before pushing to master. e.g. build_image.py base test-branch

If you want to cancel the image builds, you can interrupt packer, and it will clean up the aws resources it was using for the build, if you interrupt it a second time, then it will force quit and the source aws instance / security group / ec2 key might not be removed. You should clean up if this happens.

Debugging

Sometimes aws instances take ages to boot, sometimes aws instances never get out of the pending state, generally you just have to try again :(

If this is happending consistenly then you can try the --debug flag e.g. build_image.py base --debug, This will run packer in debug mode, It will prompt you at each stage of the build and will give you the connection details and a ssh key you can use to get into the instance, you should ssh with the ubuntu username.

I've found some times that builds magically work when you add the debug flag, this would indicate a race condition with packer. god knows.

#!/usr/bin/env python
"""Usage: build_image.py <image-name> [<branch>]
[--except <exclude-builds> | --only <include-builds>]
[--parallel <parallel>]
[--debug]
[--help]
This command should be run from the root of the ansible directory
You can configure images to be built in the file 'images.yml'
--help show this
--debug run packer in debug mode
--except exclude some builds e.g. hvm,pv
--only run only some builds e.g. hvm,pv
--parallel true or false, run builds in parallel
"""
import subprocess
import os
import sys
import itertools
import yaml
import json
import copy
import tempfile
from distutils.spawn import find_executable
from images import get_aws_images, get_ubuntu_images
from docopt import docopt
def _dependency(config):
filters = config.get('filters', {})
filters['tag:Name'] = config['depends_on']
config['filters'] = filters
return get_aws_images(config)
def main():
args = docopt(__doc__)
image_name = args.get('<image-name>')
config = yaml.load(file('images/images.yml', 'r'))
config = config[image_name]
source_type = config['source']['type']
branch = args.get('<branch>') or 'master'
func = {
'ubuntu-api': get_ubuntu_images,
'dependency': _dependency,
'image-search': get_aws_images
}[source_type]
images = func(config['source'])
output = config['output']
grouped_images = {}
for image in images:
key = '/'.join([
image['region'],
image['virtualization_type'],
image['architecture'],
])
grouped_images[key] = image
def generate_builders():
for k in itertools.product(output['region'], output['virtualization_type'], output['architecture']):
key = '/'.join(k)
image = grouped_images.get(key, None)
if not image:
sys.stderr.write('No suitable source image found for ' + str(k))
sys.exit(1)
yield {
"name": '-'.join(k),
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": image['region'],
"source_ami": image['id'],
"instance_type": config.get("instance_type", "c3.large"),
"ssh_username": "ubuntu",
"ami_name": '-'.join([image_name, k[1], k[2], '{{timestamp}}']),
"tags": {
"Name": image_name,
"branch": branch,
"timestamp": "{{timestamp}}",
"verification": "pending"
}
}
builders = list(generate_builders())
template = json.load(file('images/image_template.json'))
template['builders'] = builders
packer = find_executable('packer')
if packer is None:
sys.stderr.write('packer not found, please make sure the packer executable is on your path\n')
sys.stderr.write('see http://www.packer.io for more information\n')
sys.exit(1)
for builder in builders:
print "".join([builder["name"].ljust(30), "Image Id: ", builder["source_ami"]])
#pass the path to packer executable as the first arg
#since this is like sys.argv[0]
build = [packer, 'build']
playbook = config['playbook']
options = [
'-var', '='.join(['playbook', playbook]),
'-var', '='.join(['branch', branch])]
if args.get('--debug'):
options.append('-debug')
include = args.get('<include-builds>')
if include:
options.extend(['-only', include])
exclude = args.get('<exclude-builds>')
if exclude:
options.extend(['-exclude', exclude])
parallel = args.get('<parallel>')
if parallel:
options.append('-parallel='+parallel)
temp_file = tempfile.NamedTemporaryFile(delete=False)
temp_file.write(json.dumps(template))
temp_file.close()
command = build + options + [temp_file.name]
print " ".join(command) #Print the command for reference
#This call to the OS replaces this process with the spawned one, so we dont have to care about
#managing the subprocess or passing signals. It all just works. magic
os.execv(packer, command)
if __name__ == '__main__':
main()
{
"variables": {
"playbook": null,
"branch": null,
"aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
"aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}"
},
"builders": [],
"provisioners": [{
"type": "file",
"source": "images/ssh_config",
"destination": "/home/ubuntu/.ssh/config"
},{
"type": "file",
"source": "aws_image_creator/assets/git/id_rsa",
"destination": "/home/ubuntu/.ssh/id_rsa"
},{
"type": "shell",
"inline": [
"set -e",
"export AWS_ACCESS_KEY_ID='{{user `aws_access_key`}}'",
"export AWS_SECRET_ACCESS_KEY='{{user `aws_secret_key`}}'",
"sudo apt-get update",
"sudo apt-get update",
"sudo apt-get -y install python-dev python-pip git",
"sudo pip install ansible",
"sudo mkdir -p /tmp/ansible",
"sudo chown -R ubuntu:ubuntu /tmp/ansible",
"sudo chmod 600 /home/ubuntu/.ssh/*",
"git clone git@github.com:academia-edu/academia-zoo.git /tmp/ansible",
"cd /tmp/ansible",
"git checkout {{user `branch`}}",
"sudo chown -R root:root /tmp/ansible",
"ansible-playbook playbooks/{{user `playbook`}} -i 'inventory/local' -s",
"sudo rm -rf /tmp/ansible"
]
}]
}
import traceback
import itertools
import multiprocessing as multi
import sys
import urllib2
import json
from boto.ec2 import connect_to_region
from collections import defaultdict
all_regions = [
"ap-northeast-1",
"ap-southeast-1",
"ap-southeast-2",
"eu-central-1",
"eu-west-1",
"sa-east-1",
"us-east-1",
"us-west-1",
"us-west-2"
]
def get_aws_images(config):
regions = config.get('regions', all_regions)
print str(config)
pool = multi.Pool(processes=len(regions))
try:
results = pool.map_async(_query_with_interrupt, [[config, region] for region in regions]).get(999)
pool.terminate()
except:
traceback.print_exc()
pool.terminate()
sys.exit(1)
return map(_aws_image_to_dict, itertools.chain.from_iterable(results))
def _query_with_interrupt(args):
try:
return _query_aws_images(args)
except KeyboardInterrupt:
raise Exception('Interrupted!')
def _query_aws_images(args):
[config, region] = args
conn = connect_to_region(region)
images = {}
#group by a key built with these values & find the most recent image in each
key_parts = ['virtualization_type', 'architecture', 'root_device_type']
kwargs = {
'filters': config.get('filters') or {}
}
if config.has_key('owners'):
kwargs['owners'] = config.get('owners')
if config.has_key('image_ids'):
kwargs['image_ids'] = config.get('image_ids')
print kwargs
for i in conn.get_all_images(**kwargs):
timestamp = int(i.tags.get('timestamp', 0))
key = '/'.join([getattr(i, v) for v in key_parts])
if not images.has_key(key):
images[key] = i
elif timestamp > int(images.get(key).tags.get('timestamp', 0)):
images[key] = i
return images.values()
def _aws_image_to_dict(image):
return {
"region": image.region.name,
"name": image.name,
"architecture": image.architecture,
"virtualization_type": image.virtualization_type,
"root_device_type": image.root_device_type,
"id": image.id,
}
def get_ubuntu_images(config):
url = "https://cloud-images.ubuntu.com/query/{}/server/released.current.txt".format(config['release_name'])
response = urllib2.urlopen(url).read()
def parse_images():
for line in response.split('\n'):
fields = line.split('\t')
if len(fields) < 10:
continue
yield {
"id": fields[7],
"root_device_type": fields[4],
"architecture": fields[5],
"region": fields[6],
"virtualization_type": fields[10],
}
data = list(parse_images())
filters = config.get('filters', {})
def filter_images(images):
for image in images:
match = True
for key, value in image.iteritems():
if filters.has_key(key) and str(filters[key]) != str(image[key]):
match = False
if match:
yield image
images = list(filter_images(data))
#The fields from the ubuntu api are inconsistent with those from aws. so re-query the data from aws
ids = map(lambda i: i["id"], images)
regions = set(map(lambda i: i["region"], images))
images_by_region = defaultdict(list)
for image in images:
images_by_region[image["region"]].append(image)
def _aws_images():
for region, images in images_by_region.iteritems():
conn = connect_to_region(region)
aws_images = conn.get_all_images(image_ids=map(lambda i: i["id"], images))
for i in aws_images:
yield _aws_image_to_dict(i)
return list(_aws_images())
def main():
print json.dumps(get_ubuntu_images({
'release_name': 'trusty',
'root_device_type': 'ebs-ssd'
}))
#print json.dumps(get_aws_images({
#'filters': {
#'tag:Name': 'Varnish'
#}
#}))
if __name__ == '__main__':
main()
base:
playbook: apt.yml
source:
type: ubuntu-api
release_name: trusty
filters:
root_device_type: ebs
architecture: amd64
output:
virtualization_type: ['hvm']
architecture: ['x86_64']
region: ['us-east-1', 'us-west-1']
test:
playbook: update-ansible.yml
source:
type: dependency
depends_on: base
output:
virtualization_type: ['hvm']
architecture: ['x86_64']
region: ['us-east-1']
Host *
StrictHostKeyChecking no
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment