Skip to content

Instantly share code, notes, and snippets.

@garnaat
Last active August 29, 2015 14:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garnaat/b8db4732732542041cd9 to your computer and use it in GitHub Desktop.
Save garnaat/b8db4732732542041cd9 to your computer and use it in GitHub Desktop.
Finding AWS resources across regions, services, and accounts

Finding AWS Resources Across Regions, Services, and Accounts

If you only have one account in AWS and you only use one service in one region, this article probably isn't for you. However, if you are like me and manage resources in many accounts, across multiple regions, and in many different services, stick around.

There are a lot of great tools to help you manage your AWS resources. There is the AWS Web Console, the AWSCLI, various language SDK's like boto, and a host of third-party tools. The biggest problem I have with most of these tools is that they limit your view of resources to a single region, a single account, and a single service at a time. For example, you have to login to the AWS Console with one set of credentials representing a single account. And once you are logged in, you have to select a single region. And then, finally, you drill into a particular service. The AWSCLI and the SDK's follow this same basic model.

But what if you want to look at resources across regions? Across accounts? Across services? Well, that's where skew comes in.

Skew

Skew is a Python library built on top of botocore. The main purpose of skew is to provide a flat, uniform address space for all of your AWS resources.

The name skew is a homonym for SKU (Stock Keeping Unit). SKU's are the numbers that show up on the bar codes of just about everything you purchase and that SKU number uniquely identifies the product in the vendor's inventory. When you make a purchase they scan the barcode containing the SKU and can instantly find the pricing data for the item.

Similary, skew uses a unique identifier for each one of your AWS resources and allows you to scan the SKU and quickly find the details for that resource. It also provides some powerful mechanisms to find sets of resources by allowing wildcarding and regular expressions within the SKU's.

ARN't You Glad You Are Reading This?

So, what do we use for a unique identifier for all of our AWS resources? Well, as it turns out, AWS has already solved that problem for us. Each resource in AWS can be identified by an Amazon Resource Name or ARN. The general form for ARN's are:

arn:aws:service:region:account:resource
arn:aws:service:region:account:resourcetype/resource
arn:aws:service:region:account:resourcetype:resource

So, the ARN for an EC2 instances might look like this:

arn:aws:ec2:us-west-2:123456789012:instance/i-12345678

This tells us the instance is in the us-west-2 region, running in the account identified by the account number 123456789012 and the instance has an instance ID of i-12345678.

Getting Started With Skew

The easiest way to install skew is via pip.

% pip install skew

Because skew is based on botocore, as is AWSCLI, it will use the same credentials as those tools. You need to make a small addition to your ~/.aws/config file to help skew map AWS account ID's to the profiles in the config file. Check the README for details on that.

Let's Find Some Stuff

Once we have skew installed and configured, we can use it to find resources based on their ARN's. For example, using the example ARN above:

>>> import skew
>>> arn = skew.scan('arn:aws:ec2:us-west-2:123456789012:instance/i-12345678')
>>> arn
arn:aws:ec2:us-west-2:123456789012:instance/i-12345678
>>>

Ok, that wasn't very exciting. How do I get at my actual resource in AWS? Well, the scan method returns an ARN object and this object supports the iterator pattern in Python. This makes sense since as we will see later this ARN can actually return a lot of objects, not just one. So if we want to get our object we can:

>>> instance = list(arn)[0]
>>> instance.id
'i-12345678'
>>> instance.data
{u'AmiLaunchIndex': 1,
 u'Architecture': 'x86_64',
 u'BlockDeviceMappings': [{u'DeviceName': '/dev/sda1',
   u'Ebs': {u'AttachTime': datetime.datetime(2014, 12, 14, 13, 48, tzinfo=tzutc()),
   u'DeleteOnTermination': True,
   u'Status': 'attached',
   u'VolumeId': 'vol-63276b7b'}}],
 u'ClientToken': '425f1a07-2e61-4089-a7dc-7344b302731e_us-east-1d_2',
 u'EbsOptimized': False,
 u'Hypervisor': 'xen',
 u'ImageId': 'ami-6227460a',
 u'InstanceId': 'i-12345678',
 u'InstanceType': 'c3.2xlarge',
 ...
 }
 >>>

Iterating on an ARN returns a list of Resource objects and each of these Resource objects represents one resource in AWS. Resource objects have a number of attributes like id and they also have an attribute called data that contains all of the data about that resource. This is the same information that would be returned by the AWSCLI or an SDK.

Wildcards And Regular Expressions

Finding a single resource in AWS is okay but one of the nice things about skew is that it allows you to quickly find lots of resources in AWS. And you don't have to worry about which region those resources are in or in which account they reside.

For example, let's say we want to find all EC2 instances running in all regions and in all of my accounts:

arn = skew.scan('arn:aws:ec2:*:*:instance/*')
for instance in arn:
    print(arn)

In that one little line of Python code, a lot of stuff is happening. Skew will iterate through all of the regions supported by the EC2 service and, in each region, will authenticate with each of the account profiles listed in your AWS config file. It will then find all EC2 instances and finally return the complete list of those instances as Resource objects.

In addition to wildcards, you can also use regular expressions as components in the ARN. For example:

arn = skew.scan('arn:aws:dynamodb:us-\.*:*:table/*')

This will find all DynamoDB tables in all US regions for all accounts.

Some Useful Examples

Here are some examples of things you can do quickly and easily with skew that would be difficult in most other tools.

Find all unattached EBS volumes across all regions and accounts and tally the size of wasted space.

import skew
 
total_size = 0
total_volumes = 0
 
for volume in skew.scan('arn:aws:ec2:*:*:volume/*'):
    if not volume.data['Attachments']:
        total_volumes += 1
        total_size += volume.data['Size']
        print('%s: %dGB' % (volume.arn, volume.data['Size']))
print('Total unattached volumes: %d' % total_volumes)
print('Total size (GB): %d' % total_size)

Audit all EC2 security groups to find CIDR rules that are not whitelisted.

import skew

# Add whitelisted CIDR blocks here, e.g. 192.168.1.1/32.
# Any addresses not in this list will be flagged.
whitelist = []
 
for secgrp in skew.scan('arn:aws:ec2:*:*:security-group/*'):
    for ipperms in secgrp.data['IpPermissions']:
        for ip in ipperms['IpRanges']:
            if ip['CidrIp'] not in whitelist:
                print('%s: %s is not whitelisted' % (sg.arn, ip['CidrIp']))

Find all EC2 instances that are not tagged in any way.

import skew
 
for instance in skew.scan('arn:aws:ec2:*:*:instance/*'):
    if not instance.tags:
        print('%s is untagged' % instance.arn)

Building ARN's Interactively

The ARN provides a great way to uniquely identify AWS resources but it doesn't exactly roll off the tongue. Skew provides some help for constructing ARN's interactively.

First, start off with a new ARN object.

>>> from skew.arn import ARN
>>> arn = ARN()
>>> arn
arn:aws:*:*:*:*
>>>

Each ARN object contains 6 components:

  • scheme - for now this will always be arn
  • provider - again, for now always aws
  • service - the Amazon Web Service
  • region - the AWS region
  • account - the ID of the AWS account
  • resource - the resource type and resource ID

All of these are available as attributes of the ARN object.

>>> arn.scheme
arn
>>> arn.provider
aws
>>> arn.service
*
>>> arn.region
*
>>> arn.account
*
>>> arn.resource
*
>>>

If you want to build up the ARN interactively, you can ask each of the components what choices are available.

>>> arn.service.choices
['autoscaling',
 'cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch',
 'codedeploy',
 ...
 'storagegateway',
 'sts',
 'support',
 'swf']
 >>>

You can also try out your regular expressions to make sure they return the results you expect.

>>> arn.service.match('cloud\.*')
['cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch']
>>>

To set the value of a particular component, use the pattern attribute.

>>> arn.service.pattern = 'cloud\.*'
>>> arn.service.matches
['cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch']
>>>

Once you have the ARN that you want, you can enumerate it like this:

>>> for resource in arn:
    <do something amazing>
>>>

Running Queries Against Returned Data

A recent feature of skew allows you to run queries against the resource data. This feature makes use of jmespath which is a really nice JSON query engine. It was originally written in Python for use on the AWSCLI but is now available in a number of other languages. If you have ever used the --query option of the AWSCLI, then you have used jmespath.

If you append a jmespath query to the end of the ARN (using a | as a separator) skew will send the data for each of the returned resources through the jmespath query and store the result in the filtered_data attribute of the resource object. The original data is still available as the data attribute. For example:

arn = skew.scan('arn:aws:ec2:*:*:instance/*|InstanceType'

Then each resource returned would have the instance type store in the filtered_data attribute of the Resource object. This is obviously a very simple example but jmespath is very powerful and the interactive query tool available on http://jmespath.org/ allows you to try your queries out beforehand to get exactly what you want.

CloudWatch Metrics

One other feature of skew is easy access to CloudWatch metrics for AWS resources. If we refer back to the very first interative session in the post, we can show how you would access those CloudWatch metrics for the instance.

>>> instance.metric_names
['CPUUtilization',
 'NetworkOut',
 'StatusCheckFailed',
 'StatusCheckFailed_System',
 'NetworkIn',
 'DiskWriteOps',
 'DiskReadBytes',
 'DiskReadOps',
 'StatusCheckFailed_Instance',
 'DiskWriteBytes']
>>> instance.get_metric_data('CPUUtilization')
[{u'Average': 0.134, u'Timestamp': '2014-12-13T14:04:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T13:54:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T14:09:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:34:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T14:19:00Z', u'Unit': 'Percent'},
 {u'Average': 0.068, u'Timestamp': '2014-12-13T13:44:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T14:14:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T13:29:00Z', u'Unit': 'Percent'},
 {u'Average': 0.132, u'Timestamp': '2014-12-13T13:59:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:49:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:39:00Z', u'Unit': 'Percent'}]
>>>

We can find the available CloudWatch metrics with the metric_names attribute and then we can retrieve the desired metric using the get_metric_data method. The README for skew contains a bit more information about accessing CloudWatch metrics.

Wrap Up

Skew is pretty new and is still changing a lot. It currently supports only a subset of available AWS resource types but more are being added all the time. If you manage a lot of AWS resources, I encourage you to give it a try. Feedback, as always, is very welcome as are pull requests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment