If you only have one account in AWS and you only use one service in one region, this article probably isn't for you. However, if you are like me and manage resources in many accounts, across multiple regions, and in many different services, stick around.
There are a lot of great tools to help you manage your AWS resources. There is the AWS Web Console, the AWSCLI, various language SDK's like boto, and a host of third-party tools. The biggest problem I have with most of these tools is that they limit your view of resources to a single region, a single account, and a single service at a time. For example, you have to login to the AWS Console with one set of credentials representing a single account. And once you are logged in, you have to select a single region. And then, finally, you drill into a particular service. The AWSCLI and the SDK's follow this same basic model.
But what if you want to look at resources across regions? Across accounts?
Across services? Well, that's where skew
comes in.
Skew is a Python library built on top of
botocore. The main purpose of skew
is to provide a flat, uniform address space for all of your AWS resources.
The name skew
is a homonym for SKU (Stock Keeping Unit). SKU's are the
numbers that show up on the bar codes of just about everything you purchase and
that SKU number uniquely identifies the product in the vendor's inventory. When
you make a purchase they scan the barcode containing the SKU and can instantly
find the pricing data for the item.
Similary, skew
uses a unique identifier for each one of your
AWS resources and allows you to scan the SKU and quickly find the details
for that resource. It also provides some powerful mechanisms to find sets
of resources by allowing wildcarding and regular expressions within the SKU's.
So, what do we use for a unique identifier for all of our AWS resources? Well, as it turns out, AWS has already solved that problem for us. Each resource in AWS can be identified by an Amazon Resource Name or ARN. The general form for ARN's are:
arn:aws:service:region:account:resource
arn:aws:service:region:account:resourcetype/resource
arn:aws:service:region:account:resourcetype:resource
So, the ARN for an EC2 instances might look like this:
arn:aws:ec2:us-west-2:123456789012:instance/i-12345678
This tells us the instance is in the us-west-2
region, running in the
account identified by the account number 123456789012
and the instance
has an instance ID of i-12345678
.
The easiest way to install skew
is via pip
.
% pip install skew
Because skew
is based on botocore, as is AWSCLI, it will use the same
credentials as those tools. You need to make a small addition to your
~/.aws/config
file to help skew
map AWS account ID's to the profiles in the config file. Check the
README for
details on that.
Once we have skew
installed and configured, we can use it to find
resources based on their ARN's. For example, using the example ARN
above:
>>> import skew
>>> arn = skew.scan('arn:aws:ec2:us-west-2:123456789012:instance/i-12345678')
>>> arn
arn:aws:ec2:us-west-2:123456789012:instance/i-12345678
>>>
Ok, that wasn't very exciting. How do I get at my actual resource in AWS?
Well, the scan
method returns an ARN
object and this object supports
the iterator pattern in Python. This makes sense since as we will see
later this ARN can actually return a lot of objects, not just one. So if
we want to get our object we can:
>>> instance = list(arn)[0]
>>> instance.id
'i-12345678'
>>> instance.data
{u'AmiLaunchIndex': 1,
u'Architecture': 'x86_64',
u'BlockDeviceMappings': [{u'DeviceName': '/dev/sda1',
u'Ebs': {u'AttachTime': datetime.datetime(2014, 12, 14, 13, 48, tzinfo=tzutc()),
u'DeleteOnTermination': True,
u'Status': 'attached',
u'VolumeId': 'vol-63276b7b'}}],
u'ClientToken': '425f1a07-2e61-4089-a7dc-7344b302731e_us-east-1d_2',
u'EbsOptimized': False,
u'Hypervisor': 'xen',
u'ImageId': 'ami-6227460a',
u'InstanceId': 'i-12345678',
u'InstanceType': 'c3.2xlarge',
...
}
>>>
Iterating on an ARN returns a list of Resource
objects and each of these
Resource
objects represents one resource in AWS. Resource
objects
have a number of attributes like id
and they also have an attribute
called data
that contains all of the data about that resource. This is
the same information that would be returned by the AWSCLI or an SDK.
Finding a single resource in AWS is okay but one of the nice things about skew is that it allows you to quickly find lots of resources in AWS. And you don't have to worry about which region those resources are in or in which account they reside.
For example, let's say we want to find all EC2 instances running in all regions and in all of my accounts:
arn = skew.scan('arn:aws:ec2:*:*:instance/*')
for instance in arn:
print(arn)
In that one little line of Python code, a lot of stuff is happening. Skew will iterate through all of the regions supported by the EC2 service and, in each region, will authenticate with each of the account profiles listed in your AWS config file. It will then find all EC2 instances and finally return the complete list of those instances as Resource objects.
In addition to wildcards, you can also use regular expressions as components in the ARN. For example:
arn = skew.scan('arn:aws:dynamodb:us-\.*:*:table/*')
This will find all DynamoDB tables in all US regions for all accounts.
Here are some examples of things you can do quickly and easily with skew
that would be difficult in most other tools.
Find all unattached EBS volumes across all regions and accounts and tally the size of wasted space.
import skew
total_size = 0
total_volumes = 0
for volume in skew.scan('arn:aws:ec2:*:*:volume/*'):
if not volume.data['Attachments']:
total_volumes += 1
total_size += volume.data['Size']
print('%s: %dGB' % (volume.arn, volume.data['Size']))
print('Total unattached volumes: %d' % total_volumes)
print('Total size (GB): %d' % total_size)
Audit all EC2 security groups to find CIDR rules that are not whitelisted.
import skew
# Add whitelisted CIDR blocks here, e.g. 192.168.1.1/32.
# Any addresses not in this list will be flagged.
whitelist = []
for secgrp in skew.scan('arn:aws:ec2:*:*:security-group/*'):
for ipperms in secgrp.data['IpPermissions']:
for ip in ipperms['IpRanges']:
if ip['CidrIp'] not in whitelist:
print('%s: %s is not whitelisted' % (sg.arn, ip['CidrIp']))
Find all EC2 instances that are not tagged in any way.
import skew
for instance in skew.scan('arn:aws:ec2:*:*:instance/*'):
if not instance.tags:
print('%s is untagged' % instance.arn)
The ARN provides a great way to uniquely identify AWS resources but it doesn't exactly roll off the tongue. Skew provides some help for constructing ARN's interactively.
First, start off with a new ARN object.
>>> from skew.arn import ARN
>>> arn = ARN()
>>> arn
arn:aws:*:*:*:*
>>>
Each ARN
object contains 6 components:
- scheme - for now this will always be
arn
- provider - again, for now always
aws
- service - the Amazon Web Service
- region - the AWS region
- account - the ID of the AWS account
- resource - the resource type and resource ID
All of these are available as attributes of the ARN object.
>>> arn.scheme
arn
>>> arn.provider
aws
>>> arn.service
*
>>> arn.region
*
>>> arn.account
*
>>> arn.resource
*
>>>
If you want to build up the ARN interactively, you can ask each of the components what choices are available.
>>> arn.service.choices
['autoscaling',
'cloudformation',
'cloudfront',
'cloudsearch',
'cloudsearchdomain',
'cloudtrail',
'cloudwatch',
'codedeploy',
...
'storagegateway',
'sts',
'support',
'swf']
>>>
You can also try out your regular expressions to make sure they return the results you expect.
>>> arn.service.match('cloud\.*')
['cloudformation',
'cloudfront',
'cloudsearch',
'cloudsearchdomain',
'cloudtrail',
'cloudwatch']
>>>
To set the value of a particular component, use the pattern
attribute.
>>> arn.service.pattern = 'cloud\.*'
>>> arn.service.matches
['cloudformation',
'cloudfront',
'cloudsearch',
'cloudsearchdomain',
'cloudtrail',
'cloudwatch']
>>>
Once you have the ARN that you want, you can enumerate it like this:
>>> for resource in arn:
<do something amazing>
>>>
A recent feature of skew
allows you to run queries against the resource
data. This feature makes use of jmespath which is
a really nice JSON query engine. It was originally written in Python for
use on the AWSCLI but is now available in a number of other languages.
If you have ever used the --query
option of the AWSCLI, then you have
used jmespath.
If you append a jmespath query to the end of the ARN (using a |
as a
separator) skew
will send the data for each of the returned resources
through the jmespath query and store the result in the filtered_data
attribute of the resource object. The original data is still available
as the data
attribute. For example:
arn = skew.scan('arn:aws:ec2:*:*:instance/*|InstanceType'
Then each resource returned would have the instance type store in the
filtered_data
attribute of the Resource
object. This is obviously
a very simple example but jmespath is very powerful and the interactive
query tool available on http://jmespath.org/ allows you to try your
queries out beforehand to get exactly what you want.
One other feature of skew
is easy access to CloudWatch metrics for AWS
resources. If we refer back to the very first interative session in the
post, we can show how you would access those CloudWatch metrics for the
instance
.
>>> instance.metric_names
['CPUUtilization',
'NetworkOut',
'StatusCheckFailed',
'StatusCheckFailed_System',
'NetworkIn',
'DiskWriteOps',
'DiskReadBytes',
'DiskReadOps',
'StatusCheckFailed_Instance',
'DiskWriteBytes']
>>> instance.get_metric_data('CPUUtilization')
[{u'Average': 0.134, u'Timestamp': '2014-12-13T14:04:00Z', u'Unit': 'Percent'},
{u'Average': 0.066, u'Timestamp': '2014-12-13T13:54:00Z', u'Unit': 'Percent'},
{u'Average': 0.066, u'Timestamp': '2014-12-13T14:09:00Z', u'Unit': 'Percent'},
{u'Average': 0.134, u'Timestamp': '2014-12-13T13:34:00Z', u'Unit': 'Percent'},
{u'Average': 0.066, u'Timestamp': '2014-12-13T14:19:00Z', u'Unit': 'Percent'},
{u'Average': 0.068, u'Timestamp': '2014-12-13T13:44:00Z', u'Unit': 'Percent'},
{u'Average': 0.134, u'Timestamp': '2014-12-13T14:14:00Z', u'Unit': 'Percent'},
{u'Average': 0.066, u'Timestamp': '2014-12-13T13:29:00Z', u'Unit': 'Percent'},
{u'Average': 0.132, u'Timestamp': '2014-12-13T13:59:00Z', u'Unit': 'Percent'},
{u'Average': 0.134, u'Timestamp': '2014-12-13T13:49:00Z', u'Unit': 'Percent'},
{u'Average': 0.134, u'Timestamp': '2014-12-13T13:39:00Z', u'Unit': 'Percent'}]
>>>
We can find the available CloudWatch metrics with the metric_names
attribute and then we can retrieve the desired metric using the
get_metric_data
method. The README for skew
contains a bit more
information about accessing CloudWatch metrics.
Skew is pretty new and is still changing a lot. It currently supports only a subset of available AWS resource types but more are being added all the time. If you manage a lot of AWS resources, I encourage you to give it a try. Feedback, as always, is very welcome as are pull requests!