garnaat/gist:b8db4732732542041cd9 Secret

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Finding AWS Resources Across Regions, Services, and Accounts

If you only have one account in AWS and you only use one service in one
region, this article probably isn't for you.  However, if you are like
me and manage resources in many accounts, across multiple regions, and in
many different services, stick around.
There are a lot of great tools to help you manage your AWS resources.  There is
the AWS Web Console, the AWSCLI, various language SDK's like boto, and a host
of third-party tools.  The biggest problem I have with most of these tools is
that they limit your view of resources to a single region, a single account,
and a single service at a time.  For example, you have to login to the AWS
Console with one set of credentials representing a single account.  And once
you are logged in, you have to select a single region.  And then, finally, you
drill into a particular service.  The AWSCLI and the SDK's follow this same
basic model.
But what if you want to look at resources across regions?  Across accounts?
Across services?  Well, that's where skew comes in.
Skew

Skew is a Python library built on top of
botocore.  The main purpose of skew
is to provide a flat, uniform address space for all of your AWS resources.
The name skew is a homonym for SKU (Stock Keeping Unit).  SKU's are the
numbers that show up on the bar codes of just about everything you purchase and
that SKU number uniquely identifies the product in the vendor's inventory.  When
you make a purchase they scan the barcode containing the SKU and can instantly
find the pricing data for the item.
Similary, skew uses a unique identifier for each one of your
AWS resources and allows you to scan the SKU and quickly find the details
for that resource.  It also provides some powerful mechanisms to find sets
of resources by allowing wildcarding and regular expressions within the SKU's.
ARN't You Glad You Are Reading This?

So, what do we use for a unique identifier for all of our AWS resources?
Well, as it turns out, AWS has already solved that problem for us.  Each
resource in AWS can be identified by an
Amazon Resource Name or ARN.  The general form for ARN's are:
arn:aws:service:region:account:resource
arn:aws:service:region:account:resourcetype/resource
arn:aws:service:region:account:resourcetype:resource

So, the ARN for an EC2 instances might look like this:
arn:aws:ec2:us-west-2:123456789012:instance/i-12345678

This tells us the instance is in the us-west-2 region, running in the
account identified by the account number 123456789012 and the instance
has an instance ID of i-12345678.
Getting Started With Skew

The easiest way to install skew is via pip.
% pip install skew

Because skew is based on botocore, as is AWSCLI, it will use the same
credentials as those tools.  You need to make a small addition to your
~/.aws/config file to help skew map AWS account ID's to the profiles in the config file.  Check the
README for
details on that.
Let's Find Some Stuff

Once we have skew installed and configured, we can use it to find
resources based on their ARN's.  For example, using the example ARN
above:
>>> import skew
>>> arn = skew.scan('arn:aws:ec2:us-west-2:123456789012:instance/i-12345678')
>>> arn
arn:aws:ec2:us-west-2:123456789012:instance/i-12345678
>>>

Ok, that wasn't very exciting.  How do I get at my actual resource in AWS?
Well, the scan method returns an ARN object and this object supports
the iterator pattern in Python.  This makes sense since as we will see
later this ARN can actually return a lot of objects, not just one.  So if
we want to get our object we can:
>>> instance = list(arn)[0]
>>> instance.id
'i-12345678'
>>> instance.data
{u'AmiLaunchIndex': 1,
 u'Architecture': 'x86_64',
 u'BlockDeviceMappings': [{u'DeviceName': '/dev/sda1',
   u'Ebs': {u'AttachTime': datetime.datetime(2014, 12, 14, 13, 48, tzinfo=tzutc()),
   u'DeleteOnTermination': True,
   u'Status': 'attached',
   u'VolumeId': 'vol-63276b7b'}}],
 u'ClientToken': '425f1a07-2e61-4089-a7dc-7344b302731e_us-east-1d_2',
 u'EbsOptimized': False,
 u'Hypervisor': 'xen',
 u'ImageId': 'ami-6227460a',
 u'InstanceId': 'i-12345678',
 u'InstanceType': 'c3.2xlarge',
 ...
 }
 >>>

Iterating on an ARN returns a list of Resource objects and each of these
Resource objects represents one resource in AWS.  Resource objects
have a number of attributes like id and they also have an attribute
called data that contains all of the data about that resource.  This is
the same information that would be returned by the AWSCLI or an SDK.
Wildcards And Regular Expressions

Finding a single resource in AWS is okay but one of the nice things about
skew is that it allows you to quickly find lots of resources in AWS.  And
you don't have to worry about which region those resources are in or in
which account they reside.
For example, let's say we want to find all EC2 instances running in all
regions and in all of my accounts:
arn = skew.scan('arn:aws:ec2:*:*:instance/*')
for instance in arn:
    print(arn)

In that one little line of Python code, a lot of stuff is happening.  Skew
will iterate through all of the regions supported by the EC2 service and, in
each region, will authenticate with each of the account profiles listed in
your AWS config file.  It will then find all EC2 instances and finally
return the complete list of those instances as Resource objects.
In addition to wildcards, you can also use regular expressions as components
in the ARN.  For example:
arn = skew.scan('arn:aws:dynamodb:us-\.*:*:table/*')

This will find all DynamoDB tables in all US regions for all accounts.
Some Useful Examples

Here are some examples of things you can do quickly and easily with skew
that would be difficult in most other tools.
Find all unattached EBS volumes across all regions and accounts and tally
the size of wasted space.
import skew
 
total_size = 0
total_volumes = 0
 
for volume in skew.scan('arn:aws:ec2:*:*:volume/*'):
    if not volume.data['Attachments']:
        total_volumes += 1
        total_size += volume.data['Size']
        print('%s: %dGB' % (volume.arn, volume.data['Size']))
print('Total unattached volumes: %d' % total_volumes)
print('Total size (GB): %d' % total_size)

Audit all EC2 security groups to find CIDR rules that are not whitelisted.
import skew

# Add whitelisted CIDR blocks here, e.g. 192.168.1.1/32.
# Any addresses not in this list will be flagged.
whitelist = []
 
for secgrp in skew.scan('arn:aws:ec2:*:*:security-group/*'):
    for ipperms in secgrp.data['IpPermissions']:
        for ip in ipperms['IpRanges']:
            if ip['CidrIp'] not in whitelist:
                print('%s: %s is not whitelisted' % (sg.arn, ip['CidrIp']))

Find all EC2 instances that are not tagged in any way.
import skew
 
for instance in skew.scan('arn:aws:ec2:*:*:instance/*'):
    if not instance.tags:
        print('%s is untagged' % instance.arn)

Building ARN's Interactively

The ARN provides a great way to uniquely identify AWS resources but it
doesn't exactly roll off the tongue.  Skew provides some help for
constructing ARN's interactively.
First, start off with a new ARN object.
>>> from skew.arn import ARN
>>> arn = ARN()
>>> arn
arn:aws:*:*:*:*
>>>

Each ARN object contains 6 components:

scheme - for now this will always be arn
provider - again, for now always aws
service - the Amazon Web Service
region - the AWS region
account - the ID of the AWS account
resource - the resource type and resource ID

All of these are available as attributes of the ARN object.
>>> arn.scheme
arn
>>> arn.provider
aws
>>> arn.service
*
>>> arn.region
*
>>> arn.account
*
>>> arn.resource
*
>>>

If you want to build up the ARN interactively, you can ask each of the
components what choices are available.
>>> arn.service.choices
['autoscaling',
 'cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch',
 'codedeploy',
 ...
 'storagegateway',
 'sts',
 'support',
 'swf']
 >>>

You can also try out your regular expressions to make sure they return
the results you expect.
>>> arn.service.match('cloud\.*')
['cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch']
>>>

To set the value of a particular component, use the pattern attribute.
>>> arn.service.pattern = 'cloud\.*'
>>> arn.service.matches
['cloudformation',
 'cloudfront',
 'cloudsearch',
 'cloudsearchdomain',
 'cloudtrail',
 'cloudwatch']
>>>

Once you have the ARN that you want, you can enumerate it like this:
>>> for resource in arn:
    <do something amazing>
>>>

Running Queries Against Returned Data

A recent feature of skew allows you to run queries against the resource
data.  This feature makes use of jmespath which is
a really nice JSON query engine.  It was originally written in Python for
use on the AWSCLI but is now available in a number of other languages.
If you have ever used the --query option of the AWSCLI, then you have
used jmespath.
If you append a jmespath query to the end of the ARN (using a | as a
separator) skew will send the data for each of the returned resources
through the jmespath query and store the result in the filtered_data
attribute of the resource object.  The original data is still available
as the data attribute.  For example:
arn = skew.scan('arn:aws:ec2:*:*:instance/*|InstanceType'

Then each resource returned would have the instance type store in the
filtered_data attribute of the Resource object.  This is obviously
a very simple example but jmespath is very powerful and the interactive
query tool available on http://jmespath.org/ allows you to try your
queries out beforehand to get exactly what you want.
CloudWatch Metrics

One other feature of skew is easy access to CloudWatch metrics for AWS
resources.  If we refer back to the very first interative session in the
post, we can show how you would access those CloudWatch metrics for the
instance.
>>> instance.metric_names
['CPUUtilization',
 'NetworkOut',
 'StatusCheckFailed',
 'StatusCheckFailed_System',
 'NetworkIn',
 'DiskWriteOps',
 'DiskReadBytes',
 'DiskReadOps',
 'StatusCheckFailed_Instance',
 'DiskWriteBytes']
>>> instance.get_metric_data('CPUUtilization')
[{u'Average': 0.134, u'Timestamp': '2014-12-13T14:04:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T13:54:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T14:09:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:34:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T14:19:00Z', u'Unit': 'Percent'},
 {u'Average': 0.068, u'Timestamp': '2014-12-13T13:44:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T14:14:00Z', u'Unit': 'Percent'},
 {u'Average': 0.066, u'Timestamp': '2014-12-13T13:29:00Z', u'Unit': 'Percent'},
 {u'Average': 0.132, u'Timestamp': '2014-12-13T13:59:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:49:00Z', u'Unit': 'Percent'},
 {u'Average': 0.134, u'Timestamp': '2014-12-13T13:39:00Z', u'Unit': 'Percent'}]
>>>

We can find the available CloudWatch metrics with the metric_names
attribute and then we can retrieve the desired metric using the
get_metric_data method.  The README for skew contains a bit more
information about accessing CloudWatch metrics.
Wrap Up

Skew is pretty new and is still changing a lot.  It currently supports only
a subset of available AWS resource types but more are being added all the
time.  If you manage a lot of AWS resources, I encourage you to give it a
try.  Feedback, as always, is very welcome as are pull requests!