Skip to content

Instantly share code, notes, and snippets.

@vsouza
Forked from oddskool/parse_aws_s3_billing.py
Created November 17, 2015 17:48
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save vsouza/62ea409491cb605b5633 to your computer and use it in GitHub Desktop.
Save vsouza/62ea409491cb605b5633 to your computer and use it in GitHub Desktop.
Simplistic script to parse the detailed AWS billing CSV file. Script displays cost of S3 operations broken down per region, bucket and usage type (either storage or network). It also sums up the amount of storage used per bucket. Output is filtered wrt to costs < 1$. See http://docs.aws.amazon.com/awsaccountbilling/latest/about/programaccess.html
# -*- coding:utf-8 -*-
'''
Simplistic script to parse the detailed AWS billing CSV file.
Script displays cost of S3 operations broken down per region, bucket and usage
type (either storage or network). It also sums up the amount of storage used per bucket.
Output is filtered wrt to costs < 1$.
See http://docs.aws.amazon.com/awsaccountbilling/latest/about/programaccess.html for
how to set up programmatic access to your billing.
Should be simple enough to enhance this script and use it for other AWS resources
(EC2, EMR, etc)
@author: @oddskool <https://github.com/oddskool>
@license: BSD 3 clauses
'''
import sys
import csv
from collections import defaultdict
def add_type(d):
if d['RecordType'] == 'UsageQuantity':
return None
for field in ('Cost', 'UsageQuantity'):
d[field] = float(d[field])
for field in ('LinkedAccountId', 'InvoiceID', 'RecordType', 'RecordId',
'PayerAccountId', 'SubscriptionId'):
del d[field]
return d
def parse(stats, d):
d = add_type(d)
if not d:
return
if d['ProductName'] != 'Amazon Simple Storage Service':
return
stats[(d['AvailabilityZone'] or 'N/A')+' * '+d['ResourceId']+' * '+d['UsageType']]['Cost'] += d['Cost']
stats[(d['AvailabilityZone'] or 'N/A')+' * '+d['ResourceId']+' * '+d['UsageType']]['UsageQuantity'] += d['UsageQuantity']
if __name__ == '__main__':
fd = open(sys.argv[1]) if len(sys.argv) > 1 else sys.stdin
reader = csv.reader(fd, delimiter=',', quotechar='"')
legend = None
stats = defaultdict(lambda: defaultdict(int))
for row in reader:
if not legend:
legend = row
continue
d = dict(zip(legend, row))
try:
parse(stats, d)
except Exception as e:
print e
print row
print d
data = [ (resource, cost_usage) for resource, cost_usage in
stats.iteritems() if cost_usage['Cost'] > 1.0 ]
data.sort(key=lambda x:x[-1]['Cost'], reverse=True)
for d in data:
print "%50s : $%.2f - %.2f GB" % (d[0],d[1]['Cost'],d[1]['UsageQuantity'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment