Skip to content

Instantly share code, notes, and snippets.

@nathants nathants/s3-du.py
Last active Jun 18, 2018

Embed
What would you like to do?
"""
prints a table of the largest prefixes in an s3 bucket, along with some meta-data.
usage:
$ python3.4 s3_du.py -h
usage:
$ aws s3 ls $BUCKET --recursive | python3.4 s3_du.py | head -n 100 | columnt -t
example:
$ echo '
2016-05-27 00:05:32 100 dir-a/foo.txt
2016-05-28 00:05:32 200 dir-a/bar.txt
2016-05-18 00:05:32 10 dir-b/bar.txt
2016-05-19 00:05:32 20 dir-b/bar.txt
' | python3.4 s3_du.py | column -t
> path bytes num_keys min_date:max_date
> dir-a 300 2 2016-05-27:2016-05-28
> dir-b 30 2 2016-05-18:2016-05-19
"""
import sys
import argh # https://pypi.python.org/pypi/argh/0.26.2
@argh.dispatch_command
def main(max_depth: 'how many directory levels to show' = 1):
result = {}
for x in sys.stdin:
if x.strip():
try:
date, _, bytes, path = x.strip().split(maxsplit=3)
except:
print('skipping bad line:', x, file=sys.stderr)
continue
path = path.replace(' ', '_')
for i in range(max_depth):
key = '/'.join(path.split('/')[:i + 1])
val = result[key] = result.get(key, {})
val['min'] = min(val.get('min', date), date)
val['max'] = max(val.get('max', date), date)
val['bytes'] = val.get('bytes', 0) + float(bytes)
val['keys'] = val.get('keys', 0) + 1
result = result.items()
result = sorted(result, key=lambda x: x[0])
result = sorted(result, key=lambda x: x[1]['bytes'], reverse=True)
print('path bytes num_keys min_date:max_date')
for k, v in result:
print(k, format(v['bytes'], '3,').split('.')[0], str(v['keys']), v['min'] + ':' + v['max'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.