Skip to content

Instantly share code, notes, and snippets.

@wontonst
Last active September 3, 2019 18:06
Show Gist options
  • Save wontonst/c8c06730d13e004de390e67f593be763 to your computer and use it in GitHub Desktop.
Save wontonst/c8c06730d13e004de390e67f593be763 to your computer and use it in GitHub Desktop.
Find all leaf node directories (common prefixes) in an s3 bucket with a root prefix
"""
Readings:
https://docs.aws.amazon.com/AmazonS3/latest/dev/ListingKeysHierarchy.html
https://github.com/boto/boto3/blob/develop/boto3/examples/s3.rst#list-top-level-common-prefixes-in-amazon-s3-bucket
To get all subdirectories instead of only leaf: https://gist.github.com/wontonst/fd2b03dd4b5ac656cc264fe70a7d37bd
"""
import boto3
def find_all_leaf_common_prefixes(bucket_name, root_prefix):
"""Finds all common prefixes (basically they're s3 directories) that are leaf nodes underneath the root prefix.
For example, if we input '2019/2' as the root prefix and keys look like year/month/day/myfile.txt,
then the result would be a list that looks like ['2019/2/1/', '2019/2/12/', '2019/2/8/', ...]
"""
queue = deque([root_prefix])
output = set()
s3 = boto3.client('s3')
while queue:
common_prefix = queue.popleft()
paginator = s3.get_paginator('list_objects')
result = paginator.paginate(
Bucket=bucket_name,
Delimiter='/',
Prefix=common_prefix,
)
if all(x is None for x in result.search('CommonPrefixes')):
output.add(common_prefix)
continue
for prefix in result.search('CommonPrefixes'):
queue.append(prefix.get('Prefix'))
return output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment