Skip to content

Instantly share code, notes, and snippets.

@headquarters
Last active March 15, 2023 01:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save headquarters/b96a5268391d693bc73a4b6ec9800511 to your computer and use it in GitHub Desktop.
Save headquarters/b96a5268391d693bc73a4b6ec9800511 to your computer and use it in GitHub Desktop.
List S3 objects with depth (ChatGPT generated)
import boto3
def list_s3_objects(bucket_name, prefix='', depth=0, page_size=100, start_after='', max_pages=10):
"""
Lists all objects in an S3 bucket at a certain "depth" (number of slashes in the key), optionally filtered by a prefix, and paginates results.
Returns a list of dictionaries containing object metadata.
Args:
bucket_name (str): Name of the S3 bucket.
prefix (str): Prefix to filter objects by (default '').
depth (int): Depth of objects to list (default 0, meaning all objects).
page_size (int): Maximum number of objects to return per page (default 100).
start_after (str): Object key to start listing after (default '').
max_pages (int): Maximum number of pages to retrieve (default 10).
Returns:
list: List of dictionaries containing object metadata.
"""
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(
Bucket=bucket_name,
Prefix=prefix,
PaginationConfig={
'PageSize': page_size,
'StartingToken': start_after,
}
)
objects = []
page_count = 0
for page in page_iterator:
for obj in page.get('Contents', []):
if obj['Key'].count('/') == depth:
objects.append(obj)
page_count += 1
if page_count >= max_pages:
break
return objects
@headquarters
Copy link
Author

Example:

# Replace 'bucket-name' with the name of your S3 bucket.
objects = list_s3_objects(bucket_name='bucket-name', prefix='data/', depth=2, page_size=100, max_pages=5)

# Each object is a dictionary containing metadata about the object.
for obj in objects:
    print(obj['Key'], obj['Size'], obj['LastModified'])

"In this example, list_s3_objects lists only the objects in the S3 bucket whose name is specified by the bucket_name parameter and whose keys have two slashes (i.e., a depth of 2), filtered by the prefix specified by the prefix parameter. It retrieves up to page_size objects per page, starting with the object whose key is specified by the start_after parameter (if any), and returns up to max_pages pages of results. The function returns a list of dictionaries, where each dictionary contains metadata about an S3 object (such as its key, size, and last modified date)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment