Skip to content

Instantly share code, notes, and snippets.

@headquarters
Last active April 17, 2023 16:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save headquarters/ff03131fd44053cf57bb580bb33da920 to your computer and use it in GitHub Desktop.
Save headquarters/ff03131fd44053cf57bb580bb33da920 to your computer and use it in GitHub Desktop.
List S3 objects (ChatGPT generated)
import boto3
def list_s3_objects(bucket_name, prefix='', page_size=100, start_after='', max_pages=10):
"""
Lists all objects in an S3 bucket, optionally filtered by a prefix, and paginates results.
Returns a list of dictionaries containing object metadata.
Args:
bucket_name (str): Name of the S3 bucket.
prefix (str): Prefix to filter objects by (default '').
page_size (int): Maximum number of objects to return per page (default 100).
start_after (str): Object key to start listing after (default '').
max_pages (int): Maximum number of pages to retrieve (default 10).
Returns:
list: List of dictionaries containing object metadata.
"""
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(
Bucket=bucket_name,
Prefix=prefix,
PaginationConfig={
'PageSize': page_size,
'StartingToken': start_after,
}
)
objects = []
page_count = 0
for page in page_iterator:
objects.extend(page.get('Contents', []))
page_count += 1
if page_count >= max_pages:
break
return objects
@headquarters
Copy link
Author

Example use:

# Replace 'bucket-name' with the name of your S3 bucket.
objects = list_s3_objects(bucket_name='bucket-name', prefix='data/', page_size=100, max_pages=5)

# Each object is a dictionary containing metadata about the object.
for obj in objects:
    print(obj['Key'], obj['Size'], obj['LastModified'])

Generated by ChatGPT. Here's it's explanation:
"In this example, list_s3_objects lists all objects in the S3 bucket whose name is specified by the bucket_name parameter, filtered by the prefix specified by the prefix parameter. It retrieves up to page_size objects per page, starting with the object whose key is specified by the start_after parameter (if any), and returns up to max_pages pages of results. The function returns a list of dictionaries, where each dictionary contains metadata about an S3 object (such as its key, size, and last modified date)."

@headquarters
Copy link
Author

One bug here: "start after" is a different parameter for which key you want to start listing everything after; starting token is a place to start pagination based on a previous NextToken sent back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment