Skip to content

Instantly share code, notes, and snippets.

@pashri
Created May 14, 2024 15:43
Show Gist options
  • Save pashri/5f1a110bf3d690f9210e34991c17017e to your computer and use it in GitHub Desktop.
Save pashri/5f1a110bf3d690f9210e34991c17017e to your computer and use it in GitHub Desktop.
Make an iterator of boto3 request results, which handles pagination.
from collections.abc import Callable, Iterator
from typing import Any, Optional
def boto_request(
func: Callable,
params: Optional[dict[str, Any]] = None,
items_key: str = 'Items',
last_key: str = 'LastEvaluatedKey',
start_key: str = 'ExclusiveStartKey',
) -> Iterator[dict]:
"""Make an iterator of a boto3 request results
Parameters
----------
func : Callable
The boto3 function that requires multiple calls
params : dict, optional
The parameters to be passed to that function
items_key : str, default: 'Items'
The key in the response that corresponds with the query results
last_key : str
The key in the response that refers to the last key returned
start_key : str
The key in the function that refers to the starting point,
based on the last key
Returns
-------
Iterator[dict]
An iterator of records
Notes
-----
The default keys correspond to dynamodb
Examples
--------
Loading all data from a dynamodb table into a pandas DataFrame:
```python
table = boto3.resource('dynamodb').Table('Records')
records = boto_request(Table.scan)
df = pd.DataFrame(records)
```
"""
params = {} if params is None else params.copy()
continue_request: bool = True
while continue_request:
results: dict[str, Any] = func(**params)
yield from results[items_key]
if last_key in results:
params[start_key] = results[last_key]
else:
continue_request = False
@pashri
Copy link
Author

pashri commented May 14, 2024

You can do something like

from functools import partial
import boto3

TABLE_NAME = 'spam'

session = boto3.Session()
dynamodb = session.resource('dynamodb')
table = dynamodb.Table(TABLE_NAME)

scan_table = partial(boto_request, table.scan)

items = scan_table()  # This is an iterable

for item in items:  # This will load in chunks, and paginate when needed
    ... # The item is the record

@pashri
Copy link
Author

pashri commented May 14, 2024

Let's say you want to list all the objects in an S3 bucket:

import boto3

...

def list_objects(
        bucket_name: str,
        prefix: str | None = None,
        session: boto3.Session | None = None,
) -> Iterable[dict]:
    """List the objects in an S3 bucket"""

    session = session or boto3.Session()
    client = session.client('s3')
    items: Iterable[dict] = boto_request(
        func=client.list_objects_v2,
        params={'Bucket': bucket_name, 'Prefix': prefix},
        items_key='Contents',
        last_key='NextContinuationToken',
        start_key='ContinuationToken',
    )

    return items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment