Skip to content

Instantly share code, notes, and snippets.

@dougireton
Last active December 14, 2016 20:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dougireton/e86e20e863cedb0068649bc799bf1779 to your computer and use it in GitHub Desktop.
Save dougireton/e86e20e863cedb0068649bc799bf1779 to your computer and use it in GitHub Desktop.
AWS Advent 2016

Paginating AWS API Results using the Boto3 Python SDK

Boto3 is Amazon's officially supported AWS SDK for Python. It's the de facto way to interact with AWS via Python.

If you've used Boto3 to query AWS resources, you may have run into limits on how many resources a query to the specified AWS API will return, generally 50 or 100 results, although S3 will return up to 1000 results. The AWS APIs return "pages" of results. If you are trying to retrieve more than one "page" of results you will need to use a paginator to issue multiple API requests on your behalf.

Introduction

Boto3 provides Paginators to automatically issue multiple API requests to retrieve all the results (e.g. on an API call to EC2.DescribeInstances). Paginators are straightforward to use, but not all Boto3 services provide paginator support. For those services you'll need to write your own paginator in Python.

In this post, I'll show you how to retrieve all query results for Boto3 services which provide Pagination support, and I'll show you how to write a custom paginator for services which don't provide built-in pagination support.

Built-In Paginators

Most services in the Boto3 SDK provide Paginators. See S3 Paginators for example.

Once you determine you need to paginate your results, you'll need to call the get_paginator() method.

How do I know I need a Paginator?

If you suspect you aren't getting all the results from your Boto3 API call, there are a couple of ways to check. You can look in the AWS console (e.g. number of Running Instances), or run a query via the aws command-line interface.

Here's an example of querying an S3 bucket via the AWS command-line. Boto3 will return the first 1000 S3 objects from the bucket, but since there are a total of 1002 objects, you'll need to paginate.

Counting results using the AWS CLI

$ aws s3 ls my-example-bucket|wc -l

-> 1002

Here's a boto3 example which, by default, will return the first 1000 objects from a given S3 bucket.

Determining if the results are truncated

import boto3

# use default profile
s3 = boto3.client('s3')
resp = s3.list_objects_v2(Bucket='my-example-bucket')

print('list_objects_v2 returned {}/{} files.'.format(resp['KeyCount'], resp['MaxKeys']))
if resp['IsTruncated']:
    print('There are more files available. You will need to paginate the results.')
>>> "list_objects_v2 returned 1000/1000 files."
>>> "There are more files available. You will need to paginate the results."

The S3 response dictionary provides some helpful properties, like IsTruncated, KeyCount, and MaxKeys which tell you if the results were truncated. If resp['IsTruncated'] is True, you know you'll need to use a Paginator to return all the results.

Using Boto3's Built-In Paginators

The Boto3 documentation provides a good overview of how to use the built-in paginators, so I won't repeat it here.

If a given service has Paginators built-in, they are documented in the Paginators section of the service docs, e.g. AutoScaling, and EC2.

Determine if a method can be paginated

You can also verify if the boto3 service provides Paginators via the client.can_paginate() method.

import boto3

s3 = boto3.client('s3')
print(s3.can_paginate('list_objects_v2')) # => True

So, that's it for built-in paginators. In this section I showed you how to determine if your API results are being truncated, pointed you to Boto3's excellent documentation on Paginators, and showed you how to use the can_paginate() method to verify if a given service method supports pagination.

If the Boto3 service you are using provides paginators, you should use them. They are tested and well documented. In the next section, I'll show you how to write your own paginator.

How to Write Your Own Paginator

Some Boto3 services, such as AWS Config don't provide paginators. For these services, you will have to write your own paginator code in Python to retrieve all the query results. In this section, I'll show you how to write your own paginator.

You Might Need To Write Your Own Paginator If...

Some Boto3 SDK services aren't as built-out as S3 or EC2. For example, the AWS Config service doesn't provide paginators. The first clue is that the Boto3 AWS ConfigService docs don't have a "Paginators" section.

The can_paginate Method

You can also ask the individual service client's can_paginate method if it supports paginating. For example, here's how to do that for the AWS config client. In the example below, we determine that the config service doesn't support paginating for the get_compliance_details_by_config_rule method.

import boto3

config = boto3.client('config')
can_paginate = config.can_paginate('get_compliance_details_by_config_rule')

if not can_paginate:
  print('There is no built-in paginator for that method')
>>> 'There is no built-in paginator for that method'

Operation Not Pageable Error

If you try to paginate a method without a built-in paginator, you will get an error similar to this:

config.get_paginator('get_compliance_details_by_config_rule')

.../python2.7/site-packages/botocore/client.pyc in get_paginator(self, operation_name)
    591         if not self.can_paginate(operation_name):
--> 592             raise OperationNotPageableError(operation_name=operation_name)
    593         else:
    594             actual_operation_name = self._PY_TO_OP_NAME[operation_name]

"OperationNotPageableError: Operation cannot be paginated: get_compliance_details_by_config_rule"

If you get an error like this, it's time to roll up your sleeves and write your own paginator.

Writing a Paginator

Writing a paginator is fairly straightforward. When you call the AWS service API, it will return the maximum number of results, and a long hex string token, next_token if there are more results.

Approach

To create a paginator for this, you make calls to the service API in a loop until next_token is empty, collecting the results from each loop iteration in a list. At the end of the loop, you will have all the results in the list.

In the example code below, I'm calling the AWS Config service to get a list of resources (e.g. EC2 instances), which are not compliant with the required-tags Config rule.

As you read the example code below, it might help to read the Boto3 SDK docs for the get_compliance_details_by_config_rule method, especially the "Response Syntax" section.

Example Paginator

import boto3

def get_resources_from(compliance_details):
    results = compliance_details['EvaluationResults']
    resources = [result['EvaluationResultIdentifier']['EvaluationResultQualifier'] for result in results]
    next_token = compliance_details.get('NextToken', None)

    return resources, next_token


def main():
    config = boto3.client('config')

    next_token = ''  # variable to hold the pagination token
    resources = []   # list for the entire resource collection

    # Call the `get_compliance_details_by_config_rule` method in a loop
    # until we have all the results from the AWS Config service API.
    while next_token is not None:
        compliance_details = config.get_compliance_details_by_config_rule(
            ConfigRuleName='required-tags',
            ComplianceTypes=['NON_COMPLIANT'],
            Limit=100,
            NextToken=next_token
            )

        current_batch, next_token = get_resources_from(compliance_details)
        resources += current_batch

    print(resources)


if __name__ == "__main__":
    main()

Example Paginator - main() Method

In the example above, the main() method creates the config client and initializes the next_token variable. The resources list will hold the final results set.

The while loop is the heart of the paginating code. In each loop iteration, we call the get_compliance_details_by_config_rule method, passing next_token as a parameter. Again, next_token is a long hex string returned by the given AWS service API method. It's our "claim check" for the next set of results.

Next, we extract the current_batch of AWS resources and the next_token string from the compliance_details dictionary returned by our API call.

Example Paginator - get_resources_from() Helper Method

The get_resources_from(compliance_details) is an extracted helper method for parsing the compliance_details dictionary. It returns our current batch (100 results) of resources and our next_token "claim check" so we can get the next page of results from config.get_compliance_details_by_config_rule().

I hope the example is helpful in writing your own custom paginator.


In this section on writing your own paginators I showed you a Boto3 documentation example of a service without built-in Paginator support. I discussed the can_paginate method and showed you the error you get if you call it on a method which doesn't support pagination. Finally, I discussed an approach for writing a custom paginator in Python and showed a concrete example of a custom paginator which passes the NextToken "claim check" string to fetch the next page of results.

Summary

In this post, I covered Paginating AWS API responses with the Boto3 SDK. Like most APIs (Twitter, GitHub, Atlassian, etc) AWS paginates API responses over a set limit, generally 50 or 100 resources. Knowing how to paginate results is crucial when dealing with large AWS accounts which may contain thousands of resources.

I hope this post has taught you a bit about paginators and how to get all your results from the AWS APIs.

About the Author

Doug is a Sr. DevOps engineer at 1Strategy, an AWS Consulting Partner specializing in Amazon Web Services (AWS). He has 23 years experience in IT, working at Microsoft, Washington Mutual Bank, and Nordstrom in diverse roles from testing, Windows Server engineer, developer, and Chef engineer, helping app and platform teams manage thousands of servers via automation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment