Skip to content

Instantly share code, notes, and snippets.

@erichiggins
Last active November 25, 2020 17:31
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save erichiggins/1b8e34a1a9816245192c to your computer and use it in GitHub Desktop.
Save erichiggins/1b8e34a1a9816245192c to your computer and use it in GitHub Desktop.
Efficiently page over a Query to fetch all entities from the Google App Engine Datastore.
#!/usr/bin/python
"""
Functions are provided for both the DB and NDB Datastore APIs.
References:
* https://cloud.google.com/appengine/docs/python/datastore/queries
* https://cloud.google.com/appengine/docs/python/ndb/queries
"""
def db_fetch_all(query, limit=100, cursor=None):
"""Fetch all function for the DB Datastore API."""
results = []
more = True
if cursor:
query = query.with_cursor(cursor)
# Fetch entities in batches.
while more:
entities = query.fetch(limit)
results.extend(entities)
query = query.with_cursor(query.cursor())
more = bool(entities)
return results
def ndb_fetch_all(query, limit=100, cursor=None):
"""Fetch all function for the NDB Datastore API."""
results = []
more = True
# Fetch entities in batches.
while more:
entities, cursor, more = query.fetch_page(limit, start_cursor=cursor)
results.extend(entities)
return results
@erichiggins
Copy link
Author

Based on some more through testing and research:

  • .fetch_page() is best-suited for user-facing pagination w/ cursors
  • .fetch() is best-suited when the number of results is known to be under 2000 or so
  • .iter() with a batch_size of 200 or appears to be the fastest way to iterate over a query with an unknown number of results

Sample performance data from a basic query with no filters and 2100 entities:

  • fetch_page
    • 2.044650s
    • 3.684160s
    • 4.055870s
    • 4.400940s
    • 4.795300s
    • 4.839350s
    • 11.897800s
    • 12.700310s
    • 3.950250s
    • 3.813200s
    • 4.106560s
    • 3.774050s
    • 4.628290s
  • [x for x in query]
    • 1.222110s
    • 6.526030s
    • 10.406020s
    • 9.368520s
    • 2.529480s
    • 2.404470s
    • 3.003570s
    • 3.604640s
    • 3.675460s
    • 3.448450s
    • 3.297120s
    • 3.561790s
    • 3.327240s
    • 3.144370s
  • ndb.tasklet and query.iter()
    • 2.166430s
    • 2.114100s
  • list(query)
    • 1.627440s
    • 2.659040s
  • list(query) with batch_size=100
    • 1.344340s
    • 2.518590s
    • 3.010650s
  • fetch()
    • 0.663910s
    • 1.000210s
    • 1.211190s

Relevant discussions:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment