Skip to content

Instantly share code, notes, and snippets.

@psyonara
Last active September 2, 2020 12:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save psyonara/26e9098fc179cc6d4d406d7449fa26e1 to your computer and use it in GitHub Desktop.
Save psyonara/26e9098fc179cc6d4d406d7449fa26e1 to your computer and use it in GitHub Desktop.
Speed up bulk-exists check with python sets
# Code for blog article:
# https://www.helmut.dev/speed-up-bulk-exists-check-with-python-sets.html
# When you need to do an "exists" check for a high volume of
# items (think hundreds of thousands or more), doing a query
# for each will take ages and put a strain on your database.
# Rather, you could extract all the relevant field values
# into a list and then check whether it contains each record's
# reference value. However, using a list for such a check
# is also expensive in terms of computing time. But using a set
# instead improves performance dramatically.
# Standard "exists" check
for item in external_records:
if not Data.objects.filter(external_id=item.id).exists():
# Do something
# Bulk "exists" check for very high volumes
existing_ids = set(Data.objects.all().values_list("external_id", flat=True))
for item in external_records:
if item.id not in existing_ids:
# Do something
@adamchainz
Copy link

At a certain scale you can't pull back all the external ids in your DB. Better to ask which of the set you have exist:

existing_ids = set(Data.objects.filter(
    external_id__in={r.id for r in external_records}
).values_list("external_id, flat=True))

@psyonara
Copy link
Author

psyonara commented Sep 2, 2020

@adamchainz Thanks for the pointer! Hadn't considered that possibility initially. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment