-
Incorrect usage of the Django ORM is a common cause of performance issues. Therefore, it is important to know when querysets are evaluated (i.e., hit the database)
- Common mistakes that cause the queryset to be evaulated include iterating over the queryset or calling any of
len()
,list()
,bool()
with the queryset as an argument. - Reference: (QuerySet API reference | Django documentation | Django)
- Common mistakes that cause the queryset to be evaulated include iterating over the queryset or calling any of
-
Use
exists()
to check if a queryset has any results in it.- Good:
queryset = Account.objects.filter(some_attribute=True) if queryset.exists(): # do something
- Bad:
queryset = Account.objects.filter(some_attribute=True) if len(queryset): # do something
-
Consider breaking up complex queries by assigning “intermediate” querysets to variables. This can be very helpful in debugging, and does not impact performance since the intermediate steps do not cause the queryset to be evaluated (see above).
queryset = Account.objects.filter(some_attribute=True) queryset = queryset.exclude(in_production=False) queryset = queryset.exclude(theme__old_context=False)
-
Avoid looping over querysets. This causes Django to create an object for each item and load it into memory. (sql - Why is iterating through a large Django QuerySet consuming massive amounts of memory? - Stack Overflow)
-
If you cannot avoid looping, consider using
iterator()
Evaluates the queryset (by performing the query) and returns an iterator (see PEP 234 ) over the results. A queryset typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast,
iterator()
will read results directly, without doing any caching at the queryset level (internally, the default iterator callsiterator()
and caches the return value). For a queryset which returns a large number of objects that you only need to access once, this can result in better performance… -
Use
update()
to bulk update a queryset, rather than looping, updating each item and saving individually.- Be aware that
update()
does not callsave()
or triggerpre_save
andpost_save
signals. This can be useful when you need to make updates without running that logic (e.g., in a data migration), but if used in other contexts may bypass important business logic. - Reference: Making queries | Django documentation | Django
- Be aware that
-
Leverage
select_related
to follow foreign key relationships in order to select related objects, and reduce the number of queries performed. This will work with ForeignKey and OneToOne relationships, but not ManyToMany.-
A potential example of
select_related
usage is a situation where data cannot be retrieved via the ORM and iteration is necessary (e.g., accessing a model property instead of a model database field).const display_names = [] # where display_menu is a calculated property and not a field on Menu and therefore can't be retrieved using the ORM. for menu in Menu.objects.select_related('location'): display_names.append(menu.location.display_name)
-
The above example results in a single query. However, if you were to remove
select_related
and useall()
, a query would occur each timemenu.location
is called. -
Be aware that calling
select_related
without any arguments will fetch all related objects, and may result in a performance decrease. -
Reference: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-related
-
-
Similar to
select_related
,prefetch_related
can be used to reduce the number queries, but works with ManyToMany relationships. One key difference to note, however, is that the JOIN is done using Python and not SQL.
- Only call
.format()
to perform interpolation on unicode strings to avoidUnicodeEncodeError
. - Pass object IDs into celery tasks and not objects. Use the ID to get a fresh object out of the database. If an object is passed into a celery task, the object may have changed by the time the task runs.
- Good:
send_email.delay(order.id)
- Bad:
send_email.delay(order)
- Good: