EricFries/python_pr.md

## python_pr.md

      
    Raw
  

              python_pr.md
            
          
    Django Performance Guidelines

ORM


Incorrect usage of the Django ORM is a common cause of performance issues.  Therefore, it is important to know when querysets are evaluated (i.e., hit the database)

Common mistakes that cause the queryset to be evaulated include iterating over the queryset or calling any of len(), list(), bool() with the queryset as an argument.
Reference: (QuerySet API reference | Django documentation | Django)


Use exists() to check if a queryset has any results in it.

Good:

 queryset = Account.objects.filter(some_attribute=True)
 if queryset.exists():
     # do something


Bad:

 queryset = Account.objects.filter(some_attribute=True)
 if len(queryset):
     # do something


Reference: QuerySet API reference | Django documentation | Django


Consider breaking up complex queries by assigning “intermediate” querysets to variables.  This can be very helpful in debugging, and does not impact performance since the intermediate steps do not cause the queryset to be evaluated (see above).
queryset = Account.objects.filter(some_attribute=True)
queryset = queryset.exclude(in_production=False)
queryset = queryset.exclude(theme__old_context=False)


Avoid looping over querysets.  This causes Django to create an object for each item and load it into memory. (sql - Why is iterating through a large Django QuerySet consuming massive amounts of memory? - Stack Overflow)


If you cannot avoid looping, consider using iterator()

Evaluates the queryset (by performing the query) and returns an iterator (see PEP 234 ) over the results. A queryset typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the queryset level (internally, the default iterator calls iterator() and caches the return value). For a queryset which returns a large number of objects that you only need to access once, this can result in better performance…


Reference: (QuerySet API reference | Django documentation | Django)


Use update() to bulk update a queryset, rather than looping, updating each item and saving individually.

Be aware that update() does not call save()or trigger pre_save and post_save signals.  This can be useful when you need to make updates without running that logic (e.g., in a data migration), but if used in other contexts may bypass important business logic.
Reference: Making queries | Django documentation | Django


Leverage select_related to follow foreign key relationships in order to select related objects, and reduce the number of queries performed.  This will work with ForeignKey and OneToOne relationships, but not ManyToMany.


A potential example of select_related usage is a situation where data cannot be retrieved via the ORM and iteration is necessary (e.g., accessing a model property instead of a model database field).
const display_names = []
# where display_menu is a calculated property and not a field on Menu and therefore can't be retrieved using the ORM.
for menu in Menu.objects.select_related('location'):
    display_names.append(menu.location.display_name)


The above example results in a single query.  However, if you were to remove select_related and use all(), a query would occur each time menu.location is called.


Be aware that calling select_related without any arguments will fetch all related objects, and may result in a performance decrease.


Reference: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#select-related


Similar to select_related, prefetch_related can be used to reduce the number queries, but works with ManyToMany relationships.  One key difference to note, however, is that the JOIN is done using Python and not SQL.

Reference: https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related


Common Bugs


Only call .format() to perform interpolation on unicode strings to avoid UnicodeEncodeError.

https://stackoverflow.com/questions/3235386/python-using-format-on-a-unicode-escaped-string


Pass object IDs into celery tasks and not objects.  Use the ID to get a fresh object out of the database.  If an object is passed into a celery task, the object may have changed by the time the task runs.

Good:
send_email.delay(order.id)
Bad:
send_email.delay(order)