Skip to content

Instantly share code, notes, and snippets.

@gthomas
Created July 20, 2012 17:47
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gthomas/3152195 to your computer and use it in GitHub Desktop.
Save gthomas/3152195 to your computer and use it in GitHub Desktop.
Django Doesn't Scale

Django doesn't scale and what you can do about it

Django doesn't scale Frameworks don't scale -- they're general purpose.

Development speed

Frameworks remove initial learning curve -- can get an application started in a day but you don't know what's going on. Productivity falls, and ramps back up.

Most apps stop before this point, so it doesn't necessarily matter. Performance works kind of the same way.

Performance

Start from stratch, probably start with poor performance. Django devs have thought about performance, and probably performs better than your scratch solution.

However, under huge (success!) load, Django's performance assumptions may not hold up.

What can you do about this? Five big problems, solutions for those problems. Things that aren't in Django, but maybe should be.

Metrics

Measure twice, cut once.

"Our site is slow" -- if you've never measured you don't know why.

Django doesn't help with data collection (aside from Python logging module)

Sentry: capture and diagnose production errors (gets debugesque tracebacks outside of debug mode)

Also, you can put data into graphite with these things... (Graphite does real time graphing. Track response times, errors etc etc etc)

python-statsd: will handle putting data into Graphite for you low level

mmstats: higher level, from Urban Airship stores stats in memory mapped file. very efficient. process comes by to slurp the data and feed it graphite or another backend. Behaves well at very large scale.

Metrology: Easiest API for collecting stats and pushing to Graphite. Probably the best for beginners.

Put your statistics gathering into a middleware. Keep it light. 1-5% overhead. Small performance hit.

Graphite: graphite.readthedocs.org Graphite is a Django app, a bit of a pain to set up, but you know how to deploy it.

Generates ad-hoc real-time graphs. For example, create a graph of failed vs. successful logins. (Seems like a good way to get around the 'Python/Django has no logging' -- just add it)

You can overlay graphs to compare things.

Also there's New Relic... fantastic, no work, costs mad $$, can ignore all the tools. You'll need to add tooling for custom metrics to track more interesting things.

Caching

Difference between a site that feels fast or doesn't. Watch "Cache rules everything around me".

"But my app isn't cacheable!" -- probably not exactly right.

Handle it with resource decomposition. Decompose views into trivially cacheable resources.

Techniques...

Edge-side includes

Pull esi-tags out of your front-end caches (Varnish or similar). django-esi make it easy to integrate, resolve internally in development.

<esi:include src="/user/username" />

Now varnish knows how to cache it, front end knows where to get it. Slightly more complicated stack.

Two-phase template rendering

Render twice. First pass, no request info. Second pass, add request dependent info.

Use django-phased

{load phase-tags} {% phased with user %} Dynamic shit {% endphased %}

Read the documentation carefully.

Client side composition

Simplest. Use Javascript go fetch stuff from the server and put it in the DOM.

Some say client side evaluation is an integral part of REST. Basically like an ESI from the client.

Cache invalidation

Rough! @cache_page decorator is easy

Delete cache on update? What about your keyspace? Django cache keys are an MD5 sum of a lot of stuff, so invalidating the Django cache is a big pain in the ass.

Technique: serve everything from the cahche.

View hits cache, displays data. Checks the data for validity on the way out. Serve it to the user and rewrite that cache key. You can call the same regeneration operation from models etc.

Queueing background tasks

Celery! Rule of thumb, writes are more expensive than reads. Save your updates for later, off the request/response cycle.

You need to pay a little bit in efficiency to make the app more responsive for users.

Query count

Keep it low! Makes your view take a long time, because your view must wait for them. User select_related, prefetch_related, and raw queries.

Don't be afraid of the Database. "ORM is the Vietnam of Computer Science" ORM hides complexity, sometimes too much. Need to know what's happening to know when.

Innefficiencies

Queryset cloning. Model.objects.filter().filter().order_by() is slower than just writing it by hand. (Q will help)

Model instantiation 40k inits per second. See bitly/Muepgo Use values, values_list or cursors if you need a lot of data

Saving models. whole row is written, slow! Use update()

Bulk Inserts for loop is slow transaction is better copy_from is even beter

Django is database agnostic, but you don't have to be. Learn the database you're using. Modern databases are incredible.

Measure everything. Cache well. Use queues. Count queries. Use your database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment