Skip to content

Instantly share code, notes, and snippets.

@andrewgodwin
Last active March 10, 2016 06:00
Show Gist options
  • Star 43 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save andrewgodwin/b3f826a879eb84a70625 to your computer and use it in GitHub Desktop.
Save andrewgodwin/b3f826a879eb84a70625 to your computer and use it in GitHub Desktop.
Django Channels Proposal
Going Beyond Request-Response - Django Channels
-----------------------------------------------
For a long time, Django has been directly tied to the request-response cycle
of HTTP. This has, and continues to, work really well and is a nice, simple
abstraction that works well with concepts like middleware.
However, we've also seen the rise of things like long-polling, websockets and
now HTTP2 has push built-in, and Django is not designed well for this future
where the server must actively hold open client connections and send
information as it happens.
I'm proposing a moderately large change to the way Django works that maintains
a lot of backwards compatability, but allows new ways to write Django code
that not only allows neat handling of pushes and socket-type interactions
but also provides a framework that would allow larger sites to distribute
server workloads better.
Goals
-----
Backwards Compatability
While there will always be a few edge cases, the intention is that all
existing code will keep working, with no major changes to the current
request-response level APIs.
Request-Response maintained
If you don't want this new stuff, Django will still work as it does today;
in normal request-response mode tied to WSGI (though this work may slow
that down a bit if it still routes through the new layer in-memory). You'll
still be able to write a hello world view that takes a request and returns
a response.
Native WebSocket and long-polling support
These are the two most popular methods right now for push-style
interactions with browsers, and Django should ship native or semi-native
support for them, with a relatively painless setup experience.
Does not require greenlets, asyncio, stackless...
The solution I propose does not use async techniques inside the Python
process itself; instead, it's designed to work across multiple processes
and across a network, relying on the "channel layer" to do most of the
async-related work and a worker model to get the maximum out of individual]
processes.
This also means that we don't need to re-educate users about async
"gotchas", such as missing yields locking your entire process, or not
knowing which functions will context-switch and which ones won't.
The proposal actually does use some in-process async for the "interface
servers", but this is not a place end-developers will write code and can
be swapped out for implementations with other async solutions or even other
languages.
Still runs in a single process
The solution is designed to still run in a single "runserver" process
for development while still behaving the same way it would in production -
just slower. WebSocket and long-polling support would not be bundled into
runserver necesasrily, but the custom events and model-save stuff would
work.
Basic Design
------------
The key crux of the design is making django what I will refer to as
"task-oriented". Rather than a series of views that take a request and return
a response, it is a series of consumers that receive messages from a single
channel and write messages back to other channels.
Notably, each task can only receive from the one channel it is assigned to that
causes it to be called; you cannot wait on channels in the middle of a process,
as this would require in-process async support like greenlets. No task should
block like this, though they are fine doing long but not indefinite actions
like complex SQL queries.
A "channel", in this definition, is similar to the Go definition of a
channel - it is a multi-reader, multi-writer, FIFO queue, with messages
received at-most-once by only one of the readers. Messages must be serialisable
to JSON, and will always be a dict at the top-level.
Django would be run as two separate components in a production environment;
interface servers, which would service WSGI, WebSocket or long-polling requests
(and in the latter cases, would have to be written using in-process async or
an alternate language), and worker servers, which actually run the business
logic code. They communicate via the "channel layer", details of which are
covered later but which is essentially an external queue server.
In the runserver environment or a WSGI-only environment, we would
still allow the interface and worker layers to be run in the same process
and use an in-memory channel layer to communicate.
Here's an ASCII diagram:
HTTP
|
--------
gunicorn
--------
|
WSGI WebSocket
| |
-------------- -------------------
WSGI interface WebSocket interface
-------------- -------------------
| |
V V
-------------
Channel layer
-------------
|
----------+----------
| | |
------ ------
worker worker ...
------ ------
Serving Requests
----------------
These interface layers do one thing - they translate incoming requests and
connections into a pair of channels; one for receiving data, and one for
sending it. Notably, unlike in some other async solutions, there is a single
common channel for receiving data on for a given protocol, and only the sending
channels differ per client - because of the limitations placed on us by having
no in-process async, we cannot subscribe consumers to channels dynamically, and
so we must have one channel for all received data.
For now, let's ignore the Django routing and rendering framework and deal with
raw responses to HTTP and WebSockets. The specific naming and keys I use here
are obviously not final, but should give you some idea of what I mean.
Incoming HTTP requests are handled by a WSGI server and passed to Django's WSGI
interface, which invents a unique channel name for the response - we'll use
"django.wsgi.response.12345", and posts a message to the "django.wsgi.request"
channel::
{
"response_channel": "django.wsgi.response.12345",
"method": "GET",
"path": "/foo/bar/",
"GET": [["query", "value"]],
...
}
Meanwhile, in your Django app you have declared a consumer for the request
channel::
@consumer("django.wsgi.request")
def hello_world(response_channel, path, **kwargs):
Channel(response_channel).send({
"mimetype": "text/plain",
"content": "Hello World",
})
This takes an incoming request, constructs a basic response, and then sends it
back. Obviously this isn't very useful, but now compare it with a websocket
echo server::
@consumer("django.websocket.receive")
def hello_world(send_channel, path, **kwargs):
Channel(send_channel).send({
"content": "Hello World",
})
The pattern works again here, though of course WebSocket responses don't have
a mimetype as they're just byte streams.
Now, let's look at how this works with normal Django. If you just want to
run a site as it does now, then all that you need is this sort of function
(which will ship with Django and be tied in by default)::
@consumer("django.wsgi.request")
def handle_request(response_channel, **kwargs):
request = HTTPRequest.from_data(kwargs)
view = url_resolver.resolve(request.path)
response = run_middleware_and_view(request, view)
Channel(response_channel).send(response.to_data)
This seamlessly maps the requests transported over the channel layer into
normal Django requests, with some translation to and from a standard JSON
format for requests and responses, and is likely how most people would
use Django. Use cases for overriding this behaviour, though, are things
like streaming via chunked responses, which you can kind of consider
as one way of doing push messages.
The URL resolver itself could also be implemented as a consumer which publishes
to other named channels, for places where you wish to service different URLs
using lower-level handling than request response (or for different websocket
behaviour based on path, for example).
Let's look now at a typical example for push-based systems, a simple chatroom
application. We'll assume the use of websockets here::
members = RedisBackedSet()
@consumer("django.websocket.connect")
def connected(send_channel, **kwargs):
"New user connected, add them to room list"
members.add(send_channel)
@consumer("django.websocket.disconnect")
def disconnected(send_channel, **kwargs):
"User disconnected, remove them from room list"
members.remove(send_channel)
@consumer("django.websocket.receive")
def new_message(send_channel, data, **kwargs):
"Someone sent message, broadcast it to room"
for member in members:
Channel(member).send(data)
This example assumes the use of some network-wide store for the channel names
of those sockets that are in the channel. Here it's an imaginary redis set
class, but it could just as easily be ORM queries.
Because we don't have the pattern of asynchronous programming where we can
leave the initial connecting thread around listening for messages, we have to
persist the set of connections rather than relying on threads running in the
background, but on the plus side this means better resilience as even if
a worker crashes the room state and socket are not lost.
More Events
-----------
Websocket and HTTP requests are one thing, but the same model can then be
extended to enable a powerful mixing of request/response style views and
push-based actions, thanks to custom events and model events.
First, let's look at model events. The idea here is that we'd basically turn
the post_save signal into an event as well - so, for example, if you wanted
a page that updated live as new people signed up to your event (let's use
a long-poll example here, with a consumer routed from the urlconf)::
from myapp.models import Signup
listeners = RedisBackedSet()
def signup_view(request):
if request.method == "POST":
Signup.objects.create(email=request.POST["email"])
return render(request, "thanks.html")
return render(request, "signup.html")
# Corresponding URL entry:
# url(r'^longpoll/$', Channel("messaging.long-poll").as_view())
@consumer("messaging.long-poll")
def connected(response_channel, **kwargs):
"New user connected, add them to listener list"
members.add(response_channel)
@consumer("myapp.models.signup.post_save")
def new_message(app_label, model_name, pk, **kwargs):
"Someone signed up, broadcast it"
instance = apps.get_model(app_label, model_name).objects.get(pk=pk)
content = json.dumps({"action": "signup", "email": instance.email})
message = HttpResponse(content).channel_encode()
for listener in listeners:
Channel(listener).send(message)
members.remove(listener)
The same mechanism could be used for things like thumbnailing and calculations
that don't require the execution guarantee of a traditional task queue (this
is not intended to replace task queues like Celery; they have different
delivery and latency requirements).
Now let's look at a custom event - for this contrived example, we'll update
a database every time a page is viewed - but we'll speed up the request/response
cycle by offloading this to a separate consumer::
from myapp.models import Stats
def some_view(request):
Channel("myapp.new_view").send({"ip_address": request.META['REMOTE_ADDR']})
return render(request, "signup.html")
@consumer("myapp.new_view")
def save_stat(ip_address, **kwargs):
entry = Stats.objects.filter(ip_address=ip_address).first()
if entry is None:
entry = Stats(ip_address=ip_address)
entry.count += 1
entry.save()
What This Means
---------------
Django basically gains a worker model, natively enabling it to offload things
outside the request/response cycle but still running inside "normal" Django,
with no startup and shutdown overhead and the ability to natively handle
push messages, fan-out and other desirable asynchronous behaviours.
It doesn't enable the full power that asynchronous programming itself might,
but the restrictions imposed (consumers can only be called when an event happens,
nothing can live-wait on events except than via the consumer subscribe system)
means that it's basically impossible to deadlock a set of worker programs and
lets us sidestep around Python not having native asynchrony in 2.7.
The Channel System
------------------
Channels will be a pluggable backend, with three shipping with Django by default:
- An in-memory backend, used for single-process deploys of Django (interface
and workers are run in separate threads, likely WSGI/runserver only)
- A Django ORM-backed backend, for development and testing only - won't perform
well, but also requires no new dependencies.
- A Redis backend, using Redis lists and the BLPOP/RPUSH/SET commands to
implement the channels and message expiry.
We're not introducing Redis as a required dependency, since you can just run
it with the in-memory backend and not use any new features,
and we'll be encouraging further implementations of channel backends with a
well-defined pluggable API and acceptable set of delivery behaviours.
The Interface Servers
---------------------
The initial plan is to ship with three interface servers:
- A synchronous WSGI server, using Django's existing WSGI code
- A WebSocket server, using one of greenlets/asyncio/Twisted/etc (TBD)
- A long polling-capable WSGI server, using one of greenlets/asyncio/Twisted/etc (TBD)
These should be relatively simple pieces of code and it's likely that more than
one version of the ones that require in-process async will be made to support
different runtimes (e.g. asyncio is Python 3.3 and up only).
It's also conceivable that an interface server could be written in a
non-Python language as long as it talked a common channel server format,
though this isn't in the initial scope.
The Worker Servers
------------------
These will just be processes that loop, running one consumer at a time. On
startup, the server compiles a list of all consumers and the channels they're
listening on, and then picks the first available message from those channels,
runs it, and repeats.
It's likely that worker servers would have options to only run a subset of
consumers based on channel name.
Built-in channels
-----------------
The following are the proposed channels that would come with Django:
* django.wsgi.request - Incoming WSGI requests with response channel and request data
* django.wsgi.disconnect - Client disconnected before response sent. Useful for long-poll.
* django.websocket.connect - New websocket connected with response channel, IP address, path etc.
* django.websocket.receive - New data on websocket with response channel, IP address, path etc.
* django.websocket.disconnect - Websocket closed with (closed) response channel, IP address, path etc.
Django wouldn't ship with built-in channels that mirror things like the post_save
signals, but we'd supply easy-to-use shortcuts for this and documentation around
how to do it (including highlighting how channels are single-receiver, not broadcast)
Channel Lifetime
----------------
The first version of this proposal had the ability to mark channels as
"closed", intended for both marking the end of a streaming response and to aid
in garbage collection.
However, given the fact that channels are a global, string-based namespace,
this makes a lot less sense than in a local language with memory allocation
and/or ownership. Given this, channels are considered to always exist; sending
to a channel should always work, and receiving from a nonexistent channel
should block until a message is available.
Response channels do, of course, share this global namespace and so it is
crucial that there is an ability to name channels uniquely and avoid collisions
(though there are some protocols where you may want response channels to be
predictable if you can somehow loadbalance response separately from request -
this would be possible with a UDP based protocol, for example). I suggested
that we just use either UUIDs or client address + random value for this.
Some element of "garbage collection" is still needed, mostly because of
response channels where the interface server has already disconnected
(nothing will ever consume those messages). Thus, messages will have a maximum
lifetime, with the default value being 60 seconds, but adjustable by
the user. This is also the value interface servers will use to time out if they
are actively waiting for a response (like HTTP).
There is a possible situation where a more push-oriented interface server
(like a WebSocket one) persists a connection longer than the system expects -
for example, a user has a chatroom as described above and restarts Redis,
thus losing all the response channel names, even though the interface server is
still holding them open. This is something client code should work around,
with a keep-alive or time-since-last-seen pattern (we can assume that these
connections will close eventually, and users can restart these interface servers
too if this is a common issue, though it's also easy to make a very resilient
system on top of this).
The channel layer will be responsible for expiring messages, including any
clock sync issues if applicable (the layer is allowed to let the message
last longer than expiry, of course, just not less time).
This expiry will also hopefully reinforce to users that Django Channels are not
a task queue, and do not come with a guarantee of delivery, though it is probably
inevitable someone will massively increase the expiry and use them as such.
Potential Issues
----------------
* Large file uploads aren't going to work as a single request message by
themselves, and the current strategy of offloading them onto local files
if they're too big is going to be problematic. They could either be
sent over in chunks through messages or some other storage solution devised.
* Middleware would only be appled to requests sent via the request-response
method; this isn't too important, as WSGI middleware exists for the few
things you might want to do to raw responses. Middleware would also only
see the "request" portion of consumers routed via the urlconf.
* There needs to be discovery or registration code for finding consumers and
registering them, either like models and autoloading or with url resolvers
and views and a manual single registry.
* Large responses might overflow the per-message limits of whatever system
is handling channels (e.g. even though Redis is 512MB per key, you could
imagine having hundreds of 2MB responses overflowing that). Each channel
backend would be responsible for managing this - e.g. the Redis backend
would probably put response data over 20KB or so in its own key.
* Custom response classes aren't possible without implementing the
encode/decode methods and overriding the class the interface server uses,
though this is not a problem if the response only affects already-serialised
attributes like "content" and "headers".
* The existing method of doing streamed responses will need adapting to do
chunked response messages, but this seems feasible to implement.
Other Notes
-----------
* It would probably be nice to upgrade the URL resolver to also support
dispatching directly to Channels, which can probably be done with a
Channel.as_view.
* This would likely be developed as a third-party app initially that would work
with 1.8 and perhaps 1.7 to prove itself and make sure the ideas are sound
before being merged into core. This is mostly possible as this code is
pretty much additive around existing Django.
* The @consumer decorator (or whatever it is renamed to) will accept multiple
channel names to subscribe to, and be called for any of them.
* It's likely that there will be a built-in limit of one consumer per channel,
as the single-worker consumption would make it nondeterministic which consumer
gets run otherwise. If a user wants to run more than one consumer per message,
they can manually fan-out to multiple other channels.
* As this will slightly slow down the main request-response cycle, we could
implement it for everything but that, and drop the ability to drop down to
raw HTTP, but I'd be against this as I think having everything work off
the same system is worth it (and the ability to run multiple threads or
processes of workers easily would hopefully make scaling easier anyway)
@shaib
Copy link

shaib commented Jun 3, 2015

I'm trying to understand how this will work with ATOMIC_REQUESTS: Each message handling will be tied to a transaction? So, If I understand correctly, the post_* channel messages will always be handled in separate transactions?

@andrewgodwin
Copy link
Author

shaib: Yes, we'd move the per-request-style transaction handling to be per-consumer-call - it's a very analagous situation, and would also work as it currently does for "normal" views and middleware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment