-
-
Save andrewgodwin/b3f826a879eb84a70625 to your computer and use it in GitHub Desktop.
Django Channels Proposal
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Going Beyond Request-Response - Django Channels | |
----------------------------------------------- | |
For a long time, Django has been directly tied to the request-response cycle | |
of HTTP. This has, and continues to, work really well and is a nice, simple | |
abstraction that works well with concepts like middleware. | |
However, we've also seen the rise of things like long-polling, websockets and | |
now HTTP2 has push built-in, and Django is not designed well for this future | |
where the server must actively hold open client connections and send | |
information as it happens. | |
I'm proposing a moderately large change to the way Django works that maintains | |
a lot of backwards compatability, but allows new ways to write Django code | |
that not only allows neat handling of pushes and socket-type interactions | |
but also provides a framework that would allow larger sites to distribute | |
server workloads better. | |
Goals | |
----- | |
Backwards Compatability | |
While there will always be a few edge cases, the intention is that all | |
existing code will keep working, with no major changes to the current | |
request-response level APIs. | |
Request-Response maintained | |
If you don't want this new stuff, Django will still work as it does today; | |
in normal request-response mode tied to WSGI (though this work may slow | |
that down a bit if it still routes through the new layer in-memory). You'll | |
still be able to write a hello world view that takes a request and returns | |
a response. | |
Native WebSocket and long-polling support | |
These are the two most popular methods right now for push-style | |
interactions with browsers, and Django should ship native or semi-native | |
support for them, with a relatively painless setup experience. | |
Does not require greenlets, asyncio, stackless... | |
The solution I propose does not use async techniques inside the Python | |
process itself; instead, it's designed to work across multiple processes | |
and across a network, relying on the "channel layer" to do most of the | |
async-related work and a worker model to get the maximum out of individual] | |
processes. | |
This also means that we don't need to re-educate users about async | |
"gotchas", such as missing yields locking your entire process, or not | |
knowing which functions will context-switch and which ones won't. | |
The proposal actually does use some in-process async for the "interface | |
servers", but this is not a place end-developers will write code and can | |
be swapped out for implementations with other async solutions or even other | |
languages. | |
Still runs in a single process | |
The solution is designed to still run in a single "runserver" process | |
for development while still behaving the same way it would in production - | |
just slower. WebSocket and long-polling support would not be bundled into | |
runserver necesasrily, but the custom events and model-save stuff would | |
work. | |
Basic Design | |
------------ | |
The key crux of the design is making django what I will refer to as | |
"task-oriented". Rather than a series of views that take a request and return | |
a response, it is a series of consumers that receive messages from a single | |
channel and write messages back to other channels. | |
Notably, each task can only receive from the one channel it is assigned to that | |
causes it to be called; you cannot wait on channels in the middle of a process, | |
as this would require in-process async support like greenlets. No task should | |
block like this, though they are fine doing long but not indefinite actions | |
like complex SQL queries. | |
A "channel", in this definition, is similar to the Go definition of a | |
channel - it is a multi-reader, multi-writer, FIFO queue, with messages | |
received at-most-once by only one of the readers. Messages must be serialisable | |
to JSON, and will always be a dict at the top-level. | |
Django would be run as two separate components in a production environment; | |
interface servers, which would service WSGI, WebSocket or long-polling requests | |
(and in the latter cases, would have to be written using in-process async or | |
an alternate language), and worker servers, which actually run the business | |
logic code. They communicate via the "channel layer", details of which are | |
covered later but which is essentially an external queue server. | |
In the runserver environment or a WSGI-only environment, we would | |
still allow the interface and worker layers to be run in the same process | |
and use an in-memory channel layer to communicate. | |
Here's an ASCII diagram: | |
HTTP | |
| | |
-------- | |
gunicorn | |
-------- | |
| | |
WSGI WebSocket | |
| | | |
-------------- ------------------- | |
WSGI interface WebSocket interface | |
-------------- ------------------- | |
| | | |
V V | |
------------- | |
Channel layer | |
------------- | |
| | |
----------+---------- | |
| | | | |
------ ------ | |
worker worker ... | |
------ ------ | |
Serving Requests | |
---------------- | |
These interface layers do one thing - they translate incoming requests and | |
connections into a pair of channels; one for receiving data, and one for | |
sending it. Notably, unlike in some other async solutions, there is a single | |
common channel for receiving data on for a given protocol, and only the sending | |
channels differ per client - because of the limitations placed on us by having | |
no in-process async, we cannot subscribe consumers to channels dynamically, and | |
so we must have one channel for all received data. | |
For now, let's ignore the Django routing and rendering framework and deal with | |
raw responses to HTTP and WebSockets. The specific naming and keys I use here | |
are obviously not final, but should give you some idea of what I mean. | |
Incoming HTTP requests are handled by a WSGI server and passed to Django's WSGI | |
interface, which invents a unique channel name for the response - we'll use | |
"django.wsgi.response.12345", and posts a message to the "django.wsgi.request" | |
channel:: | |
{ | |
"response_channel": "django.wsgi.response.12345", | |
"method": "GET", | |
"path": "/foo/bar/", | |
"GET": [["query", "value"]], | |
... | |
} | |
Meanwhile, in your Django app you have declared a consumer for the request | |
channel:: | |
@consumer("django.wsgi.request") | |
def hello_world(response_channel, path, **kwargs): | |
Channel(response_channel).send({ | |
"mimetype": "text/plain", | |
"content": "Hello World", | |
}) | |
This takes an incoming request, constructs a basic response, and then sends it | |
back. Obviously this isn't very useful, but now compare it with a websocket | |
echo server:: | |
@consumer("django.websocket.receive") | |
def hello_world(send_channel, path, **kwargs): | |
Channel(send_channel).send({ | |
"content": "Hello World", | |
}) | |
The pattern works again here, though of course WebSocket responses don't have | |
a mimetype as they're just byte streams. | |
Now, let's look at how this works with normal Django. If you just want to | |
run a site as it does now, then all that you need is this sort of function | |
(which will ship with Django and be tied in by default):: | |
@consumer("django.wsgi.request") | |
def handle_request(response_channel, **kwargs): | |
request = HTTPRequest.from_data(kwargs) | |
view = url_resolver.resolve(request.path) | |
response = run_middleware_and_view(request, view) | |
Channel(response_channel).send(response.to_data) | |
This seamlessly maps the requests transported over the channel layer into | |
normal Django requests, with some translation to and from a standard JSON | |
format for requests and responses, and is likely how most people would | |
use Django. Use cases for overriding this behaviour, though, are things | |
like streaming via chunked responses, which you can kind of consider | |
as one way of doing push messages. | |
The URL resolver itself could also be implemented as a consumer which publishes | |
to other named channels, for places where you wish to service different URLs | |
using lower-level handling than request response (or for different websocket | |
behaviour based on path, for example). | |
Let's look now at a typical example for push-based systems, a simple chatroom | |
application. We'll assume the use of websockets here:: | |
members = RedisBackedSet() | |
@consumer("django.websocket.connect") | |
def connected(send_channel, **kwargs): | |
"New user connected, add them to room list" | |
members.add(send_channel) | |
@consumer("django.websocket.disconnect") | |
def disconnected(send_channel, **kwargs): | |
"User disconnected, remove them from room list" | |
members.remove(send_channel) | |
@consumer("django.websocket.receive") | |
def new_message(send_channel, data, **kwargs): | |
"Someone sent message, broadcast it to room" | |
for member in members: | |
Channel(member).send(data) | |
This example assumes the use of some network-wide store for the channel names | |
of those sockets that are in the channel. Here it's an imaginary redis set | |
class, but it could just as easily be ORM queries. | |
Because we don't have the pattern of asynchronous programming where we can | |
leave the initial connecting thread around listening for messages, we have to | |
persist the set of connections rather than relying on threads running in the | |
background, but on the plus side this means better resilience as even if | |
a worker crashes the room state and socket are not lost. | |
More Events | |
----------- | |
Websocket and HTTP requests are one thing, but the same model can then be | |
extended to enable a powerful mixing of request/response style views and | |
push-based actions, thanks to custom events and model events. | |
First, let's look at model events. The idea here is that we'd basically turn | |
the post_save signal into an event as well - so, for example, if you wanted | |
a page that updated live as new people signed up to your event (let's use | |
a long-poll example here, with a consumer routed from the urlconf):: | |
from myapp.models import Signup | |
listeners = RedisBackedSet() | |
def signup_view(request): | |
if request.method == "POST": | |
Signup.objects.create(email=request.POST["email"]) | |
return render(request, "thanks.html") | |
return render(request, "signup.html") | |
# Corresponding URL entry: | |
# url(r'^longpoll/$', Channel("messaging.long-poll").as_view()) | |
@consumer("messaging.long-poll") | |
def connected(response_channel, **kwargs): | |
"New user connected, add them to listener list" | |
members.add(response_channel) | |
@consumer("myapp.models.signup.post_save") | |
def new_message(app_label, model_name, pk, **kwargs): | |
"Someone signed up, broadcast it" | |
instance = apps.get_model(app_label, model_name).objects.get(pk=pk) | |
content = json.dumps({"action": "signup", "email": instance.email}) | |
message = HttpResponse(content).channel_encode() | |
for listener in listeners: | |
Channel(listener).send(message) | |
members.remove(listener) | |
The same mechanism could be used for things like thumbnailing and calculations | |
that don't require the execution guarantee of a traditional task queue (this | |
is not intended to replace task queues like Celery; they have different | |
delivery and latency requirements). | |
Now let's look at a custom event - for this contrived example, we'll update | |
a database every time a page is viewed - but we'll speed up the request/response | |
cycle by offloading this to a separate consumer:: | |
from myapp.models import Stats | |
def some_view(request): | |
Channel("myapp.new_view").send({"ip_address": request.META['REMOTE_ADDR']}) | |
return render(request, "signup.html") | |
@consumer("myapp.new_view") | |
def save_stat(ip_address, **kwargs): | |
entry = Stats.objects.filter(ip_address=ip_address).first() | |
if entry is None: | |
entry = Stats(ip_address=ip_address) | |
entry.count += 1 | |
entry.save() | |
What This Means | |
--------------- | |
Django basically gains a worker model, natively enabling it to offload things | |
outside the request/response cycle but still running inside "normal" Django, | |
with no startup and shutdown overhead and the ability to natively handle | |
push messages, fan-out and other desirable asynchronous behaviours. | |
It doesn't enable the full power that asynchronous programming itself might, | |
but the restrictions imposed (consumers can only be called when an event happens, | |
nothing can live-wait on events except than via the consumer subscribe system) | |
means that it's basically impossible to deadlock a set of worker programs and | |
lets us sidestep around Python not having native asynchrony in 2.7. | |
The Channel System | |
------------------ | |
Channels will be a pluggable backend, with three shipping with Django by default: | |
- An in-memory backend, used for single-process deploys of Django (interface | |
and workers are run in separate threads, likely WSGI/runserver only) | |
- A Django ORM-backed backend, for development and testing only - won't perform | |
well, but also requires no new dependencies. | |
- A Redis backend, using Redis lists and the BLPOP/RPUSH/SET commands to | |
implement the channels and message expiry. | |
We're not introducing Redis as a required dependency, since you can just run | |
it with the in-memory backend and not use any new features, | |
and we'll be encouraging further implementations of channel backends with a | |
well-defined pluggable API and acceptable set of delivery behaviours. | |
The Interface Servers | |
--------------------- | |
The initial plan is to ship with three interface servers: | |
- A synchronous WSGI server, using Django's existing WSGI code | |
- A WebSocket server, using one of greenlets/asyncio/Twisted/etc (TBD) | |
- A long polling-capable WSGI server, using one of greenlets/asyncio/Twisted/etc (TBD) | |
These should be relatively simple pieces of code and it's likely that more than | |
one version of the ones that require in-process async will be made to support | |
different runtimes (e.g. asyncio is Python 3.3 and up only). | |
It's also conceivable that an interface server could be written in a | |
non-Python language as long as it talked a common channel server format, | |
though this isn't in the initial scope. | |
The Worker Servers | |
------------------ | |
These will just be processes that loop, running one consumer at a time. On | |
startup, the server compiles a list of all consumers and the channels they're | |
listening on, and then picks the first available message from those channels, | |
runs it, and repeats. | |
It's likely that worker servers would have options to only run a subset of | |
consumers based on channel name. | |
Built-in channels | |
----------------- | |
The following are the proposed channels that would come with Django: | |
* django.wsgi.request - Incoming WSGI requests with response channel and request data | |
* django.wsgi.disconnect - Client disconnected before response sent. Useful for long-poll. | |
* django.websocket.connect - New websocket connected with response channel, IP address, path etc. | |
* django.websocket.receive - New data on websocket with response channel, IP address, path etc. | |
* django.websocket.disconnect - Websocket closed with (closed) response channel, IP address, path etc. | |
Django wouldn't ship with built-in channels that mirror things like the post_save | |
signals, but we'd supply easy-to-use shortcuts for this and documentation around | |
how to do it (including highlighting how channels are single-receiver, not broadcast) | |
Channel Lifetime | |
---------------- | |
The first version of this proposal had the ability to mark channels as | |
"closed", intended for both marking the end of a streaming response and to aid | |
in garbage collection. | |
However, given the fact that channels are a global, string-based namespace, | |
this makes a lot less sense than in a local language with memory allocation | |
and/or ownership. Given this, channels are considered to always exist; sending | |
to a channel should always work, and receiving from a nonexistent channel | |
should block until a message is available. | |
Response channels do, of course, share this global namespace and so it is | |
crucial that there is an ability to name channels uniquely and avoid collisions | |
(though there are some protocols where you may want response channels to be | |
predictable if you can somehow loadbalance response separately from request - | |
this would be possible with a UDP based protocol, for example). I suggested | |
that we just use either UUIDs or client address + random value for this. | |
Some element of "garbage collection" is still needed, mostly because of | |
response channels where the interface server has already disconnected | |
(nothing will ever consume those messages). Thus, messages will have a maximum | |
lifetime, with the default value being 60 seconds, but adjustable by | |
the user. This is also the value interface servers will use to time out if they | |
are actively waiting for a response (like HTTP). | |
There is a possible situation where a more push-oriented interface server | |
(like a WebSocket one) persists a connection longer than the system expects - | |
for example, a user has a chatroom as described above and restarts Redis, | |
thus losing all the response channel names, even though the interface server is | |
still holding them open. This is something client code should work around, | |
with a keep-alive or time-since-last-seen pattern (we can assume that these | |
connections will close eventually, and users can restart these interface servers | |
too if this is a common issue, though it's also easy to make a very resilient | |
system on top of this). | |
The channel layer will be responsible for expiring messages, including any | |
clock sync issues if applicable (the layer is allowed to let the message | |
last longer than expiry, of course, just not less time). | |
This expiry will also hopefully reinforce to users that Django Channels are not | |
a task queue, and do not come with a guarantee of delivery, though it is probably | |
inevitable someone will massively increase the expiry and use them as such. | |
Potential Issues | |
---------------- | |
* Large file uploads aren't going to work as a single request message by | |
themselves, and the current strategy of offloading them onto local files | |
if they're too big is going to be problematic. They could either be | |
sent over in chunks through messages or some other storage solution devised. | |
* Middleware would only be appled to requests sent via the request-response | |
method; this isn't too important, as WSGI middleware exists for the few | |
things you might want to do to raw responses. Middleware would also only | |
see the "request" portion of consumers routed via the urlconf. | |
* There needs to be discovery or registration code for finding consumers and | |
registering them, either like models and autoloading or with url resolvers | |
and views and a manual single registry. | |
* Large responses might overflow the per-message limits of whatever system | |
is handling channels (e.g. even though Redis is 512MB per key, you could | |
imagine having hundreds of 2MB responses overflowing that). Each channel | |
backend would be responsible for managing this - e.g. the Redis backend | |
would probably put response data over 20KB or so in its own key. | |
* Custom response classes aren't possible without implementing the | |
encode/decode methods and overriding the class the interface server uses, | |
though this is not a problem if the response only affects already-serialised | |
attributes like "content" and "headers". | |
* The existing method of doing streamed responses will need adapting to do | |
chunked response messages, but this seems feasible to implement. | |
Other Notes | |
----------- | |
* It would probably be nice to upgrade the URL resolver to also support | |
dispatching directly to Channels, which can probably be done with a | |
Channel.as_view. | |
* This would likely be developed as a third-party app initially that would work | |
with 1.8 and perhaps 1.7 to prove itself and make sure the ideas are sound | |
before being merged into core. This is mostly possible as this code is | |
pretty much additive around existing Django. | |
* The @consumer decorator (or whatever it is renamed to) will accept multiple | |
channel names to subscribe to, and be called for any of them. | |
* It's likely that there will be a built-in limit of one consumer per channel, | |
as the single-worker consumption would make it nondeterministic which consumer | |
gets run otherwise. If a user wants to run more than one consumer per message, | |
they can manually fan-out to multiple other channels. | |
* As this will slightly slow down the main request-response cycle, we could | |
implement it for everything but that, and drop the ability to drop down to | |
raw HTTP, but I'd be against this as I think having everything work off | |
the same system is worth it (and the ability to run multiple threads or | |
processes of workers easily would hopefully make scaling easier anyway) |
shaib: Yes, we'd move the per-request-style transaction handling to be per-consumer-call - it's a very analagous situation, and would also work as it currently does for "normal" views and middleware.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm trying to understand how this will work with
ATOMIC_REQUESTS
: Each message handling will be tied to a transaction? So, If I understand correctly, the post_* channel messages will always be handled in separate transactions?