zpoint/gunicorn.md

## gunicorn.md

      
    Raw
  

              gunicorn.md
            
          
    gunicorn workers

We've learned SyncWorker for gunicorn in part1, now let's see how other workers work

contents


eventlet
gevent
thread
tornado
read more

Eventlet

If you visite the official site of eventlet

Eventlet is a concurrent networking library for Python that allows you to change how you run your code, not how you write it.

It uses epoll or kqueue or libevent for highly scalable non-blocking I/O.
Coroutines ensure that the developer uses a blocking style of programming that is similar to threading, but provide the benefits of non-blocking I/O.
The event dispatch is implicit, which means you can easily use Eventlet from the Python interpreter, or as a small part of a larger application.


EventletWorker inherit from AsyncWorker, it override the init_process method and run method
def patch(self):
    hubs.use_hub()
    eventlet.monkey_patch()
    patch_sendfile()

def init_process(self):
    self.patch()
    super().init_process()
After fork from the master process, the init_process calls eventlet.monkey_patch()  , which replace the following modules by the corresponding eventlet support module by default
for name, modules_function in [
    ('os', _green_os_modules),
    ('select', _green_select_modules),
    ('socket', _green_socket_modules),
    ('thread', _green_thread_modules),
    ('time', _green_time_modules),
    ('MySQLdb', _green_MySQLdb),
    ('builtins', _green_builtins),
    ('subprocess', _green_subprocess_modules),
]
Eventlet replaced the default IO module by it's green module, when you calls the socket function, you are actually calling _green_socket_modules  , which implements nonblocking IO
On every socket read/write, or time.sleep, it actually save the current context and add the current gthread to the pooling list, and then calls pool to wait for next ready IO event
It's like the async keyword in python3, but with less code invasion
If you run your app in eventlet mode
gunicorn --workers 2 --worker-class eventlet mysite.wsgi

EventletWorker will spawn a new gthread, which in charge of accept connection from socket, after accept a new connection from socket, the gthread pass the django handle function to the greenpool, and use the greenpool to start the django function
Thanks for eventlet, we can simply change --worker-class to make our django application blocking IO to nonblocking IO
Compare to define async function directly, your code can run both in blocking and nonblocking mode, and easier to debug
But defining async function with async keyword directly, require you to design your code in async style from the top down, gives you more power about async control. For example, eventlet with django parallel two different request, while async function is able to parallel different IO operation in the same request
Gevent

If you visite the official site of gevent

gevent is a coroutine -based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libev or libuv event loop.
gevent is inspired by eventlet but features a more consistent API, simpler implementation and better performance.

The differences


gevent is built on top of libevent(since 1.0, gevent uses libev and c-ares.)

Signal handling is integrated with the event loop.
Other libevent-based libraries can integrate with your app through single event loop.
DNS requests are resolved asynchronously rather than via a threadpool of blocking calls.
WSGI server is based on the libevent’s built-in HTTP server, making it super fast.


gevent’s interface follows the conventions set by the standard library
gevent does not have all the features that Eventlet has.


If you had another library (written in C) that used libevent’s event loop and want to integrate them together in a single process, gevent support while eventlet does not
Let's go back to gunicorn
GeventWorker inherit from AsyncWorker, it also override the init_process method and run method
def patch(self):
    monkey.patch_all()

def init_process(self):
    self.patch()
    hub.reinit()
    super().init_process()
After fork from the master process, the init_process calls gevent.monkey()  , which replace the following modules by the corresponding gevent support module
def patch_all(socket=True, dns=True, time=True, select=True, thread=True, os=True, ssl=True,
              subprocess=True, sys=False, aggressive=True, Event=True,
              builtins=True, signal=True,
              queue=True, contextvars=True,
              **kwargs):
              pass
The pattern is similar to eventlet, the interface is different, so the actual function being called in run is slightly different
# gunicorn/workers/ggevent.py
from gevent.pool import Pool
from gevent.server import StreamServer

def run(self):
	# ...
	pool = Pool(self.worker_connections)
	# ...
	server = StreamServer(s, handle=hfun, spawn=pool, **ssl_args)
	# ...
	server.start()
If you run
gunicorn --workers 2 --worker-class eventlet mysite.wsgi

The pros and cons of using gevent is the same as eventlet, we are not repeating it again
If you focus more on performance, or you've  C lib that use libevent’s(or libev) event loop that want to integrate into Python in a single process, consider using gevent
If you rely on some specific features on eventlet such as eventlet.db_pool or eventlet.processes, you probably should keep using eventlet
thread

By default gunicorn use the sync mode, It prefork workers number of process and each worker is able to handle one request at a time
ThreadWorker inherit from Worker, it also override the init_process method and run method
def init_process(self):
    self.tpool = self.get_thread_pool()
    self.poller = selectors.DefaultSelector()
    self._lock = RLock()
    super().init_process()

def enqueue_req(self, conn):
    conn.init()
    # submit the connection to a worker
    fs = self.tpool.submit(self.handle, conn)
    self._wrap_future(fs, conn)

def accept(self, server, listener):
    try:
        sock, client = listener.accept()
        # initialize the connection object
        conn = TConn(self.cfg, sock, client, server)
        self.nr_conns += 1
        # enqueue the job
        self.enqueue_req(conn)
    except EnvironmentError as e:
        if e.errno not in (errno.EAGAIN, errno.ECONNABORTED,
                           errno.EWOULDBLOCK):
            raise
            
def run(self):
    # ....
We can see that init_process create a thread pool, and accept just push the established connection to the queue inside the ThreadPool object


If there is a concern about the application memory footprint, using threads and its corresponding gthread worker class in favor of workers yields better performance because the application is loaded once per worker and every thread running on the worker shares some memory, this comes to the expense of some additional CPU consumption.


Let's see an example
gunicorn --workers 1 --worker-class gthread --threads 2 mysite.wsgi
The --threads will only affect gthread worker class, other worker class will not be affected by --threads parameter

Each worker initialize a ThreadPool with size --threads threads, whenever the main thread accept a socket object, the object is pushed into the queue, and the working thread in ThreadPool will pop it from the queue and delegate the actual request to django application
tornado

The last worker class is tornado, the code is pretty simple
# gunicorn/gunicorn/workers/gtornado.py
def init_process(self):
    # IOLoop cannot survive a fork or be shared across processes
    # in any way. When multiple processes are being used, each process
    # should create its own IOLoop. We should clear current IOLoop
    # if exists before os.fork.
    IOLoop.clear_current()
    super().init_process()
    
def run(self):
    # ...
The run method initlaize monitor utility in gunicorn , start a tornado server instance, bind the listening sockets to the tornado server, and finally runs the IOLoop
read more


what are you using gevent for?
Comparing gevent to eventlet
Better performance by optimizing Gunicorn config