Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Python Concurrency

References

Concurrency

Options:

  • multiprocessing
  • threading
  • event based programming
  • gevents

multiprocessing

import multiprocessing as mp

def downloader():
    pool = []
    for user in users:
        p = mp.Process(download_photo, user)
        pool.append(p)
        p.start()

    for p in pool:
        p.join()
  • Each network request is still blocking, but only blocks process for that user.
  • Gives parallellism, multi-core, Concurrency
  • High overhead on processes

multi-threading

import threading

def downloader():
    pool = []
    for user in users:
        t = threading.Thread(download_photo, user)
        pool.append(t)
        t.start()

    for t in pool:
        t.join()
  • Lighter weight than processes
  • gives concurrency
  • multi-Threaded programming is hard, writing correct code is difficult, troubleshooting more troublesome
  • another problem: CPython & GIL (Global Interpreter Lock) ** GIL: -- Larry Hastings: https://www.youtube.com/watch?v=P3AyI_u66Bw

event-based programming

  • twisted
import twisted
  • code becomes very complicated
  • write loop
  • wire call backs to loops etc

green threads

  • great for I/O bound apps that need to be highly concurrent
  • they are user space, OS does not create / schedule them
  • they are cooperatively scheduled
  • extremely lightweight compared to threads
  • 20-30K concurrent connections, threads do not give you this b/c of memory overhead
  • used at "web scale" at : ** Pinterest, Facebook, PayPal, Disqus, ...
import gevent
from gevent import monkey
monkey.patch_all()

def downloader():
    pool = []
    for user in users:
        g = gevent.Greenlet(download_photo, user)        
        g.start()
        pool.append(g)

    gevent.joinall(pool)
  • api is exposed as asyncronous api

The building blocks

from greenlet import greenlet

gr1 = greenlet(print_red)
gr2 = greenlet(print_blue)
gr1.switch()

def print_red():
    print('red')
    gr2.switch()  # switches to funtion print_blue
    print('red done')

def print_blue():
    print('blue')
    gr1.switch()  # switches to print_red function, but does not re-run, "resumes", so 'red_done' is printed
    print('blue_done')
  • .switch() did: ** pause current + yield control flow to the next greenlet ** next time switch is called, it was resumed: next.switch()
  • greenlet is written in C
  • every greenlet has a parent
  • gevent uses greenlets, for coroutines via assembly-based stack slicing to get cooperative execution units
  • gevent uses libev, high-performance event loop written in C ** libev gives you an API to register event_handler callbacks ** libev's event loop watches for events ** when event occurs, libev calls registered callbacks
g = gevent.Greenlet(download_photo, user)  
  • gevent initiates Greenlet class
  • Class initialization instanciates a small greenlet and it sets its parent to 'Hub' greenlet
  • Hub is where the looping happens
class Greenlet(greenlet):
    def __init__(self, run=None,...)
        greenlet.__init__(self, None, get_hub())

where

get_hub sets -> g.parent = Hub
  • Hub is the greenlet that runs the event loop, 1 in a thread
  • Greenlet() creates two things: ** a greenlet for our function (download_photos) ** Sets its .parent to the event loop (i.e. Hub) greenlet

g.start() registers its switch funtion to event loop. self.parent.loop.run_callback(self.switch) becomes Hub.loop.run_callback(self.switch) # this is registered as pre_block_watcher (run it before you block)

  • gevent.joinall(pool) runs the loop: it switches to Hub loop
  • gevent.join() is the short version
from gevent import monkey
monkey.patch_all()
  • What above code does is on the fly it replaces standard libraries, e.g. socket, with gevent.socket
  • monkey patching makes libraries co-operative, non-blocking
  • when the blocking call (socket)is made, it registers it into loop (Hub)and runs the loop
  • gevent gives us non/blocking I/O

gevent Minuses

  • gevent does not give you parallellism
  • non-cooperative code will block the entire proces ** C-extensions (e.g. database drivers) *** -> use pure Python libraries (can take advantage of greenlets) ** compute-bound greenlets (can hog cpu) *** -> use gevent.sleep(0) *** -> use greenlet blocking detection
  • monkey-patch may have confusing implications ** order of imports matter!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment