Skip to content

Instantly share code, notes, and snippets.

@aronasorman
Last active August 8, 2018 22:09
Show Gist options
  • Save aronasorman/57b8c01e5ed2b7cbf876e7734b7b9f38 to your computer and use it in GitHub Desktop.
Save aronasorman/57b8c01e5ed2b7cbf876e7734b7b9f38 to your computer and use it in GitHub Desktop.
I expected the ioloop method to win here, and it does hold its ground when ran on my local machine.
But i'm surprised how reordered things are when ran on a server on GCP:
the naive method is still slow, but it's a third faster than my local machine, which clocked in at 120 seconds.
I'm honestly surprised that the naive HTTP2.0 method is so slow. Maybe because of the overhead of the http2 library i use.
As usual, the threaded method was the winner here. I'm guessing it's because requests' builtin libraries are fast enough for HEADs. The HTTP2 method would probably win if we actually fetch the content.
#!/usr/bin/env python
import timeit
import requests
import hyper
from hyper.contrib import HTTP20Adapter
from collections import Counter
from urlparse import urlparse
from multiprocessing.dummy import Pool
from tornado.httpclient import AsyncHTTPClient
from tornado import gen, ioloop
from multiprocessing import Value, Lock
url = "https://storage.googleapis.com/studio-content/storage/0/0/000021c2b641f5fcc7ca1fb605af4460.png"
urls = [url] * 100
print("This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.\n\n\n")
print("Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.")
c = Value("i", 0)
def test_synchronous():
for url in urls:
r = requests.head(url)
if r.status_code == 200:
c.value += 1 # no need to acquire lock, since it's synchronous
print("Number of successful calls: {}".format(c.value))
print "time for naive synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous, number=10))
print("Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.")
c = Value("i", 0)
def test_synchronous_http2():
session = requests.Session()
session.mount("https://storage.googleapis.com", HTTP20Adapter())
for url in urls:
r = session.head(url)
if r.status_code == 200:
c.value += 1 # no need to acquire lock, since it's synchronous
print("Number of successful calls: {}".format(c.value))
print "time for http2 synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous_http2, number=10))
print("Running the threaded method. This splits the request calls into three threads, and reuses sessions too.")
c = Value("i", 0)
session = requests.Session()
def handle_url(url):
resp = session.head(url)
if resp.status_code == 200:
with c.get_lock():
c.value += 1
def test_multiprocessing():
pool = Pool(3)
pool.map(handle_url, urls)
print("Number of successful calls: {}".format(c.value))
print "time for threaded method: {}\n\n\n".format(timeit.timeit(test_multiprocessing, number=10))
print("Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.")
c = Value("i", 0)
session = requests.Session()
session.mount("https://storage.googleapis.com", HTTP20Adapter())
def handle_url(url):
resp = session.head(url)
if resp.status_code == 200:
with c.get_lock():
c.value += 1
def test_http2_multiprocessing():
pool = Pool(3)
pool.map(handle_url, urls)
print("Number of successful calls: {}".format(c.value))
print "time for threaded http2 method: {}\n\n\n".format(timeit.timeit(test_http2_multiprocessing, "gc.enable()", number=10))
print("Running the IO loop method. This uses coroutines instead of separate OS threads.")
http_client = AsyncHTTPClient()
c = Value("i", 0)
@gen.coroutine
def async_fetch_gen(url):
response = yield http_client.fetch(url)
raise gen.Return(response.code)
@gen.coroutine
def async_main():
futures = []
for url in urls:
futures.append(async_fetch_gen(url))
results = yield gen.multi(futures)
for r in results:
if r == 200:
c.value += 1
def test_ioloop():
io_loop = ioloop.IOLoop.current()
io_loop.run_sync(async_main)
print("Number of successful calls: {}".format(c.value))
print "time for ioloop method: {}\n\n\n".format(timeit.timeit(test_ioloop, number=10))
requests
hyper
tornado
singledispatch
backports_abc
This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.
Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for naive synchronous method: 36.1137590408
Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for http2 synchronous method: 57.045787096
Running the threaded method. This splits the request calls into three threads, and reuses sessions too.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for threaded method: 6.38562583923
Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for threaded http2 method: 11.7339289188
Running the IO loop method. This uses coroutines instead of separate OS threads.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for ioloop method: 17.4569571018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment