aronasorman/conclusion.txt

## conclusion.txt
I expected the ioloop method to win here, and it does hold its ground when ran on my local machine.

But i'm surprised how reordered things are when ran on a server on GCP:

the naive method is still slow, but it's a third faster than my local machine, which clocked in at 120 seconds.

I'm honestly surprised that the naive HTTP2.0 method is so slow. Maybe because of the overhead of the http2 library i use.

As usual, the threaded method was the winner here. I'm guessing it's because requests' builtin libraries are fast enough for HEADs. The HTTP2 method would probably win if we actually fetch the content.

## requests_benchmark.py
#!/usr/bin/env python

import timeit
import requests
import hyper
from hyper.contrib import HTTP20Adapter
from collections import Counter
from urlparse import urlparse
from multiprocessing.dummy import Pool
from tornado.httpclient import AsyncHTTPClient
from tornado import gen, ioloop
from multiprocessing import Value, Lock


url = "https://storage.googleapis.com/studio-content/storage/0/0/000021c2b641f5fcc7ca1fb605af4460.png"

urls = [url] * 100

print("This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.\n\n\n")

print("Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.")
c = Value("i", 0)
def test_synchronous():
    for url in urls:
        r = requests.head(url)
        if r.status_code == 200:
            c.value += 1        # no need to acquire lock, since it's synchronous
    print("Number of successful calls: {}".format(c.value))

print "time for naive synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous, number=10))

print("Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.")
c = Value("i", 0)
def test_synchronous_http2():
    session = requests.Session()
    session.mount("https://storage.googleapis.com", HTTP20Adapter())
    for url in urls:
        r = session.head(url)
        if r.status_code == 200:
            c.value += 1        # no need to acquire lock, since it's synchronous
    print("Number of successful calls: {}".format(c.value))

print "time for http2 synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous_http2, number=10))

print("Running the threaded method. This splits the request calls into three threads, and reuses sessions too.")
c = Value("i", 0)
session = requests.Session()
def handle_url(url):
    resp = session.head(url)
    if resp.status_code == 200:
        with c.get_lock():
            c.value += 1

def test_multiprocessing():
    pool = Pool(3)
    pool.map(handle_url, urls)
    print("Number of successful calls: {}".format(c.value))

print "time for threaded method: {}\n\n\n".format(timeit.timeit(test_multiprocessing, number=10))

print("Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.")
c = Value("i", 0)
session = requests.Session()
session.mount("https://storage.googleapis.com", HTTP20Adapter())
def handle_url(url):
    resp = session.head(url)
    if resp.status_code == 200:
        with c.get_lock():
            c.value += 1

def test_http2_multiprocessing():
    pool = Pool(3)
    pool.map(handle_url, urls)
    print("Number of successful calls: {}".format(c.value))

print "time for threaded http2 method: {}\n\n\n".format(timeit.timeit(test_http2_multiprocessing, "gc.enable()", number=10))

print("Running the IO loop method. This uses coroutines instead of separate OS threads.")
http_client = AsyncHTTPClient()
c = Value("i", 0)
@gen.coroutine
def async_fetch_gen(url):
    response = yield http_client.fetch(url)
    raise gen.Return(response.code)

@gen.coroutine
def async_main():
    futures = []
    for url in urls:
        futures.append(async_fetch_gen(url))
    results = yield gen.multi(futures)
    for r in results:
        if r == 200:
            c.value += 1


def test_ioloop():
    io_loop = ioloop.IOLoop.current()
    io_loop.run_sync(async_main)
    print("Number of successful calls: {}".format(c.value))

print "time for ioloop method: {}\n\n\n".format(timeit.timeit(test_ioloop, number=10))

## requirements.txt
requests
hyper
tornado
singledispatch
backports_abc

## run_in_gcp_results.txt
This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.


Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for naive synchronous method: 36.1137590408


Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for http2 synchronous method: 57.045787096


Running the threaded method. This splits the request calls into three threads, and reuses sessions too.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for threaded method: 6.38562583923


Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for threaded http2 method: 11.7339289188


Running the IO loop method. This uses coroutines instead of separate OS threads.
Number of successful calls: 100
Number of successful calls: 200
Number of successful calls: 300
Number of successful calls: 400
Number of successful calls: 500
Number of successful calls: 600
Number of successful calls: 700
Number of successful calls: 800
Number of successful calls: 900
Number of successful calls: 1000
time for ioloop method: 17.4569571018
	I expected the ioloop method to win here, and it does hold its ground when ran on my local machine.

	But i'm surprised how reordered things are when ran on a server on GCP:

	the naive method is still slow, but it's a third faster than my local machine, which clocked in at 120 seconds.

	I'm honestly surprised that the naive HTTP2.0 method is so slow. Maybe because of the overhead of the http2 library i use.

	As usual, the threaded method was the winner here. I'm guessing it's because requests' builtin libraries are fast enough for HEADs. The HTTP2 method would probably win if we actually fetch the content.
	#!/usr/bin/env python

	import timeit
	import requests
	import hyper
	from hyper.contrib import HTTP20Adapter
	from collections import Counter
	from urlparse import urlparse
	from multiprocessing.dummy import Pool
	from tornado.httpclient import AsyncHTTPClient
	from tornado import gen, ioloop
	from multiprocessing import Value, Lock


	url = "https://storage.googleapis.com/studio-content/storage/0/0/000021c2b641f5fcc7ca1fb605af4460.png"

	urls = [url] * 100

	print("This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.\n\n\n")

	print("Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.")
	c = Value("i", 0)
	def test_synchronous():
	for url in urls:
	r = requests.head(url)
	if r.status_code == 200:
	c.value += 1 # no need to acquire lock, since it's synchronous
	print("Number of successful calls: {}".format(c.value))

	print "time for naive synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous, number=10))

	print("Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.")
	c = Value("i", 0)
	def test_synchronous_http2():
	session = requests.Session()
	session.mount("https://storage.googleapis.com", HTTP20Adapter())
	for url in urls:
	r = session.head(url)
	if r.status_code == 200:
	c.value += 1 # no need to acquire lock, since it's synchronous
	print("Number of successful calls: {}".format(c.value))

	print "time for http2 synchronous method: {}\n\n\n".format(timeit.timeit(test_synchronous_http2, number=10))

	print("Running the threaded method. This splits the request calls into three threads, and reuses sessions too.")
	c = Value("i", 0)
	session = requests.Session()
	def handle_url(url):
	resp = session.head(url)
	if resp.status_code == 200:
	with c.get_lock():
	c.value += 1

	def test_multiprocessing():
	pool = Pool(3)
	pool.map(handle_url, urls)
	print("Number of successful calls: {}".format(c.value))

	print "time for threaded method: {}\n\n\n".format(timeit.timeit(test_multiprocessing, number=10))

	print("Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.")
	c = Value("i", 0)
	session = requests.Session()
	session.mount("https://storage.googleapis.com", HTTP20Adapter())
	def handle_url(url):
	resp = session.head(url)
	if resp.status_code == 200:
	with c.get_lock():
	c.value += 1

	def test_http2_multiprocessing():
	pool = Pool(3)
	pool.map(handle_url, urls)
	print("Number of successful calls: {}".format(c.value))

	print "time for threaded http2 method: {}\n\n\n".format(timeit.timeit(test_http2_multiprocessing, "gc.enable()", number=10))

	print("Running the IO loop method. This uses coroutines instead of separate OS threads.")
	http_client = AsyncHTTPClient()
	c = Value("i", 0)
	@gen.coroutine
	def async_fetch_gen(url):
	response = yield http_client.fetch(url)
	raise gen.Return(response.code)

	@gen.coroutine
	def async_main():
	futures = []
	for url in urls:
	futures.append(async_fetch_gen(url))
	results = yield gen.multi(futures)
	for r in results:
	if r == 200:
	c.value += 1


	def test_ioloop():
	io_loop = ioloop.IOLoop.current()
	io_loop.run_sync(async_main)
	print("Number of successful calls: {}".format(c.value))

	print "time for ioloop method: {}\n\n\n".format(timeit.timeit(test_ioloop, number=10))
	This test requests a URL from GCS, 100 times per run. We do 10 runs, so a total of 1000 HEAD requests.



	Running the naive, synchronous technique. This one makes a HEAD request serially, without any session reuse. Our baseline.
	Number of successful calls: 100
	Number of successful calls: 200
	Number of successful calls: 300
	Number of successful calls: 400
	Number of successful calls: 500
	Number of successful calls: 600
	Number of successful calls: 700
	Number of successful calls: 800
	Number of successful calls: 900
	Number of successful calls: 1000
	time for naive synchronous method: 36.1137590408



	Running the HTTP2.0, synchronous method. This reuses sessions across requests, and also implements HTTP2.0 for faster downloading.
	Number of successful calls: 100
	Number of successful calls: 200
	Number of successful calls: 300
	Number of successful calls: 400
	Number of successful calls: 500
	Number of successful calls: 600
	Number of successful calls: 700
	Number of successful calls: 800
	Number of successful calls: 900
	Number of successful calls: 1000
	time for http2 synchronous method: 57.045787096



	Running the threaded method. This splits the request calls into three threads, and reuses sessions too.
	Number of successful calls: 100
	Number of successful calls: 200
	Number of successful calls: 300
	Number of successful calls: 400
	Number of successful calls: 500
	Number of successful calls: 600
	Number of successful calls: 700
	Number of successful calls: 800
	Number of successful calls: 900
	Number of successful calls: 1000
	time for threaded method: 6.38562583923



	Running the threaded HTTP2.0 method. This splits requests into three threads, and makes requests use HTTP2.0.
	Number of successful calls: 100
	Number of successful calls: 200
	Number of successful calls: 300
	Number of successful calls: 400
	Number of successful calls: 500
	Number of successful calls: 600
	Number of successful calls: 700
	Number of successful calls: 800
	Number of successful calls: 900
	Number of successful calls: 1000
	time for threaded http2 method: 11.7339289188



	Running the IO loop method. This uses coroutines instead of separate OS threads.
	Number of successful calls: 100
	Number of successful calls: 200
	Number of successful calls: 300
	Number of successful calls: 400
	Number of successful calls: 500
	Number of successful calls: 600
	Number of successful calls: 700
	Number of successful calls: 800
	Number of successful calls: 900
	Number of successful calls: 1000
	time for ioloop method: 17.4569571018