Skip to content

Instantly share code, notes, and snippets.

@skamensky
Created June 6, 2021 13:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save skamensky/1d588f2a9f82e9dc740205a1d2d1e5f9 to your computer and use it in GitHub Desktop.
Save skamensky/1d588f2a9f82e9dc740205a1d2d1e5f9 to your computer and use it in GitHub Desktop.
Is asyncio worth it if you're only doing http requests?
import asyncio
from concurrent.futures import ThreadPoolExecutor
from functools import wraps
from threading import Thread
from time import monotonic,sleep
import aiohttp
import requests
# taken from https://gist.githubusercontent.com/demersdesigns/4442cd84c1cc6c5ccda9b19eac1ba52b/raw/cf06109a805b661dd12133f9aa4473435e478569/craft-popular-urls
URLS = open("urls.txt").read().splitlines()
def timeit(func):
@wraps(func)
def sync_wrapper(*args, **kwargs):
start = monotonic()
results = func(*args, **kwargs)
end = monotonic()
print(f"Took {round(end-start,2)} seconds to run {func.__name__}")
return results
@wraps(func)
async def async_wrapper(*args, **kwargs):
start = monotonic()
results = await func(*args, **kwargs)
end = monotonic()
print(f"Took {round(end-start,2)} seconds to run {func.__name__}")
return results
if asyncio.iscoroutinefunction(func):
return async_wrapper
else:
return sync_wrapper
@timeit
async def do_requests_async():
responses = {}
awaitables = []
async def set_response(url, session):
response = await session.get(url)
responses[url] = await response.text()
async with aiohttp.ClientSession() as session:
for url in URLS:
awaitables.append(set_response(url, session))
await asyncio.gather(*awaitables)
return responses
@timeit
def do_requests_sync():
responses = {}
with requests.Session() as session:
for url in URLS:
response = session.get(url)
responses[url] = response.text
return responses
@timeit
def do_requests_threaded():
responses = {}
def get_response(url):
responses[url] = requests.get(url)
threads = [Thread(target=get_response, args=(url,)) for url in URLS]
[thread.start() for thread in threads]
[thread.join() for thread in threads]
return responses
@timeit
def do_requests_threadpool():
def get_response(url):
return requests.get(url)
futures = {}
responses = {}
with ThreadPoolExecutor(50) as pool:
for url in URLS:
futures[url] = pool.submit(get_response, url)
for url, future in futures.items():
responses[url] = future.result()
return responses
def main():
#sleep in between to avoid exponential backoffs skewing our results
do_requests_sync()
sleep(20)
do_requests_threaded()
sleep(20)
do_requests_threadpool()
sleep(20)
asyncio.get_event_loop().run_until_complete(do_requests_async())
if __name__ == "__main__":
main()
aiohttp==3.7.4.post0
requests==2.25.1
http://www.youtube.com
http://www.facebook.com
http://www.baidu.com
http://www.yahoo.com
http://www.amazon.com
http://www.google.co.in
http://www.twitter.com
http://www.live.com
http://www.taobao.com
http://www.bing.com
http://www.instagram.com
http://www.weibo.com
http://www.sina.com.cn
http://www.linkedin.com
http://www.yahoo.co.jp
http://www.msn.com
http://www.vk.com
http://www.google.de
http://www.yandex.ru
http://www.hao123.com
http://www.google.co.uk
http://www.reddit.com
http://www.ebay.com
http://www.google.fr
http://www.t.co
http://www.tmall.com
http://www.google.com.br
http://www.360.cn
http://www.sohu.com
http://www.amazon.co.jp
http://www.pinterest.com
http://www.netflix.com
http://www.google.it
http://www.google.ru
http://www.microsoft.com
http://www.google.es
http://www.wordpress.com
http://www.gmw.cn
http://www.tumblr.com
http://www.paypal.com
http://www.blogspot.com
http://www.imgur.com
http://www.stackoverflow.com
http://www.aliexpress.com
http://www.naver.com
http://www.ok.ru
http://www.apple.com
http://www.github.com
http://www.chinadaily.com.cn
http://www.imdb.com
http://www.google.co.kr
http://www.fc2.com
http://www.jd.com
http://www.blogger.com
http://www.163.com
http://www.google.ca
http://www.whatsapp.com
http://www.amazon.in
http://www.office.com
http://www.tianya.cn
http://www.google.co.id
http://www.youku.com
http://www.rakuten.co.jp
http://www.craigslist.org
http://www.amazon.de
http://www.nicovideo.jp
http://www.google.pl
http://www.soso.com
http://www.bilibili.com
http://www.dropbox.com
http://www.xinhuanet.com
http://www.outbrain.com
http://www.pixnet.net
http://www.alibaba.com
http://www.alipay.com
http://www.booking.com
http://www.googleusercontent.com
http://www.google.com.au
http://www.popads.net
http://www.cntv.cn
http://www.zhihu.com
http://www.amazon.co.uk
http://www.diply.com
http://www.coccoc.com
http://www.cnn.com
http://www.bbc.co.uk
http://www.twitch.tv
http://www.wikia.com
http://www.google.co.th
http://www.go.com
http://www.google.com.ph
http://www.doubleclick.net
http://www.onet.pl
http://www.googleadservices.com
http://www.accuweather.com
http://www.googleweblight.com
http://www.answers.yahoo.com
@skamensky
Copy link
Author

Output from running python is_asyncio_worth_it.py:

Took 189.0 seconds to run do_requests_sync
Took 12.63 seconds to run do_requests_threaded
Took 12.75 seconds to run do_requests_threadpool
Took 13.52 seconds to run do_requests_async

So according to this unthorough benchmark, it may not be worth it to refactor code to asyncio (due to added complexity and loss of thread blocking libraries) if all you're trying to do is achieve concurrent http requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment