Skip to content

Instantly share code, notes, and snippets.

@darwing1210
Last active March 12, 2024 04:42
Show Gist options
  • Save darwing1210/c9ff8e3af8ba832e38e6e6e347d9047a to your computer and use it in GitHub Desktop.
Save darwing1210/c9ff8e3af8ba832e38e6e6e347d9047a to your computer and use it in GitHub Desktop.
Script to download files in a async way, using Python asyncio
import os
import asyncio
import aiohttp # pip install aiohttp
import aiofile # pip install aiofile
REPORTS_FOLDER = "reports"
FILES_PATH = os.path.join(REPORTS_FOLDER, "files")
def download_files_from_report(urls):
os.makedirs(FILES_PATH, exist_ok=True)
sema = asyncio.BoundedSemaphore(5)
async def fetch_file(session, url):
fname = url.split("/")[-1]
async with sema:
async with session.get(url) as resp:
assert resp.status == 200
data = await resp.read()
async with aiofile.async_open(
os.path.join(FILES_PATH, fname), "wb"
) as outfile:
await outfile.write(data)
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch_file(session, url) for url in urls]
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
@H4dr1en
Copy link

H4dr1en commented Jan 1, 2021

Use aiofile (not aiofiles) for better performances. aiofiles is using multithreading behind the scenes while aiofile uses platform native api on Linux/MacOS, thus being real async

@HanslettTheDev
Copy link

Brilliant code thanks

@SmyczekF
Copy link

Great piece of code!

@gabrielfreitash
Copy link

You should refactor to reuse the session, not creating one for each request.

@darwing1210
Copy link
Author

applied suggestions

@callumprentice
Copy link

Very elegant solution but after experimenting with it for a while, I have started getting errors while downloading files (never the same file and if you try enough times, the downloads for all files succeed):

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Is the solution to change the code to catch that exception and try N times before giving up or is the root cause known?

@darwing1210
Copy link
Author

Very elegant solution but after experimenting with it for a while, I have started getting errors while downloading files (never the same file and if you try enough times, the downloads for all files succeed):

aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Is the solution to change the code to catch that exception and try N times before giving up or is the root cause known?

Please check aiohttp documentation about ClientPayloadError and yes, you can use aiohttp-retry to handle the failure cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment