Skip to content

Instantly share code, notes, and snippets.

@gearbox
Created August 13, 2020 09:08
Show Gist options
  • Save gearbox/518991ec8ddbd33f9da0422bfaecdc4c to your computer and use it in GitHub Desktop.
Save gearbox/518991ec8ddbd33f9da0422bfaecdc4c to your computer and use it in GitHub Desktop.
Use concurrent IO jobs for speed up

Typical approach is to put the IO heavy part like fetching data over the internet and data processing into the same function:

import random
import threading
import time
from concurrent.futures import ThreadPoolExecutor

import requests


def fetch_and_process_file(url):
    thread_name = threading.currentThread().name

    print(thread_name, "fetch", url)
    data = requests.get(url).text

    # "process" result
    time.sleep(random.random() / 4)  # simulate work
    print(thread_name, "process data from", url)

    result = len(data) ** 2
    return result


threads = 2
urls = ["https://google.com", "https://python.org", "https://pypi.org"]

executor = ThreadPoolExecutor(max_workers=threads)
with executor:
    results = executor.map(fetch_and_process_file, urls)

print()
print("results:", list(results))

outputs:

ThreadPoolExecutor-0_0 fetch https://google.com
ThreadPoolExecutor-0_1 fetch https://python.org
ThreadPoolExecutor-0_0 process data from https://google.com
ThreadPoolExecutor-0_0 fetch https://pypi.org
ThreadPoolExecutor-0_0 process data from https://pypi.org
ThreadPoolExecutor-0_1 process data from https://python.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment