Skip to content

Instantly share code, notes, and snippets.

@kerphi
Last active November 10, 2023 09:23
Show Gist options
  • Save kerphi/9654a8e50d46e55f3a1b52bd5f6bb294 to your computer and use it in GitHub Desktop.
Save kerphi/9654a8e50d46e55f3a1b52bd5f6bb294 to your computer and use it in GitHub Desktop.
parallélisation api <ppn>.rdf
```python
SUDOC_URIS = [
"https://www.sudoc.fr/190605901.rdf",
"https://www.sudoc.fr/244375704.rdf",
"https://www.sudoc.fr/224827553.rdf",
"https://www.sudoc.fr/261936891.rdf",
"https://www.sudoc.fr/256277346.rdf",
"https://www.sudoc.fr/272009032.rdf",
"https://www.sudoc.fr/248493957.rdf",
"https://www.sudoc.fr/23011279X.rdf",
"https://www.sudoc.fr/252448200.rdf",
"https://www.sudoc.fr/250190559.rdf",
"https://www.sudoc.fr/153848030.rdf",
"https://www.sudoc.fr/254932762.rdf",
"https://www.sudoc.fr/158105621.rdf",
"https://www.sudoc.fr/267968612.rdf",
"https://www.sudoc.fr/230858554.rdf",
"https://www.sudoc.fr/230858538.rdf",
"https://www.sudoc.fr/254823750.rdf",
"https://www.sudoc.fr/204182174.rdf",
"https://www.sudoc.fr/189422262.rdf",
"https://www.sudoc.fr/043828302.rdf",
"https://www.sudoc.fr/203893433.rdf",
"https://www.sudoc.fr/263214168.rdf",
"https://www.sudoc.fr/267968868.rdf",
"https://www.sudoc.fr/267968558.rdf",
"https://www.sudoc.fr/238688380.rdf",
"https://www.sudoc.fr/190605901.rdf",
"https://www.sudoc.fr/244375704.rdf",
"https://www.sudoc.fr/224827553.rdf",
"https://www.sudoc.fr/261936891.rdf",
"https://www.sudoc.fr/256277346.rdf",
"https://www.sudoc.fr/272009032.rdf",
"https://www.sudoc.fr/248493957.rdf",
"https://www.sudoc.fr/23011279X.rdf",
"https://www.sudoc.fr/252448200.rdf",
"https://www.sudoc.fr/250190559.rdf",
"https://www.sudoc.fr/153848030.rdf",
"https://www.sudoc.fr/254932762.rdf",
"https://www.sudoc.fr/158105621.rdf",
"https://www.sudoc.fr/267968612.rdf",
"https://www.sudoc.fr/230858554.rdf",
"https://www.sudoc.fr/230858538.rdf",
"https://www.sudoc.fr/254823750.rdf",
"https://www.sudoc.fr/204182174.rdf",
"https://www.sudoc.fr/189422262.rdf",
"https://www.sudoc.fr/043828302.rdf",
"https://www.sudoc.fr/203893433.rdf",
"https://www.sudoc.fr/263214168.rdf",
"https://www.sudoc.fr/267968868.rdf",
"https://www.sudoc.fr/267968558.rdf",
"https://www.sudoc.fr/238688380.rdf",
]
GOOGLE_URIS = [
f"https://www.google.com/search?q={animal}"
for animal in [
"cat",
"dog",
"horse",
"cow",
"pig",
"sheep",
"goat",
"chicken",
"duck",
"goose",
"turkey",
"pigeon",
"rabbit",
"guinea pig",
"ferret",
"rat",
"mouse",
"hamster",
"gerbil",
"chinchilla",
"hedgehog",
"sugar glider",
"parrot",
"cockatiel",
"parakeet",
"canary",
"finch",
"lovebird",
"macaw",
"cockatoo",
"conure",
"budgie",
"african grey",
"amazon",
"cockatiel",
"parakeet",
"canary",
"finch",
"lovebird",
"macaw",
"cockatoo",
"lion",
"tiger",
"leopard",
"jaguar",
"cheetah",
"cougar",
"wolf",
"fox",
"bear",
]
]
import asyncio
import time
async def parallel_queries(uris: list[str]):
tasks = [asyncio.create_task(fetch_publication(uri)) for uri in uris]
start_time = time.time()
await asyncio.gather(*tasks)
print("Total time --- %s seconds ---" % (time.time() - start_time))
async def sequential_queries(uris: list[str]):
start_time = time.time()
for uri in uris:
await fetch_publication(uri)
print("Total time --- %s seconds ---" % (time.time() - start_time))
async def fetch_publication(uri: str) -> dict:
import aiohttp
async with aiohttp.ClientSession(
connector=aiohttp.TCPConnector(limit=None)
) as session:
start_time = time.time()
async with session.get(uri) as response:
await response.text()
print("HTTP response --- %s seconds --- %s" % ((time.time() - start_time), uri))
if __name__ == "__main__":
"""
print("****************************")
print("Les URIS Google en séquentiel")
await sequential_queries(GOOGLE_URIS)
print("****************************")
print("Les URIS Google en parallèle")
await parallel_queries(GOOGLE_URIS)
print("****************************")
print("Les URIS Sudoc en séquentiel")
await sequential_queries(SUDOC_URIS)
"""
print("Les URIS Sudoc en parallèle")
await parallel_queries(SUDOC_URIS)
```
@kerphi
Copy link
Author

kerphi commented Nov 10, 2023

Ce script retourne :

Les URIS Sudoc en parallèle
HTTP response --- 0.7648427486419678 seconds --- https://www.sudoc.fr/248493957.rdf
HTTP response --- 0.7997128963470459 seconds --- https://www.sudoc.fr/244375704.rdf
HTTP response --- 0.8402276039123535 seconds --- https://www.sudoc.fr/224827553.rdf
HTTP response --- 0.8613252639770508 seconds --- https://www.sudoc.fr/256277346.rdf
HTTP response --- 0.9453489780426025 seconds --- https://www.sudoc.fr/254932762.rdf
HTTP response --- 0.9460756778717041 seconds --- https://www.sudoc.fr/158105621.rdf
HTTP response --- 0.97591233253479 seconds --- https://www.sudoc.fr/153848030.rdf
HTTP response --- 0.984994649887085 seconds --- https://www.sudoc.fr/254823750.rdf
HTTP response --- 1.0033390522003174 seconds --- https://www.sudoc.fr/272009032.rdf
HTTP response --- 0.9939169883728027 seconds --- https://www.sudoc.fr/204182174.rdf
HTTP response --- 1.0167269706726074 seconds --- https://www.sudoc.fr/23011279X.rdf
HTTP response --- 1.023681879043579 seconds --- https://www.sudoc.fr/224827553.rdf
HTTP response --- 1.0870180130004883 seconds --- https://www.sudoc.fr/261936891.rdf
HTTP response --- 1.1968104839324951 seconds --- https://www.sudoc.fr/190605901.rdf
HTTP response --- 1.2967240810394287 seconds --- https://www.sudoc.fr/252448200.rdf
HTTP response --- 1.9157030582427979 seconds --- https://www.sudoc.fr/158105621.rdf
HTTP response --- 2.0624425411224365 seconds --- https://www.sudoc.fr/250190559.rdf
HTTP response --- 2.134915828704834 seconds --- https://www.sudoc.fr/267968612.rdf
HTTP response --- 3.9443376064300537 seconds --- https://www.sudoc.fr/254932762.rdf
HTTP response --- 3.9546995162963867 seconds --- https://www.sudoc.fr/153848030.rdf
HTTP response --- 3.989595413208008 seconds --- https://www.sudoc.fr/204182174.rdf
HTTP response --- 4.00037693977356 seconds --- https://www.sudoc.fr/189422262.rdf
HTTP response --- 4.6783223152160645 seconds --- https://www.sudoc.fr/238688380.rdf
HTTP response --- 8.002959489822388 seconds --- https://www.sudoc.fr/189422262.rdf
HTTP response --- 8.133774042129517 seconds --- https://www.sudoc.fr/250190559.rdf
HTTP response --- 8.693456649780273 seconds --- https://www.sudoc.fr/267968558.rdf
HTTP response --- 16.38490653038025 seconds --- https://www.sudoc.fr/230858538.rdf
HTTP response --- 16.407524824142456 seconds --- https://www.sudoc.fr/203893433.rdf
HTTP response --- 16.41716194152832 seconds --- https://www.sudoc.fr/190605901.rdf
HTTP response --- 16.46275568008423 seconds --- https://www.sudoc.fr/261936891.rdf
HTTP response --- 17.110564470291138 seconds --- https://www.sudoc.fr/267968558.rdf
---------------------------------------------------------------------------
CancelledError                            Traceback (most recent call last)
[<ipython-input-2-61c0a0664370>](https://localhost:8080/#) in fetch_publication(uri)
    134         start_time = time.time()
--> 135         async with session.get(uri) as response:
    136             await response.text()

@kerphi
Copy link
Author

kerphi commented Nov 10, 2023

En ajoutant une limite de nbr de requêtes par secondes, par exemple 5, on obtient un meilleur résultat :

SUDOC_URIS = [
    "https://www.sudoc.fr/190605901.rdf",
    "https://www.sudoc.fr/244375704.rdf",
    "https://www.sudoc.fr/224827553.rdf",
    "https://www.sudoc.fr/261936891.rdf",
    "https://www.sudoc.fr/256277346.rdf",
    "https://www.sudoc.fr/272009032.rdf",
    "https://www.sudoc.fr/248493957.rdf",
    "https://www.sudoc.fr/23011279X.rdf",
    "https://www.sudoc.fr/252448200.rdf",
    "https://www.sudoc.fr/250190559.rdf",
    "https://www.sudoc.fr/153848030.rdf",
    "https://www.sudoc.fr/254932762.rdf",
    "https://www.sudoc.fr/158105621.rdf",
    "https://www.sudoc.fr/267968612.rdf",
    "https://www.sudoc.fr/230858554.rdf",
    "https://www.sudoc.fr/230858538.rdf",
    "https://www.sudoc.fr/254823750.rdf",
    "https://www.sudoc.fr/204182174.rdf",
    "https://www.sudoc.fr/189422262.rdf",
    "https://www.sudoc.fr/043828302.rdf",
    "https://www.sudoc.fr/203893433.rdf",
    "https://www.sudoc.fr/263214168.rdf",
    "https://www.sudoc.fr/267968868.rdf",
    "https://www.sudoc.fr/267968558.rdf",
    "https://www.sudoc.fr/238688380.rdf",
    "https://www.sudoc.fr/190605901.rdf",
    "https://www.sudoc.fr/244375704.rdf",
    "https://www.sudoc.fr/224827553.rdf",
    "https://www.sudoc.fr/261936891.rdf",
    "https://www.sudoc.fr/256277346.rdf",
    "https://www.sudoc.fr/272009032.rdf",
    "https://www.sudoc.fr/248493957.rdf",
    "https://www.sudoc.fr/23011279X.rdf",
    "https://www.sudoc.fr/252448200.rdf",
    "https://www.sudoc.fr/250190559.rdf",
    "https://www.sudoc.fr/153848030.rdf",
    "https://www.sudoc.fr/254932762.rdf",
    "https://www.sudoc.fr/158105621.rdf",
    "https://www.sudoc.fr/267968612.rdf",
    "https://www.sudoc.fr/230858554.rdf",
    "https://www.sudoc.fr/230858538.rdf",
    "https://www.sudoc.fr/254823750.rdf",
    "https://www.sudoc.fr/204182174.rdf",
    "https://www.sudoc.fr/189422262.rdf",
    "https://www.sudoc.fr/043828302.rdf",
    "https://www.sudoc.fr/203893433.rdf",
    "https://www.sudoc.fr/263214168.rdf",
    "https://www.sudoc.fr/267968868.rdf",
    "https://www.sudoc.fr/267968558.rdf",
    "https://www.sudoc.fr/238688380.rdf",
]

GOOGLE_URIS = [
    f"https://www.google.com/search?q={animal}"
    for animal in [
        "cat",
        "dog",
        "horse",
        "cow",
        "pig",
        "sheep",
        "goat",
        "chicken",
        "duck",
        "goose",
        "turkey",
        "pigeon",
        "rabbit",
        "guinea pig",
        "ferret",
        "rat",
        "mouse",
        "hamster",
        "gerbil",
        "chinchilla",
        "hedgehog",
        "sugar glider",
        "parrot",
        "cockatiel",
        "parakeet",
        "canary",
        "finch",
        "lovebird",
        "macaw",
        "cockatoo",
        "conure",
        "budgie",
        "african grey",
        "amazon",
        "cockatiel",
        "parakeet",
        "canary",
        "finch",
        "lovebird",
        "macaw",
        "cockatoo",
        "lion",
        "tiger",
        "leopard",
        "jaguar",
        "cheetah",
        "cougar",
        "wolf",
        "fox",
        "bear",
    ]
]

import asyncio
import time

MAX_CONCURRENT_REQUESTS = 5  # Nombre maximal de requêtes en parallèle
semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)



async def parallel_queries(uris: list[str]):
    tasks = [asyncio.create_task(fetch_publication(uri))  for uri in uris]
    start_time = time.time()
    await asyncio.gather(*tasks)
    print("Total time --- %s seconds ---" % (time.time() - start_time))


async def sequential_queries(uris: list[str]):
    start_time = time.time()
    for uri in uris:
        await fetch_publication(uri)
    print("Total time --- %s seconds ---" % (time.time() - start_time))


async def fetch_publication(uri: str) -> dict:
    import aiohttp

    async with semaphore:
      async with aiohttp.ClientSession(
          connector=aiohttp.TCPConnector(limit=None)
      ) as session:
          start_time = time.time()
          async with session.get(uri) as response:
              await response.text()
              print("HTTP response --- %s seconds --- %s" % ((time.time() - start_time), uri))


if __name__ == "__main__":
    """
    print("****************************")
    print("Les URIS Google en séquentiel")
    await sequential_queries(GOOGLE_URIS)
    print("****************************")
    print("Les URIS Google en parallèle")
    await parallel_queries(GOOGLE_URIS)
    print("****************************")
    print("Les URIS Sudoc en séquentiel")
    await sequential_queries(SUDOC_URIS)
    """
    print("Les URIS Sudoc en parallèle")
    await parallel_queries(SUDOC_URIS)

@kerphi
Copy link
Author

kerphi commented Nov 10, 2023

Voici le résultat :

Les URIS Sudoc en parallèle
HTTP response --- 1.0189406871795654 seconds --- https://www.sudoc.fr/244375704.rdf
HTTP response --- 1.0590949058532715 seconds --- https://www.sudoc.fr/261936891.rdf
HTTP response --- 0.878089427947998 seconds --- https://www.sudoc.fr/272009032.rdf
HTTP response --- 1.9887638092041016 seconds --- https://www.sudoc.fr/224827553.rdf
HTTP response --- 1.9910883903503418 seconds --- https://www.sudoc.fr/190605901.rdf
HTTP response --- 1.9936246871948242 seconds --- https://www.sudoc.fr/256277346.rdf
HTTP response --- 0.9469921588897705 seconds --- https://www.sudoc.fr/248493957.rdf
HTTP response --- 0.8781380653381348 seconds --- https://www.sudoc.fr/23011279X.rdf
HTTP response --- 0.8263168334960938 seconds --- https://www.sudoc.fr/254932762.rdf
HTTP response --- 0.8461554050445557 seconds --- https://www.sudoc.fr/153848030.rdf
HTTP response --- 1.0137169361114502 seconds --- https://www.sudoc.fr/250190559.rdf
HTTP response --- 1.0274970531463623 seconds --- https://www.sudoc.fr/252448200.rdf
HTTP response --- 1.0272748470306396 seconds --- https://www.sudoc.fr/158105621.rdf
HTTP response --- 1.0015439987182617 seconds --- https://www.sudoc.fr/230858554.rdf
HTTP response --- 0.9988603591918945 seconds --- https://www.sudoc.fr/230858538.rdf
HTTP response --- 1.0026895999908447 seconds --- https://www.sudoc.fr/254823750.rdf
HTTP response --- 0.8878316879272461 seconds --- https://www.sudoc.fr/189422262.rdf
HTTP response --- 0.8609805107116699 seconds --- https://www.sudoc.fr/203893433.rdf
HTTP response --- 2.1018757820129395 seconds --- https://www.sudoc.fr/267968612.rdf
HTTP response --- 0.937183141708374 seconds --- https://www.sudoc.fr/043828302.rdf
HTTP response --- 0.8924369812011719 seconds --- https://www.sudoc.fr/263214168.rdf
HTTP response --- 1.929969310760498 seconds --- https://www.sudoc.fr/204182174.rdf
HTTP response --- 1.1308016777038574 seconds --- https://www.sudoc.fr/267968868.rdf
HTTP response --- 0.7630870342254639 seconds --- https://www.sudoc.fr/190605901.rdf
HTTP response --- 1.578714370727539 seconds --- https://www.sudoc.fr/267968558.rdf
HTTP response --- 0.7819344997406006 seconds --- https://www.sudoc.fr/244375704.rdf
HTTP response --- 1.6291162967681885 seconds --- https://www.sudoc.fr/238688380.rdf
HTTP response --- 0.8392984867095947 seconds --- https://www.sudoc.fr/224827553.rdf
HTTP response --- 0.8056166172027588 seconds --- https://www.sudoc.fr/261936891.rdf
HTTP response --- 0.729097843170166 seconds --- https://www.sudoc.fr/272009032.rdf
HTTP response --- 0.7410075664520264 seconds --- https://www.sudoc.fr/248493957.rdf
HTTP response --- 0.8177986145019531 seconds --- https://www.sudoc.fr/256277346.rdf
HTTP response --- 0.7702634334564209 seconds --- https://www.sudoc.fr/23011279X.rdf
HTTP response --- 0.8679947853088379 seconds --- https://www.sudoc.fr/252448200.rdf
HTTP response --- 0.7291524410247803 seconds --- https://www.sudoc.fr/254932762.rdf
HTTP response --- 0.7622101306915283 seconds --- https://www.sudoc.fr/153848030.rdf
HTTP response --- 0.8571999073028564 seconds --- https://www.sudoc.fr/250190559.rdf
HTTP response --- 0.8232910633087158 seconds --- https://www.sudoc.fr/158105621.rdf
HTTP response --- 0.7298517227172852 seconds --- https://www.sudoc.fr/230858554.rdf
HTTP response --- 0.7318100929260254 seconds --- https://www.sudoc.fr/230858538.rdf
HTTP response --- 0.7777314186096191 seconds --- https://www.sudoc.fr/254823750.rdf
HTTP response --- 0.9067645072937012 seconds --- https://www.sudoc.fr/267968612.rdf
HTTP response --- 0.8124053478240967 seconds --- https://www.sudoc.fr/204182174.rdf
HTTP response --- 0.7459297180175781 seconds --- https://www.sudoc.fr/043828302.rdf
HTTP response --- 0.7642695903778076 seconds --- https://www.sudoc.fr/189422262.rdf
HTTP response --- 0.738013505935669 seconds --- https://www.sudoc.fr/203893433.rdf
HTTP response --- 0.7079699039459229 seconds --- https://www.sudoc.fr/263214168.rdf
HTTP response --- 1.0488197803497314 seconds --- https://www.sudoc.fr/267968868.rdf
HTTP response --- 1.4072630405426025 seconds --- https://www.sudoc.fr/267968558.rdf
HTTP response --- 1.4552791118621826 seconds --- https://www.sudoc.fr/238688380.rdf
Total time --- 11.022102117538452 seconds ---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment