Skip to content

Instantly share code, notes, and snippets.

@thehappycheese
Last active November 20, 2023 03:10
Show Gist options
  • Save thehappycheese/8a5e40ce9ef5215a7a1ce235cdfb3b21 to your computer and use it in GitHub Desktop.
Save thehappycheese/8a5e40ce9ef5215a7a1ce235cdfb3b21 to your computer and use it in GitHub Desktop.
GET faster in python. Maps a pandas Series of URL strings to data returned from the web using urllib3
import pandas as pd
import concurrent.futures
import urllib3
def _load_url(arg):
url, http = arg
response = http.request("GET", url)
if response.status!=200:
return f"ERROR: {response.status}"
return response.data.decode("utf8")
def get_faster(series:pd.Series, max_workers=500):
result = None
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor, urllib3.PoolManager(num_pools = 50) as http:
future_to_url = executor.map(
_load_url,
((url, http) for index, url in series.items()) # note:collected immediately
)
result = pd.Series(future_to_url, index=series.index)
return result
# See https://github.com/thehappycheese/nicklinref_rust for context for this example.
from get_faster import get_faster
PORT = 8080
map_cwy = {
"L":"LS",
"R":"RS",
"S":"LRS"
}
df["url_geom"] = (
f"http://localhost:{PORT}/query/"
+ "?road=" + df['road' ]
+ "&cwy=" + df['cwy' ].map(map_cwy)
+ "&slk_from=" + df['slk_from'].astype("str")
+ "&slk_to=" + df['slk_to' ].astype("str")
)
df["url_latlon"] = (
f"http://localhost:{PORT}/query/"
+ "?road=" + df['road' ]
+ "&cwy=" + df['cwy' ].map(map_cwy)
+ "&slk=" + ((df['slk_to'] + df['slk_from']) / 2).astype("str")
+ "&f=latlon"
)
df["geom"] = get_faster(df["url_geom"])

Notes

Since changing from requests.get() to urllib3 this has become the fastest and simplest solution for bulk HTTP requests that I have tried. Performance is in the same ballpark with excel's =webservice() function, except that python probably doesnt use all CPU cores.

TODO: I have read that aoihttp can maybe do better. I have not had time to investigate. I'd prefer the async await syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment