Skip to content

Instantly share code, notes, and snippets.

@robbintt
Created November 24, 2015 05:06
Show Gist options
  • Save robbintt/d3fd4af30931eb60cb66 to your computer and use it in GitHub Desktop.
Save robbintt/d3fd4af30931eb60cb66 to your computer and use it in GitHub Desktop.
an example of threading requests
"""
Specification for retrieve_data:
If 200 is the status code, return the json object
If not, return an error
instead of subclassing Thread to store the data, we've used a global variable.
avoiding objects for now to keep everything accessible.
"""
import time
import threading
import requests
RESULTS_LIST = list()
def retrieve_data(target_url):
""" retrieve json data from a url
"""
r = requests.get(target_url)
if not r.ok:
print r.status_code, target_url
RESULTS_LIST.append([target_url, r.ok, r.json()])
if __name__ == "__main__":
"""
This is sort of like a main() function for Python.
"""
endpoint = "https://data.sfgov.org/resource/zfw6-95su.json"
query_string = "?$limit=10&$offset=0"
first_record_to_get = 0
max_records_to_get = 500
# offset is incrementing by ten each time. this is the only change.
# if there is no more data, socrata returns an empty result, []
# for the sake of sanity, lets retreieve the first 500 records.
for offset in range(first_record_to_get, max_records_to_get, 10):
constructed_endpoint = endpoint + query_string + str(offset)
threading.Thread(None, retrieve_data, None, (constructed_endpoint,)).start()
print constructed_endpoint
time.sleep(1) # will take max/offset / 2 seconds
print len(RESULTS_LIST)
time.sleep(3)
print len(RESULTS_LIST) # expected to be 50 requests long
for request in RESULTS_LIST:
print request[0], request[1], len(request[2])
@robbintt
Copy link
Author

This code would benefit from subclassing Thread and using callbacks in the future. We've avoided exposing any classes and the concept of callbacks here by using a global mutable list to store our results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment