-
-
Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
# | |
# This small example shows you how to access JS-based requests via Selenium | |
# Like this, one can access raw data for scraping, | |
# for example on many JS-intensive/React-based websites | |
# | |
from time import sleep | |
from selenium import webdriver | |
from selenium.webdriver import DesiredCapabilities | |
# make chrome log requests | |
capabilities = DesiredCapabilities.CHROME | |
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs | |
driver = webdriver.Chrome( | |
desired_capabilities=capabilities, executable_path="./chromedriver" | |
) | |
# fetch a site that does xhr requests | |
driver.get("https://sitewithajaxorsomething.com") | |
sleep(5) # wait for the requests to take place | |
# extract requests from logs | |
logs_raw = driver.get_log("performance") | |
logs = [json.loads(lr["message"])["message"] for lr in logs_raw] | |
def log_filter(log_): | |
return ( | |
# is an actual response | |
log_["method"] == "Network.responseReceived" | |
# and json | |
and "json" in log_["params"]["response"]["mimeType"] | |
) | |
for log in filter(log_filter, logs): | |
request_id = log["params"]["requestId"] | |
resp_url = log["params"]["response"]["url"] | |
print(f"Caught {resp_url}") | |
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})) |
hello; thanks for sharing this gist; your code is working fine, i just got this little issue and can't get my head arround it;
so what i'm trying to log is a xhr call made by a webworker;
so getting the performance log on the main threads doesnt list the request i want;
in chrome when i select the worker in console tab, i can execute "performance.getEntries()" only then i can get the request i want
any idea on how to do that on selenium ?
Used this method for a while, after some time during script run and without clear reason "driver.execute_cdp_cmd" function throws error:
'WebDriver' object has no attribute 'execute_cdp_cmd'
Looking for alternative solution, feel free to suggest what could be done...
Hey @milanbog92, how about:
- https://pypi.org/project/mitmproxy/ to catch requests
- a regular browser (e.g. by hotkeys) or maybe playwright with some adaptions to be undetectable
@lorey Thanks for the fast response!
Since I am executing my "python3 script.py" from external script it seams that my system has loaded wrong python version. I have seen that python3.6 is showing error consistently while python3.9 is working as expected. Hopefully this will help someone...
I was stumbling across all solutions available, and I believe that there is no better one, Selenium cant load Chrome extension that uses chrome.debugger API and I have no luck with hotkeys for now in my complex environment.
@lorey, thanks for your fantastic work, just one more thing.
is there a way that i could get only the response data from a specific url?
Hi,
I use performance_logs instead of logs_raw variable name and skipping "chrome://favicon2" and searching for image_name
performance_logs = driver.get_log("performance")
for performance_log in performance_logs:
performance_log_json = json.loads(performance_log["message"])
if performance_log_json["message"]["method"] == 'Network.responseReceived':
if performance_log_json["message"]["params"]["response"]["url"].find('chrome://favicon2/') != -1:
continue;
if performance_log_json["message"]["params"]["response"]["url"].find(image_name) != -1:
print(performance_log_json["message"]["params"]["response"]["url"])
print(performance_log_json["message"]["params"]["requestId"])
print(performance_log_json["message"]["params"]["type"])
Desired Capabilities is deprecated and can't be used anymore, how can I achieve this without it?
@LiamKrenn take a look at this, I haven't tried but hopefully it works https://stackoverflow.com/questions/76622916/converting-desired-capabilities-to-options-in-selenium-python
Hello guys, for Selenium 4.x use it
driver.options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver.get(url)
then just follow the steps from line 24+
Works for selenium 4.13.0
for Selenium 4.15 set the option:
options = webdriver.ChromeOptions()
options.set_capability(
"goog:loggingPrefs", {"performance": "ALL"}
)
driver = webdriver.Chrome(options=options)
Something I noticed is that you need to filter out Preflight
requests.
if event['params']['type'] != 'Preflight':
. . .
Otherwise, you might get this error:
{"code":-32000,"message":"No resource with given identifier found"}
im looking to print the response after click button to know the status response of this click if it's successful or failed the only way to know the status its to open dev tool and go to network and check the response manual from here
so i need method to print this status in log