Skip to content

Instantly share code, notes, and snippets.

@lorey
Last active February 24, 2024 01:23
Show Gist options
  • Star 75 You must be signed in to star a gist
  • Fork 12 You must be signed in to fork a gist
  • Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Save lorey/079c5e178c9c9d3c30ad87df7f70491d to your computer and use it in GitHub Desktop.
Access Chrome's network tab (e.g. XHR requests) with Selenium
#
# This small example shows you how to access JS-based requests via Selenium
# Like this, one can access raw data for scraping,
# for example on many JS-intensive/React-based websites
#
from time import sleep
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
# make chrome log requests
capabilities = DesiredCapabilities.CHROME
capabilities["loggingPrefs"] = {"performance": "ALL"} # newer: goog:loggingPrefs
driver = webdriver.Chrome(
desired_capabilities=capabilities, executable_path="./chromedriver"
)
# fetch a site that does xhr requests
driver.get("https://sitewithajaxorsomething.com")
sleep(5) # wait for the requests to take place
# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]
def log_filter(log_):
return (
# is an actual response
log_["method"] == "Network.responseReceived"
# and json
and "json" in log_["params"]["response"]["mimeType"]
)
for log in filter(log_filter, logs):
request_id = log["params"]["requestId"]
resp_url = log["params"]["response"]["url"]
print(f"Caught {resp_url}")
print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))
@lorey
Copy link
Author

lorey commented Nov 3, 2020

@sahin52
Copy link

sahin52 commented Mar 1, 2021

I've been trying to achieve this for at least a week working on it, and for a few months thinking about it. You are great.

@lee-hodg
Copy link

lee-hodg commented May 6, 2021

This is really great, however at the final step of getting the response body using the requestId I get

self.driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG POST http://127.0.0.1:42437/session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute {"cmd": "Network.getResponseBody", "params": {"requestId": "37056.284"}}
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 urllib3.connectionpool[36958] DEBUG http://127.0.0.1:42437 "POST /session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute HTTP/1.1" 500 253
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG Finished Request
*** selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}
  (Session info: chrome=89.0.4389.114)

@shans0535
Copy link

Can you please help me out, on how to do this with firefox browser? I tried few steps , but it didnt work out.

@lorey
Copy link
Author

lorey commented Jun 2, 2021

Sorry, this is not intended for Firefox, @shans0535. Have you tried selenium-wire or just a mitm-proxy instead?

@JaeEon-Ryu
Copy link

To : lee-hodg

I think it's an error that came from accessing a place without resources.
It works well with try-except syntax.

This is really great, however at the final step of getting the response body using the requestId I get

self.driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id})
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG POST http://127.0.0.1:42437/session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute {"cmd": "Network.getResponseBody", "params": {"requestId": "37056.284"}}
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 urllib3.connectionpool[36958] DEBUG http://127.0.0.1:42437 "POST /session/b29c0918324a3defb5d6d11100dd3bec/goog/cdp/execute HTTP/1.1" 500 253
2021-05-06 14:04:12 jim-ThinkPad-S5-S540 selenium.webdriver.remote.remote_connection[36958] DEBUG Finished Request
*** selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"No resource with given identifier found"}
  (Session info: chrome=89.0.4389.114)

@SvenTheSwede
Copy link

I was working on a way to do this for a week or two before I found your post. Works beautifully for what I needed, thanks a bunch.

@megapegabot
Copy link

it's work! Senk's) I was looking for a solution for a long time, and you helped! 👍

@lorey
Copy link
Author

lorey commented Nov 8, 2021

Thanks for the kindness everyone. Glad I could help you out. Please feel free to check out my profile with similar tools and libraries at https://github.com/lorey <3

@billy8407
Copy link

Awsome!!

@BlondinkaQ
Copy link

how get xhr from real browser online?

@lorey
Copy link
Author

lorey commented Jan 10, 2022

Selenium is using a real browser. If you want to do it manually yourself, check out developer tools (e.g. F12 in Chrome, tab "Network").

@nikolaysm
Copy link

@lorey, thanks for sharing.

For Chrome >=75 we have to do small changes.

As specified in the release notes for ChromeDriver 75.0.3770.8, capability loggingPrefs has been renamed to goog:loggingPrefs

@skndrvoip
Copy link

im looking to print the response after click button to know the status response of this click if it's successful or failed the only way to know the status its to open dev tool and go to network and check the response manual from here
F2226CF6-6EBD-4DF5-A042-F7214CFD9785
FFCDCAA1-FDFD-4BEB-975A-0F9A37FE181F
so i need method to print this status in log

@hamzaadad
Copy link

hamzaadad commented May 23, 2022

hello; thanks for sharing this gist; your code is working fine, i just got this little issue and can't get my head arround it;
so what i'm trying to log is a xhr call made by a webworker;
so getting the performance log on the main threads doesnt list the request i want;
in chrome when i select the worker in console tab, i can execute "performance.getEntries()" only then i can get the request i want
any idea on how to do that on selenium ?

@milanbog92
Copy link

milanbog92 commented Nov 22, 2022

Used this method for a while, after some time during script run and without clear reason "driver.execute_cdp_cmd" function throws error:
'WebDriver' object has no attribute 'execute_cdp_cmd'

Looking for alternative solution, feel free to suggest what could be done...

@lorey
Copy link
Author

lorey commented Nov 22, 2022

Hey @milanbog92, how about:

@milanbog92
Copy link

@lorey Thanks for the fast response!

Since I am executing my "python3 script.py" from external script it seams that my system has loaded wrong python version. I have seen that python3.6 is showing error consistently while python3.9 is working as expected. Hopefully this will help someone...

I was stumbling across all solutions available, and I believe that there is no better one, Selenium cant load Chrome extension that uses chrome.debugger API and I have no luck with hotkeys for now in my complex environment.

@FaMousNoob
Copy link

@lorey, thanks for your fantastic work, just one more thing.
is there a way that i could get only the response data from a specific url?

@milanbog92
Copy link

milanbog92 commented Dec 5, 2022

Hi,

I use performance_logs instead of logs_raw variable name and skipping "chrome://favicon2" and searching for image_name

performance_logs = driver.get_log("performance")
		for performance_log in performance_logs:
			performance_log_json = json.loads(performance_log["message"])
			if performance_log_json["message"]["method"] == 'Network.responseReceived':
				if performance_log_json["message"]["params"]["response"]["url"].find('chrome://favicon2/') != -1:
					continue;
				if performance_log_json["message"]["params"]["response"]["url"].find(image_name) != -1:
					print(performance_log_json["message"]["params"]["response"]["url"])
					print(performance_log_json["message"]["params"]["requestId"])
					print(performance_log_json["message"]["params"]["type"])

@LiamKrenn
Copy link

Desired Capabilities is deprecated and can't be used anymore, how can I achieve this without it?

@Newtoniano
Copy link

@danbailo
Copy link

danbailo commented Oct 7, 2023

Hello guys, for Selenium 4.x use it

driver.options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})
driver.get(url)

then just follow the steps from line 24+

Works for selenium 4.13.0

@tinyhare
Copy link

for Selenium 4.15 set the option:

options = webdriver.ChromeOptions()
options.set_capability(
            "goog:loggingPrefs", {"performance": "ALL"}
        )
driver = webdriver.Chrome(options=options)

@nathan-fiscaletti
Copy link

nathan-fiscaletti commented Jan 25, 2024

Something I noticed is that you need to filter out Preflight requests.

if event['params']['type'] != 'Preflight':
    . . .

Otherwise, you might get this error:

{"code":-32000,"message":"No resource with given identifier found"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment