-
-
Save rengler33/f8b9d3f26a518c08a414f6f86109863c to your computer and use it in GitHub Desktop.
# see rkengler.com for related blog post | |
# https://www.rkengler.com/how-to-capture-network-traffic-when-scraping-with-selenium-and-python/ | |
import json | |
import pprint | |
from selenium import webdriver | |
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities | |
capabilities = DesiredCapabilities.CHROME | |
# capabilities["loggingPrefs"] = {"performance": "ALL"} # chromedriver < ~75 | |
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # chromedriver 75+ | |
driver = webdriver.Chrome( | |
r"chromedriver.exe", | |
desired_capabilities=capabilities, | |
) | |
def process_browser_logs_for_network_events(logs): | |
""" | |
Return only logs which have a method that start with "Network.response", "Network.request", or "Network.webSocket" | |
since we're interested in the network events specifically. | |
""" | |
for entry in logs: | |
log = json.loads(entry["message"])["message"] | |
if ( | |
"Network.response" in log["method"] | |
or "Network.request" in log["method"] | |
or "Network.webSocket" in log["method"] | |
): | |
yield log | |
driver.get("https://www.rkengler.com") | |
logs = driver.get_log("performance") | |
events = process_browser_logs_for_network_events(logs) | |
with open("log_entries.txt", "wt") as out: | |
for event in events: | |
pprint.pprint(event, stream=out) |
So you want the script extended for capturing HAR files because you are exporting your web traffic?
good!!!
How can I make it capture a specific Request URL Only?
use this to get bearer token
logs = driver.get_log("performance")
for entry in logs:
if "Bearer" in str(entry["message"]):
token = (entry["message"].split()[3]).split('"')[0]
print(token)
break
use this to get bearer token logs = driver.get_log("performance") for entry in logs: if "Bearer" in str(entry["message"]): token = (entry["message"].split()[3]).split('"')[0] print(token) break
Maybe my answer is a bit late, but I have encountered the problem of getting a token in this way when we use chromium headless.
The logs are in json format, so we can use a solution like this:
logs = browser.get_log("performance")
for entry in logs:
if "Bearer" in str(entry["message"]):
json_message_data = json.loads(str(entry["message"]))
authorization_json = json_message_data['message']['params']['request']['headers']['Authorization']
print(authorization_json)
break
Result will be: Bearer xxxxxxx
whwere is the response i cant find it for any of the requests
If anyone needs this for Kotlin
val options = ChromeOptions()
options.setCapability(ChromeOptions.LOGGING_PREFS, mapOf("performance" to "ALL"))
val driver = ChromeDriver(options)
/*
Do some stuff here .... get post ..
*/
val logs: Logs = driver.manage().logs()
val performance = logs.get("performance")
This code Won't work on latest selenium. Try selenium==4.9.1
Hi, this tool is cool and i would like to see this extended for capturing HAR files if possible.
Thanks in advance,
Naveen