Skip to content

Instantly share code, notes, and snippets.

@rengler33
Last active May 1, 2024 05:50
Show Gist options
  • Star 40 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save rengler33/f8b9d3f26a518c08a414f6f86109863c to your computer and use it in GitHub Desktop.
Save rengler33/f8b9d3f26a518c08a414f6f86109863c to your computer and use it in GitHub Desktop.
How to Capture Network Traffic When Scraping with Selenium & Python
# see rkengler.com for related blog post
# https://www.rkengler.com/how-to-capture-network-traffic-when-scraping-with-selenium-and-python/
import json
import pprint
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
# capabilities["loggingPrefs"] = {"performance": "ALL"} # chromedriver < ~75
capabilities["goog:loggingPrefs"] = {"performance": "ALL"} # chromedriver 75+
driver = webdriver.Chrome(
r"chromedriver.exe",
desired_capabilities=capabilities,
)
def process_browser_logs_for_network_events(logs):
"""
Return only logs which have a method that start with "Network.response", "Network.request", or "Network.webSocket"
since we're interested in the network events specifically.
"""
for entry in logs:
log = json.loads(entry["message"])["message"]
if (
"Network.response" in log["method"]
or "Network.request" in log["method"]
or "Network.webSocket" in log["method"]
):
yield log
driver.get("https://www.rkengler.com")
logs = driver.get_log("performance")
events = process_browser_logs_for_network_events(logs)
with open("log_entries.txt", "wt") as out:
for event in events:
pprint.pprint(event, stream=out)
@naveenmandepudi
Copy link

Hi, this tool is cool and i would like to see this extended for capturing HAR files if possible.
Thanks in advance,
Naveen

@Zalasyu
Copy link

Zalasyu commented Sep 10, 2021

So you want the script extended for capturing HAR files because you are exporting your web traffic?

Copy link

ghost commented Mar 17, 2022

good!!!

@danieln-12
Copy link

How can I make it capture a specific Request URL Only?

@AbinashNS
Copy link

AbinashNS commented Dec 15, 2022

use this to get bearer token
logs = driver.get_log("performance")
for entry in logs:
if "Bearer" in str(entry["message"]):
token = (entry["message"].split()[3]).split('"')[0]
print(token)
break

@damiantrx
Copy link

damiantrx commented Apr 30, 2023

use this to get bearer token logs = driver.get_log("performance") for entry in logs: if "Bearer" in str(entry["message"]): token = (entry["message"].split()[3]).split('"')[0] print(token) break

Maybe my answer is a bit late, but I have encountered the problem of getting a token in this way when we use chromium headless.
The logs are in json format, so we can use a solution like this:

logs = browser.get_log("performance")
for entry in logs:
    if "Bearer" in str(entry["message"]):
        json_message_data = json.loads(str(entry["message"]))
        authorization_json = json_message_data['message']['params']['request']['headers']['Authorization']
        print(authorization_json)
        break

Result will be: Bearer xxxxxxx

@pichumeta
Copy link

whwere is the response i cant find it for any of the requests

@ilyasKerbal
Copy link

If anyone needs this for Kotlin

val options = ChromeOptions()
options.setCapability(ChromeOptions.LOGGING_PREFS, mapOf("performance" to "ALL"))

val driver = ChromeDriver(options)
/*
Do some stuff here .... get post ..
*/
val logs: Logs = driver.manage().logs()
val performance = logs.get("performance")

@masummuhammad
Copy link

This code Won't work on latest selenium. Try selenium==4.9.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment