Skip to content

Instantly share code, notes, and snippets.

@ilovefreesw
Last active May 4, 2023 09:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save ilovefreesw/36587762f3239162a4c1acef5e759822 to your computer and use it in GitHub Desktop.
Save ilovefreesw/36587762f3239162a4c1acef5e759822 to your computer and use it in GitHub Desktop.
A Python-Selenium script to bulk take screenshots of webpage using headless Chrome by reading a text file full of URLs Tutorial: https://www.ilovefreesoftware.com/26/tutorial/how-to-take-full-page-screenshot-in-bulk-from-multiple-urls.html
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
from selenium.webdriver.common.by import By
from tqdm import tqdm
import time
lines = []
Links_File = r''
OP_DIR = r''
i = 1
S = lambda X: driver.execute_script('return document.body.scrollHeight') + X
with open(Links_File, "r") as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument('--log-level=3')
driver = webdriver.Chrome(options=options)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
for link in tqdm(lines, ncols=65):
try:
driver.get(link)
time.sleep(5)
driver.set_window_size(1024,S(0)) # May need manual adjustment
driver.find_element(By.TAG_NAME,"body").screenshot(f'{OP_DIR}\{i}.png')
i = i + 1
except WebDriverException:
print(link)
continue
driver.quit()
@fazio79
Copy link

fazio79 commented Jan 10, 2023

Thank you following yours suggestions worked fine!

You can do it in two ways.

  1. Inject JavaScript based on website you are taking screenshot of.
  2. Load Chrome with an extension installed that will block the cookie and other popups. Try with https://adlock.com/ or https://crumbs.org/en/
    You will need CRX file of any of these extensions that you can get using this: https://chrome.google.com/webstore/detail/get-crx/dijpllakibenlejkbajahncialkbdkjc

Now, you can load the extension using CRX like this:

options.add_argument('pathToCRX')

Add this after line 18 and update PATH to the CRX file of the extension.

Thank you following yours suggestions worked fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment