Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A Python-Selenium script to bulk take screenshots of webpage using headless Chrome by reading a text file full of URLs
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
from tqdm import tqdm
import time
lines = []
Links_File = r''
OP_DIR = r''
i = 1
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
with open(Links_File, "r") as f:
lines = f.readlines()
lines = [line.rstrip() for line in lines]
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument('--log-level=3')
driver = webdriver.Chrome(options=options)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
for link in tqdm(lines, ncols=65):
try:
driver.get(link)
time.sleep(5)
driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment
driver.find_element_by_tag_name('body').screenshot(f'{OP_DIR}\{i}.png')
i = i + 1
except WebDriverException:
print(link)
continue
driver.quit()
@okabak123
Copy link

okabak123 commented Mar 7, 2022

@ilovefreesw How can I change the width but leave the height auto? I get this error:
Traceback (most recent call last): File "c:\Users\ozgun\OneDrive\Desktop\code\bulk_webpage_screenshots\bulk_webpage_screenshots.py", line 27, in <module> driver.set_window_size(S('1440'),S('Height')) # May need manual adjustment File "C:\Users\ozgun\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1386, in set_window_size self.set_window_rect(width=int(width), height=int(height)) TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

@ilovefreesw
Copy link
Author

ilovefreesw commented Mar 9, 2022

@ilovefreesw How can I change the width but leave the height auto? I get this error: Traceback (most recent call last): File "c:\Users\ozgun\OneDrive\Desktop\code\bulk_webpage_screenshots\bulk_webpage_screenshots.py", line 27, in <module> driver.set_window_size(S('1440'),S('Height')) # May need manual adjustment File "C:\Users\ozgun\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1386, in set_window_size self.set_window_rect(width=int(width), height=int(height)) TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

To my knowledge, it is not possible. You will have to manually set it. I usually work with 3500

@thebotmakercom
Copy link

thebotmakercom commented Apr 24, 2022

hi this works well.

but it takes a screenshot of the login page.

how can i take a screenshot of a page that requires the user to be logged in?

thank you for sharing and the hard work.

@ilovefreesw
Copy link
Author

ilovefreesw commented May 19, 2022

hi this works well.

but it takes a screenshot of the login page.

how can i take a screenshot of a page that requires the user to be logged in?

thank you for sharing and the hard work.

This is not meant for that. To log into a website, there needs be added more lines of code based on what the website type is.

@Unscrew5772
Copy link

Unscrew5772 commented Aug 24, 2022

Doesn't work for me. is it due to Selenium removing that find element by tag?

@FunFair-chiraagshah
Copy link

FunFair-chiraagshah commented Sep 21, 2022

Doesn't work for me. is it due to Selenium removing that find element by tag?

I used the following:
driver.find_element(By.TAG_NAME, 'body')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment