Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active February 15, 2023 15:59
Show Gist options
  • Star 70 You must be signed in to star a gist
  • Fork 19 You must be signed in to fork a gist
  • Save dannguyen/61af3c7cbdef9d04a5fe to your computer and use it in GitHub Desktop.
Save dannguyen/61af3c7cbdef9d04a5fe to your computer and use it in GitHub Desktop.
Using Selenium and Python to screenshot a javascript-heavy page

Using Selenium and Python to screenshot a javascript-heavy page

As websites become more JavaScript heavy, it's harder to automate things like screenshotting for archival purposes. I've seen examples and suggestions to use PhantomJS for visual testing/archiving of websites, but have run into issues such as the non-rendering of webfonts. I've never tried out Selenium until today...and while I'm not thinking about performance implications yet, Selenium seems far more accurate than PhantomJS...which makes sense since it actually opens a real browser. And it's not too hard to script to do complex interactions: here's an example of how to log in to Twitter, write a tweet, upload an image, and send a tweet via Selenium and DOM element selection...Obviously, you shouldn't be automating Twitter via browser when the API and tweepy are so much better for that, though it is fun to watch the browser go through the steps without you touching a thing.

The example snippet below, which is not particularly well coded, opens up YouTube's homepage and clunkily scrolls to the bottom, triggering the AJAX events that load video previews below the browser fold. It then "clicks" the Load more button, scrolls to the bottom, then scrolls back up before taking a screenshot of the entire page:

(note: I realize my arithmetic is crap. oh well)

from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get("https://www.youtube.com")


# scroll some more
for isec in (4, 3, 2, 1):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
    sleep(1)

# load more
sleep(2)
print("push Load more...")
driver.find_element_by_css_selector('button.load-more-button').click()

print("wait a bit...")
sleep(2)

print("Jump to the bottom, work our way back up")
for isec in (1, 2, 3, 4, 5):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
    sleep(1)

driver.execute_script("window.scrollTo(0, 0)")
print("Pausin a bit...")
sleep(2)
print("Scrollin to the top so that the nav bar isn't funny looking")
driver.execute_script("window.scrollTo(0, 0);")


sleep(1)
print("Screenshotting...")
# screenshot
driver.save_screenshot("/tmp/youtube.com.jpg")

Result

image

Bloomberg what is code

Firefox crashes when trying to screenshot a page as big as Bloomberg's What is Code? Installing the chromedriver to run Chrome mitigates part of the issue...however, Chrome only captures the viewport:

(partial code in progress)

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.implicitly_wait(5) # this is the preferred way to wait for things
driver.get("http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/")
driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")

# # http://stackoverflow.com/questions/30648765/screen-capture-error-what-does-it-mean
# # brew install chromedriver



# # scroll some more
# for n in range(30):
#     inc = round((n + 1) / 30, 2)
#     driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
#     sleep(0.2)

# # work our way up
# for n in range(5):
#     inc = round((5 - (n + 1)) / 5, 2)
#     driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
#     sleep(0.2)
# sleep(1)
# print("Screenshotting...")
# # screenshot
# driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")
@tomleo
Copy link

tomleo commented Mar 24, 2016

Do you have a method for dealing with "sticky" templates?

@fmartingr
Copy link

@tomleo Maybe you can inject some fixed css to them? (or hide them)

@Viach
Copy link

Viach commented Jul 18, 2017

when you save screenshot:

driver.save_screenshot("/tmp/youtube.com.jpg")

is output format really JPG or PNG instead?

@QA-Rahul
Copy link

QA-Rahul commented Feb 8, 2020

I used same code with Firefox and used youtube but it does not take full screenshot. Any update on code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment