Instantly share code, notes, and snippets.

Embed
What would you like to do?
Using Selenium and Python to screenshot a javascript-heavy page

Using Selenium and Python to screenshot a javascript-heavy page

As websites become more JavaScript heavy, it's harder to automate things like screenshotting for archival purposes. I've seen examples and suggestions to use PhantomJS for visual testing/archiving of websites, but have run into issues such as the non-rendering of webfonts. I've never tried out Selenium until today...and while I'm not thinking about performance implications yet, Selenium seems far more accurate than PhantomJS...which makes sense since it actually opens a real browser. And it's not too hard to script to do complex interactions: here's an example of how to log in to Twitter, write a tweet, upload an image, and send a tweet via Selenium and DOM element selection...Obviously, you shouldn't be automating Twitter via browser when the API and tweepy are so much better for that, though it is fun to watch the browser go through the steps without you touching a thing.

The example snippet below, which is not particularly well coded, opens up YouTube's homepage and clunkily scrolls to the bottom, triggering the AJAX events that load video previews below the browser fold. It then "clicks" the Load more button, scrolls to the bottom, then scrolls back up before taking a screenshot of the entire page:

(note: I realize my arithmetic is crap. oh well)

from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get("https://www.youtube.com")


# scroll some more
for isec in (4, 3, 2, 1):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
    sleep(1)

# load more
sleep(2)
print("push Load more...")
driver.find_element_by_css_selector('button.load-more-button').click()

print("wait a bit...")
sleep(2)

print("Jump to the bottom, work our way back up")
for isec in (1, 2, 3, 4, 5):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
    sleep(1)

driver.execute_script("window.scrollTo(0, 0)")
print("Pausin a bit...")
sleep(2)
print("Scrollin to the top so that the nav bar isn't funny looking")
driver.execute_script("window.scrollTo(0, 0);")


sleep(1)
print("Screenshotting...")
# screenshot
driver.save_screenshot("/tmp/youtube.com.jpg")

Result

image

Bloomberg what is code

Firefox crashes when trying to screenshot a page as big as Bloomberg's What is Code? Installing the chromedriver to run Chrome mitigates part of the issue...however, Chrome only captures the viewport:

(partial code in progress)

from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.implicitly_wait(5) # this is the preferred way to wait for things
driver.get("http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/")
driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")

# # http://stackoverflow.com/questions/30648765/screen-capture-error-what-does-it-mean
# # brew install chromedriver



# # scroll some more
# for n in range(30):
#     inc = round((n + 1) / 30, 2)
#     driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
#     sleep(0.2)

# # work our way up
# for n in range(5):
#     inc = round((5 - (n + 1)) / 5, 2)
#     driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
#     sleep(0.2)
# sleep(1)
# print("Screenshotting...")
# # screenshot
# driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")
@tomleo

This comment has been minimized.

tomleo commented Mar 24, 2016

Do you have a method for dealing with "sticky" templates?

@fmartingr

This comment has been minimized.

fmartingr commented Apr 1, 2016

@tomleo Maybe you can inject some fixed css to them? (or hide them)

@Viach

This comment has been minimized.

Viach commented Jul 18, 2017

when you save screenshot:

driver.save_screenshot("/tmp/youtube.com.jpg")

is output format really JPG or PNG instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment