Skip to content

Instantly share code, notes, and snippets.

@Kasahs
Created July 2, 2015 06:26
Show Gist options
  • Save Kasahs/27890a48ff129ef648ac to your computer and use it in GitHub Desktop.
Save Kasahs/27890a48ff129ef648ac to your computer and use it in GitHub Desktop.
Use selenium with phantomjs (with custom capabilities) for screen scraping
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from bs4 import BeautifulSoup
# edit desired capabilities
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87"
)
dcap['pahntomjs.page.settings.loadImages'] = False
driver = webdriver.PhantomJS('/path/to/bin/phantomjs', desired_capabilities=dcap)
driver.get('http://scarpe.this.url/please')
soup = BeautifulSoup(driver.page_source)
# do stuff with your soup
# useful links
# https://coderwall.com/p/9jgaeq/set-phantomjs-user-agent-string
# http://stackoverflow.com/a/15699761 # phantomjs + selenium example
# http://stackoverflow.com/a/6300672 # link for using selenium with xvfb (virtual display)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment