Skip to content

Instantly share code, notes, and snippets.

@irfani
Created June 24, 2011 16:10
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save irfani/1045108 to your computer and use it in GitHub Desktop.
Save irfani/1045108 to your computer and use it in GitHub Desktop.
Scrapyd with Selenium Spider
from selenium import selenium
from scrapy.spider import BaseSpider
from scrapy.http import Request
import time
import lxml.html
class SeleniumSprider(BaseSpider):
name = "selenium"
allowed_domains = ['selenium.com']
start_urls = ["http://localhost"]
def __init__(self, **kwargs):
print kwargs
self.sel = selenium("localhost", 4444, "*firefox","http://selenium.com/")
self.sel.start()
def parse(self, response):
sel = self.sel
sel.open("/index.aspx")
sel.click("id=radioButton1")
sel.select("genderOpt", "value=male")
sel.type("nameTxt", "irfani")
sel.click("link=Submit")
time.sleep(1) #wait a second for page to load
root = lxml.html.fromstring(sel.get_html_source())
@lesolorzanov
Copy link

Quick question, are the allowed domains the same one you put for the definition of the sel object?
Why is the start url localhost? does it mean localhost from selenium.com or the localhost of your machine? (for some strange reason)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment