Skip to content

Instantly share code, notes, and snippets.

@haveaguess
Created October 8, 2013 22:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save haveaguess/6892624 to your computer and use it in GitHub Desktop.
Save haveaguess/6892624 to your computer and use it in GitHub Desktop.
example
class DmozSpider(BaseSpider):
name = "twitter.com"
name = "dmoz"
allowed_domains = ["codinginmysleep.com"]
start_urls = [
"http://codinginmysleep.com"
# "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
]
def parse(self, response):
#logging.warning(response.body)
hxs = HtmlXPathSelector(response)
links = hxs.select("//a/@href")
items = []
for link in links:
text = link.extract()
getLinks(text)
item = BlogscrapeItem()
item['link'] = text
items.append(item)
return items
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment