Skip to content

Instantly share code, notes, and snippets.

@abkosar
Created May 28, 2016 18:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abkosar/25ac74bd59fd541875bc6b8b26b15413 to your computer and use it in GitHub Desktop.
Save abkosar/25ac74bd59fd541875bc6b8b26b15413 to your computer and use it in GitHub Desktop.
Collecting info
for href in urls:
print href
url = href.extract()
self.driver.get(url)
response = TextResponse(url=self.driver.current_url, body=self.driver.page_source, encoding='utf-8')
item = IndeedItem()
for sel in response.xpath('//div[@class="col-md-5 col-lg-6"]'):
item['job_title'] = sel.xpath('//div[@class="col-md-5 col-lg-6"]/h1/text()').extract()
item['location'] = sel.xpath('//div[@class="col-md-5 col-lg-6"]/ul/li[2]/text()').extract()
item['company_name'] = sel.xpath('//div[@class="col-md-5 col-lg-6"]/ul/li[1]/a/text()').extract()
for sel_1 in response.xpath('//*[@id="bd"]/div/div[1]'):
item['job_type'] = sel_1.xpath('//div[2]/div/div[2]/span/text()').extract()
item['job_salary'] = sel_1.xpath('//div[3]/div/div[2]/span/text()').extract()
yield item
self.driver.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment