Skip to content

Instantly share code, notes, and snippets.

@khunreus
Last active March 25, 2019 04:43
Show Gist options
  • Save khunreus/6bb641b7b68afd60579af0de408153a8 to your computer and use it in GitHub Desktop.
Save khunreus/6bb641b7b68afd60579af0de408153a8 to your computer and use it in GitHub Desktop.
"""
python 3.6
Scrapy + Selenium
"""
scrapy_selector = Selector(text = self.driver.page_source)
homes_selector = scrapy_selector.xpath('//*[@itemtype="http://schema.org/ListItem"]')
self.logger.info('Theres a total of ' + str(len(homes_selector)) + ' links.')
profile_urls_distinct = []
try:
s = 0
for home_selector in homes_selector:
url = home_selector.xpath('//*[@itemprop = "url"]/@content').extract()[s]
if '/rooms/plus/' not in url:
profile_url = 'https://' + url.replace('adults=0&children=0&infants=0&guests=0','adults=1&guests=1&toddlers=0')
profile_urls_distinct.append(profile_url)
s = s+1
else:
s = s+1
except:
self.logger.info('Reached last iteration #' + str(s))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment