Skip to content

Instantly share code, notes, and snippets.

@barseghyanartur
Last active January 23, 2019 15:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save barseghyanartur/8a63fb8e71181b9c4657d2d2b48156f3 to your computer and use it in GitHub Desktop.
Save barseghyanartur/8a63fb8e71181b9c4657d2d2b48156f3 to your computer and use it in GitHub Desktop.
XPath recipes

Tutorials

Code examples

The scrapy way of iterating through the response.

import requests

from scrapy.selector import Selector

url = 'https://f4e.europa.eu/careers/vacancies/Default.aspx'
response = requests.get(url)

items = Selector(text=response.text).xpath(
    "//div[contains(@class, 'careersPurple')]"
    "//div[contains(@class, 'careerList')]"
    "//div[contains(@class, 'careersItemPanel')]"
).extract()
for item in items:
    title = item.select("//div[contains(@class, 'careersTitle')]//text()") \
                .extract_first() \
                .strip()

You could also do it using lxml:

import requests

from lxml.html.soupparser import fromstring

url = 'https://f4e.europa.eu/careers/vacancies/Default.aspx'
response = requests.get(url)

tree = fromstring(response.content)

items = tree.xpath(
    "//div[contains(@class, 'careersPurple')]"
    "//div[contains(@class, 'careerList')]"
    "//div[contains(@class, 'careersItemPanel')]"
)

for item in items:
    title = item.xpath(
        "self::*//div[contains(@class, 'careersTitle')]//text()"
    )[0].strip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment