Skip to content

Instantly share code, notes, and snippets.

@diyclassics
Created June 4, 2018 19:39
Show Gist options
  • Save diyclassics/feb754ceb982d85a272248692e92405d to your computer and use it in GitHub Desktop.
Save diyclassics/feb754ceb982d85a272248692e92405d to your computer and use it in GitHub Desktop.
Extract DC.Creator elements by url
import requests
from lxml import html
url = 'http://dlib.nyu.edu/awdl/isaw/isaw-papers/13/'
page = requests.get(url)
html_content = html.fromstring(page.content)
creator_elements = html_content.xpath('//meta[@name="DC.creator"]')
creators = []
for creator in creator_elements:
creators.append(creator.attrib['content'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment