Skip to content

Instantly share code, notes, and snippets.

@neilernst
Created November 30, 2015 17:39
Show Gist options
  • Save neilernst/75a89199f23ecf16c200 to your computer and use it in GitHub Desktop.
Save neilernst/75a89199f23ecf16c200 to your computer and use it in GitHub Desktop.
Retrieves a list of authors by screen-scraping DBLP tags. Currently set for SCAM, change base_url for your conference. Runs best with Python3 due to Unicode names.
from lxml import html
import requests
import operator
authors_dict = {}
base_url = 'http://dblp.uni-trier.de/db/conf/scam/scam'
for year in range(2000,2016):
page = requests.get(base_url + str(year) + '.html')
tree = html.fromstring(page.content)
authors = tree.xpath('//span[@itemprop="author"]/a/span[@itemprop="name"]/text()')
for author in authors:
if author in authors_dict.keys():
authors_dict[author] = authors_dict[author] + 1
else :
authors_dict[author] = 1
sorted_x = sorted(authors_dict.items(), key=operator.itemgetter(1))
for x in sorted_x:
print (x[0] + ',' + str(x[1]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment