Skip to content

Instantly share code, notes, and snippets.

@siwells
Created May 17, 2013 16:34
Show Gist options
  • Save siwells/5600303 to your computer and use it in GitHub Desktop.
Save siwells/5600303 to your computer and use it in GitHub Desktop.
A little script to retrieve and print my total number of citations, h-index, and i10 number from Google Scholar. Only prerequisite is BeautifulSoup which does the heavy lifting/HTML parsing.
import urllib2
from BeautifulSoup import BeautifulSoup
base_url = "http://scholar.google.co.uk/citations?"
link = "&"
lang = "en"
lang_str = "&hl=" + lang
user = "NJ4EZFwAAAAJ"
user_str = "user="+user
url = base_url + lang_str + link + user_str
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
table = soup.find(id="stats")
citations = 0
hindex = 0
i10 = 0
for idx, row in enumerate(table.findAll('tr')[1:]):
cols = row.findAll('td')
if idx == 0:
citations = "".join(cols[1].contents)
elif idx == 1:
hindex = "".join(cols[1].contents)
elif idx == 2:
i10 = "".join(cols[1].contents)
print citations, hindex, i10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment