Skip to content

Instantly share code, notes, and snippets.

@shazeline
Created May 26, 2014 07:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shazeline/d9881d06be31a59a93d3 to your computer and use it in GitHub Desktop.
Save shazeline/d9881d06be31a59a93d3 to your computer and use it in GitHub Desktop.
Herbert Simon 153
Terrence Sejnowski 123
Hector Garcia-Molina 120
John R Anderson 118
Wil Van Der Aalst 108
Nick Jennings 101
Alon Halevy 88
Amit Sheth 85
Moshe Y. Vardi 85
Sushil Jajodia 82
Tim Finin 81
Alice Eagly 78
Katia Sycara 77
Judea Pearl 75
Mark Musen 74
Michael Wooldridge 74
Steffen Staab 74
Allen Newell 73
John Mylopoulos 73
Robert M Gray 71
Daniel Weld 71
Oren Etzioni 68
Krishna Saraswat 68
Qiang Yang 68
Pedro Domingos 68
Anupam Joshi 66
Ravi Sandhu 66
Manuela M. Veloso 66
Raymond Mooney 66
Tuomas Sandholm 66
Trevor Darrell 66
K Anders Ericsson 66
Stuart K. Card 65
Tom Mitchell 64
Georg Gottlob 63
Stefano Ceri 63
Noam Nisan 61
Russ Altman 61
Salvatore Stolfo 60
Munindar P Singh 59
Peter Stone 59
Edmund Kieran Burke 59
Frank Van Harmelen 59
Craig A. Knoblock 59
Beatrice De Gelder 59
Paschal Sheeran 59
Milind Tambe 58
Yi Chen 58
Pierangela Samarati 58
Gio Wiederhold 57
Michelene T.H. Chi 57
William W. Cohen 56
Edward Marcotte 56
Craig Boutilier 55
Deborah L. Mcguinness 55
Wolfgang Nejdl 55
Stefan Decker 55
Thomas P. Moran 55
Benjamin Kuipers 54
Yin Zhang (章寅) 54
William Hersh 54
Jeffry A. Simpson 54
Toby Walsh 53
Peter F. Patel-Schneider 53
Nigel Shadbolt 53
Kurt Vanlehn 52
Sarit Kraus 52
Fabio Casati 52
Inderjit S. Dhillon 52
Richard Snodgrass 52
Robin Miles Hogarth 51
Bruce Buchanan 51
Riccardo Poli 51
Joydeep Ghosh 50
H. Van Dyke Parunak 49
Salim Roukos, Salim Roucos 49
Wendy Wood 48
Fausto Giunchiglia 47
Lynne Reder 47
Henry Lieberman 47
Shamkant B Navathe 47
Rada Mihalcea 47
Diane Litman 47
Martha Stone Palmer 47
M Dahlin 47
Michael Luck 46
Stephen Roberts 46
Jonathan Schaeffer 46
Peter GäRdenfors 46
Edward Y. Chang 張智威 46
Franco Zambonelli 46
Bruce Donald 46
Li Ding 46
Robert F. Murphy 45
David Klahr 45
Simon Parsons 45
John Laird 45
Frank Dignum 45
Carles Sierra 45
Stacy Marsella 45
Jude Shavlik 45
John Mccarthy 45
Yi Sun 45
Frank Wolter 45
Aaron Sloman 45
Dragomir Radev 45
Erhard Rahm 45
Patrick Hayes 44
Jean CôTé 44
Ram Duvvuru Sriram 44
Sven Koenig 44
Kewen Wang 44
Samson Tu 44
Weixiong Zhang 44
Foster Provost 44
Jeffrey S. Rosenschein 43
David De Roure 43
Pat Hayes 43
David C. Parkes 43
Janyce Wiebe 43
Sharon Oviatt 43
Krzysztof R. Apt 43
Alessandro Cimatti 42
Alice F. Healy 42
Wiebe Van Der Hoek 42
Barbara Pernici 42
Lise Getoor 42
Jonathan Gratch 42
Ellen Riloff 41
Yolanda Gil 41
Paolo Giorgini 41
David Traum 41
William F Brewer 41
Carsten Lutz 41
Marco Pistore 41
Walter Daelemans 40
Luc Moreau 40
Steven N. Minton 40
Paul Rosenbloom 40
Bonnie E. John 40
Heiner Stuckenschmidt 40
Claire Cardie 40
Catherine Pelachaud 40
Nils Nilsson 40
Vincent Conitzer 40
Makoto Yokoo 40
Marianne Winslett 40
JéRôMe Euzenat 40
Cyrus Shahabi 40
Subbarao Kambhampati 40
import re
import requests
from bs4 import BeautifulSoup
import operator
headers = {
'User-Agent': 'Mozilla/5.0',
}
def crawl(id):
url = 'http://scholar.google.com/citations?user=%s&hl=en' % id
response = requests.get(url, headers=headers)
raw_html = response.text.encode('utf8')
soup = BeautifulSoup(raw_html)
name = soup.find('span', {'id': 'cit-name-display'}).text.encode('utf8')
metrics = soup.findAll('td',{'class':'cit-borderleft cit-data'})
h_index = metrics[2].text
links = [get_id_from_url(link.get('href')) for link in soup.findAll('a') if '/citations?user' in link.get('href')]
return str(name), int(h_index), links
def get_id_from_url(url):
start = url.find('user=')
end = url.find('&hl=en')
return str(url[start+5:end])
visited = set()
start_id = '9d7rMrkAAAAJ'
q = []
q.append(start_id)
data = {}
while len(q) > 0 and len(data) < 150:
node = q.pop(0)
if node in visited:
continue
visited.add(node)
name, h_index, neighbors = crawl(node)
if h_index >= 40:
data[name] = h_index
q += neighbors
print name, h_index, len(data)
print '-----'
sort = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True)
for name, score in sort:
print name.lower().title(), score
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment