Created
May 26, 2014 07:39
-
-
Save shazeline/d9881d06be31a59a93d3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Herbert Simon 153 | |
Terrence Sejnowski 123 | |
Hector Garcia-Molina 120 | |
John R Anderson 118 | |
Wil Van Der Aalst 108 | |
Nick Jennings 101 | |
Alon Halevy 88 | |
Amit Sheth 85 | |
Moshe Y. Vardi 85 | |
Sushil Jajodia 82 | |
Tim Finin 81 | |
Alice Eagly 78 | |
Katia Sycara 77 | |
Judea Pearl 75 | |
Mark Musen 74 | |
Michael Wooldridge 74 | |
Steffen Staab 74 | |
Allen Newell 73 | |
John Mylopoulos 73 | |
Robert M Gray 71 | |
Daniel Weld 71 | |
Oren Etzioni 68 | |
Krishna Saraswat 68 | |
Qiang Yang 68 | |
Pedro Domingos 68 | |
Anupam Joshi 66 | |
Ravi Sandhu 66 | |
Manuela M. Veloso 66 | |
Raymond Mooney 66 | |
Tuomas Sandholm 66 | |
Trevor Darrell 66 | |
K Anders Ericsson 66 | |
Stuart K. Card 65 | |
Tom Mitchell 64 | |
Georg Gottlob 63 | |
Stefano Ceri 63 | |
Noam Nisan 61 | |
Russ Altman 61 | |
Salvatore Stolfo 60 | |
Munindar P Singh 59 | |
Peter Stone 59 | |
Edmund Kieran Burke 59 | |
Frank Van Harmelen 59 | |
Craig A. Knoblock 59 | |
Beatrice De Gelder 59 | |
Paschal Sheeran 59 | |
Milind Tambe 58 | |
Yi Chen 58 | |
Pierangela Samarati 58 | |
Gio Wiederhold 57 | |
Michelene T.H. Chi 57 | |
William W. Cohen 56 | |
Edward Marcotte 56 | |
Craig Boutilier 55 | |
Deborah L. Mcguinness 55 | |
Wolfgang Nejdl 55 | |
Stefan Decker 55 | |
Thomas P. Moran 55 | |
Benjamin Kuipers 54 | |
Yin Zhang (章寅) 54 | |
William Hersh 54 | |
Jeffry A. Simpson 54 | |
Toby Walsh 53 | |
Peter F. Patel-Schneider 53 | |
Nigel Shadbolt 53 | |
Kurt Vanlehn 52 | |
Sarit Kraus 52 | |
Fabio Casati 52 | |
Inderjit S. Dhillon 52 | |
Richard Snodgrass 52 | |
Robin Miles Hogarth 51 | |
Bruce Buchanan 51 | |
Riccardo Poli 51 | |
Joydeep Ghosh 50 | |
H. Van Dyke Parunak 49 | |
Salim Roukos, Salim Roucos 49 | |
Wendy Wood 48 | |
Fausto Giunchiglia 47 | |
Lynne Reder 47 | |
Henry Lieberman 47 | |
Shamkant B Navathe 47 | |
Rada Mihalcea 47 | |
Diane Litman 47 | |
Martha Stone Palmer 47 | |
M Dahlin 47 | |
Michael Luck 46 | |
Stephen Roberts 46 | |
Jonathan Schaeffer 46 | |
Peter GäRdenfors 46 | |
Edward Y. Chang 張智威 46 | |
Franco Zambonelli 46 | |
Bruce Donald 46 | |
Li Ding 46 | |
Robert F. Murphy 45 | |
David Klahr 45 | |
Simon Parsons 45 | |
John Laird 45 | |
Frank Dignum 45 | |
Carles Sierra 45 | |
Stacy Marsella 45 | |
Jude Shavlik 45 | |
John Mccarthy 45 | |
Yi Sun 45 | |
Frank Wolter 45 | |
Aaron Sloman 45 | |
Dragomir Radev 45 | |
Erhard Rahm 45 | |
Patrick Hayes 44 | |
Jean CôTé 44 | |
Ram Duvvuru Sriram 44 | |
Sven Koenig 44 | |
Kewen Wang 44 | |
Samson Tu 44 | |
Weixiong Zhang 44 | |
Foster Provost 44 | |
Jeffrey S. Rosenschein 43 | |
David De Roure 43 | |
Pat Hayes 43 | |
David C. Parkes 43 | |
Janyce Wiebe 43 | |
Sharon Oviatt 43 | |
Krzysztof R. Apt 43 | |
Alessandro Cimatti 42 | |
Alice F. Healy 42 | |
Wiebe Van Der Hoek 42 | |
Barbara Pernici 42 | |
Lise Getoor 42 | |
Jonathan Gratch 42 | |
Ellen Riloff 41 | |
Yolanda Gil 41 | |
Paolo Giorgini 41 | |
David Traum 41 | |
William F Brewer 41 | |
Carsten Lutz 41 | |
Marco Pistore 41 | |
Walter Daelemans 40 | |
Luc Moreau 40 | |
Steven N. Minton 40 | |
Paul Rosenbloom 40 | |
Bonnie E. John 40 | |
Heiner Stuckenschmidt 40 | |
Claire Cardie 40 | |
Catherine Pelachaud 40 | |
Nils Nilsson 40 | |
Vincent Conitzer 40 | |
Makoto Yokoo 40 | |
Marianne Winslett 40 | |
JéRôMe Euzenat 40 | |
Cyrus Shahabi 40 | |
Subbarao Kambhampati 40 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
import requests | |
from bs4 import BeautifulSoup | |
import operator | |
headers = { | |
'User-Agent': 'Mozilla/5.0', | |
} | |
def crawl(id): | |
url = 'http://scholar.google.com/citations?user=%s&hl=en' % id | |
response = requests.get(url, headers=headers) | |
raw_html = response.text.encode('utf8') | |
soup = BeautifulSoup(raw_html) | |
name = soup.find('span', {'id': 'cit-name-display'}).text.encode('utf8') | |
metrics = soup.findAll('td',{'class':'cit-borderleft cit-data'}) | |
h_index = metrics[2].text | |
links = [get_id_from_url(link.get('href')) for link in soup.findAll('a') if '/citations?user' in link.get('href')] | |
return str(name), int(h_index), links | |
def get_id_from_url(url): | |
start = url.find('user=') | |
end = url.find('&hl=en') | |
return str(url[start+5:end]) | |
visited = set() | |
start_id = '9d7rMrkAAAAJ' | |
q = [] | |
q.append(start_id) | |
data = {} | |
while len(q) > 0 and len(data) < 150: | |
node = q.pop(0) | |
if node in visited: | |
continue | |
visited.add(node) | |
name, h_index, neighbors = crawl(node) | |
if h_index >= 40: | |
data[name] = h_index | |
q += neighbors | |
print name, h_index, len(data) | |
print '-----' | |
sort = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True) | |
for name, score in sort: | |
print name.lower().title(), score |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment