Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dimitryzub/a72ca1af2cf4248d4506be8895a83cc7 to your computer and use it in GitHub Desktop.
Save dimitryzub/a72ca1af2cf4248d4506be8895a83cc7 to your computer and use it in GitHub Desktop.
Scrape Google Scholar Co-Authors Results with Python
from bs4 import BeautifulSoup
import requests, lxml, os
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
proxies = {
'http': os.getenv('HTTP_PROXY')
}
html = requests.get('https://scholar.google.com/citations?hl=en&user=m8dFEawAAAAJ', headers=headers, proxies=proxies).text
soup = BeautifulSoup(html, 'lxml')
for container in soup.select('.gsc_rsb_aa'):
author_name = container.select_one('#gsc_rsb_co a').text
author_affiliations = container.select_one('.gsc_rsb_a_ext').text
author_link = container.select_one('#gsc_rsb_co a')['href']
print(f'{author_name}\n{author_affiliations}\nhttps://scholar.google.com{author_link}\n')
# Part of the output:
'''
Christoph Benzmüller
Professor, FU Berlin
https://scholar.google.com/citations?user=zD0vtfwAAAAJ&hl=en
Pascal Fontaine
LORIA, INRIA, Université de Lorraine, Nancy, France
https://scholar.google.com/citations?user=gHe6EF8AAAAJ&hl=en
Stephan Merz
Senior Researcher, INRIA
https://scholar.google.com/citations?user=jaO3Z3wAAAAJ&hl=en
'''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment