Skip to content
All gists
Back to GitHub
Sign in
Sign up
Sign in
Sign up
{{ message }}
Instantly share code, notes, and snippets.
dimitryzub
/
python_scrape_google_scholar_co_authors_results.py
Last active
May 20, 2021 08:25
Star
0
Fork
0
Star
Code
Revisions
2
Embed
What would you like to do?
Embed
Embed this gist in your website.
Share
Copy sharable link for this gist.
Clone via HTTPS
Clone with Git or checkout with SVN using the repository’s web address.
Learn more about clone URLs
Download ZIP
Scrape Google Scholar Co-Authors Results with Python
Raw
python_scrape_google_scholar_co_authors_results.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Show hidden characters
from bs4 import BeautifulSoup
import requests, lxml, os
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
proxies = {
'http': os.getenv('HTTP_PROXY')
}
html = requests.get('https://scholar.google.com/citations?hl=en&user=m8dFEawAAAAJ', headers=headers, proxies=proxies).text
soup = BeautifulSoup(html, 'lxml')
for container in soup.select('.gsc_rsb_aa'):
author_name = container.select_one('#gsc_rsb_co a').text
author_affiliations = container.select_one('.gsc_rsb_a_ext').text
author_link = container.select_one('#gsc_rsb_co a')['href']
print(f'{author_name}\n{author_affiliations}\nhttps://scholar.google.com{author_link}\n')
# Part of the output:
'''
Christoph Benzmüller
Professor, FU Berlin
https://scholar.google.com/citations?user=zD0vtfwAAAAJ&hl=en
Pascal Fontaine
LORIA, INRIA, Université de Lorraine, Nancy, France
https://scholar.google.com/citations?user=gHe6EF8AAAAJ&hl=en
Stephan Merz
Senior Researcher, INRIA
https://scholar.google.com/citations?user=jaO3Z3wAAAAJ&hl=en
'''
Sign up for free
to join this conversation on GitHub
. Already have an account?
Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.