Here's a few useful Python packages I've found so far that can help with various aspects of scholcomm research. I make no claim as to the quality of these packages.
- pyOpenSci pyApiToolkit: 'Python 3 scripts to access, create, distribute and publish open research data or data about open science works.' Includes DOAJ, oaDOI (Unpaywall), ORCID, Zotero and Wikidata API wrappers.
- BeautifulSoup: Scrape webpages (including journal webpages, where permitted by journal T&Cs) using BeautifulSoup. I've used this to scrape acknowledgements and conflict of interest data from clinical trials published in journals.
- crossrefapi: Access the Crossref API for data on journal articles, journals, funding info, and more.
- refextract: Enter a link to a journal article, get back a scraped, structured list of the references included in that article.
- idutils: 'Small library for validating and normalising persistent identifiers used in scholarly communication.' Haven't used this one yet but it looks to be good.
- pyAltmetric: Counts of altmetrics (tweets, news articles, policy cites, etc) for 9MM+ research outputs (journal articles, books, etc). Query the API using a number of persistent identifiers or by fixed time range (1d, 3d, 1m, etc).